Shader

A program used in 3D computer graphics to determine the final surface properties of an object or image. This can include arbitrarily complex descriptions of light absorption and diffusion, texture mapping, reflection and refraction, shadowing, surface displacement and post-processing effects.

Programmable shaders are flexible and efficient. Complicated surfaces can be rendered from simple geometry or at least appear to be. For example, a shader can be used to draw a grid of 3D ceramic tiles from a simple plane. A shading language usually has special data types like color and normal.

Shading languages

Initially introduced in Pixar's RenderMan, shaders acquired increasing momentum as the cost of computers lowered. The main benefit from using shaders is surely the great flexibility allowed, resulting in faster and cheaper development time but also richer experience for the final user.

Because of the various targets and markets of 3D graphics, different shading languages have been developed to emphatize the most important aspect taken in account.

Production rendering

This kind of shading languages are geared towards maximum quality and programming easyness. Material properties are totally abstracted, little programming skills and and no hardware knowledge is required. Those kind of shaders are often developed by artists to get the right "look", just as texture mapping, lighting and other faces of their work.

Processing this kind of shaders is usually a time-consuming process. The actual machinery to get this kind of shading language work can be rather expensive because of their ability to produce photorealistic results. Most of the time, those languages can be run on large computer clusters.

RenderMan Shading Language

RenderMan Shading Language, which is defined in the RenderMan Interface Specification, is the most common shading language for production-quality rendering. RenderMan by Rob Cook, is currently used in all Pixar's products. It's also one of the first shading languages ever implemented.

The language actually defines five major shader types:

Light source shaders compute the color of the light emitted from a point on the light source towards a point on the surface being illuminated.
Surface shaders are used to model the optical properties of the object being illuminated. They output the final color and position of the point being illuminated by taking into account the incoming light and the physical properties of the object.
Displacement shaders manipulate the surface's geometry independent of its color.
Volume shaders manipulate the color of a light as it passes through a volume. They are used to create effects like fog.
Imager shaders describe a color transformation to final pixel values. This is much like an image filter, however the imager shader operates on prequantized data, which typically has a greater dynamic range than can be displayed on the output device.

Most of the time RenderMan is referenced, it is really meant to speak about PRMan, the RenderMan implementation from Pixar, which was the only one avaiable for years. Further informations can be found at the RenderMan repository

Gelato Shading Language

Developed by NVIDIA, a graphics processing unit manifacturer for its Gelato rendering software.

This language is meant to interact with hardware^[1], providing higher computational rates while retaining cinematic image quality and functionalities.

Real-time rendering

Until recently, developers did not have the same level of control over the output from the graphics pipeline of graphics cards but shading languages for real time rendering are now widespread commodities. They provide both higher hardware abstraction and more flexible programming model when compared to previous paradigms which hardcoded transformation and shading equations. This results in both giving the programmer greater control over the rendering process and delivering richer content at lower overhead.

Quite surprisingly those shaders, which are designed for maximum performance and to be executed directly on the GPU at the proper point in the pipeline also scored successes in general processing because of their stream programming model.

This kind of shading languages are usually bound to a graphics API, although some applications also provided built-in shading languages with limited functionalities.

Historically, only few of those languages were successful in both estabilishing themself and mantaining strong market position: a short description of those languages follows below.

OpenGL shading language

Also known as GLSL or glslang this standardized^[2] high level shading language is meant to be used with OpenGL.

The language featured a very rich feature set since the beginning, unificating vertex and fragment processing in a single instruction set, allowing conditional loops and (more generally) branches.

Historically, GLSL have been preceded by various OpenGL extensions such as ARB_vertex_program, ARB_fragment_program and many others. Those were low-level, assembly-like languages with various limitations. Usage of those languages is now discuraged so they are just shortly referenced here. Those two extensions were also preceded by other proposals which didn't survive their renew version^[3]^[4].

Cg programming language

This language^[5] developed by NVIDIA has been designed for easy and efficient production pipeline integration. The language features API independance and comes with a large variety of free tools^[6] to improve assets management.

First Cg implementations were rather restrictive because of the hardware being abstracted but they were still innovative because of the great leap when compared to previous methods. Cg seems to have survived the introduction of the newer shading languages very well, mainly of its estabilished momentum in the digital content creation area, although the language is seldom used in final products.

A distinctive feature of Cg is the use of connectors, special data structures to link the various stages of processing. Connectors are used to define the input from application to vertex processing stage and the attributes to be interpolated as input to fragment processing.

DirectX High-Level Shader Language

This is possibly the most successful language up to date, mainly because of the great pressure from Microsoft. The high level shader language (also called HLSL for short) was released before its main competitor, GLSL, altough its feature set was later extended in two different revisions to match the same feature set.

Real time shader structure

There are different approaches to shading, mainly because of the various applications of the targeted technology. Production shading language are usually at a higher abstraction level, avoiding the need to write specific code to handle lighting or shadowing. By constrast, real time shaders usually integrate light and shadowing computations. In those languages, the lights are passed to the shader itself as parameters.

There are actually two different applications of shaders in real time shading languages. Altough the feature set actually converged so it's possible to write a vertex shader using the same functions of a fragment shader, the different purposes of computation impose limitations to be acknowledged.

Vertex shaders

Vertex shaders are applied for each vertex and run on a programmable vertex processor. Vertex shaders define a method to compute vector space transformations and other linearizable computations.

A vertex shader expects various inputs:

Uniform variables are constant values for each shader invocation. It is allowed to change the value of each uniform variable between different shader invocation batches. This kind of variables are usually 3-component arrays but this does not need to be. Usually, only basic datatypes are allowed to be loaded from external APIs so complex structures must be broken down^[7].
Vertex attributes, which are a special case of variant variables is essentially per-vertex data such as vertex position. Most of the time, each shader invocation performs computation on different data sets. The external application usually does not access this variables "directly" but manages as large arrays. Besides this little detail, applications are usually capable of changing a single vertex attribute with ease.

Vertex shader computations are meant to provide following stages of the graphics pipeline with interpolable fragment attributes. Because of this, a vertex shader must output at least the transformed homogeneus vertex position (in GLSL this means the variable gl_Position must be written). Outputs from different vertex shader invocations from the same batch will be linearly interpolated across the primitive being rendered. The result of this linear interpolation is fetched to the next pipeline stage.

Fragment shaders

Fragment shaders are applied for each fragment. They are run on a fragment processor, which usually features much more processing power than its vertex-oriented counterpart. At the time this is written (24 October 2005) some architectures are merging the two processors in a single one to increase transistor usage and provide some kind of load balancing.

As previously stated, the fragment shaders expects input from interpolated vertex values. This means that there are three sources of information:

Uniform variables can still be used and provide interesting opportunities. A typical example is passing an integer providing a number of lights to be processed and an array of light parameters. Textures are special cases of uniform values and can be applied to vertices as well, altough vertex texturing is often more complicated.
Varying attributes is a special name to indicate fragment's variant variables. Because of their origin, the application has no direct control on the actual value of those variables.

A fragment shader is allowed to discard the results of its computation, meaning that the corresponding framebuffer position must retain its actual value. Fragment shaders also does not need to write specific color information because this is not always wanted. Not producing color output when expected however gives undefined results in GLSL.

Texturing

Some words shall be spent on texture mapping with shaders (please note this applies specifically to GLSL, the information may not hold true for DirectX HLSL). The functionality by itself continues to be applied "as usual"^[8] with shading languages providing special ad-hoc functions and opaque objects.

It has been stated that textures are special uniform variables. The shading languages define special variables to be used as textures called samplers. Each sampler does have a specific lookup mode assigned explicitly in the name. Looking up a texture actually means to get an interpolated texel color at a specified position. For example, in GLSL sampler2D will access a specific texture performing bidimensional texturing and filtering. Other details are specified by the function used to actually perform the lookup. For cube map textures, a samplerCube would be used with a textureCube function call.

Understanding completely the model also needs to know a little about the previous shading model, commonly referred as multitexturing or texture cascade. For our purposes, we'll just assume there is a limited set of units which can be linked to specific textures and somehow produce color results, possibly combining them in a sequential order. This is definetly redundant with the new programming model which allows much greater flexibility.

To lookup to a specific texture, the sampler really needs to know what of those texture units needs to be used with the specified lookup mode. This means samplers are really integers referring to the texture unit used to carry on the lookup. It will now be possible to bind to each texture unit an image texture just as usual. It will happen that those units are actually a subset of "legacy" texture units and are referred as image units. Most implementation actually allow more image units than texture units because of the lower complexity to implement them but also to push for the new programming model. In short, samplers are really linked to image units, which are bound to textures^[9].

For final users, this extra flexibility results in both improved performance and richer content because of the better hardware utilization and resources.

Lighting and shadowing

Considering the lighting equation, we have seen the trend to move evaluations to fragment granularity. Initially, the lighing computations were performed at vertex level (phong lighting model) but improvements in fragment processor designs allowed to evaluate much more complex lighting equations such as the blinn lighting model, often referred as bump mapping. In this latter technique, vertex shaders are used to set up a vertex local space (also called tangent space)^[10] which is then used to compute per-pixel lighting vectors. The actual math for this can be quite involved and is beyond scope of this article.

It is well acknowledged that lighting really needs hardware support for dynamic loops (this is often referred as DirectX Shader Model 3.0) because this allows to process many lights of the same type with a single shader. By contrast, previous shading models would have need the application to use multi pass rendering (an expensive operation) because of the fixed loops. This approach would also have needed more complicated machinery. For example, after finding there are 13 "visible" lights, the application would have the need to use a shader to process 8 lights (suppose this is the upper hardware limitation) and another shader to process the remaining 5. If there are 7 lights the application would have needed a special 7-light shader. By contrast, with dynamic loops the application can iterate on dynamic variables thus defining a uniform array to be 13 (or 7) "lights long" and get correct results, provided this actually fits in hardware capabilities^[11]. At the time this is being written (27 October 2005) there are enough resources to evaluate over 50 lights per pass when resources are managed carefully. Compare this to old programming models.

Computing accurate shadows make this much more complicated, depending on the algorithm used. Compare stencil shadow volumes and shadow mapping. In the first case, the algorithm requires at least some care to be applied to multiple lights at once and there's no actual proof of a multi-light shadow volume based version. Shadow mapping by contrast seems to be much more well suited to future hardware improvements and to the new shading model which also evaluates computations at fragment level. Shadow maps however needs to be passed as samplers, which are limited resources: actual hardware (27 October 2005) support up to 16 samplers so this is a hard-limit, unless some tricks are used. We speculate future hardware improvements and packing multiple shadow maps in a single 3D-texture will rapidly raise this resource availability.

References

^ NVIDIA Gelato official website, http://film.nvidia.com/page/gelato.html
^ Official language specification, http://www.opengl.org/documentation/oglsl.html
^ Fx composer from NVIDIA home page, http://developer.nvidia.com/object/fx_composer_home.html
^ Official Cg home page, http://developer.nvidia.com/object/cg_toolkit.html
^ Search ARB_shader_objects for the issue "32) Can you explain how uniform loading works?". This is an example of how a complex data structure must be broken in basic data elements.
^ Previous vertex shading languages (in no particular order) for OpenGL include EXT_vertex_shader, NV_vertex_program, the aforementioned ARB_vertex_program, NV_vertex_program2 and NV_vertex_program3.
^ For fragment shading nvparse is possibly the first shading language featuring high-level abstraction based on NV_register_combiners, NV_register_combiners2 for pixel math and NV_texture_shader, NV_texture_shader2 and NV_texture_shader3 for texture lookups. ATI_fragment_shader and EXT_fragment_shader did not even provide a "string oriented" parsing facility. ARB_fragment_program, has been very successful. NV_fragment_program and NV_fragment_program2 are actually similar although the latter provides much more advanced functionality in respect to others.
^ Required machinery has been introduced in OpenGL by ARB_multitexture but this specification is no more avaialable since its integration in core OpenGL 1.2.
^ Search again ARB_shader_objects for the issue "25) How are samplers used to access textures?". You may also want to check out "Subsection 2.14.4 Samplers".
^ Search NVIDIA developer resources for various papers on per-pixel lighting.
^ You'll be glad to see those limits are (at least theorically) rather high. Check out The OpenGL® Shading Language for the string "52) How should resource limits for the shading language be defined?" and look for your favorite video card at Delphi3d.net hardware database.