Volumetric Rendering

Bob Amidon
Cornell University, 1995

Overview____________________

Volumetric rendering is the process of visualizing three dimensional data sets. Data Explorer (DX) provides a basic volumetric rendering capability, however, it is limited to orthographic cameras and ambient shading.

The goal of this project was to create a more advanced volumetric rendering module for Data Explorer that provides realistic shading and perspective camera views. Due to the large amount of data that must be processed when volume rendering, secondary goals of the project were to conserve memory resources and keep the processing time to a minimum.

Interface___________________

To add the volumetric rendering module described above to Data Explorer, load the module description file Vol.mdf. This creates a module titled Vol under the Rendering category.

The Vol module has six inputs and one output. Here is a sample network showing how it is used.

Volumetric Data

The first input receives the volumetric data. This is a 7-vector of floats containing the color, opacity and normal vector for each voxel in the data set. Color is an RGB value, opacity is a scalar and the normal vector is a 3-vector. The Mark modules extract the DX colors and opacities components and the Compute module combines them with the normal vectors to form the 7-vector.

The Colormap module selects the colors and opacities based on the density of each voxel. For performance reasons, it is suggested that opacities be clipped when they fall below a minimum threshold value. The rendering algorithm is optimized to bypass the shading calculations when a voxel is completely transparent, therefore, by changing near-zero opacities to zero allows more voxels to make use of this optimization. If the threshold value is chosen well, no information should be lost in the rendering.

The normal vectors are calculated by the Gradient module. These vectors should not be normalized. During the rendering process, the normal vector for a voxel is calculated by interpolating the normal vectors from the eight surrounding voxels. By using the non-normalized vectors in the interpolation, the magnitudes of the vectors affect the outcome of the interpolation by giving a proportional weight to each vector.

When the data set being visualized is composed of discrete materials, then it may be desirable to assign colors and opacities based on the materials contained by the voxel rather than by density alone. For these data sets, a probablistic classifier can be used to extract the percentage of each material contained by the voxel from the density. Then attributes can be assigned accordingly. Refer to "Volumetric Rendering" by Drebin, Carpenter and Hanrahan for more information on extracting material percentages from density values.

Grid size/spacing information and shading properties are also extracted from the first input. The grid information includes grid dimension, grid counts, grid origin and grid deltas. The grid dimension must be 3, volumetric data. Non-orthogonal grids are permitted.

The surface shading properties include the ambient (Ka), diffuse (Kd) and specular (Ks) shading coefficients as well as the shininess level (n). These values are added to the volumetric data stream by using the Shade module. If a Shade module is not used, the default values of Ka=0.4, Kd=0.6, Ks=0.3 and n=15 are used.

Strength

An optional eighth component for the volumetric data is surface strength. Surface strength is calculated as the magnitude of the normal vector. It represents the percentage of a surface within the voxel and is used to diminish the light reflected as the surface percentage approaches zero.

The use of the surface strength component is selected by setting the second input value, strength, to 1. When this component is used, the densities must be normalized. This is necessary to keep the magnitudes of the normal vectors, and therefore the surface strength values, between 0.0 and 1.0, otherwise, the resulting image colors will be scaled past their maximum component values of 1.0 and the resulting image will look over-exposed.

If the data set being used does not have strong surface boundaries, i.e. the densities are continuous, it is probably better if the surface strength component is not used (set the strength input to 0).

Note that the strength input must be fed by an Integer interactor when the default value of zero is not used.

Camera

The third input specifies the camera parameters. A Camera module provides this input. The camera position, 'look to' point, up direction, field of view, aspect ratio, width, height and background color are all extracted from the DX camera structure. Only perspective cameras are supported.

The width and height parameters select the resolution of the viewing window and therefore, the size of the rendered image. Because the size of the view window and rendering time are directly related, it may be more efficient to render to a smaller window and then enlarge the result.

The view window size determines the number of samples taken along the x and y axis in view space. If the view window resolution is much greater than the data set size, then most of the samples are just going to be further interpolations between the same data points. By rendering to a smaller image, the amount of oversampling is reduced and the same result is accomplished by interpolating the pixels of the final image.

For example, a 64 cubed data set rendered at 640x480 pixels will take approximately 10 samples between data points along the x-axis and 7.5 samples between data points along the y-axis. By rendering to a 320x240 image, the job will finish in a 1/4 of the time and the volume is still being sampled adequately (5 x-samples/voxel and 3.25 y-samples/voxel).

640x480 resolution (time=2:25).

320x240 resolution; enlarged to 640x480 (time=0:37).

200x150 resolution; enlarged to 640x480 (time=0:16).

There is no noticeable difference between the 320x240 rendering and the 640x480 rendering. The 200x150 rendering starts to look jagged, though.

Also note that if a large data set is rendered to a small image, the effect will be that some the data points will not be used in the rendering. This could cause discontinuities in the image when abrupt changes occur in the data set or if the mismatch between the data set size and image size is significant.

Composite Image

The fourth connection inputs an external image to use as the background of the volumetric rendering. The ReadImage module is used to load the image file and feed this input. The size of the image must correspond to the view window size specified by the camera. This input is optional. When not connected, the background is set to the background color of the camera.

Z-Buffer

The fifth input reads in a z-buffer filename. This input is optional. When provided, it allows the background image to be composited with the volumetric data. The size of the z-buffer must match the size of the camera view window.

The z-buffer filename must be fed by a FileSelector interactor, otherwise the default of no z-buffer is assumed.

Light

The sixth input provides the light direction vector. A Light module provides this input. Only a single distant light source is allowed and white light is assumed. If this input is not connected, the default light direction vector of [0,0,1] is used.

Some ambient shading is needed to fill in the area on objects where the diffuse and specular terms go to zero. The spheres rendered below show the affect that no ambient lighting has. Also notice how the specular reflections appear throughout the object.

Output Image

The output of the "Vol" module returns the rendered image of the volume. It has the resolution of the camera view window when viewed with a Display module. An Image module can be used instead of the Display module if the image needs to be resized.

Algorithm____________________

The approach taken by this volumetric renderer is to divide screen space into a series of depth planes perpendicular to the camera, sample a regular grid of points on each plane, map the points to the volumetric data set to calculate the reflected light and then merge the resulting plane images to form the final image.

To avoid having to allocate memory for the intermediate images, the order that the depth planes are processed is from back to front. This allows each newly rendered point to be merged into the final image immediately, using the opacity to interpolate the point with the image pixel. Therefore, only the memory required for the final image need be allocated.

The increment parameter selects the resolution that screen space is sampled at. This parameter is a 3-vector of floats containing the X, Y and Z increments. The Z increment selects the distance between the depth planes and the XY increments specify the regular grid spacings that each depth plane is sampled at.

The depth planes map directly to the final image. Therefore, the XY increments are determined directly from the image size. Since screen space spans a distance of 2.0 (from -1.0 to 1.0) along the X and Y axis, these increments are calculates as 2.0 / (width-1) along the x-axis and 2.0 / (height-1) along the y-axis.

There is no rigid definition for choosing the Z increment as there is for the X and Y increments, however, to satisfy the Nyquist theorem, the Z increment is chosen to correspond to a 1X sample rate. The number of samples required for this rate is found by dividing the view depth (the distance between the far and near clipping planes) by the maximum distance along the z-axis between neighboring data points. Since screen space spans a distance of 1.0 along the Z axis, the Z increment is calculated by 1.0 / (#samples-1).

The images below illustrate the effect of increasing the number of depth planes used in the rendering.

1X sample rate (time=2:25)

2X sample rate (time=4:48)

4X sample rate (time=9:35)

To minimize the number of screen points that map outside of the data set, and therefore reduce the overall rendering time, the clipping planes are chosen so they abut the data set. To find these values, the corners of the data set are transformed to view space, and the minimum and maximum depths are located and assigned as the near and far clipping planes distances respectively. The auto_limits subroutine implements this process. The width and height boundaries of screen space are based on the near cutoff plane and the perspective values and are calculated in the view_to_screen subroutine where the perspective transform matrix is formed.

The auto_resolution subroutine calculates the increment parameter as described above. Of course, this routine could be bypassed and the increments chosen manually or by a different method (e.g. if an approximate X and Y sample rate is preferred to an exact image size or if a finer depth resolution is required).

The view space depth of a screen point is calculated prior to transforming it to grid space. The macro calculate_zv performs this calculation using the equation -df / (Zs(f-d)-f). Zs is the screen depth, f is the distance to the far cutoff plane and d is the distance to the near cutoff plane. This value only has to be calculated once per depth plane because all the points on a depth plane have the same view space depth.

The processing order of this algorithm makes compositing an external image straightforward. The memory allocated for the final image is initially assigned with the external image colors. Then, before each screen point is mapped to the data set, the view space depth of that point is compared with the corresponding z-buffer value. If the screen point is in front of the z-buffer depth, then the point is shaded, otherwise it is not. The XY position of the screen point maps directly to the image and z-buffer indices. Appendix B details the z-buffer file format.

The screen point is transformed into grid space to get the data required for the shading calculations. The steps required to perform this transform are:

append a w-component of 1.0 to the point to make it homogeneous
scale the homogeneous coordinate by the view space depth
multiply the scaled point by the transform matrix

The scaling by the view space depth, step 2, is done to counter-act the non-linear perspective divide of the forward transform. Instead of scaling each point individually, the transform matrix is scaled once for each depth plane because the view space depths for these points are the same.

The transform matrix used to convert screen points to grid points is the product of three matrices: screen-to-view, view-to-world and world-to-grid. The formation of these individual matrices are detailed in Appendix A. The only significant note about the transforms is that the world-to-grid matrix removes the assumption of an orthogonal grid.

After the matrix multiply, the w-component is dropped to return the grid point to three-dimensional space. The w-component can be dropped because the inverse perspective matrix makes this value 1.0 and the view and grid transform matrices do not alter its value.

When a screen point is transformed into grid space, it falls within a cube formed by the eight closest data points. The term "cube" is used loosely here since non-orthogonal grids are allowed. The integer value of the grid point locates the lower-right-front point of this cube (right-handed coordinate system for grid space) and the fractional part identifies the relative location of the point within the cube.

The grid indices of the eight neighboring data points are [X,Y,Z], [X,Y,Z+1], [X,Y+1,Z], [X,Y+1,Z+1], ..., [X+1,Y+1,Z+1], where X,Y,Z is the integer value of the grid point. The data set is stored with the z-axis as the fast axis, therefore, to access the data set memory, the following equation is used:

((X*Grid_Count_Y*Grid_Count_Z) + (Y*Grid_Count_Z) + Z) * Data_Size

Grid_Count_n is the number of grid points in the data set along the n-axis and Data_Size is the size of the data type that stores the information at each point. The array_index macro implements this equation a bit more efficiently.

Tri-linear interpolation is the method used to interpolate between these points. This involves performing seven linear interpolations. The first four are along the z-axis between the points:

[X,Y,Z] and [X,Y,Z+1]
[X,Y+1,Z] and [X,Y+1,Z+1]
[X+1,Y,Z] and [X+1,Y,Z+1]
[X+1,Y+1,Z] and [X+1,Y+1,Z+1]

The ordering of these points was chosen to reduce the number of cache misses. Sequentially accessing points that differ only by their z-coordinate will typically not cause a cache miss. This guarantees that four of the eight possible cache misses here are prevented. The ordering of the interpolations will eliminate two more potential cache misses if the size of the cache block is large enough to contain points that differ only by one in the y-coordinate. For moderate to larger data sets, the cache block size probably won't be large enough.

The next two interpolations are along the y-axis between the results of interpolations (1) and (2), and (3) and (4). The results of the initial four interpolations are stored in local memory and therefore the ordering of later interpolations has no affect on access time. The final interpolation is along the x-axis between the results of the last two interpolations.

The relative location of the point within the cube (i.e. the fractional part of the grid point) determines the weight of each point along each axis for the interpolations. All components of the data points (color, opacity and normal) are interpolated to get the 7-vector used to shade the screen point.

Tri-linear interpolation does a good job of creating a smooth image provided that the data set has been sampled adequately. If there are not enough data points to sufficiently reproduce the original object, the interpolation will fail and a "stair step" will be visible.

The first example below shows a sphere rendered from a 50 cubed volumetric data set. The edges appear jagged. When a 100 cubed data set is used instead, as in the second example, the jagginess goes away.

A way to increase the smoothness along object edges without increasing the size of the data set is to reduce the opacity delta between surfaces. In the above renderings, the opacity of the sphere was 1.0 and the surrounding area 0.0. In the image below, the opacity of the sphere was reduced to 0.025 and the shading coefficients were increased to brighten the object.

If further smoothing of the image is required, a filtering operation that uses more distant data points in the computation could be implemented.

Before the shading calculations are worked, the opacity and surface strength components of the point are examined. Surface strength is calculated by taking the magnitude of the interpolated normal vector. If either of these values are zero, then the point is completely transparent and is not shaded and the image color does not change. NOTE: If the surface strength calculations have been turned off by the user, then only opacities are checked here.

For the points with non-zero opacities and strengths, ambient, diffuse and specular shading terms are calculated. The Phong reflection model is used for these calculations. The normal vector of the data point, as calculated by the interpolation, must be normalized before the diffuse and specular terms are calculated. The specular calculation also requires the view direction vector to be formed and normalized.

There is an absolute value taken in the diffuse and specular terms that is not typical of Phong shading. The purpose of this operation is to make the normal vectors bidirectional. This makes the light reflect off the front and back sides of voxels. Since most of the voxels in the data set have some transparency, this is necessary so all surfaces, even those facing away from the light source, are visible.

Performance considerations force four simplifications to be made in the shading calculations. Most of these can be removed quite easily if their worth exceeds the performance hit.

There is only a single light source. To implement multiple light sources, the shading calculations would need to be executed once for each light source and then the results summed to get the total shading for the point.
The light source is positioned at infinity. This prevents the light direction vector from being recalculated (and normalized) for each screen point.
White light is assumed. For non-white light, the color of the data point is multiplied by the color of the light source in the ambient and diffuse calculations and the color of the light is used directly as a scaling factor in the specular calculation.
Light is not attenuated as it travels deeper into the volume. This means that each point is shaded as if there is nothing between it and the light source.
To overcome this last simplification, a volume of light intensities needs to be formed before the data set is rendered. This volume is generated much in the same way that the volumetric data is rendered. View space is divided into a series of planes perpendicular to the light source, a regular grid of points is sampled on each plane, each point is mapped back to the data set (using a transformation matrix based on the light source position) and then the opacity of the point is determined by interpolating the neighboring data set points. The opacity of the point and the intensity of the light at that point determine the intensity of the light at the point directly behind it.
Once the light intensity volume is created, the rendering process proceeds as normal until the shading calculations are reached. Here, the light intensity is needed to scale the diffuse and specular reflections. It is found by transforming the point into the light intensity volume and interpolating the surrounding values.
Adding this feature would have a significant impact on performance (I'd guess rendering time would double) and memory requirements, however, the result would have the effect of casting shadows within the volume.

After the point has been shaded, the color is diminished by the surface strength and then interpolated with the background color (the color of the light passing through the point) based on the opacity of the data point. This result replaces the previous image color of the corresponding pixel.

Once all of the screen points have been rendered, the final image is complete and returned to the calling routine.

This rendering algorithm is implemented by the render.h, render.c module. Vol.c is the interface routine to Data Explorer and the mainline routine for this program.

There are many inherent benefits to this algorithm:

Interpolation is a simple, integral part of the data sampling process that results in a smooth image given an adequately sampled data set. If further smoothing is required, this process can be expanded to provide a filtering operation that would use more of the surrounding data points in the interpolation.
Compositing an external image is done with very little impact on the rendering algorithm and performance.
The rendering time is mostly a function of the image size and not of the size of the data set.
Each depth plane is a complete image in itself and could be output to a separate file to make a cross-sectional animation of the data set.

The only drawback is that the volumetric data set is accessed randomly. This increases cache misses for data fetches and therefore limits performance.

Performance___________________

Improvements

There were many enhancements made after the initial implementation of the algorithm to improve performance.

Subroutine overhead in an application like this becomes significant because the same processing is repeated many times. This overhead was mostly eliminated by using macros for the short subroutines of about 2-3 lines. There were also a couple of calls to longer subroutines called within the main rendering loop that were replaced with the actual subroutine code. These routines had been broken out for readability reasons alone.

Cache misses are a big performance impediment when dealing with large amounts of data. Accessing data sequentially, whenever possible, is the only way to address this problem. The macros and inlined subroutines mentioned above helped to reduce cache misses on instruction fetches by extending the number of instructions that follow sequentially in memory. Cache misses on data access were controlled by accessing the image and z-buffer memories sequentially and by reducing the random access of the interpolation process as much as possible.

Computation time can be reduced by eliminating instructions with long execution times. Where possible, equations were rearranged to eliminate these long operations or reduce the total number of operations required. As mentioned before, the "loop" nature of this application magnifies any delay caused by inefficient calculations. Therefore, any calculations that could be moved outside of the rendering loop were.

Subroutines, such as the generic matrix multiply procedure, are very useful because of the flexibility they provide. However, that flexibility comes at the price of performance. Therefore, generic routines that were called from the rendering loop were replaced by macros that handled the job directly. The vector-matrix multiply macro is an example.

Conditional checks can go a long way in eliminating unnecessary calculations; they can also diminish performance if not chosen carefully. Three conditional checks are used by this algorithm. The first is required for compositing an external image. If the point is behind the composite image, the point is never transformed, interpolated or shaded. The second check is done after the screen point is mapped to the data set. If the point falls outside of the data set, it is not interpolated or shaded. The third check is made after the interpolation and checks the opacity and strength of the point. If either is zero, then the point is completely transparent and is not shaded.

Rendering Time

The conditional checks used to optimize the performance cause the rendering time to vary depending on how much of the volume is hidden behind the composite image, how many of the screen points map to the data set and the properties of the data set (e.g. the number of points that are transparent). Therefore, the worst case rendering time occurs when every single point in screen space maps to the data set, there re no hidden or transparent voxels and each point has some surface strength.

Since times vary depending on the machine used, an overall worst case time cannot be given. To get an idea of the time it will take to render a data set, the worst case times for a 320x240 image with 68 depth samples on an IBM R6000 workstation and an SGI Onyx are given:

63x69x68 data set
d      = 2000.0
f      = 2013.6
fov    = 0.00315  (0.1805 degree perspective angle)
aspect = 1.09524

SGI Onyx: 3 minutes 18 seconds

IBM R6000: 2 minutes 25 seconds

The effect that camera angle has on rendering time is based in the size of the view volume that screen space maps to. As the camera angle approaches 45 degrees on any axis, the orientation of the data set will increase the distance between the near and far clipping planes. This requires more depth planes to be processed in screen space in order to form the final image. (Note that the XY sampling remains the same because it is determined by the image size.) This is balanced out, though, by the increased number of screen points that don't map to the data set and, therefore, are not interpolated or shaded.

Rendering time increases linearly as the number of screen space samples is increased along any one axis. Therefore, doubling the number of depth samples will double the rendering time. Remember that the number of depth samples depend on the size and resolution of the data set and on the camera angle. Similarly, since the image size determines the number of X and Y samples taken in screen space, doubling the image size (e.g. 320x240 to 640x480) will result in quadrupling the rendering time.

Modules____________________

Color Model: ( color.h)

An RGB color model is being used for this application. This makes the color structure a three vector of floats (r,g,b). Five macros are provided to perform operations on this data structure. They are:

C_COLOR         -- Creates an RGB color data type given three scalars.
C_ADD           -- Adds two colors.  This routine is used when
		   multiple colors contribute to a final color (e.g.
		   diffuse + specular).
C_SCALE         -- Multiplies the R, G and B color components by a
		   scalar.  This routine is used to attenuate a color.
C_MULTIPLY      -- Multiplies each component (R, G, B) of one color
		   by the corresponding component of the other.  This
		   routine is used to determine the resulting color
		   when a colored surface is illuminated by a colored
		   light source.
C_MIX           -- Merges two colors by interpolating the resulting
		   color using the given ratio.  This routine is used
		   to overlay individual pixels from one image onto
		   another, using the opacity are the mix ratio.

Vector Operations: ( vector.h)

Vectors are represented by three floats (x,y,z). Ten macros are provided to perform operations on this data type. They are:

V_VECTOR        -- Creates a vector type given three scalars.
V_CREATE        -- Creates a vector type given two vectors.
V_DOT_PRODUCT   -- Calculates the dot product of two vectors.
V_CROSS_PRODUCT -- Calculates the cross product of two vectors.
V_MAGNITUDE     -- Calculates the magnitude of a vector.
V_ADD           -- Adds two vectors.
V_SUBTRACT      -- Subtracts two vectors.
V_SCALE         -- Scales a vector.
V_REFLECT       -- Calculates the reflected vector given a normal
		   and incident vectors.  Input vectors must be
		   normalized.
V_NORMALIZE     -- Normalizes a vector.

Matrix Operations: ( matrix.h, matrix.c)

A matrix is handled as a pointer-to-float type. Therefore, the size of the matrix must be indicated in some form to the routine operating on the matrix. Two of the three operations provided by this module assume the matrices passed in are of a given size. The third provides parameters to specify the size of the matrices input.

M_MULTIPLY      -- Multiplies two matrices.  This routine will
		   handle matrices of any size.  The size of the
		   matrices are input parameters. **1
M_MULT_WV       -- Multiplies a matrix with a vector.  This is a macro
		   that assumes a four vector is being multiplied with
		   a 4X4 matrix.  This macro was added to improve the
		   performance of this common operation. **1
M_INVERSE       -- Calculates the inverse matrix of a 3x3 matrix.

**1 result matrix must be unique (i.e. it cannot be one of the operand matrices). If not, the operand will be destroyed during the calculation and an incorrect result will occur.

Shading Model: ( shade.h)

This module provides the ambient, diffuse and specular shading calculations based on the Phong reflection model. It makes use of macros provided by the Color and Vector modules. Note that all input vectors to these macros must be normalized and that white light is assumed.

AMBIENT         -- Calculates the ambient term. 
DIFFUSE         -- Calculates the diffuse term.
SPECULAR        -- Calculates the ambient term.

Transforms: ( xform.h, xform.c)

This module contains the routines needed to calculate the transform matrices to do conversions between grid coordinates, world coordinates, view coordinates and screen coordinates.

AUTO_LIMITS     -- Calculates the near and far clipping planes in
		   view space.
AUTO_RESOLUTION -- Calculates the resolution that screen space is
		   sampled at for the rendering.
RT_PRODUCT      -- Forms a rotation:translation matrix product.
TR_PRODUCT      -- Forms a translation:rotation matrix product.
VIEW_ROTATION   -- Calculates the camera rotation matrix given the
		   camera position, up direction and look-to point.
GRID_TO_WORLD   -- Calculates the grid-to-world and world to-grid
		   transform matrices.
WORLD_TO_VIEW   -- Calculates the world-to-view and view-to-world
		   transform matrices.
VIEW_TO_SCREEN  -- Calculates the view-to-screen and screen-to-view
		   transform matrices.
TRANSFORMS      -- Main routine for forming the transformation
		   matrices and other inputs required by the
		   renderer.

Input/Output: ( io.h, io.c)

This module contains the routines that allocate resources, handle file io and write messages to the screen.

ALLOCATE_IMAGE  -- Allocates the memory required for the image and
		   initializes it with the background color.
READ_ZBUFFER    -- Opens and reads in the contents of a z-buffer file.
DEBUG_WRITE     -- Debug routine for writing out the contents of
		   various data structures.
WRITE_ERROR     -- Converts status values into error messages.

Appendix A: (Matrix Formation)

The module xform.c contains all the routines necessary to create the transforms matrices required as inputs by the rendering program. The transforms routine in this module is the main routine that calculates the forward and inverse transform matrices for the grid-to-world (Tgrid and Tgrid_inv), world-to-view (Tview and Tview_inv) and view-to-screen (Tpersp and Tpersp_inv) conversions. The product of Tpersp_inv*Tview_inv*Tgrid_inv is taken to create the overall inverse transform matrix from screen to grid coordinates.

Grid-To-World:

The grid-to-world matrix is formed by multiplying the grid deltas matrix with the grid origin matrix. The grid deltas matrix is a rotation operation that reduces to a scaling operation for orthogonal grids. Below are the individual and resulting 4X4 matrices:

Grid Deltas Matrix:               Grid Origin Matrix:
Xdx  Xdy  Xdz  0.0                1.0  0.0  0.0  0.0
Ydx  Ydy  Ydz  0.0                0.0  1.0  0.0  0.0
Zdx  Zdy  Zdz  0.0                0.0  0.0  1.0  0.0
0.0  0.0  0.0  1.0                Ox   Oy   Oz   1.0

Grid-To-World Matrix:
Xdx  Xdy  Xdz  0.0
Ydx  Ydy  Ydz  0.0
Zdx  Zdy  Zdz  0.0
Ox   Oy   Oz   1.0

Xdx reads "the rate of change in the world X-axis along the grid X-axis".
Xdy reads "the rate of change in the world Y-axis along the grid X-axis".
Ydx reads "the rate of change in the world X-axis along the grid Y-axis".
etc...

O = [Ox,Oy,Oz] is the location of the grid origin.

The inverse of the grid-to-world matrix is formed by multiplying the inverse grid translation matrix with the inverse grid rotation matrix. The result is shown below:

Inverse Grid Origin Matrix:       Inverse Grid Deltas Matrix:
1.0  0.0  0.0  0.0                Xdx  Ydx  Zdx  0.0
0.0  1.0  0.0  0.0                Xdy  Ydy  Zdy  0.0
0.0  0.0  1.0  0.0                Xdz  Ydz  Zdz  0.0
-Ox  -Oy  -Oz  1.0                0.0  0.0  0.0  1.0

World-To-Grid Matrix:
Xdx         Ydx         Zdx         0.0
Xdy         Ydy         Zdy         0.0
Xdz         Ydz         Zdz         0.0
(-O dot X)  (-O dot Y)  (-O dot Z)  1.0

X = [Xdx,Xdy,Xdz];  Y = [Ydx,Ydy,Ydz];  Z = [Zdx,Zdy,Zdz]

Note that the matrix notation for the world-to-grid matrix above is simplified by assuming that the rotation matrix from the grid-to-world transform is orthonormal and therefore the inverse matrix is just the transpose of the matrix. In the actual code, this assumption is not made and the inverse matrix is computed.

World-To-View:

The world-to-view matrix is formed by multiplying the camera translation matrix with the camera rotation matrix. The individual and result matrices are shown below:

Camera Translation Matrix:        Camera Rotation Matrix:
1.0  0.0  0.0  0.0                Ux   Vx   Nx   0.0
0.0  1.0  0.0  0.0                Uy   Vy   Ny   0.0
0.0  0.0  1.0  0.0                Uz   Vz   Nz   0.0
-Cx  -Cy  -Cz  1.0                0.0  0.0  0.0  1.0

World-To-View Matrix:
Ux          Vx          Nx          0.0
Uy          Vy          Ny          0.0
Uz          Vz          Nz          0.0
(-C dot U)  (-C dot V)  (-C dot N)  1.0

C is the camera position, N is camera direction vector, V is the camera "up" direction vector and U is the camera "right" direction vector (left-handed coordinate system).

The inverse of the world-to-view matrix is formed by multiplying the inverse camera rotation matrix with the inverse camera translation matrix. The result is shown below:

Inverse Camera Rotation Matrix:   Inverse Camera Translation Matrix:
Ux   Vx   Nx   0.0                1.0  0.0  0.0  0.0
Uy   Vy   Ny   0.0                0.0  1.0  0.0  0.0
Uz   Vz   Nz   0.0                0.0  0.0  1.0  0.0
0.0  0.0  0.0  1.0                Cx   Cy   Cz   1.0

View-To-World
Ux   Uy   Uz   0.0
Vx   Vy   Vz   0.0
Nx   Ny   Nz   0.0
Cx   Cy   Cz   1.0

The same assumption, concerning orthonormality, made in the world-to-grid matrix formation applies here, however, in this case, the rotation matrix should actually be orthonormal.

View-To-Screen:

The view-to-screen matrix is just the linear perspective matrix and is shown below. After a point is transformed by this matrix, it must be divided by its w-component to return it to 3-vector space. This divide is non-linear.

View-To-Screen Matrix:
d/w     0.0     0.0     0.0
0.0     d/h     0.0     0.0
0.0     0.0   f/(f-d)   1.0
0.0     0.0  -df/(f-d)  0.0

w is the view window width, h is the view window height, d is the distance to the near clipping plane and f is the distance to the far clipping plane.

The inverse of the perspective matrix was taken and the result is shown below. Since this matrix does not account for the non-linear divide performed after the view-to-screen transform, points to be transformed by this matrix must be multiplied by the view space depth prior to the operation.

Screen-To-View Matrix:
w/d     0.0     0.0     0.0
0.0     h/d     0.0     0.0
0.0     0.0     0.0  (f-d)/-df
0.0     0.0     1.0     1/d

Appendix B: (Z-Buffer Format)

The format of the z-buffer file is shown below. The subroutine read_zbuffer is used to read in this information.

data           type        quantity      size (bytes)
-------------- ----------- ------------  --------------
magic number   int         1             4
width          short int   1             2
height         short int   1             2
matrix 1       float       16            64
matrix 2       float       16            64
depth values   float       width*height  width*height*4

The magic number is 0x2F0867AB.
Matrix 1 is the product of the view and perspective matrices.
Matrix 2 is the view matrix.

Appendix C: (Error Messages)

Status  Message
------- ------------------------------------------------------
100     ERROR: File read error.
101     ERROR: Z-buffer file has wrong magic number.
102     ERROR: Z-buffer size is not compatible with image size.
103     ERROR: Could not allocate z-buffer memory.
104     ERROR: Premature EOF on z-buffer read.
105     ERROR: Could not open z-buffer file.
106     ERROR: Could not allocate image memory.
107     ERROR: Data set is not volumetric.
108     ERROR: Far and near clipping planes are equal.
109     ERROR: Perspective angle is zero.
110     ERROR: Increment value of zero detected.
111     ERROR: View direction vector is null.
112     ERROR: Up direction vector is null.
113     ERROR: Light source direction vector is null.

Bibliography

Alan Watt, "3D Computer Graphics", Addison-Wesley, 1993
Robert A. Drebin, Loren Carpenter and Pat Hanrahan, "Volume Rendering", Computer Graphics (SIGGRAPH '88 Proceedings) 22(4) pp. 65-73 (August 1988)
Steve Hill, "Tri-Linear Interpolation", Graphic Gems 4, pp. 521-525, Academic Press, Inc., 1994
James D. Foley, "Computer Graphics: Principles and Practice", Addison-Wesley, 1990
Philip J. Davis, "The Mathematics of Matrices", Wiley, 1973, pp.179-183
Ed Catmull and Alvy Ray Smith, "3-D Transformations of Images in Scanline Order", ACM 1980, pp. 279-284