Theory

What follows is some of the theory behind hierarchical modeling in computer graphics and some of the issues related to the camera and perspective transforms used in the viewing system.

Hierarchical Modeling

Hierarchical Modeling is a way of representing complex objects as hierarchical combinations of simple objects. A face, for example could be represented as an oblong sphere with some simple objects "pasted" on to represent eyes, ears, nose and mouth. Or a wheel could be represented as a thin torus with a series of cylinders inside to represent the spokes, for example. A more complicated object could then hierarchically include several of these wheels with a box to represent a cart.

The basic premise of hierarchical modeling is thus to create a series of simple objects and allow them to be manipulated and duplicated. In this way, one creates complicated objects that take advantage not only of the features of their sub-objects, but also of their hierarchical properties.

Coordinate Spaces

In general, 3-dimensional coordinate spaces have an origin and 3 non-parallel axes. Typically, however, these axes are taken to be orthogonal. When modeling objects, it is convenient to place them in a world-coordinate system. In order to view these objects, the notion of a camera is used - where the camera itself has a location in world-coordinates (camera position, C ), as well as a direction in which it points ( the N vector ) and an up vector ( U ) and a vector to denote increasing "x" coordinates ( V ). In order to take objects and view them with the camera, the Camera Transform is used to put the objects in to View Coordinate Space.

Camera Transform

Another way of representing the camera position is with two angles and a camera distance. Theta is the angle the camera sits at in the XY plane, and Phi is angle the camera is "away" from the Z axis. We shall call the camera distance to the world-coordinate origin U. The transformation from world coordinate to view coordiante space is carried out using the Tview matrix.

The TView matrix can be derived through a sequential application of the following transformations (the first three of which are essentially one step):

A Translation of -U*Cos(Theta)*Sin(Phi) in X
A Translation of -U*Sin(Theta)*Sin(Phi) in Y
A Translation of -U*Cos(Phi) in Z
A Rotation about the Z axis of 90-Theta
A Rotaion about the X axis of Phi-180
An optional flipping of x coordinates depending on whether you want everything left or right-handed.

Perspective Transform

The perspective transform takes points and compresses the Z coordinates to turn them from View coordinates to Screen coordinates. In essence, objects that are far away from the camera appear smaller in the screen than objects that are closer. In its most basic form, the perspective transform divides the last column of the transformation matrix by d, the distance from the camera to the viewing screen. This means setting the homogeneous coordinate factor to 1/d resulting in dividing the x and y coordinates by z which gives the desired scaling effect.