Introduction
Multivariate or high-dimensional (HiD) systems are hard to visualize because we are wired for a 3D world. Many different systems have been suggested to help visualize HiD data. Most of them use some system of subspace selection to reduce the dimensionality to 2 or 3 (e.g. 1-4) or use some procedure to identify important axes (e.g. principal component analysis or projection pursuit).
The parallel coordinate (PC) scheme due to Inselberg and others (5-13) attempts to plot HiD systems in a different manner. Since plotting more than 3 orthogonal axis is impossible, parallel coordinate schemes plot all the axes parallel to each other in a plane. Amazingly enough, squashing the space in this manner does not destroy too much of the geometric structure. The geometric structure is however projected in such a fashion that most geometric intuition has to be relearned.
The plan here is to build up intuition for HiD representations in parallel coordinates:
Examples
The two clusters shown are separated along the x1 and x2 axes, but not on the x3 axis. The cluster centers are [0,0,0] and [5,5,0].
Such a plot rapidly gets cluttered with a large number of lines. If you
squint at the PC plot, you get the feeling that density of lines might be
important for determining the center of the cluster. The next plot shows an
example of distinguishing cluster shape by computing line densities.
The line segment x1=t; x2=1-.5*t; x3=0.5-.1*t; with 0<t<1 is drawn in normal 3D coordinates and in PC. The color is relative t value. Note that each point on the line becomes two line segments in PC (x1-to-x2 and x2-to-x3). The obvious convergence of all the line segments between the x1 and x2 axis is a direct consequence of the linearity of the points and partly defines the line. A less obvious convergence of line segments to the right of x3 completes the definition of the line.
If the slope of the line relating one axis to another (e.g x1 to x2) is m
and the intercept is b
in normal coordinates, then the position
of the convergence in PC is the point
[ 1/(1-m) , b/(1-m) ]
,
asuming that the first axis (x1) is located at horiziontal position zero and
x2 is located at horizontal position 1. For the line given above the slope
from x1 to x2, m=-0.5
and b=1
. Thus the first convergence
point should be [0.67, 0.67]
as shown.
The slope from x2 to x3, m=0.2
and b=0.3
. Thus the
second convergence point should be [1.25, 0.375],
or just to
the right of the x3 axis.
The middle panel show a plane in 3D space. The plane is given by
x1= t
x2=-.5*t+.5*u
x3 = 1.1*u
Where t and u are uniform random variables from 0 to 1. The color code is
proportional to t.
The top panel shows the PC representation of the plane. There is clearly structure to the points, but it hard to see what it means, except of course that the color mapping is along x1.
We would like some way to detect a plane by a visual technique. The bottom
panel shows one of two characteristic points of this plane plotted in PC according
to the following scheme. Pairs of points in the plane define lines in the
plane. For a large number of pairs of points (lines in the plane), compute
the two points in PC which describe each 3D line (in the plane) as in the
example above then connect them with a line. If all the constructed lines
pass through a point, then the points fall on a plane. If the plane were noisy,
then the lines would 'almost' pass through a common point.
References
(1) Calibrate you eyes to recognize high-dimensional objects from their projections. Diane Cook and Peter Sutherland. http://www.public.iastate.edu/~dicook/JSS/paper/paper.html
(2) Visualisation of High Dimensional Data. Dr. Carolina Cruz-Neira and Laura
Arns
http://www.vrac.iastate.edu/research/visualization/multivariate/
(3) Polytope visualization. Gordon Kindelmann.
http://www.graphics.cornell.edu/~gordon/peek/
(4) N-Land: a Graphical Tool for Exploring N-Dimensional Data. Matthew O. Ward
Jeffrey T. LeBlanc and Rajeev Tipnis.
http://davis.wpi.edu/~matt/courses/nland/cgi93.html
(5) Don't panic ... just do it in parallel! Al Inselberg, COMPUTATION STAT 14: (1) 53-77 1999
(6) Visual data mining with parallel coordinates Al Inselberg, COMPUTATION STAT 13: (1) 47-63 1998
(7) MULTIDIMENSIONAL LINES .1. REPRESENTATION. INSELBERG A, DIMSDALE B, SIAM J APPL MATH 54: (2) 559-577 APR 1994
(8) MULTIDIMENSIONAL LINES .2. PROXIMITY AND APPLICATIONS. INSELBERG A, DIMSDALE B, SIAM J APPL MATH 54: (2) 578-596 APR 1994
(9) HYPERDIMENSIONAL DATA-ANALYSIS USING PARALLEL COORDINATES. WEGMAN EJ. J AM STAT ASSOC 85 (411): 664-675 SEP 1990
(10) THE ANALYSIS OF T48 LOW PRESSURE TURBINE INLET TEMPERATURES USING PARALLEL
COORDINATES. Frank S. Budny.
http://www.che.ufl.edu/visualize/THESIS/thesis.html
(11) Visualizing the Behavior of Higher Dimensional Dynamical Systems. R. Wegenkittl,
H. Löffelmann, and E. Gröller.
http://www.cg.tuwien.ac.at/research/vis/dynsys/ndim/
(12) Hierarchical Parallel Coordinates. Ying-Huey Fua.
http://davis.wpi.edu/~yingfua/cs563_1/hiervis.html
(13) High Dimensional Clustering Using Parallel Coordinates and the Grand Tour.
Edward J. Wegman and Qiang Luo.
http://www.galaxy.gmu.edu/papers/inter96.html
Copyright Cornell University, 2001