Classification of Behaviors by Motion Estimation of Visual Stimuli

Damian Elias and Bruce Land.


Neurobiologists and people studying animal behavior often want to measure and classify visual stimuli (1). Characterizing the motion of a stimulus for comparision with other stimuli is not well defined at present. We are attempting to figure out a scheme for measuring the motion properties of a motion sequence. The goal is to summaried the motion in a way with conveys the essential aspects of the motion, without overwhelming the investigator with detail. The first application is in spider visual courting displays.


There are several ways of estimating motion. The two basic approaches are object based (e.g. 2) and image based methods. Object based methods use information about the 3D geometry of the structure generating the motion to estimate parameters such as joint rotation or limb velocity. Image based methods use only the information that might be available from a single video tape, or that might be observed by an animal under test. We chose to concentrate on image based methods because we need to compare video sequences of spider behavior.

The image based scheme we used had several processing steps

  1. Video sequences were shot at either 30 fps or 500 fps. The high-speed sequences were decimated by two in time for analysis. Frames were cropped so that the animal was completely within the frame and filled ½ to ¾ of the frame. A series of video frames were extracted from an AVI file. Typically 100 to 400 frames were used. A short Matlab program reads the AVI file, converts it to a Matlab movie format. The images were downsampled to a resolution of about 120 pixels for analysis.
  2. The intensity of each frame was normalized because the high-speed camera AGC tended to oscillate a bit. The correction factor used was (SequenceAveragePixelIntensity/FrameAveragePixelIntensity)0.75.
  3. The motion estimation was based on optical flow computation between frames(3). A simple gradient optical flow calculation was used to estimate motion. The video data was arranged as an N by M by T matrix where N is the number of pixels in the x-direction, M the number in the y direction, and T the number of video frames. The 3D matrix was smoothed with a 5x5x5 gaussian convolution kernel with a standard deviation of one pixel. Derivatives in all three directions were computed using a second-order (centered, 3-point) algorithm. The motion estimate is based on the notion that pixel intensities only change from frame-to-frame because of motion. If true, and if I is the array of intensities at every point in a frame, and v is the vector velocity of the object seen by the pixel, then
    dI/dt = -∇I dot v
    or put in words the rate of change of intensity at a point is equal to the spatial gradient of the intensity projected onto the velocity. The dot-product implies that only the projection of v on the gradient can be detected. Put differently, moving at right angles to the gradient causes no intensity change. Solving for the component of v in the direction of the gradient, vg, (perhaps normal to an edge) gives,
    vg = -(dI/dt)∇I /mag(∇I)2
    For each frame, the total pixel motion was estimated by averaging the speeds of all pixels. The "average speed" of a frame was a simple average of the magnitude of v of all the pixels in the frame, as used by Peters, et al on motion studies of the Jacky Dragon. We also defined a "speed surface" (consciously trying to mimic the usefulness of a spectrogram). The speed surface is a 2D plot with the x-axis being frame number, y-axis being pixel speed, and the color proportional to the log of the number of pixels moving at a given speed. In other words, at each frame time we plotted a histogram of pixel speeds.
  4. Similarity of the various signals was computed by circular cross-correlation. Waveforms being compared were padded to the same length and rotated through frame number. Both average speed waveforms and speed surfaces were analyzed, using 1D correlation for the speed waveforms and 2D correlation (with shifts only along time) for the speed surfaces. For the next stages of the analysis it was more useful to have a measure of dis-similarity, so we used one minus the maximum correlation as a distance measure. Since every signal is correlated with every other signal, the result is a matrix of correlations.
  5. Given a matrix of distances (of all signals to every other signal) you can compute the strength of the clustering of the signals (perhaps by species) and the entropy of the clustering. The entropy of the clustering is an indication of how well the signals conform to some a-priori clustering scheme.
  6. The distance matrix can also be used as input to a MDS scheme which attempts to find the best 2D or 3D fit to the distance data. Clusters wihch appear after MDS are determined by a golbal relaxation of distance fits, rather than computing a-priori catagory distances.

The programs

The following matlab programs implemented the various features noted above.

Step 1 above was carried out by a short program that reads an AVI file, selects a frame range, and writes a MAT file containing a Matlab movie.

Step 2 and 3 are carried out by a program which takes as input the MAT file containing a movie, decimates the resolution, builds a 3D array (x,y,t) of images, normalizes the intensites, computes the motion and outputs two files, containing respectively arrays of speed and speed histogram (speed surface). Each individual analysis to 2 to 15 munutes on a 2.6 GHz Pentium with 1Gbyte of memory.

Step 4 and 5 is carried out by a program which takes as input the arrays from step 3, pads the sequences to the same length, computes the circular cross-correlation for each pair of signals and forms a distance matrix repesented the dis-similarity of all possible signal combinations. The entropy is then computed as in Victor and Purpura (4). There were actually two versions of this program, one for the Hpug data and one for the Hdos data. The only differences were the hard-coded file names and group indices.

Step 6 is carried out by a program modified from reference (5). The MDS is computed, then a-priori groups are marked with different colors. In addition, this program carried out an ANOVA along a selected axis of the MDS analysis. There were actually two versions of this program, one for the Hpug data and one for the Hdos data. For both data sets, a 3D fit was used. With ANOVA performed along axis 1, and for Hpug data along axis 3.

Examples of Analysis results

All of the files resulting from step 3 are available. The zipped version is here. The files with names starting with Fake represent calibration files not actually used in the final analysis. The file names which include gram are 2D histograms. The dimensions are time and speed bin, and the values are number of pixels in the speed bin. The file names whih include wave are 1D average speeds, with the dimenson being time.

Habronattus pugillis (Galiuros)

The following image is a link to a matlab movie with animated speed trace. In the movie, you can see the correlation between average pixel speed (blue line) and leg motion (inset). Frame number is on the horizontal axis, average speed on the vertical axis. Movie size is 2.5 MByte. The spider is Habronattus pugillis (Galiuros).

The following images are the 2D speed surfaces for Habronattus pugillis (Galiuros) There are two versions shown. The first is a 2D histogram with red indicating a large number of pixels at the given speed and black indicating a low number of pixels. The second projects the histogram into a 3D surface.

Habronattus pugillis (Santa Catalinas)

The following three images (first is a link to an animation) show data from another species whcih has a dramatic leg-flick.

MultiDimensional Scaling

The 1D speed traces and 2D speed surfaces were analyzed by computing the maximum cross-correlation between each pair of signals, for a total of 23 signals. This resulted in a 23x23 matrix of "similarities" bewteen signals. MDS attempts to find a low-dimensional space which adequately captures the (potentially) high-dimensional nature of distances between signals. For distance we used (1-similarity) of the 1D speed traces. A 3D fit seemed to capture most of the variation between signals for the four Habronattus pugillis variants we tested. A 3D plot is shown below with the four variants in different colors. The next two images show two projections of the 3D data. It is clear the the 4 groups can be separated in 3D. Analysis of variance along the relevant axes confirm the significant separation.

Links to manuscript and figures submitted in Sept 2005.


  1. Peters RA, Clifford CWG, and Evans CS, Measuring the structure of dynamic signals, Animal Behavior, 64, 131-146, 2002
  2. Agrawala M,, Model-based motion estimation for synthetic animations, and ACM Multimedia 1995
  3. Beauchemin SS and Barron JL, The Computation of optical Flow, ACM Computing Surveys, Vol 27, No 3, pp 433-467, 1995
  4. Victor, JD and Purpura KP, Metric-space analysis of spike trains: theory, algorithms and application. Network 8, 127-164 (1997)
  5. Steyvers, M., Nonmetric Multidimensional Scaling (MDS) software,