Real-Time Image Segmentation Function Using Pfinder

by Sadahiro Iwamoto
CS 490 Project Spring 1996


Table of Contents
  • Introduction
  • Pfinder
  • Segmentation Function and How to Use It
  • Test Programs
  • Results
  • Conclusions
  • Future Works
  • Source Codes
  • Acknowledgements
  • References

    Introduction

    On the CAVE currently running at the Cornell Theory Center, a video camera captures real-time images of the user and maps them onto a 3D plane in the virtual world so participants in the CAVE's virutal world can "see" each other. Since the whole video image is mapped to the plane, users see the background (the real world) as well as the person. This inclusion of the background destroys the virtual world illusion of the CAVE. Therefore, a processing step to extract the person's image would improve the CAVE's virtual environment and also reduce the transmission bandwidth of the images.
    A program called Pfinder developed at MIT's Perceptual Computing Lab performs real-time segmentation, contour extraction, and gesture analysis of humans in video images. Pfinder accomplishes theses tasks with statistical color analysis of the scene and by detecting large changes in the scene caused by humans. Pfinder keeps track of humans using heuristic dynamic models.
    I used Pfinder to create a function that accepts captured video images and returns the segmented image in the alpha channel in real-time. To test the segmentation function, I wrote a simple video capture and display program. I also wrote a simple head tracking function as an application for the segmented image.
    Due to my limited knowledge of the Pfinder program, I could only get Pfinder to segment the upper-left quarter of the captured image. In that region, segmentation and head-tracking work quite well on an Indycam-equipped

    Pfinder

    All segmentation work in my function is accomplished by Pfinder. Pfinder performs segmentation by first initializing itself and creating a statistical model of the scene from 10 frames of background images (scene without humans). After the statistical model creation, the new frames are compared to the model. When a person walks into the scene, he creates differences between the image and the model. In a region with differences greater than the threshold, the program designates that region as a person and segments it. For each frame, Pfinder keeps track of the person's centroid location so it can accurately locate the person in the subsequent frames. Pfinder also updates the background statistical model to adjust for lighting variations, slight camera movement, and other background variations. For more technical information on Pfinder, check out the

    Segmentation Function and How to Use It

    Pfinder is a complete program that includes video capturing capability, contour extraction, gesture interpreter, and a X interface as well as segmentation. Using the original program as a template, I wrote a simplified function that some program can call to perform segmentation on captured video images. The function returns the segmented image in the alpha channel which can be used to mask out the red, green, and blue channels or be used as transparency mapping in a texture map. The background is denoted by 0 value pixel and the human as pixel value of 255.
    Two procedure calls are necessary to segment the image: pfinderinit to initialize the underlying Pfinder program and segmentFrame for frame segmentation.
    Calling pfinderinit initializes the statistical model and allocates memory for the background frames. After capturing a frame, a call to segmentFrame with the image will return a segmented image in the alpha channel. The segmentFrame function performs segmentation by calling a series of Pfinder functions. At the end of the function, the segmented image is copied to the alpha channel.
    Below are additions and modifications required to use these functions in a program:
    1. Include iveImage.h.
    2. Create an iveImage variable and set image width, height, number of pixels, and amount of data in bytes (iveImage.width, iveImage.height, iveImage.pixelCount, and iveImage.byteCount respectively).
    3. Set up the video capture as non-interlaced and in RGBA_8 format.
    4. Call pfinderinit to initialize: pfinderinit(int width, int height).
    5. Set iveImage variable's pointer (iveImage.data) to the captured image.
    6. Call segmentFrame after capture and pass the captured frame: segmentFrame(iveImage *frame).

    Test Programs

    I modified Dan Brown's vidtowin program to capture images from the Indycam (using VL) at 160x120 (1:4 zoom) and to display the image in a window. Vidtowin (source code) contains an infinite loop to capture images from the camera and uses GL to display the image. I inserted all the modification described in the previous section (segmentFrame in loop after capture) and added modifications to represent the segmented image as a red ghost image.
    As an application of the segmented image, I wrote a simple head tracking function called findhead. This function looks for the first non-zero pixel in a left-right and up-down scan and labels that pixel as the top center of the head. The function draws a white box of user-specified dimensions around the head location.
    I placed the function in the modified vidtowin file. The program calls findhead after the call to segmentFrame.

    Results

    I compiled my modified vidtowin program with pfinder and ran the resulting excutable. The program properly captures from the Indycam at 160x120 and passes the frames to segmentFrame. segmentFrame segments the frame but only the upper-left quarter of the screen and sometimes the upper-right quarter. segmentFrame then maps this quarter size segmentation to the whole image. So a 2X zoomed segmentation image of the upper-left quarter shows up in the window. I tried to fix this problem, but could not figure out the cause. So I ended up mapping the segmented image to only the upper-left quarter of the alpha channel. In this case, segmentFrame correctly segmented the upper-left quarter of the image.
    When segmentation occurred in the upper-left quarter of the image, the head tracking function worked quite well. Sometimes the box lost track due to noise in the segmented image, but the box stayed around the head most of the time.

    Conclusions

    The quarter size problem in the segmented image cripples the performance of the segmentation function. Since the Pfinder program can segment the whole image, I must have missed some setting or function for Pfinder. Other than this problem, the segmentation function works quite well. The segmented image is quite noiseless and correctly finds the silhouette of the person.
    The simple head tracking function works well for scenes where the human head is the highest (vertical position) moving object. In frames when the head is not the highest point (raised arms), the function tracks highest non-head object. The head tracking is vulnerable to noise but fortunately, segmentation from Pfinder is pretty noiseless. The head tracking function is also limited by the set dimensions of the box. The head must be a certain size in the frame. Therefore, if the person gets too close or moves away from the camera, the function would incorrectly box the head image.

    Future Works

    Obviously, the quarter-size segmentation problem needs to be fixed. I spent many hours looking at this problem, but I could not find the solution. My guesses to the cause are some internal settings in Pfinder or undetected conflict caused by compiling the host program with Pfinder.
    Pfinder has many features that I did not exploit in the segmentation function. For example, Pfinder can find the location of head, arms, hands, and feet and interpret gestures. An improved version of the segmentation function or a new function could access these features and add more functionality. For example, using Pfinder's body part tracking feature will be more accurate than my simple head tracking function.
    The head tracking function could also be further improved by adding knowledge of past head location and determining the head size by looking at regions surrounding the top-center location of the head.

    Source Codes

    The source code for the segmentation function is here.
    The source code for the modified vidtowin and head tracking function is here.
    The makefile for modified vidtowin is here.
    The Pfinder source code is required to use the segmentation function. To obtain the Pfinder source code and documentation, contact MIT Vision and Modeling Group at pfinder-request@media.mit.edu.

    Acknowledgements

    I want to thank Professor Alex Pentland and his Vision and Modeling Group at MIT for allowing me to use the Pfinder software. I also want to thank Bruce Land and Dan Brown for their help and advice.

    References

    [1] Wren C., A. Azarbayejani, T. Darrell, and A. Pentland. "Pfinder: Real-Time Tracking of the Human Body", M.I.T. Media Laboratory Perceptual Computing Section Technical Report No. 353, 1995.

    Created on May 16, 1996.