The primary goal of this project is to use digital signal processing on a video feed to track a ping-pong ball in a game of table tennis. We propose that by tracking the movement of a ball, a processing system is able to maintain the state of the game, and determine the score in real time with no direct user input. This allows players to focus on their performance, without breaking the flow of the game to recall the score, or determine who should be serving.
The system that we have designed uses this type of ball tracking and understanding of gamestate, and superimposes the score of the game on top of a video stream of the game in action, so players can easily see it while playing. Additionally, it provides an indication to players of who should be serving, and provides audio feedback for certain states of gameplay-- such as scoring or serving-- to provide an augmented table tennis experience.
The system features a full calibration mode which adds several elements to the video stream, such as a ball-tracking location indicator so that users can observe the ball tracking for themselves, green pixel detection to help users eliminate any background noise that may impact the performance of the system, and finally guidelines to help the user line up the table height and net location to ensure accurate gamestate detection.
High Level Design top
Our score tracker works by processing a live NTSC video feed of a game of table tennis from a single digital camera. The camera is positioned to capture a side view of the game, with the net at the center of the image and the players at either side. The ping pong ball used is colored bright green, allowing the tracker to detect its position by separating the color components of the video feed as a YCbCr signal and identifying the greenest region of the screen, irrespective of lighting. To reduce the effects of noise on the system from other green objects in the room, the screen is divided into a 30x40 grid of boxes, and the box on the grid with the highest number of green pixels is interpreted as containing the ball. To improve tracking reliability, green pixels closer to the location of the ball in the previous frame count more towards a box’s count than pixels which are far away. Given a stream of ball positions, the score tracking system keeps track of the state of the current point in play using a state machine, and awards points to the players in real time. For example, bounces are detected whenever the ball position switches from traveling downwards to traveling upwards while moving in the same horizontal direction, given that the ball is in the region of the screen where the ball might bounce. Hits are detected whenever the ball position switches horizontal directions. From this information, the state machine can award points to the correct player.
Top-down system view
The score of the game is displayed on a television display through VGA for the players to see, superimposed on a live video feed of the game. Additionally, the display informs the players who should be serving and who most recently received a point. Whenever a point is awarded, a sound is played to indicate the end of the rally, and the system stops tracking the ball’s position until a player raises the ball above his or her head to indicate that he or she is ready to serve, and the next rally is to begin. The system will play a sound when the player raises the ball, indicating to the player that the system is ready for tracking, and that the server may proceed. The game ends when a player reaches the score required for victory (11 or 21 depending on which setting is chosen.) When the game is over the state machine enters an absorbing state, and the TV displays the winner’s score in green to indicate who won the game.
Hardware Design top
System Block Diagram
We use a Sony Handycam DCR-DVD108 as the camcorder to monitor the table tennis game. The NTSC video feed of the camera was used as an input to the DE2-115, where it was split into YCbCr components to isolate the green content of each frame. The luma component of each frame (Y) was not taken into account in order to allow for flexibility in the lighting of each frame. Thresholds for the blue and red component (Cb and Cr, respectively) were determined experimentally to match the color of the green balls. In order to reduce the influence of noise from other green objects in the room, we spray painted the balls a neon green color to ensure the color of the balls was distinct, much like green screen technology used in film and television.
YCbCr plane for fixed Y
One object of difficulty that we encountered was the fluorescent lighting in the room. Fluorescent lighting has a high green spectral component compared to other forms of artificial illumination. Because of this, many of the reflective or light colored surfaces in the room would reflect light that fell into our detection threshold, increasing the amount of noise in the room.
Color temperature of different lighting sources
We found that the color accuracy of the camera in this green lighting was greatly impaired, and although the ball was neon green to the naked eye, the ball appeared white or gray on the camera feed. Although correcting the white balance of the camera helped, we ultimately found that providing a black backdrop for the game was the determining factor in improving the color accuracy of the video feed. Once processed, the YCbCr video feed is converted to RGB, and the result is fed into the score monitor.
In order to determine the position of the ball, each video frame was divided into 1200 squares--40 across and 30 down, each 16 pixels by 16 pixels. The number of green pixels in each square, or box, is determined using the procedure documented in the section above. In order to determine the the location of the ball, or the “hotbox”, we only have to look at one row of boxes at a time. The green pixel counts for each square in the row are stored instead of the entire screen. When the row is completed, the best candidate for the next hotbox of the row is stored, and the rest of the information for that row is discarded before we look at the next row. The system then moves through the next row of boxes, checking the number of green pixels. If a better candidate has been found than the previous row, this box replaces the previous one as the next potential hotbox. After every row has been checked for the best new hotbox candidate, we update the ball’s presumed position accordingly.
To increase the robustness of the system, we use a weighting system based on distance from the current hotbox. The system takes the Manhattan distance in boxes from the current hotbox, subtracts it from 69, and uses this as the value for a given pixel. This modification causes the system to favor the assumption that the ball moves in a smooth, continuous path from frame to frame, which in testing held true for even the fastest of our shots.
Game State Machine
Game finite state machine
The figure above shows the state machine used to calculate scoring for our system. Green lines are used to represent transitions in normal scoreless play, while the red arrows are used when a point is scored. The state machine starts out in a “pre-serve” state, where it waits for the player to indicate that he or she is ready to serve. When the ball is raised above a certain height, the system transitions to the “serve” state, where it is waiting for the player to serve the ball. It exits this state once a bounce is detected. The bounce detection method is as follows: Each time the ball position changes to a new box, the system records the direction in which the ball is travelling. A running window of the last 6 directions is maintained. If the system finds, at any point, that the running window contains exactly 3 upward box transitions and 3 downward box transitions, it considers it a bounce.
After the initial serving bounce, the system enters the “before-net” state, where it waits for the ball to cross over the center of the screen. If the ball bounces again or the timeout limit is reached before the ball crosses the net, the system concludes that the serving player lost the point due to an improper serve. If, however, the ball makes its way to the other side of the net, the system enters the “over-net” state. Here, another bounce will advance it to the “expecting-hit” state. If the player hits the ball before it bounces, however, it will award the server the point. If the system times out, before seeing a hit or a bounce, it will assume that the ball has missed the opponent’s side of the table, and award the point to the player returning the serve. Whenever the system determines that a point is to be awarded, it will return to the “pre-serve” state, and provide an indication of which player is to serve. The players may specify whether they want the server to switch after every 2 serves or every 5 serves by using one of two game-mode switches. Assuming the state machine does indeed advance to the aforementioned “expecting-hit” state, then a paddle hit from the returning player will place the system back in the “before-net” state, and the cycle will continue.
Finally, each time the state machine enters the “pre-serve” state, the system checks if the win condition has been met and declares a winner if appropriate. The win condition depends on the game mode, and players can choose to play a 11-point or 21-point game. It should be noted, however, that in order to win, a player must be leading by at least two points. If--in a 21-point game--the score is 20-20 and a player scores, the game will not end as the player with the higher score will only be ahead by one point. At this point, the game is considered to be in over time, and the server will switch after every point (as opposed to the every 2 or every 5 points defined by the game mode.) The game continues until one of the players is above the winning score, and ahead by at least two points.
Graphical feedback for the players was output over a VGA connection to a large monitor for visibilty. Characters on the screen are displayed by shifting the luma values of certain regions, as well as the chroma values to adjust the hue of the text, creating a semi-transparent text that allows players to still see the live-feed behind them. This is accomplished by using the x and y locations for a given pixel in a lookup table, and adjusting its YCbCr value depending on the region it falls in. This implementation was chosen because the Altera VGA module we are using provides information as a continuous stream as opposed to a framebuffer that the controller accesses. The display can be placed in one of two modes to suit the user’s needs.
Display Modes: Standard (Left), Calibration (Right)
The standard gameplay mode displays the score in the center of the screen, superimposed on a feed of the action on the table. The scores are displayed by taking the score values, feeding them into a module which separates the value into ones and tens places, and uses the individual numbers to pull images from a LUT for the numbers. The image above shows the boxes highlighted for each number. There are several colored indicators to the players detailed below:
||This player's turn to serve, raise ball when ready.
||This player may serve.
||The rally is over, this player has received the last point.
||The game is over. This player has won.
The calibration display mode allows the users to see more of what’s going on under the hood, while still displaying the visual features of the standard mode at the bottom of the screen. First, any pixels which fall within the system’s boundaries of the green used to identify the ball are shown in magenta. This allows the players to identify potential sources of background noise that may degrade system performance. Additionally, the current “hotbox” is displayed to the players as a dark square, allowing them to watch as the system tracks the ball. This effect was created by decreasing the luma, or Y component in the pixels that correspond to the hotbox by a right-shift of 2 bits. Finally, guidelines are shown to identify the barriers the state machine considers the surface of the table, and the net. These are useful for players setting up the system for use, as these lines must be aligned with the appropriate boundaries for proper gamestate determination.
Number font for scores
Audio feedback was used so that the players are able to track the state of the game without needing to be constantly looking at the score monitor. Sound clips were played whenever a player scores a point, whenever servers should switch, and whenever a player holds the ball above the head to indicate that he or she is ready to serve. The sounds were generated by reading .wav files in MATLAB and converting the audio samples to entries in a memory initialization file (.mif). The file for each sound was used to initialize a ROM table. Whenever a game condition for a sound was met, the addresses for the ROM table of the corresponding sound were scanned in increasing order starting at zero. A mux selected between the output of the three ROM tables depending on the condition that was met, ensuring that only one sample at a time was played.
The table and piechart below document the accuracy of our system over 5 21-point games. We categorized the end result into one of five categories
- Correct - The system adds a point to the appropriate side
- Lost tracking - The ball tracking or green detection system misfires, giving the FSM incorrect info and leading to the wrong result.
- Out of frame - A player hits the ball to a blindspot of the camera on the table
- Serve obscured - The system doesn't register the initial bounce of the serve, which delays the FSM from reality.
- Incomplete logic - A point-ending action occurs that the FSM cannot handle, e.g. ball hitting the net on serve but still crossing.
Overall, we were very satisfied with our results. Over the course of our tests, our system ended up accurately scoring over 90% of the points. There is a caveat to this result in that all of the testers knew the faults of the system, which may reduce both out of frame and incomplete logic errors. However, these types of inaccuracies are the easiest to reduce, as mentioned below in our conclusion.
||Out of frame
|# of points:
Through our research and development of this system, we have found that we have indeed been able to track the logical progression, as well as keep score, of a standard game of table tennis by tracking ball movement using digital signal analysis. We believe that with some refinement our design would be productizable. It is our personal experience that our system enhances gameplay by not only eliminating the need for players to keep track of scoring and serving, but also by providing exciting visual and auditory feedback. Controlling gamestate using natural motions, such as lifting the ball above your head to serve, allows for the system to integrate seamlessly into play, without disruption.
Though our system performed well, and had the baseline functionality we were looking for, there are a few improvements that can be made to our design. Some improvements would be enhancements or refinements of baseline functionality, while others would be new alternative features that may increase the system's appeal as a product.
In terms of improved performance, a wider shot would help eliminate "out-of-bounds" regions that cannot be captured by the camera with our current setup. Also, a camera with a faster frame rate and greater color accuracy would improve ball tracking, and help eliminate the occasional instance of losing track of the ball, thus improving gamestate detection. Additional logic may also improve performance on this front, such as Kalman filtering to predict ball trajectories. Such improvements may assist with decoding game logic that is difficult to detect, and therefore currently unaccounted for, such as when a server hits the net during a serve, but the ball continues over the net anyway.
Additional features may make the design more user-friendly, and increase desirability. Primarily, we considered the addition of a visual menu, paired with the use of an IR remote, to allow users to select specific settings without the use of switches. Additional visual feedback may contribute to excitement as well, such as a velocity display, or an instant replay option for points scored after a long rally. It's also worth mentioning that our entire design right now is written in logic (hardware). Utilizing software for front-end user control may make it easier for users to enter custom settings, such as names to be displayed on the scoreboard during gameplay, or custom color options.
Intellectual property considerations
Our project made use of the TV to VGA IP provided by Altera
All other IP is the original work and property of team members
A. Tasks carried out by members
All team members were tasked with an equal amount of design, development, testing and debugging.
Libraries and References
We would to extend our sincere gratitude to Jessica Stephenson Edmister for loaning us her lighting equipment, to Patricia Gonyea for card access to the M.Eng lounge, to the M.Eng students of Cornell ECE for dealing with our constant (noisy) presence, to Altera for providing IP that served as the base for our design, and finally, last but not least to Dr. Bruce Land for another great semester.