Chris Fairfax (cwf38) Matheus Ogleari (mao65), Aadeetya Shreedhar. (as889)
Our project implements a two-player game based on real-time edge detection. The design is built on a Cyclone II FPGA on an Altera DE2 development board.
The project uses the Sobel operator to detect the edges of a map drawn on a whiteboard. A physics engine implemented on a Pancake processor controls the motion of a ball on the map. The map and the ball are displayed on a VGA screen.
The game is played by controlling the switches and keys on the DE2 development board. Player 1 draws an arbitrary number of straight lines on the map. Player 1 and Player 2 each try and move the ball to the destination. The ball’s motion is controlled by a fixed impulse that can be applied at any angle. The number of impulses required to reach the destination is recorded and displayed on a seven segment display. The player who applies the minimum number of impulses to the ball wins. Player 2 then draws the obstacles on the map, and the game proceeds as before.
High Level Design
The project uses concepts about embedded systems and systems-on-chip learnt as part of ECE 4760 and ECE 5760. Edge detection is an excellent example of a computation that benefits from hardware acceleration, while the behavior for motion and collisions of a ball is easier to implement in software. Our project implements a general purpose processor and an edge-detection hardware accelerator on the FPGA. The resulting system-on-chip effectively combines hardware and software design concepts to create a fun application of real time edge-detection.
The system-on-programmable-chip (SOPC) has a video decoder to acquire video input from the camera. The Video Input and Grayscale module converts the input video to a grayscale image. The grayscale image is passed through the Edge Detection Accelerator in the SOPC to generate the game map.
The Pancake processor outputs the coordinates of the ball. A window of pixels around the ball’s position on the map is input to the processor from the Edge Detection Accelerator. The processor uses the input data to detect the ball’s collisions with edges on the map, and to calculate the ball’s new position and velocity vector. The Ball Drawer module uses the computed ball position to draw the ball on the map. A VGA Controller displays the map and the ball on a VGA screen.
The Impulse Controller module reads the input switches to determine the initial ball velocity. The Bounce Counter keeps track of the number of bounces per player.
Image Processing Filter Background: Gaussian Filter
The grayscale image generated by the Video Input and Grayscale module is passed through a Gaussian filter to attenuate image noise and enable cleaner edge detection. The system uses a 5x5 Gaussian kernel with equal elements to simulate a Gaussian radius of infinity. The filter thus effectively acts as an averaging function.
Each pixel in the map is influenced by 24 of its neighbors. Spatial averaging helps get rid of several unwanted video artifacts including specular reflection. The filter is implemented by convolving the the kernel with the camera output. The detailed RTL implementation is described in the Hardware Design section of this page.
Image Processing Filter Background: Sobel Operator
The cleaned-up image from the Gaussian filter is passed through a Sobel operator to detect edges. The Sobel operator performs discrete differentiation to get the approximate vertical and horizontal gradients of the image intensity. The Sobel filter is implemented by convolving a 3x3 kernel with the image twice, once for each dimension.
The Sobel operator is computationally inexpensive in hardware. Like the Gaussian filter, implementation of the sobel filter requires multiple line buffers that will add delay to the system; however the edge-detection accelerator does not violate the real-time operation of the system.
The tradeoff for the computational simplicity of the Sobel operator is its inability to handle high frequency variations in image intensity. The Sobel operator is sufficiently accurate for the purposes of the game.
Hardware and Software Tradeoffs
Physics Implementation in Hardware
The computation of the ball’s coordinates, collision detection and velocity-change computation are all done in software running on the Pancake processor rather than in register-transfer logic because physics calculations are easier to implement in software. These computations have to be performed once each time a VGA frame is displayed; this timing requirement is sufficiently relaxed to make hardware acceleration unnecessary.
Limited Collision Detection
The collision detection scheme, described in the Hardware Design and Software Design sections of this page, uses only four points on the ball to detect collisions with edges in the map. The ball bounces when the pixels on the north, south, west or east points of the ball meet an edge on the map. Using only four points for collision detection makes the design easier to implement. A more sophisticated collision detection scheme would make the game smoother and enable more complicated shapes to be drawn on the map; however, the four-point scheme provides sufficient accuracy for the game to be played smoothly.
Averaging Instead of Using a True Gaussian Function
The image noise attenuation scheme uses a Gaussian function of infinite radius approximated to a 5x5 matrix. A Gaussian kernel with a limited radius would provide less distortion, but did not reduce the noise sufficiently. The system uses an averaging kernel that requires multiple adders and a single multiplier. A traditional Gaussian kernel requires several multipliers. The noise attenuation provided by an averaging kernel is sufficient to achieve clean edge detection for the purposes of the game.
Sobel Filter instead of a Canny Filter for Edge-Detection
A Canny filter is a more sophisticated edge-detection algorithm than a Sobel filter. A Canny filter adds hysteresis to a Sobel filter to detect a wide range of edges. The accuracy of a Sobel filter was determined to be sufficient for the purposes of detecting edges on a map for the game, hence the computational overhead of a Canny Filter is unnecessary for this application.
The target area for the game’s victory condition is defined as the left edge of the map. The inputs used to control the game are listed in the table below.
|Key Pressed/ Switch Toggled||Action|
|Key 1||Rotate impulse clockwise|
|Key 2||Fire impulse|
|Key 3||Rotate impulse counter-clockwise|
|Switch 0||Toggle between grayscaled image and edge detected image|
|Switch 1||Turn on gaussian filtering|
|Switch 2 through 17||Control edge detection threshold|
The video input originates from the TD_DATA input of the FPGA. This data comes from the camera and is in ITU-R 656 format. That input is downsampled from 720 to 640, converted to YUV 4:2:2 format and then converted to YUV 4:4:4. Until further conversion, the video input is stored in SDRAM and read when the proper control signals are asserted.
The video format is then converted to 10-bit RGB which can then be readily used for our purposes and sent to the VGA output. We added one additional modification to the image which is to mirror the image horizontally. We did this for a mirror effect so that things on the right side looking into the camera appear on the right side of the video image, simplifying user interaction with the game.
The video input and conversion code was taken from the “Air String” project from ECE 5760 in Fall 2011. See the IP Considerations section for more information.
The image taken from the video input is converted to grayscale before any further operations are performed on it. We do this because it allows us to merge the three color channels (R, G, and B) into a single value. The equation used for converting a channel to grayscale is as follows:
The three colors are weighted differently because green is the color channel that the human eye senses the most and blue is the color channel that the human eye senses the least. We adjusted the weights to be powers of two to simplify the math done in hardware (using logical shifts instead of divides).
If edge detection is done on each channel individually, the three color channels appear on the screen misaligned with each other, producing three confusing monochromatic images.
The Gaussian filter module takes a 5x5 window of pixels and performs the aforementioned computation using adders and a fixed point multiplication to multiply the result by the correct fraction.
The filter needs an array of wires. However, since Verilog modules cannot have arrays of ports, the array of wires are packed into one wide bus which is the input port. In the case of the Gaussian Filter, the port is 250 bits wide. Within the code, the ports are unpacked into a wire array which are then operated upon.
There are two grid generator modules that store arrays of pixels read from the video input. The array of pixels use line buffers to store horizontal lines of pixels until they are overwritten by new pixels. There are two grid generators because one is for a 3x3 window of pixels and the other is for a 5x5 window of pixels. The line buffers are 640 elements long to accommodate a single horizontal line of the VGA frame.
The Gaussian Filter operates on a 5x5 array of pixels so it connects to the 5x5 grid generator, storing the output gaussian value to each of the pixels. The 3x3 grid of pixels is connected to the Sobel Convolution module because the calculation performed there only requires a 3x3 pixel window. Similarly to the Gaussian filter, port packing and unpacking is done to simplify the code and reduce the number of port names.
The edge_detect module performs the Sobel Convolution. We use logical shifts in place of multiplies and divides to save FPGA resources. However, once we convolve the x and y edges (independently), their magnitudes are adjusted using their squares before the result is output. We notice through inspection that horizontal edges were weakly detected compared to vertical edges. As an adjustment, we amplify horizontal edges by a factor of four. Thresholding is done at end of the computation and is determined by the number encoded from the FPGA’s switches.
The Time Averager module takes the output from the edge detector and averages the pixels over time. This module stores data about the RGB value of the pixels from the previous frames and averages them. This module is extremely helpful in reducing jitter and noise, and removing random flashes from the VGA which would periodically blur the screen.
The code for this Time Averager module was taken from the “Video Realtime Cartoonifier” project from ECE 5760 fall 2010. See the IP Considerations section for more information.
Interface with Processor
The Pancake processor is used to compute the physics involving the ball. To do this, it is continuously fed an 18x19 window of pixels which surrounds the ball, centered on the ball itself. These pixels are updated in real time along with the coordinates of the ball, which is also controlled and determined by the Pancake.
Each pixel input to the Pancake processor is a logic one or zero depending on whether the pixel is part of an edge on the map.
Drawing the Ball
A module called Ball Drawer draws the ball on the screen in place of part of the video feed. The x and y coordinates of the center of the ball on the VGA are handled by Pancake and sent to this module. The ball coordinates are compared to the current VGA coordinates. If the current VGA coordinates fall within the area of the ball, the white pixels are overwritten with red pixels from the ball, according to the following equation:
The Impulse controller allows the user to use the push buttons to control the angle of the impulse thrust. KEY rotates the angle counterclockwise while KEY rotates the angle clockwise. KEY is used to apply the impulse to the ball. Positive-edge detection is used for each of these push buttons so that the user is required to depress the button before registering another press. The impulse button, once registered, is sent to the Pancake processor and then used to apply a force over a short time-step to the ball, altering its velocity.
The angle for thrust are adjusted in increments of 30 degrees, so there are 12 total angles possible. The angle is also sent to the Pancake and the Pancake uses its built-in functions to apply an impulse function in the specific angle to the ball.
To provide the user with visual feedback to know at what angle the impulse would be directed, a representation of the angle is displayed in the lower left of the display. The Impulse Drawer module handles this by drawing an impulse direction minimap at the bottom left of the screen, overriding the video output. A magenta diamond shows all the possible directions while a green dot traces the diamond showing the current direction of the impulse. A Look Up Table (LUT) is used to determine whether a given pixel on the VGA is contained within this minimap.
The 7-segment displays are used to communicate messages between the user and the Pancake machine. The four rightmost displays are used to display the win or lose message. The LEDs display “ACEd” for a win condition and “dEAd” for a loss condition. The two middle displays show the current count (in decimal) of the number of impulses used since the start of the game.
The line buffers are implemented using m4k blocks. The video input uses SDRAM to read and store frames. The Pancake processor uses m4k blocks to store its instructions and data memory. The Time Averager uses SRAM to store frames from previous cycles to use for averaging. We use so many different types of memory because we have various modules which require memory, and few I/O ports per memory type that we can utilize.
Collision detection is handled by the pancake processor using information about the area surrounding the ball. The collision_check function compares the north, south, west and east corners of the bounding box representing the ball with the corresponding points in the window of pixels input to the processor; if corners lie on a pixel representing an edge, a collision is detected, and the normal vector to the tangent along the point of collision is updated. The normal vector R is used to compute the ball’s new velocity vector after a collision.
It was sometimes possible for multiple collisions to be detected when only one collision should have occurred. To prevent this, two counters were added, one for the x dimension, and one for the y dimension. When the ball collides in the x dimension, the counter is set to 12, and then decremented on each subsequent frame cycle. If either frame counter is greater than zero, collisions in the corresponding direction are not detected. This mechanism ensures clean bouncing where only a single collision is detected each time the ball strikes an edge on the map. Additionally, the x-frame counter is incremented each time a collision occurs in the y direction, and vice versa.
The change in velocity due to the collision is computed by the dot product of the normal vector R obtained from the collision detection function, and the ball’s velocity vector V.
A damping function decreases the energy of the ball with each collision. The velocity computed by the collision function is used to update the ball’s position.
The position of the ball is updated by a simple addition of the velocity vector to the ball’s position vector. The vertical component of the ball’s velocity is incremented by a fixed value once per VGA frame update to simulate gravity. The ball’s velocity is saturated if it increases past a preset value to prevent the ball from going fast enough that it breaks through walls. The ball’s position is output to the Ball Drawer module.
The edge-detection is remarkably clean and robust. The edge-detection system meets real-time deadlines without noticeable lag. The threshold for detecting edges can be varied to display different amounts of detail on the VGA screen.
The following images show the VGA frame output from different stages of the edge detection accelerator. The raw image is the unprocessed image captured by the camera. The edge detected image is the final output of the edge-detection accelerator after passing through the Gaussian filter, time averager and the sobel operator. The Ball Drawer module is disabled to focus on the results of image processing. The grayscale image and the edge detected image are mirrored. This feature was added to enable future work like moving paddles in real-time on the screen.
The following image show the results of edge detection where the raw image has more details and a wider range of gradients. The edge detection logic is good enough to make out different objects in the lab. A more advanced image processing scheme is required for applications like computer vision that need higher precision.
The following images show the results of changing the threshold for edge detection. The best threshold varies by application. The game does not require a high amount of detail, so a high threshold is best for a clean map display.
The collision detection and velocity update mechanism works well enough for the purposes of the game. A more sophisticated physics engine would ease some of the restrictions on the game. For example, the system only detects collisions when the north, south, west or east corners of the ball’s bounding box strike and edge on the map. Increasing the number of points on the bounding box where collisions are detected will make bouncing more realistic. The improved physics engine could also keep track of the velocity of objects approaching the ball by measuring how quickly pixels in the window around the ball fill up; this can be leveraged to make collisions responsive to relative speeds of the ball and the edge and enable features like striking the ball with a paddle.
The game produces a relatively clean display on the VGA screen. The display is free from artifacts like flickering and screen tearing. Before adding the time averager, the game suffered from flickering effects produced by noise from the camera or in the decoder. These flickers only lasted one frame, so time averaging completely eliminated these problems.
We were able to use the concepts of system-on-chip design to create a successful application of edge detection. The design met our expectations for quality and speed of edge-detection, and robustness of the game engine. We were able to apply our understanding of state-machines, parallel computations, hardware-software interfacing and embedded control systems to create a clean, working finished product.
While we were satisfied with the way we handle collisions and edge detection, it may be possible to further improve these functionalities. The edge detection could possibly be improved through the use of a Canny algorithm, however, this would require further experimentation. While the physics engine performed as required, it would be useful to have a physics engine that is capable of detecting the motion of objects and using this to add energy to the ball more realistically.
Related Work and IP Considerations
- The RTL for video acquisition was taken from the Fall 2011 project for ECE 5760 - Air String, made by Young Hwa (Terry) Kim and Varsha Madhuranath.
- The RTL for time averaging was taken from the Fall 2010 project for ECE 5760 - Real-time Cartoonifier, made by Jeff Yates, Tom Gowing, and Kerran Flanangan.
- We used several RTL modules generated by Altera’s Megawizard Plug-in Manager including the SRAM, M4K blocks for processor memory, and the line buffers.
- The Pancake processor was adapted by Professor Bruce Land from the processor designed by K. Nakano, K. Kawakami, K, Shigemoto, and Y. Kamada and Y. Ito. We also used the associated compiler, Syrup designed by Professor Land.
- The VGA Controller was taken from the VGA Examples page of the ECE 5760 website
- W.H. Tsang, P.W.M. Tsang, “Suppression of false edge detection due to specular reflection in color images”, Pattern Recognition Letters, Volume 18, Issue 2, February 1997
- Wikipedia: Sobel Operator
- Wikipedia: Gaussian Filter
Appendix A: Code
Appendix B: Task Breakdown
All tasks were distributed evenly between group members.