|ECE 5760: Graphics Processing Unit|
Here are some sample pictures of what we were able to render:
The following models were rendered with flat shading and lower color depth:
The program is able to render all of the above images in less than a second. The original goal was to be able to render at several frames per second. This was possible using small objects such as a few triangles, a cube, etc. However once we started using the objects that were 500-1000 triangles, the frame rate began to slow down to 2-3 frames per second. That is, we could draw an image on the screen in a half to a third of a second depending on how many triangles there were. This amount of time is enough to notice a few of the triangles appearing on the screen, but it seems close to instantaneous. There is no noticeable flicker associated with the output on the VGA as far as we could see.
The models are interactive in the sense the user can rotate the camera around and see the model from 8 different views. This is shown in the images above with the different angles of the rabbit. When the user selected to move the camera with the push buttons, the screen would clear and the new image would display in the same amount of time as described above.
There are a few small glitches still in this program. The interpolative shading works correctly in most cases, but not all. As seen in the sonic image on the right, we have some precision issues with our z-buffer. You can see this with multiple colors in the same pixel where different parts of his body meet at the same vertices. You can also see this where his white eyes appear directly in front of, and in contact with, his blue head. There is some mixing of white and blue on his right eye.
One interesting thing was that whenever we wrote our own input files for small objects, each triangle was interpolated correctly and did not affect neighboring triangles. However when we went to larger objects that we obtained through other sources, there would be small glitches. Neighboring triangles would be of a different shade of color, although the overall interpolating seemed to still work.
Accuracy / Board Usage:
The project is limited in some ways by the DE2 board as far as accuracy. Combined between the M4K blocks and the SRAM (the sets of memory that have a 1 cycle latency for reading and writing), we were able to obtain 21 bits per pixel to store z-buffer and color information. This gave us a trade-off decision between how much to store for color and how much to store for z-buffer. In order to make the interpolative shading actually be able to produce a wide range of colors, we decided to make the color depth 14 bits. The 7 bit z-buffer then proved to be an issue in some cases, as described above.
The board has 70 built-in 9-bit hardware multipliers. This limits the number of multiplications we could do. First, this limits the amount of parallelization we could do. Initially, we considered making multiple units in parallel to speed up computation; however, once the hardware was finished, we were using up to 66 multipliers if an object used all 3 color bands, had a camera, etc. This prevented us from implementing multiple GPUs in parallel.
Our models took up anywhere between 13,000 and 30,000 logic elements on the board. The baseline hardware took up the 13,000, but the LUTs for each object took up the rest. This limited the size of the object we wanted to do, but we were going for performance. If we had placed the object data in another source, such as SDRAM or FLASH, the latency would have been significantly higher. As mentioned, we used the M4K for z-buffer storage. This took up 90 blocks of M4K for 5 bits of the z-buffer value. The other two were stored in SRAM. Finally, we used 1 PLL to accurately time the clock for the VGA monitor. This clock rate is the clock for which the entire GPU ran.