FPGA Ray Tracer
Scott Bingham / Donald Zhang

| Introduction | High Level Design | Program/Hardware Design | Results | Conclusion | Appendix |

Results

Design Difficulties
Our biggest design challenge turned out to be signed arithmetic. We found that we had to explicitly manage the signs of operands in multiplications by making both operands positive, multiplying, and then fixing the resultant's sign if necessary. Also, we found that the arithmetic shift operator '>>>' didn't always work as expected. Compile times at the end of the project were approaching half an hour which made debugging very difficult. Meeting the clock cycle timing requirements while trying to pack as many calculations as possible into a given state was also tricky at times. Debugging on a per pixel basis was also impossible once a large number of spheres, shadows, and reflections were added so we had to rely on mostly the VGA output to see if our changes were correct. Also, occasionally toggling one of the input switches, the state machine would jump to a state that it was never designed to reach. To solve this, we simply made any unused state return to the initial state for the ray.

On the software side one of the biggest problems is timing and handshaking between the hardware and software. Problems include assigning the spheres data to the wrong sphere registers, or cutting off the first or the last sphere due to timing problem. These problems are all fixed when we handed over the control of the interface to the software. The software would tell the hardware when the data is valid and which sphere register it should be storing the data to.

Statistical Analysis of Ray Tracer
This section presents some of the performance stats of our ray tracer. A total of 17 scenes were drawn to show the speed of the ray tracers in term of rays per second and frames per second. These 17 scenes include four configurations for four different scenes which all have resolution 512x480, and a best case performance (a low resolution scene 320X240 with no features turned on). Also note that the following time does not include the time the hardware waits for the software to sends the sphere list.

 

 

Rays per frame

cycles (25MHz)/frame

rays/sec

frames/sec

Image 1

simple

277,814

12,662,119

548,514

1.974

8 Spheres

full reflection

310,175

14,003,794

553,734

1.785

no plane

full anti aliasing

2,500,295

113,981,083

548,401

0.219

 

full AA + reflection

2,792,565

126,094,828

553,664

0.198

Image 1

simple

490,699

22,300,008

550,111

1.121

8 Spheres

full reflection

1,802,813

78,305,473

575,571

0.319

4 planes

full anti aliasing

4,413,303

200,596,329

550,023

0.125

 

full AA + reflection

16,213,574

704,338,405

575,489

0.035

Image 2

simple

353,528

22,544,244

392,038

1.109

13 Spheres

full reflection

547,627

33,850,568

404,444

0.739

no planes

full anti aliasing

3,182,440

203,017,700

391,892

0.123

 

full AA + reflection

4,929,447

304,791,218

404,330

0.082

Image 2

simple

492,358

28,868,738

426,376

0.866

13 Spheres

full reflection

1,804,498

102,301,169

440,977

0.244

4 planes

full anti aliasing

4,428,189

259,797,987

426,118

0.096

 

full AA + reflection

16,225,005

920,108,168

440,845

0.027

best case

 

108,422

4,678,168

579,404

5.344

Table 2

Figure12 Figure12

Figure 12


The number of rays per second varied between 440,000 to 540,000 depending on the specific scene. As expected the number of spheres and planes both have a big impact on the performance of the ray tracer. Certain relationship can be observed from the above graphs. The number of rays per second is more effected by the number of spheres and planes, and less by what features are turned on. On the other hand the frames per second is mostly effected by what features are turned on. For the same scene when both 8 times anti-aliasing and 3 levels of reflection are turned on, the frame rate can have as much as 30 times decrease from the standard case.

Hardware Usage
In the end we had to limit our design due to limited number of logic elements available to us on the DE2 board. The NIOS II CPU uses about 1400-1800 LEs, which leaves us with 33216 LEs for the ray tracer hardware. We had to shrink the size of the number of spheres that can be drawn down to 4 so our design can fit on the FPGA. Below is some stats from the compliation report of our final design.

Family

Cyclone II

Device

EP2C35F672C6

Total Logic Elements

32529/33216 (98%)

Total registers

6348

Total memory bits

94688 (20%)

Total PLL

2/4 (50%)

Table 3