Design Difficulties
Our biggest design challenge turned out to be signed
arithmetic. We found that we had to explicitly manage the signs of operands
in multiplications by making both operands positive, multiplying, and
then fixing the resultant's sign if necessary. Also, we found that the
arithmetic shift operator '>>>' didn't always work as expected.
Compile times at the end of the project were approaching half an hour
which made debugging very difficult. Meeting the clock cycle timing requirements
while trying to pack as many calculations as possible into a given state
was also tricky at times. Debugging on a per pixel basis was also impossible
once a large number of spheres, shadows, and reflections were added so
we had to rely on mostly the VGA output to see if our changes were correct.
Also, occasionally toggling one of the input switches, the state machine
would jump to a state that it was never designed to reach. To solve this,
we simply made any unused state return to the initial state for the ray.
On the software side one of the biggest problems is timing
and handshaking between the hardware and software. Problems include assigning
the spheres data to the wrong sphere registers, or cutting off the first
or the last sphere due to timing problem. These problems are all fixed
when we handed over the control of the interface to the software. The
software would tell the hardware when the data is valid and which sphere
register it should be storing the data to.
Statistical Analysis of Ray Tracer
This section presents some of the performance stats of our ray tracer.
A total of 17 scenes were drawn to show the speed of the ray tracers in
term of rays per second and frames per second. These 17 scenes include
four configurations for four different scenes which all have resolution
512x480, and a best case performance (a low resolution scene 320X240 with
no features turned on). Also note that the following time does not include
the time the hardware waits for the software to sends the sphere list.
|
|
Rays
per frame
|
cycles
(25MHz)/frame
|
rays/sec
|
frames/sec
|
Image
1
|
simple
|
277,814
|
12,662,119
|
548,514
|
1.974
|
8
Spheres
|
full
reflection
|
310,175
|
14,003,794
|
553,734
|
1.785
|
no
plane
|
full
anti aliasing
|
2,500,295
|
113,981,083
|
548,401
|
0.219
|
|
full
AA + reflection
|
2,792,565
|
126,094,828
|
553,664
|
0.198
|
Image
1
|
simple
|
490,699
|
22,300,008
|
550,111
|
1.121
|
8
Spheres
|
full
reflection
|
1,802,813
|
78,305,473
|
575,571
|
0.319
|
4
planes
|
full
anti aliasing
|
4,413,303
|
200,596,329
|
550,023
|
0.125
|
|
full
AA + reflection
|
16,213,574
|
704,338,405
|
575,489
|
0.035
|
Image
2
|
simple
|
353,528
|
22,544,244
|
392,038
|
1.109
|
13
Spheres
|
full
reflection
|
547,627
|
33,850,568
|
404,444
|
0.739
|
no
planes
|
full
anti aliasing
|
3,182,440
|
203,017,700
|
391,892
|
0.123
|
|
full
AA + reflection
|
4,929,447
|
304,791,218
|
404,330
|
0.082
|
Image
2
|
simple
|
492,358
|
28,868,738
|
426,376
|
0.866
|
13
Spheres
|
full
reflection
|
1,804,498
|
102,301,169
|
440,977
|
0.244
|
4
planes
|
full
anti aliasing
|
4,428,189
|
259,797,987
|
426,118
|
0.096
|
|
full
AA + reflection
|
16,225,005
|
920,108,168
|
440,845
|
0.027
|
best
case
|
|
108,422
|
4,678,168
|
579,404
|
5.344
|
Table 2
Figure 12
The number of rays per second varied between 440,000 to 540,000 depending
on the specific scene. As expected the number of spheres and planes both
have a big impact on the performance of the ray tracer. Certain relationship
can be observed from the above graphs. The number of rays per second is
more effected by the number of spheres and planes, and less by what features
are turned on. On the other hand the frames per second is mostly effected
by what features are turned on. For the same scene when both 8 times anti-aliasing
and 3 levels of reflection are turned on, the frame rate can have as much
as 30 times decrease from the standard case.
Hardware Usage
In the end we had to limit our design due to limited
number of logic elements available to us on the DE2 board. The NIOS II
CPU uses about 1400-1800 LEs, which leaves us with 33216 LEs for the ray
tracer hardware. We had to shrink the size of the number of spheres that
can be drawn down to 4 so our design can fit on the FPGA. Below is some
stats from the compliation report of our final design.
Family
|
Cyclone
II
|
Device
|
EP2C35F672C6
|
Total
Logic Elements
|
32529/33216
(98%)
|
Total registers
|
6348
|
Total
memory bits
|
94688
(20%)
|
Total
PLL
|
2/4
(50%)
|
Table 3
|