ECE 5760 Final Report: Gravity Simulator
By Zilin Wang (zw543),
Yilu Zhou (yz2797), and Haolin Li (hl992)
This project aims to develop a 2D gravitation simulator that uses FPGA technology to simulate gravitational interactions between objects with different masses. The main objective is to create a highly scalable simulator capable of simulating as many objects as possible with gravitation interaction as accurately as possible. The project also aims to develop a user-friendly interface where users can see information on certain objects. The resulting simulator will provide a valuable tool for understanding and visualizing the complex dynamics of gravitational interactions.
This project aims to develop a 2D gravitational simulator that utilizes FPGA technology to provide a highly scalable, accurate, and user-friendly platform for simulating gravitational interactions between objects with different masses. The simulator will be versatile and can be used for various purposes, such as an educational tool for physics or astronomy courses or a career development opportunity for students interested in FPGA programming, simulation modeling, and user interface design.
The simulator will incorporate this basic law of gravity to accurately simulate the interactions between multiple objects with varying masses. Users can interact with the simulator through a user-friendly interface that provides information about the simulated objects, their masses, and their interactions. Overall, this project has the potential to significantly contribute to the fields of physics and astronomy by providing a powerful and accurate simulation tool that can aid in understanding the complex dynamics of gravitational interactions.
The
simulator's underlying design is based on Newton's Law of Gravity, which states
that the force of gravity between two objects is proportional to the product of
their masses and inversely proportional to the square of the distance between
them. The gravitational constant 'G' is a fundamental constant that determines
the strength of the gravitational force between two objects. The mathematical
expression for Newton’s Law of Gravity is , where m1 and m2 are the
masses of the objects, and r is the distance between the objects.
The 2D gravitational simulator has been specifically designed to operate with the DE1-SoC platform. The simulator incorporates custom 27-bit floating-point hardware to power the Gravity Computation Engines, which perform gravitational calculations between objects. The simulation relies on Newton's Law of Gravity algorithm to calculate the gravitational forces between groups of objects. The Euler integration algorithm integrates the forces from different objects to generate net velocity.
The Gravity Computation Engines consist of eight modules, each of which can accept one external object as input to calculate its force applied on another object preloaded into the compute module. This means that the computation module can simultaneously calculate eight pairs of objects. The Gravity Engine preloads one object to all eight computation modules and passes eight different external objects to calculate the acceleration applied to this object. Once the calculations are complete, the acceleration is sent to the HPS side for integration. A new set of eight external objects is sent to all modules for further calculations until all external objects have been used. Then, the second object is preloaded to eight computing modules to repeat the previous processes.
The simulator has been optimized for the DE1-SoC platform, incorporating custom 27-bit floating-point hardware to power the Gravity Computation Engines. The simulator leverages Newton's Law of Gravity and Euler integration algorithms to calculate gravitational forces and generate net velocity. The Gravity Computation Engines consist of eight modules, each capable of simultaneously calculating eight pairs of objects. These are preloaded with one object and used to calculate the acceleration applied to another object. The process is repeated until all external objects have been used. Below is the high-level logical structure of the system.
Figure 1.
High-level logical structure diagram
The complex nature of gravitational simulations, which require the calculation of interactions between multiple objects, demands significant computational resources. Traditional software-based solutions cannot efficiently perform these calculations in parallel, leading to significant performance bottlenecks. This is where FPGA technology comes in, providing a highly scalable and efficient platform for parallel computing.
By using FPGA technology, we can calculate accelerations between objects at an unprecedented speed due to the parallel advantage of the FPGA. However, since the software runs on the HPS side and the hardware is on the FPGA, the communication between them via BUS limits the speed of calculations. To mitigate this, we have designed the simulator to run gravity calculations on the hardware and software integration parts. This trade-off allows us to take advantage of the speed of hardware acceleration for complex calculations while retaining the ease of software for simpler tasks, such as integration.
It is true that while putting everything on the hardware side would result in faster calculations, it would also significantly increase the complexity of the implementation, making it more challenging to debug. Nonetheless, our design aims to balance speed and ease of development, resulting in a highly scalable and efficient 2D gravitational simulator.
No known existing patents, copyrights, or trademarks are relevant to the design of the FPGA gravitational simulator.
We wrote the C code so that it could complete the following tasks:
The C program was designed to control the compute module via a reset signal. At the positive edge of the reset signal, the compute module will save the values from the AXI bus to the registers inside the compute module.
We also implemented some other features on the HPS side. We followed the same software design as in Lab 1 to use Pthread to read keyboard input. We changed the simulation speed on the VGA display by using the keyboard input. The user could trade between the simulation speed and simulation accuracy by selecting different simulation speeds. More specifically, if the user wanted to simulate a smaller time step, the particles would move slower because of the formula:
If the user increases the time step, the software will take a shorter time to get the calculation result for the same period t. The result is a loss of accuracy.
Moreover, the user can get the relative location and velocity for any pair of particles on the VGA display using the particle number. This is possible because we put all the particle information on the HPS side. The user could also track any one particle by entering the particle number. The software design's most tricky part was ensuring that the correct memory address was used for each compute module. We double-checked the memory mapping to ensure that the correct addresses were used.
We used the 27-bit floating point module developed and tested by previous groups [1]. Because of the limitation of the hardware, a 32-bit floating point module will waste resources. Therefore, the 27-bit floating point will be the best alternative since it balances resource utilization and accuracy. We used the following 27-bit floating point computation modules:
● FpNegate: Change the sign of the floating point number.
● FpInvSqrt: Calculate the inverse square root using the fast inverse square root method.
● FpMul: Calculate the product of 2 floating point numbers.
● FpAdd: Calculate the sum of 2 floating point numbers.
The gravitational compute module
in this project has been fully implemented on the DE1-SoC board. The
fundamental design of our compute module is based on the GRAPE-6, a specialized
and massively-parallel computer used for astrophysical N-body simulations [2].
Our module implements the calculation of the acceleration of object i due to
the gravitational interaction with object j using the equation . Here, the parameter r
denotes the distance between the two objects, and
is a minute value introduced to prevent the
denominator from approaching zero. With this module, we can accurately compute
the acceleration of object i resulting from the gravitational interaction with
object j. It is important to note that our simulation is a 2D simulation, thus
requiring the calculation of two acceleration vectors, i.e., x and y. Below, we
provide a detailed implementation diagram of our hardware implementation of
this equation.
Figure 2.
Implementation of acceleration between two objects due to gravitational
interaction.
Figure 3.
Gravitational Compute Module State Machine
The gravitational compute module comprises 17 states, each with its designated function. The first state, State 0, is the reset state, where object 1 and object 2's properties are inputted, and the adder clock is set to 0. States 1 through 14 control the adder clock to perform adding operations. The 15th state is designated for computing the inverse square root, which takes five cycles to complete. Finally, states 16 to 17 are responsible for writing the results to the register and preparing for the next cycle of operations. Below is the RTL view of one gravitation compute module.
Figure 4.
RTL view for one gravitation compute module.
This project involves the implementation of a VGA video display at a resolution of 640x480, using SDRAM as a frame buffer to facilitate 16-bit color representation. The top-level Verilog contains only the connections between the Qsys exported signals and the I/O pins, with no additional logic.
Figure 5:
VGA Implementation.
Inside the VGA subsystem, the vga_pixel_dma module address is set to the SDRAM base, 0x00000000. The address mode is set to consecutive, and the color space is changed to 16-bit.
The dual-clock fifo module's color bits are changed to 16-bits. The RGB resampler is updated to support 16-bit input. The VGA DMA controller's output in the top-level Qsys is disconnected from the on-chip SRAM and linked exclusively to SDRAM. The AXI-bus, HPS master stays connected to SDRAM, enabling the HPS to read/write the VGA screen.
The AXI-bus master base address is set to C000_0000, which the HPS C-program uses for high-speed I/O to the FPGA. The light-weight AXI-bus base address is FF20_0000, which the HPS C-program uses for low-speed control I/O to the FPGA.
Graphics routines are defined as follows:
● VGA_text: Displays text at specified positions.
● VGA_text_clear: Clears the text buffer but not graphics.
● VGA_box: Draws a box with specified corners and color.
● VGA_line: Draws a line between two specified points with a specified color.
● VGA_disc: Draws a disc with specified center, radius, and color.
A pixel writing macro is modified to allow 16-bit writes to the bus using the consecutive format.
The Qsys design layout depicts the communication setup between the HPS and FPGA sides. The process involves preloading the properties of different objects on the HPS side and sending them to the FPGA side to calculate the x and y acceleration of one object acting on another object. The resulting acceleration values are then returned to the HPS side for integration into the net velocity.
To achieve this, the HPS sends the mass, x coordinates, y coordinates, and object numbers of the first and second objects to the FPGA side as inputs for one of the eight compute modules. After all properties have been updated, the compute module begins the calculation upon receiving a reset signal. Once the calculation is completed, the corresponding complete signal from the compute module is sent back to the HPS side, along with the x and y acceleration values calculated by the module. The HPS side receives this signal and identifies which of the eight compute modules sent it. It then integrates the acceleration values and sends new properties to the module for the next computation. All eight compute modules simultaneously receive data and compute the acceleration. Below is one compute module sample Qsys connection.
Figure 6. Qsys layout for one compute module.
We first tried to implement the whole design entirely on the FPGA side so that there would be no bottleneck in the AXI bus. To achieve this goal, the compute module on the FPGA side needed to compute the next location and the next velocity and modify the content in the VGA memory. We integrated all the float computing modules into one compute module to get the computed values from the output ports. We also designed a complex state machine with around 30 states to implement the data fetching, generating clock signals for the float point adder, and M10K writing back functions. To eliminate the waiting time for memory writing, we also implemented a linked list in Verilog so that each compute module would generate results at different times. One major challenge for this design was that the debugging was hard because of the complex design. The waveforms generated by the models were hard to verify. Despite the challenge, we managed to integrate one compute module in the design to verify that the design was actually synthesizable. After synthesizing, we abandoned the design because a single compute module would consume around 35% of all available logic elements. This indicated that the design would not necessarily be the most efficient one because we could only put three complete compute modules in the design.
After the first attempt, we decided to simplify each compute module so that we could put more simplified modules. Instead of using the FPGA side to store all the information on the particles, we decided to use the HPS side to store all the particle information. We developed a C program on the HPS to control the linked list. On the positive edge of the control clock, the linked list would append the new value to the head of the linked list. Then, the compute module would compute the new location, velocity, and acceleration and output these back to the HPS side via the AXI bus. Although the new design introduced a bottleneck on the AXI bus, we could potentially put more compute modules on the FPGA side to improve the parallelism of the design. However, later we found that the linked list was limiting computation speed because only a new value could be assigned to the linked list simultaneously. We simplified the design to eliminate the linked list to solve this problem. The new design connected each compute module directly to the AXI bus so that a new value could be sent directly to each module with improved throughput.
Based on the hardware and software co-design described earlier, the DE1-SoC development board can synthesize a maximum of eight compute modules. A compilation report was generated to analyze the resource utilization of the design. The report showed that all 87 DSP blocks were utilized due to the large number of floating-point calculations required for the simulation. Therefore, the number of available DSP blocks on the board is the main constraint for our gravitation simulator. The simulation is capable of updating 72 particles at a rate of 8 frames per second, which meets our expectations. This is due to the complexity of the calculation, which has a time complexity of O(n^2).
Figure 7: Quartus Compilation Report.
Figure 8:
VGA Display.
To test the functionality of the compute module, we utilized ModelSim, a simulation and verification tool. Since the hardware uses a 27-bit floating point, we modified the input and output ports to accommodate a 32-bit floating point for easy viewing and analysis in ModelSim. The testing process involved defining the initial conditions and comparing the calculated results with those obtained from a calculator to ensure the accuracy and correctness of the compute module.
Figure 9:
ModelSim Result.
Number of particles |
Frame Time[ms] |
16 |
6.6 |
32 |
26 |
64 |
103 |
72 |
130 |
128 |
413 |
1024 |
26446 |
Table 1: Frame Time vs. Number of
Particles Table.
Figure 10:
Frame Time vs. Number of Particles Chart.
To optimize the frame time, we attempted to incorporate as many compute modules as possible on the FPGA. Initially, we aimed to integrate 20 compute modules; however, we soon realized that this led to an exhaustion of logic elements on the hardware. Consequently, we had to reduce the number of compute modules to eight to fit within the available resources.
Since we used the 27-bit floating point on the FPGA side, we expect the accuracy to be smaller than that of the 32-bit floating point. Considering that we are dealing with large masses and distances, we could ignore the loss of accuracy because of the large magnitude of numbers we are dealing with.
Users can interact with the command line by following the action prompts displayed. They have several options available: they can either speed up or slow down the animation, achieved by adjusting the time step in integration. Additionally, they can choose to pause the animation. If users want to compare the relative speed and position of two specific particles, they can do so by entering the respective particle's ID numbers, which will then be highlighted within a circle. Users also have the ability to display the position, velocity, and mass of a selected particle. This is done by entering the particle's ID number, which will consequently be accentuated within a circle for easy viewing. All commands have text prompts, which are straightforward and easy to understand for the user.
To conclude, the development of the gravitational simulator was a successful endeavor, leading to a functional and user-friendly software tool that enables individuals to simulate and visualize gravitational interactions between objects. The project aimed to offer an educational and interactive platform for understanding the concepts of gravity and its impacts on celestial bodies, and it has achieved this objective.
Throughout the project, different components were designed and implemented, including the software architecture, the user interface, the gravitational compute module, and the visualization capabilities using the VGA display. These components were integrated perfectly to create a cohesive and robust gravitational simulator. We used the floating point calculation modules provided by previous groups who worked on a similar project. Our design is impossible without them. We also used Altera's VGA subsystem to implement the output to the display. We also used the SDRAM and SRAM on the SoC to accelerate the frame time to store the pixel color information. Although our design did not follow the GRAPE design, the GRAPE design proved to us that using FPGA to accelerate the gravitational simulation was possible.
The project's outcomes have shown the efficiency and functionality of the simulator. Users had the ability to easily interact with the software, simulate gravitational scenarios, and observe the resulting effects. Feedback from users suggested a high level of satisfaction with the software's performance, interface design, and user-friendly controls.
Furthermore, the project discovered optimizations to enhance the simulator's efficiency and accuracy. By executing efficient algorithms and leveraging hardware acceleration techniques, the simulator attained real-time simulation abilities despite a large number of objects.
While
the job was successful in meeting its goals, there are potential areas for
future improvement. This consists of expanding the simulator's abilities by
incorporating extra physics simulations and interactions, such as gravitational
slingshots and orbital mechanics. Additional improvements might also be made to
the visualization aspects, providing more advanced rendering methods and
customizable visual effects.
The group approves this report for inclusion on the course website. The group approves the video for inclusion on the course youtube channel.
Throughout the entire duration of the project, we ensured an equitable distribution of work. Although the following breakdown provides an approximate distribution, each member participated in all tasks.
Zilin Wang |
Haolin Li |
Yilu Zhou |
Gravitational module design |
Bus communication design |
VGA graph implementation |
HPS side C implementation |
HPS side C implementation |
HPS side C implementation |
FPGA and HPS debug |
FPGA and HPS debug |
FPGA and HPS debug |
[2] J. Makino, T. Fukushige, M. Koga, and K. Namura, “GRAPE-6: Massively-Parallel Special-Purpose Computer for Astrophysical Particle Simulations,” Publ. Astron. Soc. Jpn., vol. 55, no. 6, pp. 1163–1187, Dec. 2003, doi: 10.1093/pasj/55.6.1163.
VGA.c
// same obj1, different obj2 for all compute
modules
#include <fcntl.h>
#include <math.h>
#include <pthread.h>
#include <semaphore.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/shm.h>
#include <sys/time.h>
#include <sys/types.h>
#include <time.h>
#include <unistd.h>
#define particle_number 72
#define dt 10
#define scale 100
typedef struct object {
unsigned int number;
float mass;
float x_coord;
float y_coord;
float x_vel;
float y_vel;
unsigned int color;
unsigned int radius;
unsigned int old_display_x;
unsigned int old_display_y;
} particle;
// video display
#define SDRAM_BASE 0xC0000000
#define SDRAM_END 0xC3FFFFFF
#define SDRAM_SPAN 0x04000000
// characters
#define FPGA_CHAR_BASE 0xC9000000
#define FPGA_CHAR_END 0xC9001FFF
#define FPGA_CHAR_SPAN 0x00002000
/* Cyclone V FPGA devices */
#define HW_REGS_BASE 0xff200000
// #define HW_REGS_SPAN 0x00200000
#define HW_REGS_SPAN 0x00005000
// graphics primitives
void VGA_text(int, int, char *);
void VGA_text_clear();
void VGA_box(int, int, int, int, short);
void VGA_rect(int, int, int, int, short);
void VGA_line(int, int, int, int, short);
void VGA_Vline(int, int, int, short);
void VGA_Hline(int, int, int, short);
void VGA_disc(int, int, int, short);
void VGA_circle(int, int, int, int);
// 16-bit primary colors
#define red (0
+ (0
<< 5) + (31 << 11))
#define dark_red (0
+ (0
<< 5) + (15 << 11))
#define green (0
+ (63
<< 5) + (0 << 11))
#define dark_green (0
+ (31
<< 5) + (0 << 11))
#define blue (31
+ (0
<< 5) + (0 << 11))
#define dark_blue (15
+ (0
<< 5) + (0 << 11))
#define yellow (0
+ (63
<< 5) + (31 << 11))
#define cyan (31
+ (63
<< 5) + (0 << 11))
#define magenta (31
+ (0
<< 5) + (31 << 11))
#define black (0x0000)
#define gray (15
+ (31
<< 5) + (51 << 11))
#define white (0xffff)
int colors[]
= {red, dark_red, green, dark_green, blue,
dark_blue,
yellow, cyan, magenta, gray, black, white};
// pixel macro
#define VGA_PIXEL(x,
y, color) \
do {
\
int *pixel_ptr;
\
pixel_ptr = (int *)((char *)vga_pixel_ptr
+ (((y)*640 + (x)) << 1)); \
*(short *)pixel_ptr
= (color); \
} while (0)
// the light weight buss base
void *h2p_lw_virtual_base;
// pixel buffer
volatile unsigned int *vga_pixel_ptr = NULL;
void *vga_pixel_virtual_base;
// character buffer
volatile unsigned int *vga_char_ptr = NULL;
void *vga_char_virtual_base;
// /dev/mem file id
int fd;
// measure time
struct timeval t1, t2;
double elapsedTime;
// main bus; PIO
#define FPGA_AXI_BASE 0xC0000000
#define FPGA_AXI_SPAN 0x00001000
// main axi bus base
void *h2p_virtual_base;
volatile unsigned int *axi_pio_ptr = NULL;
volatile unsigned int *axi_pio_read_ptr = NULL;
// lw bus; PIO
#define FPGA_LW_BASE 0xff200000
#define FPGA_LW_SPAN 0x00001000
// HPS_to_FPGA FIFO status address = 0
volatile unsigned int *lw_pio_ptr = NULL;
volatile unsigned int *lw_pio_read_ptr = NULL;
unsigned int *obj_1_num_a =
NULL;
float *obj_1_mass_a = NULL;
float *obj_1_x_coord_a = NULL;
float *obj_1_y_coord_a = NULL;
unsigned int *obj_2_num_a0 =
NULL;
float *obj_2_mass_a0 = NULL;
float *obj_2_x_coord_a0 = NULL;
float *obj_2_y_coord_a0 = NULL;
volatile float *obj_1_delta_x_acc_b0 =
NULL;
volatile float *obj_1_delta_y_acc_b0 =
NULL;
volatile bool *complete0 =
NULL;
unsigned int *obj_2_num_a1 =
NULL;
float *obj_2_mass_a1 = NULL;
float *obj_2_x_coord_a1 = NULL;
float *obj_2_y_coord_a1 = NULL;
volatile float *obj_1_delta_x_acc_b1 =
NULL;
volatile float *obj_1_delta_y_acc_b1 =
NULL;
volatile bool *complete1 =
NULL;
unsigned int *obj_2_num_a2 =
NULL;
float *obj_2_mass_a2 = NULL;
float *obj_2_x_coord_a2 = NULL;
float *obj_2_y_coord_a2 = NULL;
volatile float *obj_1_delta_x_acc_b2 =
NULL;
volatile float *obj_1_delta_y_acc_b2 =
NULL;
volatile bool *complete2 =
NULL;
unsigned int *obj_2_num_a3 =
NULL;
float *obj_2_mass_a3 = NULL;
float *obj_2_x_coord_a3 = NULL;
float *obj_2_y_coord_a3 = NULL;
volatile float *obj_1_delta_x_acc_b3 =
NULL;
volatile float *obj_1_delta_y_acc_b3 =
NULL;
volatile bool *complete3 =
NULL;
unsigned int *obj_2_num_a4 =
NULL;
float *obj_2_mass_a4 = NULL;
float *obj_2_x_coord_a4 = NULL;
float *obj_2_y_coord_a4 = NULL;
volatile float *obj_1_delta_x_acc_b4 =
NULL;
volatile float *obj_1_delta_y_acc_b4 =
NULL;
volatile bool *complete4 =
NULL;
unsigned int *obj_2_num_a5 =
NULL;
float *obj_2_mass_a5 = NULL;
float *obj_2_x_coord_a5 = NULL;
float *obj_2_y_coord_a5 = NULL;
volatile float *obj_1_delta_x_acc_b5 =
NULL;
volatile float *obj_1_delta_y_acc_b5 =
NULL;
volatile bool *complete5 =
NULL;
unsigned int *obj_2_num_a6 =
NULL;
float *obj_2_mass_a6 = NULL;
float *obj_2_x_coord_a6 = NULL;
float *obj_2_y_coord_a6 = NULL;
volatile float *obj_1_delta_x_acc_b6 =
NULL;
volatile float *obj_1_delta_y_acc_b6 =
NULL;
volatile bool *complete6 =
NULL;
unsigned int *obj_2_num_a7 =
NULL;
float *obj_2_mass_a7 = NULL;
float *obj_2_x_coord_a7 = NULL;
float *obj_2_y_coord_a7 = NULL;
volatile float *obj_1_delta_x_acc_b7 =
NULL;
volatile float *obj_1_delta_y_acc_b7 =
NULL;
volatile bool *complete7 =
NULL;
bool *my_reset_ptr = NULL;
bool *data_sent_ptr = NULL;
// read offset is 0x10 for both busses
// remember that eaxh axi master bus needs unique
address
#define OBJ1_NUM_A 0x00
#define OBJ1_MASS_A 0x10
#define OBJ1_X_COORD_A 0x20
#define OBJ1_Y_COORD_A 0x30
#define COMPLETE0 0xa0
#define RESET 0xc0
#define OBJ2_NUM_A0 0x40
#define OBJ2_MASS_A0 0x50
#define OBJ2_X_COORD_A0 0x60
#define OBJ2_Y_COORD_A0 0x70
#define OBJ1_DELTA_X_ACC0 0x80
#define OBJ1_DELTA_Y_ACC0 0x90
#define OBJ2_NUM_A1 0x100
#define OBJ2_MASS_A1 0x110
#define OBJ2_X_COORD_A1 0x120
#define OBJ2_Y_COORD_A1 0x130
#define COMPLETE1 0x140
#define OBJ1_DELTA_X_ACC1 0x150
#define OBJ1_DELTA_Y_ACC1 0x160
#define OBJ2_NUM_A2 0x200
#define OBJ2_MASS_A2 0x210
#define OBJ2_X_COORD_A2 0x220
#define OBJ2_Y_COORD_A2 0x230
#define COMPLETE2 0x240
#define OBJ1_DELTA_X_ACC2 0x250
#define OBJ1_DELTA_Y_ACC2 0x260
#define OBJ2_NUM_A3 0x300
#define OBJ2_MASS_A3 0x310
#define OBJ2_X_COORD_A3 0x320
#define OBJ2_Y_COORD_A3 0x330
#define COMPLETE3 0x340
#define OBJ1_DELTA_X_ACC3 0x350
#define OBJ1_DELTA_Y_ACC3 0x360
#define OBJ2_NUM_A4 0x400
#define OBJ2_MASS_A4 0x410
#define OBJ2_X_COORD_A4 0x420
#define OBJ2_Y_COORD_A4 0x430
#define COMPLETE4 0x440
#define OBJ1_DELTA_X_ACC4 0x450
#define OBJ1_DELTA_Y_ACC4 0x460
#define OBJ2_NUM_A5 0x500
#define OBJ2_MASS_A5 0x510
#define OBJ2_X_COORD_A5 0x520
#define OBJ2_Y_COORD_A5 0x530
#define COMPLETE5 0x540
#define OBJ1_DELTA_X_ACC5 0x550
#define OBJ1_DELTA_Y_ACC5 0x560
#define OBJ2_NUM_A6 0x600
#define OBJ2_MASS_A6 0x610
#define OBJ2_X_COORD_A6 0x620
#define OBJ2_Y_COORD_A6 0x630
#define COMPLETE6 0x640
#define OBJ1_DELTA_X_ACC6 0x650
#define OBJ1_DELTA_Y_ACC6 0x660
#define OBJ2_NUM_A7 0x700
#define OBJ2_MASS_A7 0x710
#define OBJ2_X_COORD_A7 0x720
#define OBJ2_Y_COORD_A7 0x730
#define COMPLETE7 0x740
#define OBJ1_DELTA_X_ACC7 0x750
#define OBJ1_DELTA_Y_ACC7 0x760
#define DATA_SENT 0x800
int delta_t = dt;
int choice =
-1;
int go =
-1;
int particle_1 =
-1;
int particle_2 =
-1;
float x_rel_vel =
0;
float y_rel_vel =
0;
float x_rel_pos =
0;
float y_rel_pos =
0;
float particle_mass =
0;
float x_coord =
0;
float y_coord =
0;
float x_vel =
0;
float y_vel =
0;
particle *p_old;
particle *p_reset;
void *read1() {
while (1)
{
printf("Enter 1 to change speed, enter 2 for
relative speed, enter 3 for particle status:");
scanf("%d", &choice);
// flag = false;
int delta_t_old = 0;
switch (choice) {
case 1:
printf("Enter a value between 5 - 100 to change
speed: ");
scanf("%d", &delta_t);
break;
case 2:
delta_t_old =
delta_t;
delta_t = 0;
printf("Enter two particle number you want to
compare (between 0 and %d)\n", particle_number -
1);
printf("First particle:\n");
scanf("%d", &particle_1);
printf("Second particle:\n");
scanf("%d", &particle_2);
x_rel_pos = p_old[particle_1].x_coord - p_old[particle_2].x_coord;
y_rel_pos = p_old[particle_1].y_coord - p_old[particle_2].y_coord;
x_rel_vel = p_old[particle_1].x_vel - p_old[particle_2].x_vel;
y_rel_vel = p_old[particle_1].y_vel - p_old[particle_2].y_vel;
printf("X relative position is: %fm\n", x_rel_pos);
printf("Y relative position is: %fm\n", y_rel_pos);
printf("X relative velocity is: %fm/s\n", x_rel_vel);
printf("Y relative velocity is: %fm/s\n", y_rel_vel);
printf("Continue? (1 for yes, 2 for no)\n");
VGA_circle(p_old[particle_1].old_display_x, p_old[particle_1].old_display_y, 10, green);
VGA_circle(p_old[particle_2].old_display_x, p_old[particle_2].old_display_y, 10, green);
while (1) {
scanf("%d", &go);
if (go == 1) {
break;
}
}
VGA_circle(p_old[particle_1].old_display_x, p_old[particle_1].old_display_y, 10, black);
VGA_circle(p_old[particle_2].old_display_x, p_old[particle_2].old_display_y, 10, black);
delta_t = delta_t_old;
break;
case 3:
delta_t_old =
delta_t;
delta_t = 0;
printf("Enter the particle number you want to show
details (between 0 and %d):\n", particle_number -
1);
scanf("%d", &particle_1);
x_coord = p_old[particle_1].x_coord;
y_coord = p_old[particle_1].y_coord;
x_vel = p_old[particle_1].x_vel;
y_vel = p_old[particle_1].y_vel;
particle_mass =
p_old[particle_1].mass;
printf("mass is: %fkg\n", particle_mass);
printf("X position is: %fm\n", x_coord);
printf("Y position is: %fm\n", y_coord);
printf("X velocity is: %fm/s\n", x_vel);
printf("Y velocity is: %fm/s\n", y_vel);
printf("Continue? (1 for yes, 2 for no)\n");
VGA_circle(p_old[particle_1].old_display_x, p_old[particle_1].old_display_y, 10, yellow);
while (1) {
scanf("%d", &go);
if (go == 1) {
break;
}
}
VGA_circle(p_old[particle_1].old_display_x, p_old[particle_1].old_display_y, 10, black);
delta_t = delta_t_old;
break;
default:
break;
}
// flag = true;
}
}
int main(void) {
pthread_t thread_read;
// // For portability,
explicitly create threads in a joinable state
// // thread attribute
used here to allow JOIN
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// thread
pthread_create(&thread_read, NULL, read1, NULL);
// === need to mmap:
=======================
// FPGA_CHAR_BASE
// FPGA_ONCHIP_BASE
// HW_REGS_BASE
// === get FPGA addresses
==================
// Open /dev/mem
if ((fd =
open("/dev/mem", (O_RDWR |
O_SYNC))) == -1)
{
printf("ERROR: could not open \"/dev/mem\"...\n");
return (1);
}
// get virtual addr that
maps to physical
h2p_lw_virtual_base =
mmap(NULL, HW_REGS_SPAN, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, HW_REGS_BASE);
if (h2p_lw_virtual_base == MAP_FAILED) {
printf("ERROR: mmap1() failed...\n");
close(fd);
return (1);
}
// === get VGA char addr
=====================
// get virtual addr that
maps to physical
vga_char_virtual_base =
mmap(NULL, FPGA_CHAR_SPAN, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, FPGA_CHAR_BASE);
if (vga_char_virtual_base == MAP_FAILED) {
printf("ERROR: mmap2() failed...\n");
close(fd);
return (1);
}
// Get the address that
maps to the FPGA LED control
vga_char_ptr =
(unsigned int *)(vga_char_virtual_base);
// === get VGA pixel addr
====================
// get virtual addr that
maps to physical
vga_pixel_virtual_base =
mmap(NULL, SDRAM_SPAN, (PROT_READ |
PROT_WRITE), MAP_SHARED, fd, SDRAM_BASE);
if (vga_pixel_virtual_base == MAP_FAILED) {
printf("ERROR: mmap3() failed...\n");
close(fd);
return (1);
}
// Get the address that
maps to the FPGA pixel buffer
vga_pixel_ptr =
(unsigned int *)(vga_pixel_virtual_base);
//
===========================================
// get virtual address for
// AXI bus addr
h2p_virtual_base =
mmap(NULL, FPGA_AXI_SPAN, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, FPGA_AXI_BASE);
if (h2p_virtual_base == MAP_FAILED) {
printf("ERROR: mmap3() failed...\n");
close(fd);
return (1);
}
//============================================
// get virtual addr that
maps to physical
// for light weight AXI
bus
h2p_lw_virtual_base =
mmap(NULL, FPGA_LW_SPAN, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, FPGA_LW_BASE);
if (h2p_lw_virtual_base == MAP_FAILED) {
printf("ERROR: mmap1() failed...\n");
close(fd);
return (1);
}
//============================================
// particle* p1 =
(particle*)calloc(1,sizeof(particle));
p_old = (particle *)calloc(particle_number, sizeof(particle));
p_reset = (particle *)calloc(particle_number, sizeof(particle));
obj_1_num_a =
(unsigned int *)(h2p_lw_virtual_base + OBJ1_NUM_A);
obj_1_mass_a =
(float *)(h2p_lw_virtual_base +
OBJ1_MASS_A);
obj_1_x_coord_a =
(float *)(h2p_lw_virtual_base +
OBJ1_X_COORD_A);
obj_1_y_coord_a =
(float *)(h2p_lw_virtual_base +
OBJ1_Y_COORD_A);
obj_2_num_a0 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A0);
obj_2_mass_a0 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A0);
obj_2_x_coord_a0 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A0);
obj_2_y_coord_a0 = (float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A0);
obj_1_delta_x_acc_b0 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC0);
obj_1_delta_y_acc_b0 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC0);
complete0 = (bool *)(h2p_lw_virtual_base + COMPLETE0);
obj_2_num_a1 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A1);
obj_2_mass_a1 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A1);
obj_2_x_coord_a1 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A1);
obj_2_y_coord_a1 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A1);
obj_1_delta_x_acc_b1 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC1);
obj_1_delta_y_acc_b1 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC1);
complete1 = (bool *)(h2p_lw_virtual_base + COMPLETE1);
obj_2_num_a2 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A2);
obj_2_mass_a2 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A2);
obj_2_x_coord_a2 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A2);
obj_2_y_coord_a2 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A2);
obj_1_delta_x_acc_b2 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC2);
obj_1_delta_y_acc_b2 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC2);
complete2 = (bool *)(h2p_lw_virtual_base + COMPLETE2);
obj_2_num_a3 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A3);
obj_2_mass_a3 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A3);
obj_2_x_coord_a3 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A3);
obj_2_y_coord_a3 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A3);
obj_1_delta_x_acc_b3 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC3);
obj_1_delta_y_acc_b3 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC3);
complete3 = (bool *)(h2p_lw_virtual_base + COMPLETE3);
obj_2_num_a4 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A4);
obj_2_mass_a4 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A4);
obj_2_x_coord_a4 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A4);
obj_2_y_coord_a4 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A4);
obj_1_delta_x_acc_b4 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC4);
obj_1_delta_y_acc_b4 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC4);
complete4 = (bool *)(h2p_lw_virtual_base + COMPLETE4);
obj_2_num_a5 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A5);
obj_2_mass_a5 = (float *)(h2p_lw_virtual_base +
OBJ2_MASS_A5);
obj_2_x_coord_a5 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A5);
obj_2_y_coord_a5 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A5);
obj_1_delta_x_acc_b5 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC5);
obj_1_delta_y_acc_b5 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC5);
complete5 = (bool *)(h2p_lw_virtual_base + COMPLETE5);
obj_2_num_a6 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A6);
obj_2_mass_a6 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A6);
obj_2_x_coord_a6 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A6);
obj_2_y_coord_a6 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A6);
obj_1_delta_x_acc_b6 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC6);
obj_1_delta_y_acc_b6 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC6);
complete6 = (bool *)(h2p_lw_virtual_base + COMPLETE6);
obj_2_num_a7 =
(unsigned int *)(h2p_lw_virtual_base + OBJ2_NUM_A7);
obj_2_mass_a7 =
(float *)(h2p_lw_virtual_base +
OBJ2_MASS_A7);
obj_2_x_coord_a7 =
(float *)(h2p_lw_virtual_base +
OBJ2_X_COORD_A7);
obj_2_y_coord_a7 =
(float *)(h2p_lw_virtual_base +
OBJ2_Y_COORD_A7);
obj_1_delta_x_acc_b7 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_X_ACC7);
obj_1_delta_y_acc_b7 =
(float *)(h2p_lw_virtual_base +
OBJ1_DELTA_Y_ACC7);
complete7 = (bool *)(h2p_lw_virtual_base + COMPLETE7);
my_reset_ptr =
(bool *)(h2p_lw_virtual_base +
RESET);
data_sent_ptr =
(bool *)(h2p_lw_virtual_base +
DATA_SENT);
struct timespec req, rem;
req.tv_sec = 0;
req.tv_nsec = 1;
int i =
0;
int j =
0;
struct timeval t1, t2;
double elapsedTime;
int k =
0;
// init
p_old[0].number = 0;
p_old[0].mass =
8000000000000000;
p_old[0].x_coord = 30000;
p_old[0].y_coord = 20000;
p_old[0].x_vel =
0;
p_old[0].y_vel =
0;
p_old[0].color =
yellow;
p_old[0].radius = 4;
p_old[0].old_display_x = p_old[0].x_coord / scale;
p_old[0].old_display_y = p_old[0].y_coord / scale;
p_old[1].number = 1;
p_old[1].mass =
2000000000000000;
p_old[1].x_coord = 30000 - 7000;
p_old[1].y_coord = 20000;
p_old[1].x_vel =
-3.25;
p_old[1].y_vel =
6;
p_old[1].color =
red;
p_old[1].radius = 4;
p_old[1].old_display_x = p_old[1].x_coord / scale;
p_old[1].old_display_y = p_old[1].y_coord / scale;
p_old[2].number = 2;
p_old[2].mass =
90000000000000;
p_old[2].x_coord = 30000 + 10000;
p_old[2].y_coord = 20000;
p_old[2].x_vel =
3.25;
p_old[2].y_vel =
4;
p_old[2].color =
red;
p_old[2].radius = 4;
p_old[2].old_display_x = p_old[2].x_coord / scale;
p_old[2].old_display_y = p_old[2].y_coord / scale;
for (k =
3; k < particle_number; k++)
{
p_old[k].number = k;
p_old[k].mass = random_range(100000, 1000000000);
p_old[k].x_coord = random_range(50 * 100, 589 * 100);
p_old[k].y_coord = random_range(50 * 100, 429 * 100);
p_old[k].x_vel = random_range(0, 6) - 3;
p_old[k].y_vel = random_range(0, 4) - 2;
p_old[k].color = white;
p_old[k].radius = 1;
p_old[k].old_display_x = p_old[k].x_coord / scale;
p_old[k].old_display_y = p_old[k].y_coord / scale;
}
memcpy(p_reset, p_old,
particle_number * sizeof(particle));
particle
*p_new;
VGA_box(0, 0, 639, 479, black);
char screen_text[32];
while (1)
{
p_new = (particle *)calloc(particle_number, sizeof(particle));
memcpy(p_new, p_old,
particle_number * sizeof(particle));
gettimeofday(&t1, NULL);
for (i =
0; i < particle_number; i++)
{
*obj_1_num_a =
p_old[i].number;
*obj_1_mass_a = p_old[i].mass;
*obj_1_x_coord_a = p_old[i].x_coord;
*obj_1_y_coord_a = p_old[i].y_coord;
int j =
0;
float obj_1_sum_x_acc = 0;
float obj_1_sum_y_acc = 0;
while (j < particle_number) {
*obj_2_num_a0 = p_old[j].number;
*obj_2_mass_a0 = p_old[j].mass;
*obj_2_x_coord_a0 = p_old[j].x_coord;
*obj_2_y_coord_a0 = p_old[j].y_coord;
j++;
*obj_2_num_a1 = p_old[j].number;
*obj_2_mass_a1 = p_old[j].mass;
*obj_2_x_coord_a1 = p_old[j].x_coord;
*obj_2_y_coord_a1 = p_old[j].y_coord;
j++;
*obj_2_num_a2 = p_old[j].number;
*obj_2_mass_a2 = p_old[j].mass;
*obj_2_x_coord_a2 = p_old[j].x_coord;
*obj_2_y_coord_a2 = p_old[j].y_coord;
j++;
*obj_2_num_a3 = p_old[j].number;
*obj_2_mass_a3 = p_old[j].mass;
*obj_2_x_coord_a3 = p_old[j].x_coord;
*obj_2_y_coord_a3 = p_old[j].y_coord;
j++;
*obj_2_num_a4 = p_old[j].number;
*obj_2_mass_a4 = p_old[j].mass;
*obj_2_x_coord_a4 = p_old[j].x_coord;
*obj_2_y_coord_a4 = p_old[j].y_coord;
j++;
*obj_2_num_a5 = p_old[j].number;
*obj_2_mass_a5 = p_old[j].mass;
*obj_2_x_coord_a5 = p_old[j].x_coord;
*obj_2_y_coord_a5 = p_old[j].y_coord;
j++;
*obj_2_num_a6 = p_old[j].number;
*obj_2_mass_a6 = p_old[j].mass;
*obj_2_x_coord_a6 = p_old[j].x_coord;
*obj_2_y_coord_a6 = p_old[j].y_coord;
j++;
*obj_2_num_a7 = p_old[j].number;
*obj_2_mass_a7 = p_old[j].mass;
*obj_2_x_coord_a7 = p_old[j].x_coord;
*obj_2_y_coord_a7 = p_old[j].y_coord;
j++;
*my_reset_ptr = false;
usleep(2);
*my_reset_ptr = true;
usleep(2);
*my_reset_ptr = false;
usleep(2);
while (!(*complete0))
;
//
calculate net acceleration
obj_1_sum_x_acc +=
+*obj_1_delta_x_acc_b0 + *obj_1_delta_x_acc_b1 + *obj_1_delta_x_acc_b2
+ *obj_1_delta_x_acc_b3
+ *obj_1_delta_x_acc_b4
+ *obj_1_delta_x_acc_b5
+ *obj_1_delta_x_acc_b6
+ *obj_1_delta_x_acc_b7;
obj_1_sum_y_acc +=
*obj_1_delta_y_acc_b0 + *obj_1_delta_y_acc_b1 + *obj_1_delta_y_acc_b2
+ *obj_1_delta_y_acc_b3
+ *obj_1_delta_y_acc_b4
+ *obj_1_delta_y_acc_b5
+ *obj_1_delta_y_acc_b6
+ *obj_1_delta_y_acc_b7;
}
float obj_1_new_x_vel = p_old[i].x_vel + obj_1_sum_x_acc * delta_t;
float obj_1_new_y_vel = p_old[i].y_vel + obj_1_sum_y_acc * delta_t;
float obj_1_new_x_coord = p_old[i].x_coord + obj_1_new_x_vel * delta_t;
float obj_1_new_y_coord = p_old[i].y_coord + obj_1_new_y_vel * delta_t;
int obj_1_new_display_x = (obj_1_new_x_coord < 0) ? 0 : (obj_1_new_x_coord > 639) ? 639
: (int)obj_1_new_x_coord;
int obj_1_new_display_y = (obj_1_new_y_coord < 0) ? 0 : (obj_1_new_y_coord > 479) ? 479
: (int)obj_1_new_y_coord;
int obj_1_old_display_x = (p_old[i].x_coord < 0) ? 0 : (p_old[i].x_coord > 639) ? 639
: (int)p_old[i].x_coord;
int obj_1_old_display_y = (p_old[i].y_coord < 0) ? 0 : (p_old[i].y_coord > 479) ? 479
: (int)p_old[i].y_coord;
p_new[i].x_coord = obj_1_new_x_coord;
p_new[i].y_coord = obj_1_new_y_coord;
p_new[i].x_vel = obj_1_new_x_vel;
p_new[i].y_vel = obj_1_new_y_vel;
p_new[i].old_display_x = p_new[i].x_coord / scale;
p_new[i].old_display_y = p_new[i].y_coord / scale;
VGA_disc(p_old[i].old_display_x, p_old[i].old_display_y, p_old[i].radius, black); // remove old
VGA_disc(p_new[i].old_display_x, p_new[i].old_display_y, p_old[i].radius, p_old[i].color); // add new
}
gettimeofday(&t2, NULL);
elapsedTime =
(t2.tv_sec - t1.tv_sec) * 1000.0; // sec
to ms
elapsedTime +=
(t2.tv_usec - t1.tv_usec) / 1000.0; // us to
ms
VGA_text_clear();
sprintf(screen_text, "%f ms.", elapsedTime);
VGA_text(5, 52, screen_text);
sprintf(screen_text, "%d particles.", particle_number);
VGA_text(5, 53, screen_text);
// delete old memory
free(p_old);
p_old = p_new;
}
} // end main
/**************************************************************************************
* Random
number within range (upper,lower)
**************************************************************************************/
int random_range(int lower, int upper) {
int num =
(rand() %
(upper - lower + 1))
+
lower;
return num;
}
/****************************************************************************************
*
Subroutine to send a string of text to the VGA monitor
****************************************************************************************/
void VGA_text(int x, int y, char *text_ptr) {
volatile char *character_buffer = (char *)vga_char_ptr; // VGA character buffer
int offset;
/*
assume that the text string fits on one line */
offset = (y << 7) + x;
while (*(text_ptr))
{
// write to the character
buffer
*(character_buffer + offset) =
*(text_ptr);
++text_ptr;
++offset;
}
}
/****************************************************************************************
*
Subroutine to clear text to the VGA monitor
****************************************************************************************/
void VGA_text_clear() {
volatile char *character_buffer = (char *)vga_char_ptr; // VGA character buffer
int offset, x, y;
for (x =
0; x < 79; x++) {
for (y =
0; y < 59; y++) {
/* assume that the text string fits on one line */
offset = (y << 7) + x;
// write to the character
buffer
*(character_buffer + offset) =
' ';
}
}
}
/****************************************************************************************
* Draw a
filled rectangle on the VGA monitor
****************************************************************************************/
#define SWAP(X,
Y) \
do {
\
int
temp = X; \
X = Y; \
Y = temp; \
} while (0)
void VGA_box(int x1, int y1, int x2, int y2, short pixel_color) {
char *pixel_ptr;
int row, col;
/* check
and fix box coordinates to be valid */
if (x1 > 639)
x1 = 639;
if (y1 > 479)
y1 = 479;
if (x2 > 639)
x2 = 639;
if (y2 > 479)
y2 = 479;
if (x1 < 0)
x1 = 0;
if (y1 < 0)
y1 = 0;
if (x2 < 0)
x2 = 0;
if (y2 < 0)
y2 = 0;
if (x1 > x2)
SWAP(x1, x2);
if (y1 > y2)
SWAP(y1, y2);
for (row =
y1; row <= y2; row++)
for (col =
x1; col <= x2; ++col) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
}
/****************************************************************************************
* Draw a
outline rectangle on the VGA monitor
****************************************************************************************/
#define SWAP(X,
Y) \
do {
\
int
temp = X; \
X = Y; \
Y = temp; \
} while (0)
void VGA_rect(int x1, int y1, int x2, int y2, short pixel_color) {
char *pixel_ptr;
int row, col;
/* check
and fix box coordinates to be valid */
if (x1 > 639)
x1 = 639;
if (y1 > 479)
y1 = 479;
if (x2 > 639)
x2 = 639;
if (y2 > 479)
y2 = 479;
if (x1 < 0)
x1 = 0;
if (y1 < 0)
y1 = 0;
if (x2 < 0)
x2 = 0;
if (y2 < 0)
y2 = 0;
if (x1 > x2)
SWAP(x1, x2);
if (y1 > y2)
SWAP(y1, y2);
// left edge
col = x1;
for (row =
y1; row <= y2; row++) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
// right edge
col = x2;
for (row =
y1; row <= y2; row++) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
// top edge
row = y1;
for (col =
x1; col <= x2; ++col) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
// bottom edge
row = y2;
for (col =
x1; col <= x2; ++col) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
}
/****************************************************************************************
* Draw a
horixontal line on the VGA monitor
****************************************************************************************/
#define SWAP(X,
Y) \
do {
\
int
temp = X; \
X = Y; \
Y = temp; \
} while (0)
void VGA_Hline(int x1, int y1, int x2, short pixel_color) {
char *pixel_ptr;
int row, col;
/* check
and fix box coordinates to be valid */
if (x1 > 639)
x1 = 639;
if (y1 > 479)
y1 = 479;
if (x2 > 639)
x2 = 639;
if (x1 < 0)
x1 = 0;
if (y1 < 0)
y1 = 0;
if (x2 < 0)
x2 = 0;
if (x1 > x2)
SWAP(x1, x2);
// line
row = y1;
for (col =
x1; col <= x2; ++col) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
}
/****************************************************************************************
* Draw a
vertical line on the VGA monitor
****************************************************************************************/
#define SWAP(X,
Y) \
do {
\
int
temp = X; \
X = Y; \
Y = temp; \
} while (0)
void VGA_Vline(int x1, int y1, int y2, short pixel_color) {
char *pixel_ptr;
int row, col;
/* check
and fix box coordinates to be valid */
if (x1 > 639)
x1 = 639;
if (y1 > 479)
y1 = 479;
if (y2 > 479)
y2 = 479;
if (x1 < 0)
x1 = 0;
if (y1 < 0)
y1 = 0;
if (y2 < 0)
y2 = 0;
if (y1 > y2)
SWAP(y1, y2);
// line
col = x1;
for (row =
y1; row <= y2; row++) {
// 640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) +
col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
}
/****************************************************************************************
* Draw a
filled circle on the VGA monitor
****************************************************************************************/
void VGA_disc(int x, int y, int r, short pixel_color) {
char *pixel_ptr;
int row, col, rsqr, xc, yc;
rsqr = r *
r;
for (yc =
-r; yc <= r; yc++)
for (xc =
-r; xc <= r; xc++)
{
col = xc;
row = yc;
// add the r to make the
edge smoother
if (col *
col + row * row <= rsqr +
r) {
col += x; // add the
center point
row += y; // add the
center point
//
check for valid 640x480
if (col > 639)
col = 639;
if (row > 479)
row = 479;
if (col < 0)
col = 0;
if (row < 0)
row = 0;
//
pixel_ptr = (char *)vga_pixel_ptr + (row<<10) + col ;
// set pixel color
//*(char
*)pixel_ptr = pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
}
}
/****************************************************************************************
* Draw
a circle on the VGA monitor
****************************************************************************************/
void VGA_circle(int x, int y, int r, int pixel_color) {
char *pixel_ptr;
int row, col, rsqr, xc, yc;
int col1, row1;
rsqr = r *
r;
for (yc =
-r; yc <= r; yc++)
{
// row = yc;
col1
= (int)sqrt((float)(rsqr +
r - yc * yc));
// right edge
col = col1 +
x; // add the center point
row = yc +
y; // add the center point
// check for valid 640x480
if (col > 639)
col = 639;
if (row > 479)
row = 479;
if (col < 0)
col = 0;
if (row < 0)
row = 0;
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) + col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
// left edge
col = -col1
+ x; // add the center point
// check for valid 640x480
if (col > 639)
col = 639;
if (row > 479)
row = 479;
if (col < 0)
col = 0;
if (row < 0)
row = 0;
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) + col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
for (xc =
-r; xc <= r; xc++)
{
// row = yc;
row1
= (int)sqrt((float)(rsqr +
r - xc * xc));
// right edge
col = xc +
x; // add the center point
row = row1 +
y; // add the center point
// check for valid 640x480
if (col > 639)
col = 639;
if (row > 479)
row = 479;
if (col < 0)
col = 0;
if (row < 0)
row = 0;
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) + col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
// left edge
row = -row1
+ y; // add the center point
// check for valid 640x480
if (col > 639)
col = 639;
if (row > 479)
row = 479;
if (col < 0)
col = 0;
if (row < 0)
row = 0;
// pixel_ptr = (char
*)vga_pixel_ptr + (row<<10) + col ;
// set pixel color
//*(char *)pixel_ptr =
pixel_color;
VGA_PIXEL(col, row, pixel_color);
}
}
// =============================================
// === Draw a line
// =============================================
// plot a line
// at x1,y1 to x2,y2 with color
// Code is from David Rodgers,
//"Procedural Elements of Computer
Graphics",1985
void VGA_line(int x1, int y1, int x2, int y2, short c) {
int e;
signed int dx, dy, j, temp;
signed int s1, s2, xchange;
signed int x, y;
char *pixel_ptr;
/* check
and fix line coordinates to be valid */
if (x1 > 639)
x1 = 639;
if (y1 > 479)
y1 = 479;
if (x2 > 639)
x2 = 639;
if (y2 > 479)
y2 = 479;
if (x1 < 0)
x1 = 0;
if (y1 < 0)
y1 = 0;
if (x2 < 0)
x2 = 0;
if (y2 < 0)
y2 = 0;
x = x1;
y = y1;
// take absolute value
if (x2 < x1) {
dx = x1 -
x2;
s1 = -1;
}
else if
(x2 == x1) {
dx = 0;
s1 = 0;
}
else {
dx = x2 -
x1;
s1 = 1;
}
if (y2 < y1) {
dy = y1 -
y2;
s2 = -1;
}
else if
(y2 == y1) {
dy = 0;
s2 = 0;
}
else {
dy = y2 -
y1;
s2 = 1;
}
xchange = 0;
if (dy > dx) {
temp
= dx;
dx = dy;
dy = temp;
xchange = 1;
}
e = ((int)dy << 1) - dx;
for (j =
0; j <= dx; j++)
{
// video_pt(x,y,c);
//640x480
// pixel_ptr = (char
*)vga_pixel_ptr + (y<<10)+ x;
// set pixel color
//*(char *)pixel_ptr = c;
VGA_PIXEL(x, y, c);
if (e >= 0) {
if (xchange ==
1)
x = x + s1;
else
y = y + s2;
e = e - ((int)dx << 1);
}
if (xchange ==
1)
y = y + s2;
else
x = x + s1;
e = e +
((int)dy << 1);
}
}
DE1-SoC-Computer
module DE1_SoC_Computer (
////////////////////////////////////
// FPGA Pins
////////////////////////////////////
// Clock pins
CLOCK_50,
CLOCK2_50,
CLOCK3_50,
CLOCK4_50,
// ADC
ADC_CS_N,
ADC_DIN,
ADC_DOUT,
ADC_SCLK,
// Audio
AUD_ADCDAT,
AUD_ADCLRCK,
AUD_BCLK,
AUD_DACDAT,
AUD_DACLRCK,
AUD_XCK,
// SDRAM
DRAM_ADDR,
DRAM_BA,
DRAM_CAS_N,
DRAM_CKE,
DRAM_CLK,
DRAM_CS_N,
DRAM_DQ,
DRAM_LDQM,
DRAM_RAS_N,
DRAM_UDQM,
DRAM_WE_N,
// I2C Bus for
Configuration of the Audio and Video-In Chips
FPGA_I2C_SCLK,
FPGA_I2C_SDAT,
// 40-Pin Headers
GPIO_0,
GPIO_1,
// Seven Segment Displays
HEX0,
HEX1,
HEX2,
HEX3,
HEX4,
HEX5,
// IR
IRDA_RXD,
IRDA_TXD,
// Pushbuttons
KEY,
// LEDs
LEDR,
// PS2 Ports
PS2_CLK,
PS2_DAT,
PS2_CLK2,
PS2_DAT2,
// Slider Switches
SW,
// Video-In
TD_CLK27,
TD_DATA,
TD_HS,
TD_RESET_N,
TD_VS,
// VGA
VGA_B,
VGA_BLANK_N,
VGA_CLK,
VGA_G,
VGA_HS,
VGA_R,
VGA_SYNC_N,
VGA_VS,
////////////////////////////////////
// HPS Pins
////////////////////////////////////
// DDR3 SDRAM
HPS_DDR3_ADDR,
HPS_DDR3_BA,
HPS_DDR3_CAS_N,
HPS_DDR3_CKE,
HPS_DDR3_CK_N,
HPS_DDR3_CK_P,
HPS_DDR3_CS_N,
HPS_DDR3_DM,
HPS_DDR3_DQ,
HPS_DDR3_DQS_N,
HPS_DDR3_DQS_P,
HPS_DDR3_ODT,
HPS_DDR3_RAS_N,
HPS_DDR3_RESET_N,
HPS_DDR3_RZQ,
HPS_DDR3_WE_N,
// Ethernet
HPS_ENET_GTX_CLK,
HPS_ENET_INT_N,
HPS_ENET_MDC,
HPS_ENET_MDIO,
HPS_ENET_RX_CLK,
HPS_ENET_RX_DATA,
HPS_ENET_RX_DV,
HPS_ENET_TX_DATA,
HPS_ENET_TX_EN,
// Flash
HPS_FLASH_DATA,
HPS_FLASH_DCLK,
HPS_FLASH_NCSO,
// Accelerometer
HPS_GSENSOR_INT,
// General Purpose I/O
HPS_GPIO,
// I2C
HPS_I2C_CONTROL,
HPS_I2C1_SCLK,
HPS_I2C1_SDAT,
HPS_I2C2_SCLK,
HPS_I2C2_SDAT,
// Pushbutton
HPS_KEY,
// LED
HPS_LED,
// SD Card
HPS_SD_CLK,
HPS_SD_CMD,
HPS_SD_DATA,
// SPI
HPS_SPIM_CLK,
HPS_SPIM_MISO,
HPS_SPIM_MOSI,
HPS_SPIM_SS,
// UART
HPS_UART_RX,
HPS_UART_TX,
// USB
HPS_CONV_USB_N,
HPS_USB_CLKOUT,
HPS_USB_DATA,
HPS_USB_DIR,
HPS_USB_NXT,
HPS_USB_STP
);
//=======================================================
//
PARAMETER declarations
//=======================================================
//=======================================================
// PORT
declarations
//=======================================================
////////////////////////////////////
// FPGA Pins
////////////////////////////////////
// Clock pins
input
CLOCK_50;
input
CLOCK2_50;
input
CLOCK3_50;
input
CLOCK4_50;
// ADC
inout
ADC_CS_N;
output ADC_DIN;
input
ADC_DOUT;
output
ADC_SCLK;
// Audio
input
AUD_ADCDAT;
inout
AUD_ADCLRCK;
inout
AUD_BCLK;
output
AUD_DACDAT;
inout AUD_DACLRCK;
output AUD_XCK;
// SDRAM
output [12: 0]
DRAM_ADDR;
output [ 1: 0]
DRAM_BA;
output
DRAM_CAS_N;
output
DRAM_CKE;
output
DRAM_CLK;
output
DRAM_CS_N;
inout [15: 0]
DRAM_DQ;
output
DRAM_LDQM;
output
DRAM_RAS_N;
output
DRAM_UDQM;
output
DRAM_WE_N;
// I2C Bus for Configuration of the Audio and
Video-In Chips
output
FPGA_I2C_SCLK;
inout
FPGA_I2C_SDAT;
// 40-pin headers
inout [35: 0]
GPIO_0;
inout [35: 0]
GPIO_1;
// Seven Segment Displays
output [ 6: 0]
HEX0;
output [ 6: 0]
HEX1;
output [ 6: 0]
HEX2;
output [ 6: 0]
HEX3;
output [ 6: 0]
HEX4;
output [ 6: 0]
HEX5;
// IR
input
IRDA_RXD;
output
IRDA_TXD;
// Pushbuttons
input [ 3: 0]
KEY;
// LEDs
output [ 9: 0]
LEDR;
// PS2 Ports
inout
PS2_CLK;
inout
PS2_DAT;
inout
PS2_CLK2;
inout
PS2_DAT2;
// Slider Switches
input [ 9: 0]
SW;
// Video-In
input
TD_CLK27;
input [ 7: 0]
TD_DATA;
input
TD_HS;
output
TD_RESET_N;
input
TD_VS;
// VGA
output [ 7: 0]
VGA_B;
output
VGA_BLANK_N;
output VGA_CLK;
output [ 7: 0]
VGA_G;
output VGA_HS;
output [ 7: 0]
VGA_R;
output
VGA_SYNC_N;
output VGA_VS;
////////////////////////////////////
// HPS Pins
////////////////////////////////////
// DDR3 SDRAM
output [14: 0]
HPS_DDR3_ADDR;
output [ 2: 0] HPS_DDR3_BA;
output
HPS_DDR3_CAS_N;
output
HPS_DDR3_CKE;
output
HPS_DDR3_CK_N;
output
HPS_DDR3_CK_P;
output
HPS_DDR3_CS_N;
output [ 3: 0]
HPS_DDR3_DM;
inout [31: 0]
HPS_DDR3_DQ;
inout [ 3: 0]
HPS_DDR3_DQS_N;
inout [ 3: 0]
HPS_DDR3_DQS_P;
output
HPS_DDR3_ODT;
output
HPS_DDR3_RAS_N;
output HPS_DDR3_RESET_N;
input
HPS_DDR3_RZQ;
output
HPS_DDR3_WE_N;
// Ethernet
output
HPS_ENET_GTX_CLK;
inout
HPS_ENET_INT_N;
output
HPS_ENET_MDC;
inout
HPS_ENET_MDIO;
input
HPS_ENET_RX_CLK;
input [ 3: 0]
HPS_ENET_RX_DATA;
input
HPS_ENET_RX_DV;
output [ 3: 0]
HPS_ENET_TX_DATA;
output
HPS_ENET_TX_EN;
// Flash
inout [ 3: 0]
HPS_FLASH_DATA;
output
HPS_FLASH_DCLK;
output
HPS_FLASH_NCSO;
// Accelerometer
inout
HPS_GSENSOR_INT;
// General Purpose I/O
inout [ 1: 0]
HPS_GPIO;
// I2C
inout
HPS_I2C_CONTROL;
inout
HPS_I2C1_SCLK;
inout
HPS_I2C1_SDAT;
inout
HPS_I2C2_SCLK;
inout
HPS_I2C2_SDAT;
// Pushbutton
inout
HPS_KEY;
// LED
inout
HPS_LED;
// SD Card
output
HPS_SD_CLK;
inout
HPS_SD_CMD;
inout [ 3: 0]
HPS_SD_DATA;
// SPI
output
HPS_SPIM_CLK;
input
HPS_SPIM_MISO;
output
HPS_SPIM_MOSI;
inout
HPS_SPIM_SS;
// UART
input
HPS_UART_RX;
output
HPS_UART_TX;
// USB
inout
HPS_CONV_USB_N;
input
HPS_USB_CLKOUT;
inout [ 7: 0]
HPS_USB_DATA;
input
HPS_USB_DIR;
input HPS_USB_NXT;
output
HPS_USB_STP;
//=======================================================
// REG/WIRE
declarations
//=======================================================
wire [15: 0]
hex3_hex0;
//wire
[15: 0] hex5_hex4;
//assign HEX0 = ~hex3_hex0[ 6: 0]; // hex3_hex0[
6: 0];
//assign HEX1 = ~hex3_hex0[14: 8];
//assign HEX2 = ~hex3_hex0[22:16];
//assign HEX3 = ~hex3_hex0[30:24];
assign HEX4 =
7'b1111111;
assign HEX5 =
7'b1111111;
// HexDigit Digit0(HEX0, hex3_hex0[3:0]);
// HexDigit Digit1(HEX1, hex3_hex0[7:4]);
// HexDigit Digit2(HEX2, hex3_hex0[11:8]);
// HexDigit Digit3(HEX3, hex3_hex0[15:12]);
//=======================================================
//
Structural coding
//=======================================================
Computer_System The_System (
////////////////////////////////////
// FPGA Side
////////////////////////////////////
// Global signals
.system_pll_ref_clk_clk
(CLOCK_50),
.system_pll_ref_reset_reset
(1'b0),
// AV Config
.av_config_SCLK
(FPGA_I2C_SCLK),
.av_config_SDAT
(FPGA_I2C_SDAT),
// VGA Subsystem
.vga_pll_ref_clk_clk
(CLOCK2_50),
.vga_pll_ref_reset_reset
(1'b0),
.vga_CLK
(VGA_CLK),
.vga_BLANK
(VGA_BLANK_N),
.vga_SYNC
(VGA_SYNC_N),
.vga_HS (VGA_HS),
.vga_VS (VGA_VS),
.vga_R (VGA_R),
.vga_G (VGA_G),
.vga_B (VGA_B),
// SDRAM
.sdram_clk_clk (DRAM_CLK),
.sdram_addr (DRAM_ADDR),
.sdram_ba (DRAM_BA),
.sdram_cas_n (DRAM_CAS_N),
.sdram_cke (DRAM_CKE),
.sdram_cs_n (DRAM_CS_N),
.sdram_dq (DRAM_DQ),
.sdram_dqm
({DRAM_UDQM,DRAM_LDQM}),
.sdram_ras_n (DRAM_RAS_N),
.sdram_we_n (DRAM_WE_N),
////////////////////////////////////
// HPS Side
////////////////////////////////////
// DDR3 SDRAM
.memory_mem_a
(HPS_DDR3_ADDR),
.memory_mem_ba
(HPS_DDR3_BA),
.memory_mem_ck
(HPS_DDR3_CK_P),
.memory_mem_ck_n
(HPS_DDR3_CK_N),
.memory_mem_cke
(HPS_DDR3_CKE),
.memory_mem_cs_n (HPS_DDR3_CS_N),
.memory_mem_ras_n
(HPS_DDR3_RAS_N),
.memory_mem_cas_n
(HPS_DDR3_CAS_N),
.memory_mem_we_n
(HPS_DDR3_WE_N),
.memory_mem_reset_n (HPS_DDR3_RESET_N),
.memory_mem_dq
(HPS_DDR3_DQ),
.memory_mem_dqs (HPS_DDR3_DQS_P),
.memory_mem_dqs_n
(HPS_DDR3_DQS_N),
.memory_mem_odt
(HPS_DDR3_ODT),
.memory_mem_dm
(HPS_DDR3_DM),
.memory_oct_rzqin
(HPS_DDR3_RZQ),
// Ethernet
.hps_io_hps_io_gpio_inst_GPIO35 (HPS_ENET_INT_N),
.hps_io_hps_io_emac1_inst_TX_CLK
(HPS_ENET_GTX_CLK),
.hps_io_hps_io_emac1_inst_TXD0
(HPS_ENET_TX_DATA[0]),
.hps_io_hps_io_emac1_inst_TXD1
(HPS_ENET_TX_DATA[1]),
.hps_io_hps_io_emac1_inst_TXD2
(HPS_ENET_TX_DATA[2]),
.hps_io_hps_io_emac1_inst_TXD3
(HPS_ENET_TX_DATA[3]),
.hps_io_hps_io_emac1_inst_RXD0
(HPS_ENET_RX_DATA[0]),
.hps_io_hps_io_emac1_inst_MDIO
(HPS_ENET_MDIO),
.hps_io_hps_io_emac1_inst_MDC
(HPS_ENET_MDC),
.hps_io_hps_io_emac1_inst_RX_CTL
(HPS_ENET_RX_DV),
.hps_io_hps_io_emac1_inst_TX_CTL
(HPS_ENET_TX_EN),
.hps_io_hps_io_emac1_inst_RX_CLK
(HPS_ENET_RX_CLK),
.hps_io_hps_io_emac1_inst_RXD1
(HPS_ENET_RX_DATA[1]),
.hps_io_hps_io_emac1_inst_RXD2
(HPS_ENET_RX_DATA[2]),
.hps_io_hps_io_emac1_inst_RXD3
(HPS_ENET_RX_DATA[3]),
// Flash
.hps_io_hps_io_qspi_inst_IO0 (HPS_FLASH_DATA[0]),
.hps_io_hps_io_qspi_inst_IO1
(HPS_FLASH_DATA[1]),
.hps_io_hps_io_qspi_inst_IO2
(HPS_FLASH_DATA[2]),
.hps_io_hps_io_qspi_inst_IO3
(HPS_FLASH_DATA[3]),
.hps_io_hps_io_qspi_inst_SS0
(HPS_FLASH_NCSO),
.hps_io_hps_io_qspi_inst_CLK
(HPS_FLASH_DCLK),
// Accelerometer
.hps_io_hps_io_gpio_inst_GPIO61 (HPS_GSENSOR_INT),
//.adc_sclk (ADC_SCLK),
//.adc_cs_n (ADC_CS_N),
//.adc_dout (ADC_DOUT),
//.adc_din (ADC_DIN),
// General Purpose I/O
.hps_io_hps_io_gpio_inst_GPIO40 (HPS_GPIO[0]),
.hps_io_hps_io_gpio_inst_GPIO41 (HPS_GPIO[1]),
// I2C
.hps_io_hps_io_gpio_inst_GPIO48 (HPS_I2C_CONTROL),
.hps_io_hps_io_i2c0_inst_SDA
(HPS_I2C1_SDAT),
.hps_io_hps_io_i2c0_inst_SCL
(HPS_I2C1_SCLK),
.hps_io_hps_io_i2c1_inst_SDA
(HPS_I2C2_SDAT),
.hps_io_hps_io_i2c1_inst_SCL
(HPS_I2C2_SCLK),
// Pushbutton
.hps_io_hps_io_gpio_inst_GPIO54 (HPS_KEY),
// LED
.hps_io_hps_io_gpio_inst_GPIO53 (HPS_LED),
// SD Card
.hps_io_hps_io_sdio_inst_CMD
(HPS_SD_CMD),
.hps_io_hps_io_sdio_inst_D0 (HPS_SD_DATA[0]),
.hps_io_hps_io_sdio_inst_D1 (HPS_SD_DATA[1]),
.hps_io_hps_io_sdio_inst_CLK
(HPS_SD_CLK),
.hps_io_hps_io_sdio_inst_D2 (HPS_SD_DATA[2]),
.hps_io_hps_io_sdio_inst_D3 (HPS_SD_DATA[3]),
// SPI
.hps_io_hps_io_spim1_inst_CLK
(HPS_SPIM_CLK),
.hps_io_hps_io_spim1_inst_MOSI
(HPS_SPIM_MOSI),
.hps_io_hps_io_spim1_inst_MISO
(HPS_SPIM_MISO),
.hps_io_hps_io_spim1_inst_SS0
(HPS_SPIM_SS),
// UART
.hps_io_hps_io_uart0_inst_RX
(HPS_UART_RX),
.hps_io_hps_io_uart0_inst_TX
(HPS_UART_TX),
// USB
.hps_io_hps_io_gpio_inst_GPIO09 (HPS_CONV_USB_N),
.hps_io_hps_io_usb1_inst_D0
(HPS_USB_DATA[0]),
.hps_io_hps_io_usb1_inst_D1
(HPS_USB_DATA[1]),
.hps_io_hps_io_usb1_inst_D2
(HPS_USB_DATA[2]),
.hps_io_hps_io_usb1_inst_D3
(HPS_USB_DATA[3]),
.hps_io_hps_io_usb1_inst_D4
(HPS_USB_DATA[4]),
.hps_io_hps_io_usb1_inst_D5
(HPS_USB_DATA[5]),
.hps_io_hps_io_usb1_inst_D6 (HPS_USB_DATA[6]),
.hps_io_hps_io_usb1_inst_D7
(HPS_USB_DATA[7]),
.hps_io_hps_io_usb1_inst_CLK
(HPS_USB_CLKOUT),
.hps_io_hps_io_usb1_inst_STP
(HPS_USB_STP),
.hps_io_hps_io_usb1_inst_DIR
(HPS_USB_DIR),
.hps_io_hps_io_usb1_inst_NXT
(HPS_USB_NXT),
.complete_0_external_connection_export(complete_w[0]), // complete_0_external_connection.export
.complete_1_external_connection_export(complete_w[1]), // complete_1_external_connection.export
.complete_2_external_connection_export(complete_w[2]), // complete_2_external_connection.export
.complete_3_external_connection_export(complete_w[3]), // complete_3_external_connection.export
.complete_4_external_connection_export(complete_w[4]), // complete_4_external_connection.export
.complete_5_external_connection_export(complete_w[5]), // complete_5_external_connection.export
.complete_6_external_connection_export(complete_w[6]), // complete_6_external_connection.export
.complete_7_external_connection_export(complete_w[7]), // complete_7_external_connection.export
.obj1_delta_x_acc_0_external_connection_export(obj_1_delta_x_acc_o_w[0]), // obj1_delta_x_acc_0_external_connection.export
.obj1_delta_x_acc_1_external_connection_export(obj_1_delta_x_acc_o_w[1]), // obj1_delta_x_acc_1_external_connection.export
.obj1_delta_x_acc_2_external_connection_export(obj_1_delta_x_acc_o_w[2]), // obj1_delta_x_acc_2_external_connection.export
.obj1_delta_x_acc_3_external_connection_export(obj_1_delta_x_acc_o_w[3]), // obj1_delta_x_acc_3_external_connection.export
.obj1_delta_x_acc_4_external_connection_export(obj_1_delta_x_acc_o_w[4]), // obj1_delta_x_acc_4_external_connection.export
.obj1_delta_x_acc_5_external_connection_export(obj_1_delta_x_acc_o_w[5]), // obj1_delta_x_acc_5_external_connection.export
.obj1_delta_x_acc_6_external_connection_export(obj_1_delta_x_acc_o_w[6]), // obj1_delta_x_acc_6_external_connection.export
.obj1_delta_x_acc_7_external_connection_export(obj_1_delta_x_acc_o_w[7]), // obj1_delta_x_acc_7_external_connection.export
.obj1_delta_y_acc_0_external_connection_export(obj_1_delta_y_acc_o_w[0]), // obj1_delta_y_acc_0_external_connection.export
.obj1_delta_y_acc_1_external_connection_export(obj_1_delta_y_acc_o_w[1]), // obj1_delta_y_acc_1_external_connection.export
.obj1_delta_y_acc_2_external_connection_export(obj_1_delta_y_acc_o_w[2]), // obj1_delta_y_acc_2_external_connection.export
.obj1_delta_y_acc_3_external_connection_export(obj_1_delta_y_acc_o_w[3]), // obj1_delta_y_acc_3_external_connection.export
.obj1_delta_y_acc_4_external_connection_export(obj_1_delta_y_acc_o_w[4]), // obj1_delta_y_acc_4_external_connection.export
.obj1_delta_y_acc_5_external_connection_export(obj_1_delta_y_acc_o_w[5]), // obj1_delta_y_acc_5_external_connection.export
.obj1_delta_y_acc_6_external_connection_export(obj_1_delta_y_acc_o_w[6]), // obj1_delta_y_acc_6_external_connection.export
.obj1_delta_y_acc_7_external_connection_export(obj_1_delta_y_acc_o_w[7]), // obj1_delta_y_acc_7_external_connection.export
.obj1_mass_a_external_connection_export(obj_1_mass_i_w), //
obj1_mass_a_external_connection.export
.obj1_num_a_external_connection_export(obj_1_num_i_w), //
obj1_num_a_external_connection.export
.obj1_xcoord_a_external_connection_export(obj_1_x_coord_i_w), //
obj1_xcoord_a_external_connection.export
.obj1_ycoord_a_external_connection_export(obj_1_y_coord_i_w), //
obj1_ycoord_a_external_connection.export
.obj2_mass_a0_external_connection_export(obj_2_mass_i_w[0]), // obj2_mass_a0_external_connection.export
.obj2_mass_a1_external_connection_export(obj_2_mass_i_w[1]), // obj2_mass_a1_external_connection.export
.obj2_mass_a2_external_connection_export(obj_2_mass_i_w[2]), // obj2_mass_a2_external_connection.export
.obj2_mass_a3_external_connection_export(obj_2_mass_i_w[3]), //
obj2_mass_a3_external_connection.export
.obj2_mass_a4_external_connection_export(obj_2_mass_i_w[4]), // obj2_mass_a4_external_connection.export
.obj2_mass_a5_external_connection_export(obj_2_mass_i_w[5]), // obj2_mass_a5_external_connection.export
.obj2_mass_a6_external_connection_export(obj_2_mass_i_w[6]), // obj2_mass_a6_external_connection.export
.obj2_mass_a7_external_connection_export(obj_2_mass_i_w[7]), // obj2_mass_a7_external_connection.export
.obj2_num_a0_external_connection_export(obj_2_num_i_w[0]), // obj2_num_a0_external_connection.export
.obj2_num_a1_external_connection_export(obj_2_num_i_w[1]), // obj2_num_a1_external_connection.export
.obj2_num_a2_external_connection_export(obj_2_num_i_w[2]), // obj2_num_a2_external_connection.export
.obj2_num_a3_external_connection_export(obj_2_num_i_w[3]), // obj2_num_a3_external_connection.export
.obj2_num_a4_external_connection_export(obj_2_num_i_w[4]), // obj2_num_a4_external_connection.export
.obj2_num_a5_external_connection_export(obj_2_num_i_w[5]), // obj2_num_a5_external_connection.export
.obj2_num_a6_external_connection_export(obj_2_num_i_w[6]), // obj2_num_a6_external_connection.export
.obj2_num_a7_external_connection_export(obj_2_num_i_w[7]), // obj2_num_a7_external_connection.export
.obj2_xcoord_a0_external_connection_export(obj_2_x_coord_i_w[0]), // obj2_xcoord_a0_external_connection.export
.obj2_xcoord_a1_external_connection_export(obj_2_x_coord_i_w[1]), // obj2_xcoord_a1_external_connection.export
.obj2_xcoord_a2_external_connection_export(obj_2_x_coord_i_w[2]), // obj2_xcoord_a2_external_connection.export
.obj2_xcoord_a3_external_connection_export(obj_2_x_coord_i_w[3]), // obj2_xcoord_a3_external_connection.export
.obj2_xcoord_a4_external_connection_export(obj_2_x_coord_i_w[4]), // obj2_xcoord_a4_external_connection.export
.obj2_xcoord_a5_external_connection_export(obj_2_x_coord_i_w[5]), // obj2_xcoord_a5_external_connection.export
.obj2_xcoord_a6_external_connection_export(obj_2_x_coord_i_w[6]), // obj2_xcoord_a6_external_connection.export
.obj2_xcoord_a7_external_connection_export(obj_2_x_coord_i_w[7]), // obj2_xcoord_a7_external_connection.export
.obj2_ycoord_a0_external_connection_export(obj_2_y_coord_i_w[0]), // obj2_ycoord_a0_external_connection.export
.obj2_ycoord_a1_external_connection_export(obj_2_y_coord_i_w[1]), // obj2_ycoord_a1_external_connection.export
.obj2_ycoord_a2_external_connection_export(obj_2_y_coord_i_w[2]), // obj2_ycoord_a2_external_connection.export
.obj2_ycoord_a3_external_connection_export(obj_2_y_coord_i_w[3]), // obj2_ycoord_a3_external_connection.export
.obj2_ycoord_a4_external_connection_export(obj_2_y_coord_i_w[4]), // obj2_ycoord_a4_external_connection.export
.obj2_ycoord_a5_external_connection_export(obj_2_y_coord_i_w[5]), // obj2_ycoord_a5_external_connection.export
.obj2_ycoord_a6_external_connection_export(obj_2_y_coord_i_w[6]), // obj2_ycoord_a6_external_connection.export
.obj2_ycoord_a7_external_connection_export(obj_2_y_coord_i_w[7]), // obj2_ycoord_a7_external_connection.export
.reset_external_connection_export(reset_w),
.compute_module_clk_out_clk(clk_w)
);
wire [31:0] obj_1_num_i_w, obj_1_mass_i_w,
obj_1_x_coord_i_w, obj_1_y_coord_i_w;
wire [31:0] obj_2_num_i_w [0:7];
wire [31:0] obj_2_mass_i_w [0:7];
wire [31:0] obj_2_x_coord_i_w [0:7];
wire [31:0] obj_2_y_coord_i_w [0:7];
wire [31:0] obj_1_delta_x_acc_o_w [0:7];
wire [31:0] obj_1_delta_y_acc_o_w [0:7];
wire complete_w [0:7];
wire clk_w, reset_w, data_sent_w;
wire [5:0] state_w;
genvar A;
generate
for(A =
0; A < 8; A = A +
1) begin: gravitymodule_generate
if(A ==
0) begin
gravity_module_simplified g_cal (
.obj_1_num_i(obj_1_num_i_w),
.obj_1_mass_i(obj_1_mass_i_w),
.obj_1_x_coord_i(obj_1_x_coord_i_w),
.obj_1_y_coord_i(obj_1_y_coord_i_w),
.obj_2_num_i(obj_2_num_i_w[A]),
.obj_2_mass_i(obj_2_mass_i_w[A]),
.obj_2_x_coord_i(obj_2_x_coord_i_w[A]),
.obj_2_y_coord_i(obj_2_y_coord_i_w[A]),
.obj_1_delta_x_acc_o(obj_1_delta_x_acc_o_w[A]),
.obj_1_delta_y_acc_o(obj_1_delta_y_acc_o_w[A]),
.complete(complete_w[A]),
.clk(clk_w),
.reset(reset_w),
.state_o(state_w)
);
end
else begin
gravity_module_simplified g_cal (
.obj_1_num_i(obj_1_num_i_w),
.obj_1_mass_i(obj_1_mass_i_w),
.obj_1_x_coord_i(obj_1_x_coord_i_w),
.obj_1_y_coord_i(obj_1_y_coord_i_w),
.obj_2_num_i(obj_2_num_i_w[A]),
.obj_2_mass_i(obj_2_mass_i_w[A]),
.obj_2_x_coord_i(obj_2_x_coord_i_w[A]),
.obj_2_y_coord_i(obj_2_y_coord_i_w[A]),
.obj_1_delta_x_acc_o(obj_1_delta_x_acc_o_w[A]),
.obj_1_delta_y_acc_o(obj_1_delta_y_acc_o_w[A]),
.complete(complete_w[A]),
.clk(clk_w),
.reset(reset_w),
.state_o()
);
end
end
endgenerate
HexDigit Digit0(HEX0, state_w[3:0]);
HexDigit Digit1(HEX1, state_w[5:4]);
endmodule
module gravity_module_simplified(
input [31:0] obj_1_num_i,
input [31:0] obj_1_mass_i,
input [31:0] obj_1_x_coord_i,
input [31:0] obj_1_y_coord_i,
input [31:0] obj_2_num_i,
input [31:0] obj_2_mass_i,
input [31:0] obj_2_x_coord_i,
input [31:0] obj_2_y_coord_i,
output [31:0] obj_1_delta_x_acc_o,
output [31:0] obj_1_delta_y_acc_o,
output complete,
input clk,
input reset,
output [5:0] state_o
);
// object 1 is stored in
the registers
// object 2 is shifted to
the next calculation module
reg [26:0] obj_1_num, obj_1_mass, obj_1_x_coord,
obj_1_y_coord;
wire [26:0] obj_1_x_coord_neg, obj_1_y_coord_neg;
reg [26:0] obj_2_num, obj_2_mass, obj_2_x_coord,
obj_2_y_coord;
wire [26:0] r_x_wire, r_y_wire;
wire [26:0] r_x_sq_wire, r_y_sq_wire;
wire [26:0] r_x_sq_plus_r_y_sq_wire;
wire [26:0] dd_wire, dd_sq_wire, ddd_wire, d_wire;
wire [26:0] d_times_obj_2_mass_wire;
wire [26:0] obj_1_delta_x_acc_wire_no_G,
obj_1_delta_y_acc_wire_no_G;
wire [26:0] obj_1_delta_x_acc_wire, obj_1_delta_y_acc_wire;
reg [26:0] d_times_obj_2_mass_reg;
reg [26:0] r_x_reg, r_y_reg;
reg [26:0] dd_reg;
reg [5:0] state;
reg [26:0] eps =
27'b0_00001111_000000000000000000;
reg [26:0] G =
27'b0_01011101_001001011000011010;
//reg [26:0] G =
27'b0_01111111_000000000000000000;
reg add_clk_0 =
0;
reg add_clk_1 =
0;
reg add_clk_2 =
0;
reg inv_sqrt_clk_0 = 0;
reg [2:0] inv_sqrt_counter = 3'b0;
FpNegate neg_x_coord(obj_1_x_coord, obj_1_x_coord_neg);
FpNegate neg_y_coord(obj_1_y_coord, obj_1_y_coord_neg);
FpAdd r_x_add(add_clk_0, obj_1_x_coord_neg, obj_2_x_coord, r_x_wire);
FpAdd r_y_add(add_clk_0, obj_1_y_coord_neg, obj_2_y_coord, r_y_wire);
FpMul r_x_sq(r_x_reg, r_x_reg, r_x_sq_wire);
FpMul r_y_sq(r_y_reg, r_y_reg, r_y_sq_wire);
FpAdd r_x_sq_plus_r_y_sq(add_clk_1, r_x_sq_wire, r_y_sq_wire, r_x_sq_plus_r_y_sq_wire);
FpAdd dd(add_clk_2,
r_x_sq_plus_r_y_sq_wire, eps, dd_wire);
FpMul dd_sq(dd_reg, dd_reg, dd_sq_wire);
FpMul ddd_mul(dd_reg, dd_sq_wire, ddd_wire);
FpInvSqrt ddd_to_d(inv_sqrt_clk_0, ddd_wire, d_wire);
FpMul d_times_obj_2_mass(d_wire, obj_2_mass, d_times_obj_2_mass_wire);
FpMul obj_1_delta_x_acc_no_G(d_times_obj_2_mass_reg, r_x_reg, obj_1_delta_x_acc_wire_no_G);
FpMul obj_1_delta_y_acc_no_G(d_times_obj_2_mass_reg, r_y_reg, obj_1_delta_y_acc_wire_no_G);
FpMul obj_1_delta_x_acc(obj_1_delta_x_acc_wire_no_G, G, obj_1_delta_x_acc_wire);
FpMul obj_1_delta_y_acc(obj_1_delta_y_acc_wire_no_G, G, obj_1_delta_y_acc_wire);
always @(posedge clk) begin
if(reset)
begin
// reset state
state <= 6'd0;
end
else begin
case (state)
6'd0: begin
//
reset
obj_1_num <= obj_1_num_i[26:0];
obj_1_mass <= obj_1_mass_i[31:5];
obj_1_x_coord <= obj_1_x_coord_i[31:5];
obj_1_y_coord <= obj_1_y_coord_i[31:5];
obj_2_num <= obj_2_num_i[26:0];
obj_2_mass <= obj_2_mass_i[31:5];
obj_2_x_coord <= obj_2_x_coord_i[31:5];
obj_2_y_coord <= obj_2_y_coord_i[31:5];
inv_sqrt_counter <= 3'b0;
add_clk_0 <= 0;
add_clk_1 <= 0;
add_clk_2 <= 0;
inv_sqrt_clk_0 <= 0;
state <= 6'd1;
end
6'd1: begin
//
wait
add_clk_0 <= 1;
state <= 6'd2;
end
6'd2: begin
add_clk_0 <= 0;
state <= 6'd3;
end
6'd3: begin
add_clk_0 <= 1;
state <= 6'd4;
end
6'd4: begin
//
write r_x and r_y regs
add_clk_0 <= 0;
r_x_reg <= r_x_wire;
r_y_reg <= r_y_wire;
state <= 6'd5;
end
6'd5: begin
//
generate posedge for add_clk_1
add_clk_1 <= 1;
state <= 6'd6;
end
6'd6: begin
//
wait
add_clk_1 <= 0;
state <= 6'd7;
end
6'd7: begin
add_clk_1 <= 1;
state <= 6'd8;
end
6'd8: begin
add_clk_1 <= 0;
state <= 6'd9;
end
6'd9: begin
add_clk_2 <= 1;
state <= 6'd10;
end
6'd10: begin
add_clk_2 <= 0;
state <= 6'd11;
end
6'd11: begin
add_clk_2 <= 1;
state <= 6'd12;
end
6'd12: begin
add_clk_2 <= 0;
state <= 6'd13;
end
6'd13: begin
dd_reg <= dd_wire;
state <= 6'd14;
end
6'd14: begin
inv_sqrt_counter <= inv_sqrt_counter +
1'b1;
inv_sqrt_clk_0 <= 1;
state <= 6'd15;
end
6'd15: begin
inv_sqrt_clk_0 <= 0;
if(inv_sqrt_counter > 3'd5) begin
state <= 6'd16;
inv_sqrt_counter <= 3'b0;
end
else begin
state <= 6'd14;
end
end
6'd16: begin
d_times_obj_2_mass_reg <= d_times_obj_2_mass_wire;
state <= 6'd17;
end
6'd17: begin
state <= 6'd17;
end
endcase
end
end
assign complete = (state ==
6'd17) ? 1'b1 : 1'b0; //
wait for new obj 2
assign state_o = state;
assign obj_1_delta_x_acc_o = (obj_1_num ==
obj_2_num) ? 32'b0 :
{obj_1_delta_x_acc_wire, 5'b0};
assign obj_1_delta_y_acc_o = (obj_1_num ==
obj_2_num) ? 32'b0 :
{obj_1_delta_y_acc_wire, 5'b0};
endmodule
/**************************************************************************
* Following
modules written by Bruce Land
* March
2017
*************************************************************************/
/**************************************************************************
* Floating
Point to 16-bit integer
*
Combinational
* Numbers
with mag > than +/-32768 get clipped to 32768 or -32768
*************************************************************************/
module Int2Fp(
input signed [15:0] iInteger,
output[26:0] oA
);
// output fields
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
wire [15:0] abs_input ;
// get output sign bit
assign A_s = (iInteger < 0);
// remove sign from input
assign abs_input = (iInteger < 0)? -iInteger : iInteger ;
// find the most
significant (nonzero) bit
wire [7:0] shft_amt;
assign shft_amt = abs_input[15]
? 8'd3 :
abs_input[14] ? 8'd4 : abs_input[13]
? 8'd5 :
abs_input[12] ? 8'd6 : abs_input[11]
? 8'd7 :
abs_input[10] ? 8'd8 : abs_input[9] ? 8'd9 :
abs_input[8] ? 8'd10 : abs_input[7] ? 8'd11 :
abs_input[6] ? 8'd12 : abs_input[5] ? 8'd13 :
abs_input[4] ? 8'd14 : abs_input[3] ? 8'd15 :
abs_input[2] ? 8'd16 : abs_input[1] ? 8'd17 :
abs_input[0] ? 8'd18 : 8'd19;
// exponent 127 +
(18-shift_amt)
// 127 is 2^0
// 18 is amount '1' is
shifted
assign A_e = 127 + 18 - shft_amt ;
// where the intermediate
value is formed
wire [33:0] shift_buffer ;
// remember that the
high-order '1' is not stored,
// but is shifted to bit 18
assign shift_buffer = {16'b0, abs_input} << shft_amt ;
assign A_f = shift_buffer[17:0];
assign oA = (iInteger==0)? 27'b0 : {A_s, A_e, A_f};
endmodule //Int2Fp
/**************************************************************************
* Floating
Point to 16-bit integer *
*
Combinational
* Numbers
with mag > than +/-32768 get clipped to 32768 or -32768
*************************************************************************/
module Fp2Int(
input [26:0]
iA,
output reg [15:0] oInteger
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = iA[17:0];
wire [15:0] max_int =
16'h7fff ; //32768
wire [33:0] shift_buffer ;
// form (1.A_f) and shift
it to postiion
assign shift_buffer = {15'b0, 1'b1, A_f}<<(A_e-127) ;
// If exponent less than
127, oInteger=0
// If exponent greater
than 127+14 oInteger=max value
// Between these two
values:
// set up input mantissa
with 1.mantissa
// and the "1." in the lowest bit of
an extended word.
// shift-left by A_e-127
// If the sign bit is set,
negate oInteger
always @(*) begin
if (A_e < 127) oInteger = 16'b0;
else if
(A_e > 141) begin
if (A_s) oInteger = -max_int;
else oInteger = max_int;
end
else begin
if (A_s) oInteger = -shift_buffer[33:18];
else oInteger = shift_buffer[33:18];
end
end
endmodule //Fp2Int
/**************************************************************************
* Floating
Point shift *
*
Combinational
* Negative
shift input is right shift
*************************************************************************/
module FpShift(
input [26:0]
iA,
input [7:0] iShift,
output [26:0] oShifted
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = iA[17:0];
// Flip bit 26
// zero the output if
underflow/overflow
// assign
oShifted = (A_e+iShift<8'd254 && A_e+iShift>8'd2)?
// {A_s,
A_e+iShift, A_f}
assign oShifted = {A_s, A_e+iShift,
A_f} ;
endmodule //FpShift
/**************************************************************************
* Floating
Point sign negation *
*
Combinational
*
*************************************************************************/
module FpNegate(
input [26:0]
iA,
output [26:0] oNegative
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = iA[17:0];
// Flip bit 26
assign oNegative = {~A_s,
A_e, A_f};
endmodule //FpNegate
/**************************************************************************
* Floating
Point absolute *
*
Combinational
*
*************************************************************************/
module FpAbs(
input [26:0]
iA,
output [26:0] oAbs
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = iA[17:0];
// zero bit 26
assign oAbs = {1'b0, A_e, A_f};
endmodule //Fp absolute
/**************************************************************************
* Floating
Point compare *
*
Combinational
* output=1
if A>=B
*************************************************************************/
module FpCompare(
input [26:0]
iA,
input [26:0]
iB,
output reg oA_larger
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
wire B_s;
wire [7:0] B_e;
wire [17:0] B_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = iA[17:0];
assign B_s = iB[26];
assign B_e = iB[25:18];
assign B_f = iB[17:0];
// Determine which of A, B
is larger
wire A_mag_larger ;
assign A_mag_larger =(A_e > B_e) ? 1'b1 :
((A_e == B_e) && (A_f >= B_f)) ? 1'b1 :
1'b0;
// now do the sign checks
always @(*) begin
if (A_s==0 && B_s==1) begin // A positive, B negative
oA_larger = 1'b1 ;
end
if (A_s==1 && B_s==0) begin // A negative, B positive
oA_larger = 1'b0 ;
end
if (A_s==0 && B_s==0) begin // A positive, B positive
oA_larger = A_mag_larger ;
end
if (A_s==1 && B_s==1) begin // A negative, B negative
oA_larger = ~A_mag_larger ;
end
end
endmodule //FpCompare
/**************************************************************************
* Mark
Eiding mje56
*
* ECE
5760
*
* Modified
IEEE single precision FP *
* bit
26: Sign (0: pos, 1: neg) *
*
bits[25:18]: Exponent (unsigned) *
*
bits[17:0]: Fraction (unsigned) *
* (-1)^SIGN * 2^(EXP-127) * (1+.FRAC) *
*
(http://en.wikipedia.org/wiki/Single-precision_floating-point_format) *
* Adapted
from Skyler Schneider ss868 *
*************************************************************************/
/**************************************************************************
* Floating
Point Fast Inverse Square Root *
* 5-stage
pipeline
*
*
http://en.wikipedia.org/wiki/Fast_inverse_square_root *
* Magic
number 27'd49920718 *
* 1.5 =
27'd33423360
*
*************************************************************************/
module FpInvSqrt (
input
iCLK,
input [26:0]
iA,
output [26:0] oInvSqrt
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = iA[17:0];
//Stage 1
wire [26:0] y_1, y_1_out, half_iA_1;
assign y_1 = 27'd49920718 - (iA>>1);
assign half_iA_1 = {A_s, A_e-8'd1,A_f};
FpMul s1_mult ( .iA(y_1), .iB(y_1), .oProd(y_1_out) );
//Stage 2
reg [26:0] y_2, mult_2_in, half_iA_2;
wire [26:0] y_2_out;
FpMul s2_mult ( .iA(half_iA_2), .iB(mult_2_in), .oProd(y_2_out) );
//Stage 3
reg [26:0] y_3, add_3_in;
wire [26:0] y_3_out;
FpAdd s3_add ( .iCLK(iCLK), .iA({~add_3_in[26],add_3_in[25:0]}), .iB(27'd33423360), .oSum(y_3_out) );
//Stage 4
reg [26:0] y_4;
//Stage 5
reg [26:0] y_5, mult_5_in;
FpMul s5_mult ( .iA(y_5), .iB(mult_5_in), .oProd(oInvSqrt) );
always @(posedge iCLK) begin
//Stage 1 to 2
y_2 <= y_1;
mult_2_in <= y_1_out;
half_iA_2 <= half_iA_1;
//Stage 2 to 3
y_3 <= y_2;
add_3_in
<= y_2_out;
//Stage 3 to 4
y_4 <= y_3;
//Stage 4 to 5
y_5 <= y_4;
mult_5_in <= y_3_out;
end
endmodule
/**************************************************************************
* Floating
Point Multiplier *
*
Combinational
*
*************************************************************************/
module FpMul (
input [26:0] iA, // First input
input [26:0]
iB, // Second input
output [26:0] oProd // Product
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
wire B_s;
wire [7:0] B_e;
wire [17:0] B_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = {1'b1, iA[17:1]};
assign B_s = iB[26];
assign B_e = iB[25:18];
assign B_f = {1'b1, iB[17:1]};
// XOR sign bits to
determine product sign.
wire
oProd_s;
assign oProd_s = A_s ^ B_s;
// Multiply the fractions
of A and B
wire [35:0] pre_prod_frac;
assign pre_prod_frac = A_f *
B_f;
// Add exponents of A and
B
wire [8:0]
pre_prod_exp;
assign pre_prod_exp = A_e +
B_e;
// If top bit of product
frac is 0, shift left one
wire [7:0] oProd_e;
wire [17:0] oProd_f;
assign oProd_e = pre_prod_frac[35] ? (pre_prod_exp-9'd126) : (pre_prod_exp - 9'd127);
assign oProd_f = pre_prod_frac[35] ? pre_prod_frac[34:17] : pre_prod_frac[33:16];
// Detect underflow
wire
underflow;
assign underflow = pre_prod_exp < 9'h80;
// Detect zero conditions
(either product frac doesn't start with 1, or underflow)
assign oProd = underflow
? 27'b0 :
(B_e == 8'd0) ? 27'b0 :
(A_e == 8'd0) ? 27'b0 :
{oProd_s, oProd_e, oProd_f};
endmodule
/**************************************************************************
* Floating
Point Adder
*
* 2-stage
pipeline
*
*************************************************************************/
module FpAdd (
input
iCLK,
input [26:0]
iA,
input [26:0]
iB,
output reg [26:0] oSum
);
// Extract fields of A and
B.
wire A_s;
wire [7:0] A_e;
wire [17:0] A_f;
wire B_s;
wire [7:0] B_e;
wire [17:0] B_f;
assign A_s = iA[26];
assign A_e = iA[25:18];
assign A_f = {1'b1, iA[17:1]};
assign B_s = iB[26];
assign B_e = iB[25:18];
assign B_f = {1'b1, iB[17:1]};
wire A_larger;
// Shift fractions of A
and B so that they align.
wire [7:0]
exp_diff_A;
wire [7:0]
exp_diff_B;
wire [7:0]
larger_exp;
wire [36:0] A_f_shifted;
wire [36:0] B_f_shifted;
assign exp_diff_A = B_e -
A_e; // if B
bigger
assign exp_diff_B = A_e -
B_e; // if A
bigger
assign larger_exp = (B_e > A_e) ? B_e : A_e;
assign A_f_shifted = A_larger ? {1'b0, A_f, 18'b0} :
(exp_diff_A > 9'd35) ? 37'b0 :
({1'b0, A_f, 18'b0} >> exp_diff_A);
assign B_f_shifted = ~A_larger ? {1'b0, B_f, 18'b0} :
(exp_diff_B > 9'd35) ? 37'b0 :
({1'b0, B_f, 18'b0} >> exp_diff_B);
// Determine which of A, B
is larger
assign A_larger = (A_e > B_e) ? 1'b1 :
((A_e == B_e) && (A_f > B_f)) ? 1'b1 :
1'b0;
// Calculate sum or
difference of shifted fractions.
wire [36:0] pre_sum;
assign pre_sum = ((A_s^B_s) & A_larger) ? A_f_shifted - B_f_shifted :
((A_s^B_s) & ~A_larger)
? B_f_shifted - A_f_shifted :
A_f_shifted + B_f_shifted;
// buffer midway results
reg [36:0]
buf_pre_sum;
reg [7:0] buf_larger_exp;
reg
buf_A_e_zero;
reg
buf_B_e_zero;
reg [26:0]
buf_A;
reg [26:0]
buf_B;
reg
buf_oSum_s;
always @(posedge iCLK) begin
buf_pre_sum <= pre_sum;
buf_larger_exp <= larger_exp;
buf_A_e_zero <= (A_e ==
8'b0);
buf_B_e_zero <= (B_e ==
8'b0);
buf_A <= iA;
buf_B <= iB;
buf_oSum_s <= A_larger ? A_s : B_s;
end
// Convert to positive
fraction and a sign bit.
wire [36:0] pre_frac;
assign pre_frac = buf_pre_sum;
// Determine output
fraction and exponent change with position of first 1.
wire [17:0] oSum_f;
wire [7:0] shft_amt;
assign shft_amt = pre_frac[36]
? 8'd0 : pre_frac[35] ? 8'd1 :
pre_frac[34] ? 8'd2 : pre_frac[33] ? 8'd3 :
pre_frac[32] ? 8'd4 : pre_frac[31] ? 8'd5 :
pre_frac[30] ? 8'd6 : pre_frac[29] ? 8'd7 :
pre_frac[28] ? 8'd8 : pre_frac[27] ? 8'd9 :
pre_frac[26] ? 8'd10 : pre_frac[25]
? 8'd11 :
pre_frac[24] ? 8'd12 : pre_frac[23]
? 8'd13 :
pre_frac[22] ? 8'd14 : pre_frac[21]
? 8'd15 :
pre_frac[20] ? 8'd16 : pre_frac[19]
? 8'd17 :
pre_frac[18] ? 8'd18 : pre_frac[17]
? 8'd19 :
pre_frac[16] ? 8'd20 : pre_frac[15]
? 8'd21 :
pre_frac[14] ? 8'd22 : pre_frac[13]
? 8'd23 :
pre_frac[12] ? 8'd24 : pre_frac[11]
? 8'd25 :
pre_frac[10] ? 8'd26 : pre_frac[9] ? 8'd27 :
pre_frac[8] ? 8'd28 : pre_frac[7] ? 8'd29 :
pre_frac[6] ? 8'd30 : pre_frac[5] ? 8'd31 :
pre_frac[4] ? 8'd32 : pre_frac[3] ? 8'd33 :
pre_frac[2] ? 8'd34 : pre_frac[1] ? 8'd35 :
pre_frac[0] ? 8'd36 : 8'd37;
wire [53:0] pre_frac_shft, uflow_shift;
// the shift +1 is because
high order bit is not stored, but implied
assign pre_frac_shft = {pre_frac, 17'b0} << (shft_amt+1); //? shft_amt+1
assign uflow_shift = {pre_frac, 17'b0} << (shft_amt); //? shft_amt for overflow
assign oSum_f = pre_frac_shft[53:36];
wire [7:0] oSum_e;
assign oSum_e = buf_larger_exp - shft_amt +
8'b1;
// Detect underflow
wire underflow;
// this incorrectly sets
uflow for 10-10.1
//assign underflow =
~oSum_e[7] && buf_larger_exp[7] && (shft_amt != 8'b0);
// if top bit of matissa
is not set, then denorm
assign underflow = ~uflow_shift[53];
always @(posedge iCLK) begin
oSum <= (buf_A_e_zero && buf_B_e_zero) ? 27'b0 :
buf_A_e_zero ?
buf_B :
buf_B_e_zero ?
buf_A :
underflow ?
27'b0 :
(pre_frac == 0) ? 27'b0 :
{buf_oSum_s, oSum_e, oSum_f};
end //output update
endmodule
/// end
////////////////////////////////////////////////////////