DE1-SoC: ARM HPS and FPGA
Addresses and Communication
Cornell ece5760

FPGA Programming abstraction for Linux using absolute hardware addresses

The main abstraction from the programming view is to map virtual addresses used by Linux during program execution to physical addresses of memory mapped peripherials. I assume that we are running with root privileges. Opening the device /dev/mem and then using mmap gives access to physical addresses in a limited range determined by the mmap parameters. The code example attempts to blink the HPS LED and read the switch directly attached to port GPIO1bits 24 and 25 respectively. All device addresses are from the HPS Technical Reference Manual. The switch read works, but the LED does not blink. The code was downloaded using copy/paste to the vi editor, then compiled from the console command line with a simple gcc test_led.c -o test_led.

-- Speed test the HPS-to-FPGA bus
If the MSEL switches are set correctly (5'b01010) then the default boot process loads the DE1_SoC_Computer.rbf config file (in /home/root) to the FPGA. Running the increment_led program (part of the UP Linux image in /home/root/increment_leds) controls the red LEDs attached to the FPGA side. A slightly modified version of the demo code increments the FPGA red LEDs as fast as possible. Using the DE1_SoC_Computer.rbf included with the UP Linux image, the max toggle speed 830 KHz, so one add and loop takes 600 nSec, which seems slow. Replacing the register increment with a C variable increment, which is then loaded into the register, doubles the toggle speed to 1.61 MHz, for a loop time of 300 nSec. This implies that the bus transactions are dominating execution speed. Avalon bus speed in this case is 50 MHz, or 20 nSec. The transaction must take about 15 bus cycles to transfer a word from the AXI-to-Avalon bus and Avalon-to-parallel i/o port. (But see below for higher speed connect).

-- First steps in controlling the FPGA.
This example uses serial control on the ARM to set hex digits and led count rate on the FPGA. Two 32-bit parallel ports were added (using QSYS) to the my_first_ hps_fpga example on the DE1-SoC_v.5.0.1_HWrevF_SystemCD. The parallel otuput ports were wired to a small amount of verilog to blink the red LEDs and to drive the first 4 7-seg digits. The QSYS layout made it easy to add a port, and the exported i/o signal bus is named in the verilog header generated by QSYS. If the parallel port is named pio_test, then the exported signal name is pio_test_external_connection, and the signal which appears in the *.v file is pio_test_external_connection_export. The signal is added to the top-level soc-system module instance.
soc_system u0 (
// === added BRL4 ===
.pio_led_external_connection_export (count_control),
.pio_test_external_connection_export (hex_control),
// === end add ===
.memory_mem_a ( HPS_DDR3_ADDR), // memory.mem_a
.memory_mem_ba ( HPS_DDR3_BA), // .mem_ba
.memory_mem_ck .....

The offsets for the LEDs and hex digits used in the C code are the offsets specfied in the QSYS layout.
The C code, top-level module, and Quartus archive.
A slightly cleaner version puts the hex-digit decoding into hardware and simplifies the C program.
The C code, top-level module, and Quartus archive. The QSYS layout is unchanged


-- HPS to FPGA FIFO with feedback via SRAM scratchpad
This example is a step toward full FIFO serial communication between HPS and FPGA. The Qsys layout defines a dual port FIFO with input driven from the HPS bus and input status connected to the HPS light-weight bus. FIFO output is exported to the FPGA fabric to be used by a state machine which takes a word from the FIFO, copies it the the SRAM scratchpad. The SRAM scratchpad is also dual port. The state machine writes SRAM via an exported interface, while the HPS uses a Qsys bus connection to read it. The FPGA state machine repeatedly queries the FIFO status until there is a valid entry then adds one, and copies the result to the SRAM. The HPS test program queries the user for a value, waits for a flag set by the state machine to indicate there is valid data in the SRAM, then prints it. (HPS program, top-level, ZIP)

-- Full FIFO communication: HPS-to-FPGA and FPGA-to-HPS
This example generates two FIFOs in Qsys, one each for two-way communication with the HPS. The FIFO dialog sets up a depth of 256 words, but you could clearly increase this, if necessary. In the dialog, make sure that Allow Backpressure is turned off. One port of each FIFO is exported the the FPGA fabric, where you build state machines to use the data from the FIFO. For this loop-back example, the HPS-to-FPGA receive state machine waits for data in the FIFO, then reads the data word into a buffer and sets a ready flag. The FPGA-to-HPS state machine waits for space in the FPGA-to-HPS FIFO then writes the data to the FIFO and clears the ready flag. Timing for the FIFO read/write is not specified in the users manual! The HPS-to-FPGA read operation takes TWO cycles but the read-enable line can only be held high for ONE cycle. Holding it high for two cycles results in two reads. The HPS program asks the user for the number of items to send (0<N<500), reads the fill-level of each of the FIFOs, then prints out the returned values and fill levels. Note that for N greater than 256, using block-write, that the FPGA-to-HPS FIFO will fill, then stall, while the HPS-to-FPGA FIFO keeps filling. The performance is about 1.1 MegaWords/sec (4.4 MByte/sec), round trip to the FPGA. This is consistent with requiring about four bus operations/value sent/received (check write FIFO, write, check read FIFO, read). Eliminating the write-check gives about 1.7 MegaWords/sec.
(HPS program, top-level, ZIP)
Note that the nonblocking read/write macros in the HPS program are not well tested.
If you use nonblocking read/write that you must check the return value for success.
The first six macros read out the state of the read/write FIFOs.

#define WRITE_FIFO_FILL_LEVEL (*FIFO_write_status_ptr)
#define READ_FIFO_FILL_LEVEL  (*FIFO_read_status_ptr)
#define WRITE_FIFO_FULL	   ((*(FIFO_write_status_ptr+1))& 1 ) 
#define WRITE_FIFO_EMPTY	  ((*(FIFO_write_status_ptr+1))& 2 ) 
#define READ_FIFO_FULL		  ((*(FIFO_read_status_ptr+1)) & 1 )
#define READ_FIFO_EMPTY	  ((*(FIFO_read_status_ptr+1)) & 2 )
// arg a is data to be written
#define FIFO_WRITE_BLOCK(a)	  {while (WRITE_FIFO_FULL){WAIT};FIFO_WRITE=a;}
// arg a is data to be written, arg b is success/fail of write: b==1 means success
#define FIFO_WRITE_NOBLOCK(a,b) {b=!WRITE_FIFO_FULL; if(!WRITE_FIFO_FULL)FIFO_WRITE=a; }
// arg a is data read
#define FIFO_READ_BLOCK(a)	  {while (READ_FIFO_EMPTY){WAIT};a=FIFO_READ;}
// arg a is data read, arg b is success/fail of read: b==1 means success
#define FIFO_READ_NOBLOCK(a,b) {b=!READ_FIFO_EMPTY; if(!READ_FIFO_EMPTY)a=FIFO_READ;}

 


Copyright Cornell University May 26, 2017.