DE1-SoC: ARM HPS pThreads
Cornell ece5760

 

HPS pthreads Programming

POSIX pthreads are a light-weight addition to Linux processes that allow many threads to execute independently within one process. The threads are scheduled by the kernel and seem to use the multiprocessor environment well. There are many good resources available to learn pthreads. I liked the LLNL tutorial. Also the CMU tutorial. The HPS GCC recognizes pthreads if you include the -pthread flag on the compile command. Examples from the LLNL site (slightly cleaned up) show how to create threads and use mutex and condition variables. The create example shows that thread creation and execution are async with respect to execution of main. The mutex/condtion example implements two counting threads incrementing the same counter and checking for a certain count. When that count is reached, the count thread signals the condition variable of the watch thread, which prints a message, then the count threads continue on, to their max count.

-- Threaded: Keyboard input, console print, counter
-- This example uses a thread to read the console keyboard, another thread to print what was just typed and zero a counter, and one or two more threads to increment the counter. There is a condition variable which indicates when the print is done and next input can begin (using the shared buffer) and a condition variable wich indicates when the input is done and the print can begin. Clearly, one of them has to be signaled initially for the system to come out of deadlock. This is accomplished (somewhat crudely) by waiting one second and signalling the condition variable.
-- Semaphores are POSIX defined, but not part of pthreads.
Including semaphore operations makes the code more readable to me, and makes initialization better defined. With one count thread, and full mutex protection of the count and text buffer, the system will support about 10 million/sec counts while the keyboard and display threads take very small cpu time.
-- With two count threads, and full mutex protection of the count and text buffer, and using the default scheduling, the system will support about 5 million/sec because the scheduler puts the two heavy performance threads on different processors and the mutex operations are apparently slow across processors! A small modification of the code to include pthread_setaffinity_np forces the two counting threads onto one processor. The two threads on one processor can count at the same rate as one thread (of course).

--Threaded: Matlab to UDP Audio
The two-process code from the UDP audio modified from the University Computer and UDP page
Matlab >udp> ARM receive process >ipc> ARM fpga process >bus> FPGA audio FIFO

was converted to two threads running in a single process. The code requires the the FPGA address header file. To use this you have to load the correct DE1-SoC computer sof file to the FPGA using the QuartusPrime programmer (archive). Running the Matlab script triggers playback of the audio segment by sending a start/reset command to the UDP thread, which then initializes the shared buffer. The Matlab script then blasts audio samples in UDP packets as fast as it can. The sample buffer only holds about 8 seconds of sound at 8 Ksamples/sec. The buffer shared between the two threads is protected by a mutex. After 128 samples have be received, the write_fpga thread fills a FIFO on the FPGA as fast as it can. The FIFO only holds 128 audio samples playing at 48 kHz, so it needs to be filled at least every 3 milliseconds. The thread attempts to fill it every time there is space in the FIFO.


 


References:

DE1-SOC literature list

Using the DE1-SOC FPGA by Ahmed Kamel

Stereoscopic Depth on an FPGA via OpenCL by Ahmed Kamel and Aashish Agarwal

Running Linux on DE1-SOC by MANISH PATEL and SYED TAHMID MAHBUB

OpenCL on DE1-SOC Sahil P Potnis (spp66@cornell.edu) Aashish Agarwal (aa2264@cornell.edu) Ahmed Kamel (ayk33@cornell.edu)

Audio Core (Qsys University Program 15.1) local copy

Video Core (Qsys University Program 15.1) local copy

External to Avalon Bus Master (external here means in the FPGA, but not in the Qsys bus structure)

Avalon to External Bus Slave (external here means in the FPGA, but not in the Qsys bus structure)


Copyright Cornell University December 19, 2016