Cornell University ECE4760
ProtoThreads

PIC32MX250F128B

Protothreads on PIC32
Protothreads is a very light-weight, stackless, threading library written entirely as C macros by Adam Dunkels. As such, it is trivial to move to PIC32. Adam Dunkels' documentation is very good and easy to understand. There is support for a thread to wait for an event, spawn a thread, and use semaphores. The Protothreads system is a cooperative multithread system. As such, there is no thread preemption. All thread switching is at explicit wait or yield statement. There is no scheduler. You can write your own, use mine (version 1.3.2) or just use a simple round-robin scheme, calling each thread in succession in main. Because there is no preemption, handling shared variables is easier because you always know exactly when a thread switch will occur. Because there is no separate stack for each thread, the memory footprint is quite small, but using automatic (stack) local variables must be avoided. You can use static local variables.

You must read sections 1.3-1.6 of the reference manual (local copy) to see all of the implementation details.
I added:

Debugging ProtoThreads

If your program just sits there and does nothing:
-- Did you turn on interrupts? The thread timer depends on an interrupt.
-- Did you write each thread to include at least one YIELD (or YIELD_UNTIL, or YIELD_TIME_msec) in the while(1) loop?
-- Did you schedule the threads?
-- Did you ask to initialize the TFT display in the code, while not having a TFT actuallu connected?
-- Does the board have correct Vdd?

If your program reboots about 5 times/sec:
-- Did you exit from any scheduled thread? This blows out the stack.
-- Did you turn on any interrupt source and fail to write an ISR for that source?
-- Did you divide by zero? A divide by zero is an untrapped exception.
-- Did you write to a non-existant memory location, perhaps by using a negative (or very large) array index.
A write to a non-existant memory location is an untrapped exception.

If your thread state variables seem to change on their own:
--- Did you define any automatic local variable? Local variables in a thread should be static.
-- Are your global arrays big enough the clobber the stack in high memory? You should be able to use over 30kbytes.


Version 1_3_2: Works with the Big development board (SECABB)
Use config_1_3_2.h and pt_cornell_1_3_2.h on the big development board (SECABB).
ZIP file. Keep reading for the details. All current examples codes for 1_3_2.
All previous software examples for the SECABB will run with this version.
You must still select the hardware features you need in config_1_3_2.h.
-- Turning on the use_vref_debug feature disables RB14 for everything except Vref output.
-- Turning on the use_uart_serial feature disables RB10 and RA1 for everything except USART2.
(For serial connection to a pC, use the cable specified near the bottom of the serial page.)
See the tables below for the Protothreads Function Summary.

Added features:

This demo code uses improved scheduler syntax.
Test example SECABB_test_1_3_2.c (code, ZIP) includes:


Protothreads Function Summary
The following table give the syntax for the thread macros and functions written by Adam Dunkels.

Protothreads Functions (from Adam Dunkels) Description
PT_INIT(pt)
Initialize a protothread.
PT_THREAD(name_args)
Declaration of a protothread. name_args The name and arguments of the C function implementing the protothread.
PT_WAIT_THREAD(pt, thread)
Wait until a child protothread completes.
pt
A pointer to the protothread control structure.
thread
The child protothread with arguments
PT_WAIT_UNTIL(pt, condition)
PT_WAIT_WHILE(pt, condition)
Wait the thread and wait until/while condition is true.
PT_BEGIN(pt)
Declare the start of a protothread inside the C function implementing the protothread.
pt
A pointer to the protothread control structure.
PT_END(pt)
Declare the end of a protothread.
pt
A pointer to the protothread control structure.
PT_EXIT(pt)
This macro causes the protothread to exit. If the protothread was spawned by another protothread, the parent protothread will become unblocked and can continue to run.
pt A pointer to the protothread control structure.
PT_RESTART(pt)
This macro will block and cause the running protothread to restart its execution at the place of the PT_BEGIN()call. pt A pointer to the protothread control structure.
PT_SCHEDULE(f)
This function shedules a protothread. The return value of the function is non-zero if the protothread is running or zero if the protothread has exited.
f The call to the C function implementing the protothread to be scheduled
PT_SPAWN(pt, child, thread)
Spawn a child protothread (example) and block the thread until it exits. Parameters:
pt A pointer to the protothread control structure.
child A pointer to the child protothread’s control structure.
thread The child protothread with arguments
PT_YIELD(pt)
yield the protothread, thereby allowing other processing to take place in the system.
PT_YIELD_UNTIL(pt, cond)
Yield from the protothread until a condition occurs.
PT_SEM_INIT(s, c)
This macro initializes a semaphore with a value for the counter. Internally, the semaphores use an "unsigned int" to represent the counter, and therefore the "count" argument should be within range of an unsigned int. s A pointer to the pt_sem struct representing the semaphore
PT_SEM_SIGNAL(pt, s)
This macro carries out the "signal" operation on the semaphore. The signal operation increments the counter inside the semaphore, which eventually will cause waiting protothreads to continue executing. pt A pointer to the protothread (struct pt) in which the operation is executed. s A pointer to the pt_sem struct representing the semaphore
PT_SEM_WAIT(pt, s)
This macro carries out the "wait" operation on the semaphore. The wait operation causes the protothread to yield while the counter is zero. When the counter reaches a value larger than zero, the protothread will continue.

The following table has the protothread macro extensions and functions
I wrote for the PIC32 and which are included in the Cornell header file.

Added Protothreads functions

Description

 

PT_YIELD_TIME_msec(delay_time) Causes the current thread to yield (stop executing) for the delay_time in milliseconds. The time is derived from a 1 mSec timer ISR. Note that this is a non-blocking delay, not an interval timer.
PT_GET_TIME() Returns the current millisecond count since boot time. Overflows in about 5 weeks. The time is derived from a 1 mSec timer ISR.
PT_RATE_INIT() Sets up variables for the optional rate scheduler. Called once before main schedule loop.
PT_RATE_LOOP() House keeping for the optional rate scheduler. Called once at the beginning of the main schedule loop.
PT_RATE_SCHEDULE(f,rate) For thread f, set the rate=0 to execute always, rate=1 to execute every other traversal for PT_RATE_LOOP, rate=2 to every fourth traversal, rate=3 to every 8th, and rate=4 to every 16th. rate=5 DISABLES the thread.
PT_DEBUG_VALUE(level, duration) Causes a voltage level from 0 to 15 (1 implies ~150 mV) to appear at pin 25 (CVrefOut) for duration microseconds (approximately). Zero duration means hold the voltage until changed by another call. To use this, config_1_x_x.h must contain #define use_vref_debug
int PT_GetSerialBuffer(struct pt *pt) A thread which is spawned to get nonblocking string input from UART2. This function assumes that a human is typing at a terminal and thus supports backspace, termination by enter key and echos characters! String is returned in char PT_term_buffer[max_chars]. If more than one thread can spawn this thread, then there must be semaphore protection. Control returns to the scheduler after every character is received. The thread dies when it receives an <enter>. To use this, config_1_x_x.h must contain #define use_uart_serial
int PT_GetMachineBuffer(struct pt *pt) A thread which is spawned to get nonblocking string input from UART2. This function assumes that a machine is sending data and thus does not assume any particular termination condition and does not echo characters. You can specify termination method. A terminator character (PT_terminate_char), or a character count (PT_terminate_count), or a timeout (PT_terminate_time). A zero for any of the three disables that termination. String is returned in char PT_term_buffer[max_chars]. No string is returned if there is a timeout. If more than one thread can spawn this thread, then there must be semaphore protection. Control returns to the scheduler after every character is received. The thread dies when it receives an termination condition. To use this, config_1_x_x.h must contain #define use_uart_serial
The 1_3_2 version uses DMA transfer for higher speed.
int PutSerialBuffer(struct pt *pt) A thread which is spawned to send a string input from UART2. String to be sent is in char PT_send_buffer[max_chars]. If more than one thread can spawn this thread, then there must be semaphore protection. Control returns to the scheduler after every character is loaded to be sent. The thread dies after it sends the entire string. To use this, config_1_x_x.h must contain #define use_uart_serial
int PT_DMA_PutSerialBuffer(struct pt *pt) A thread which is spawned to send a string input from UART2. String to be sent is in char PT_send_buffer[max_chars]. If more than one thread can spawn this thread, then there must be semaphore protection. Control returns to the scheduler immediately. The thread dies after it sends the entire string. To use this, config_1_x_x.h must contain #define use_uart_serial
void PT_setup (void) Configures system frequency, UART2, a DMA channel 1 for the UART2 send, system timer, and the debug pin Vref controller, depending on the features chosen in config_1_x_x.h.
Added in version 1_3_2 -- System time switched to Timer1
-- DMA GetMachineBuffer!
-- Modified scheduler!
See example code!
The 1_2_3 code style still works.
thread_identifier = pt_add(function_name, thread_scheduler_rate) A thread is now defined only by it's function name and there is an addition rate scheduler setting. See example code!
The old code style still works.
PT_SET_RATE(thread_identifier, new_rate) Sets a new relative execution rate for a thread.
PT_GET_RATE(thread_identifier) Reads the current relative execution rate for a thread.
PT_INIT(&pt_sched) ;
pt_sched_method = SCHED_ROUND_ROBIN ;
// or
// pt_sched_method = SCHED_RATE
PT_SCHEDULE(protothread_sched(&pt_sched));
Main now starts a scheduler thread (see examples). Using the SCHED_RATE option executes some threads more often then others: rate=0 fastest, rate=1 half, rate=2 quarter, rate=3 eighth, rate=4 sixteenth, rate=5 or greater DISABLEs thread!
In round_robin mode, the rate parameter has no effect.

Development History and Details:


Version 1_2_3: Works with the Big development board (SECABB)
Conversion to use the big development board.
U2TX moved to RB10. U2RX moved to RA1. Use the cable specified on the serial page.
Use config_1_2_3.h and pt_cornell_1_2_3.h on the big development board.
You must still select the hardware features you need in config_1_2_3.h.
---- Turning on the use_vref_debug feature disables RB14 for everything except Vref output.
---- Turning on the use_uart_serial feature disables RB10 and RA1 for everything except USART2.
Peripherial test example (code, ZIP) includes:
(1) DAC with DDS and with appropriate SPI share with port_expander
(2) Port_expander running keypad with appropriate SPI share with ISR (critical sections)
(3) keypad attached to port_expander with shift-key.
(4) TFT showing system time, color patches, and keypad button presses
(5) Serial with formatting showing protothreads version,
and a formatted command line for setting DDS frequency and showing system time.
Added features:
(1) A get_machine_serial_buffer routine designed for machine (non-human) communication.
(2) Better format control for serial console interactions with humans.
(3) Higher default baud rate in config file. You may need to change this!


Version 1_2_2: Works with the Big development board
Conversion to use the big development board.
U2TX moved to RB10. U2RX moved to RA1. Use the cable specified on the serial page.
Use config_1_2_2.h and pt_cornell_1_2_2.h on the big development board.
Example code is on the Development board page. Search for PuTTY on that page.


TEST Version 1_3_0: Works with the Big development board (SECABB), but is to be considered BETA
This new version has cleaner scheduler format, improved readability, but no new functionality. As before, there is a round-robin and a rate scheduler, but they are packaged up so that a single thread now acts as the scheduler and was moved to the header file. (Yes indeed, a thread is the thread scheduler.) A new structure hides the protothread structs and is set up using one function call per thread:
thread_identifier = pt_add(function_name, thread_scheduler_rate)
The rate can be set from 0 to 4 with each increasing integer value halfing the scheduled rate.
Setting the rate to a value greater than 4 freezes the thread until the rate is set between 0-4.
If you are using the round-robin scheduler, then the rate parameter has no effect.
The rate can later be modified with
PT_SET_RATE(thread_identifier, new_rate)
and read with
PT_GET_RATE(thread_identifier)

The last few lines of main will be:
PT_INIT(&pt_sched) ;
CHOOSE the scheduler method:
pt_sched_method = SCHED_RATE ; // or SCHED_ROUND_ROBIN ;
The scheduler never exits
PT_SCHEDULE(protothread_sched(&pt_sched));

The demo code has the same functions as Version 1_2_3 above, but with the improved scheduler syntax.
(Demo code, config_1_3_0, pt_cornell_1_3_0, project ZIP)
Based on the work of edartuz and properly licensed in the code.

A very simple test program in which there are three threads that each just increment a counter, then yield (plus a print thread and an 100 KHz ISR) gives an estimate of the number of thread yields/sec the scheduler can sustain. For the rate scheduler, and at maximum rate (rate=0) for each of the three, about 3.2x105 yields/sec is possible. At rates of 0,1, and 2 for the three threads respectively about about 2.1x105 yields/sec. For the simpler round-robin scheduler, which ignores the relative thread rates, you get about 4.2x105 yields/sec. Thus, for light-weight, fast executing, threads you probably want to just use the simpler scheduler. Turning off the ISR in the round-robin test program increases the thread rate to 8.6x105 yields/sec.

What about threads that have a fixed execution speed ratio? By matching the thread rate schedule rate to the thread compute time, you can approximate a rate-monotonic scheduler using the rate scheduler. For three threads which take 1,2, and 4 mSec respectively to execute, and using the round-robin scheduler, the best you can get is about 143 executions for each of the three threads in 1 second because each one must wait for the others (1000 mS/(1+2+4)=143). The rate scheduler, set up with rates 0, 1, and 2 respectively gives 336 executions (per second) for the fast thread, 168 for the medium speed thread and 84 for the slow thread. So the fast one executes more, the slow one less. Overall, I think that most of the time you want ot use the simple round-robin scheduler.

Version 1.2.1:
Fixes a bug caused by starting a second DMA-to-UART DMA burst, before the UART transmit FIFO is empty. pt_cornell_1_2_1.h.
// Wait for the DMA tranfer to complete -- existed in 1.2
PT_YIELD_UNTIL(pt, DmaChnGetEvFlags(DMA_CHANNEL1) & DMA_EV_BLOCK_DONE);
// Wait until the UART transmit buffer is empty -- added in 1.2.1 based on section 21.5.2 of Reference Manual
PT_YIELD_UNTIL(pt, U2STA&0x100);

Version 1.2:
To run protothreads 1.2 you need to download config.h, pt_cornell_1_2.h, plus the TFT routines, or the project ZIP (see below) file.
The main change from Version 1.1 is a fix for the limitation on having any thread-yield statement inside a switch statement.
This version allows yield, spawn, and wait statements anywhere. This version depends on documented, but seldom used, features of GCC.
You must still select the hardware features you need in config.h.
--Turning on the use_vref_debug feature disables pin 25 for everything except Vref output.
--Turning on the use_uart_serial feature disables pins 21 and 22 for everything except the USART
--Make sure that all of the special feature pins are disabled so that you can use them as i/o. Select:
-- #pragma config POSCMOD = OFF, FWDTEN = OFF, FSOSCEN = OFF, JTAGEN = OFF, DEBUG = OFF

Version 1.1:
To run protothreads 1.1 you need to download config.h, pt_cornell_1_1.h, plus the TFT routines, or one of the project ZIP (see below) files.
The ProtoThreads include file structure has been simplified.This version of Protothreads uses a switch-statement type construct to handle thread switching, so it is not possible to embed a thread-wait statement in a switch stanza.
The config.h file now sets:

There are examples:

Bugs to be fixed:


Timers, Output Compare, PWM, and Input Capture
All of the following examples use Protothreads. PIC architecture separates timers, from compare units and from input capture. This means that one timer can drive several output compare units for waveform generation, or act as a time reference for several input compare units. In all the examples, the cpu is running at 64 MHz and the peripherial bus at 32 MHz. It might be safer to run everything at 40 MHz.
-- This example sets up timer2 to drive two pulse trains from OC2 and OC3. Either of these pulse trains can be hooked to an input capture unit, which uses timer3 as a time reference. Timer three is set up to overflow so that periods are correct when computed from sequential edge capture times. The print thread prints out the generated interval, and the min, max and current value of the captured interval. The command thread listens for user input to set the timer2 period, and a one second clock thread gives system time (using timer5, as explained below in the protothreads section). The example code.
-- Example 2 sets up OC3 as a PWM unit with settable timer2 period (and thus PWM resolution) and settable PWM on-time. The on-time is then auto incremented in the timer2 ISR to sweep the on-time from zero to the timer2 period. Setting the timer2 period and OC2 pulse period in the user interface thread is cleaner.
-- Example 3 sets up OC3 as a PWM unit with timer2 period (and thus PWM resolution) equal to 64 cycles (500 kHz). PWM on-time is set by a sine wave Direct Digital Synthesis (DDS) unit. The frequency synthesized is set by the UART user interface. The PWM output (Pin 18) must be passed through an analog lowpass filter. Choose the time constant of the filter consistent with the frequencies you wish to generate. Spectral purity is about 32 db at low frequencies. You could get better spectral purity by increasing the PWM resolution, but that, of course, lowers the sample rate. Eight-bit samples have a PWM sample rate of 125 kHz.


=============================

Version 1.0
To run protothreads you need to download pt_cornell.h and you need to download software from Dunkels' site or use a local copy. The Example1 test code also requires a UART connection to a terminal, as explained in a project further down the page. The test code toggles three i/o pins and supports a small user interface through the UART.It also emits three different amplitude debugging pulses on pin 25. By default this version of protothreads starts timer5 and uses a timer ISR to count milliseconds.

=============================
To run these older examples you need to download software from Dunkels' site or use a local copy. Most examples also require a UART connection to a terminal, as explained in a project further down the page.
Older examples:
-- The first example has two threads executing at a rate based on a hardware timer ISR, which generates a millisecond time counter. Each thread yields for a waiting time and when executing prints the thread number and time. Thread 1 executes once per second. Thread 2 executes every 4 seconds. Main just sets up the timer ISR and UART, then inintialzes the threads and schedules them.
-- The second example has three threads. Threads 1 and 2 wait on semaphores, each of which is signaled by the other thread. The two threades therefore alternate. Thread 3 just executes every few seconds. I defined an new macro to make it easier for a thread to wait for a specific time. PT_YIELD_TIME(wait_time) takes the wait time parameter and uses a local variable and the millisceond timer variable to yield the processor to another thread for wait_time milliseconds. The second example also has a small routine to compute approximate microseconds since reset and return it as a 64-bit long long int.

#define PT_YIELD_TIME(delay_time) \
    do { static int time_thread; \
    PT_YIELD_UNTIL(pt, milliSec >= time_thread); \
    time_thread = milliSec + delay_time ;} while(0);

-- The third example has three threads. Threads 1 and 2 wait on semaphores, each of which is signaled by the other thread. The two threads therefore alternate. Thread 3 takes input from a serial terminal. The actual input routine is a thread which is spawned by thread 3. Thread 3 then waits for the input thread to terminate which it does when the human presses <enter>. The input thread yields the processor while it is waiting for the slow human to type each character, so other threads do not stall. The key statment is below which causes protothreads to wait/yield on a hardware flag. The flag is defined as part of plib.h.
PT_YIELD_UNTIL(pt, UARTReceivedDataIsAvailable(UART2));
Note that the spawn command
PT_SPAWN(pt, &pt_input, GetSerialBuffer(&pt_input) );
initializes the input thread and schedules it. The three parameters are the current thread structure, a pointer to the spawned thread, and the actual thread function. If more than one thread is using serial input, then the spawn command should be surrounded by semaphore wait/signal commands because GetSerialBuffer is not reentrant.
-- The fourth example investigates non-blocking UART transmit. In a printf, there is a waitloop for each character. We can replace that with thread yield on a per character basis. Doing this speeds up processing a factor of 2 or so. But how fast is the swtich between two threads? Is it worth a thread yield on every character? Commenting out all UART code and just waiting/signaling on a semaphore between thread 1 and thread 2 gives a switch time between threads (twice) of 2.1 microcseconds or about 126 cpu cycles. This value includes the signaling, waiting, and thread switch code two times (thread 1 to thread 2 and back). For a 1 mSec charcter transmit time, the thread switch is worth the overhead.
-- The fifth example implements a terminal command interface in thread 3 using non-blocking UART send/receive. Thread 1 and 2 toggle and are dependent upon signalling each other unless turned off by a flag from the interactive input. . Thread 4 just toggles at a fixed rate, unless it is turned off by the interactive input, working through the scheduler. The code assumes that port pins B0, A0, and A1 are connected to LEDs (with 300 ohm resistor to ground). Hitting the <enter> key to finish a command results in a 9 microSec pause in the toggling of the other threads.
There are 8 commands:

command effect
t1 time sets blink rate of thread 1/2 to time
t2 time sets blink rate of thread 4 to time
g1 starts thread 1/2 blink
s1 stops thread 1/2 blink
g2 starts thread 4 blink
s2 stops thread 4 blink
k kills the interactive input until RESET
p prints the current blink times

-- The sixth example runs the same interface as example five, but uses a DMA channel to drive the UART output with no software overhead. The DMA pattern matching feature detects the end of a string to stop the UART automatically. Using the DMA transfer allows a per-string thread yield, rather than a per-character thread yield. The code assumes that port pins B0, A0, and A1 are connected to LEDs (with 300 ohm resistor to ground). The thread switch after hitting <enter> now takes 5 microSec. With both t1 and t2 set to 1 milliSec, the dispersion in actual times for both is less than 10 microSec (<1%).
-- The seventh example adds a microsecond resolution yield option. The option is marginally useful down to about 10 microseconds, where the timing uncertainty reaches about 10%. At 100 microseconds the accuracy is good. This means that you could attempt audio synthesis in a thread at 10 KHz sample rate. Thread 4 is timed by the microsecond timer. With three threads running below100 microSec repeat rate, the system starts to miss events. The previous PT_YIELD_TIME macro has been replaced by two. One for millisecond timing and one for microsecond timing. The millisecond timer overflows about once/month. The microsecond timer overflows every 64 milliseconds. The maximum time delay using the microsecond timer is 64000 microseconds.

// macro to time a thread execution interval
#define PT_YIELD_TIME_msec(delay_time) \
    do { static int time_thread; \
    time_thread = milliSec + delay_time ; \
    PT_YIELD_UNTIL(pt, milliSec >= time_thread); \
    } while(0);
// macro to time a thread execution interveal
// parameter is in MICROSEC < 64000
//ReadTimer2()
#define PT_YIELD_TIME_usec(delay_time) \
    do { static unsigned int time_thread, T3, c ; \
      time_thread = T3 + delay_time ; c = 0;\
      if(time_thread >= 0xffff) { c = 0xffff-T3; }\
      PT_YIELD_UNTIL(pt, ((ReadTimer3()+c)& 0xffff) >= ((time_thread+c) & 0xffff)); \
      T3 = ReadTimer3() ;\
    } while(0); 

-- The eighth example introduces a minimal scheduler which allows each thread to execute at a rate determined as a fraction of full speed. The default protothreads thread swap is so fast that it a challange to introduce scheduling which does not slow down thread execution rates. The approach taken is to allow some threads to execute every time through the main while-loop, but allow others to only execute at 1/2, 1/4, 1/8, or 1/16 of the times through the main loop. The approach is consistent with a nonpremptive thread system and gives better execution consistency if one thread has to execute at a much higher rate than the others. Rate 0 executes every time throught the loop, rate 1 every other time, 2 every four times, rate 3 every 8 times, and rate 4 every 16 times through the main while-loop. Any other value freezes the thread execution. With thread 4 executing at a nominal 10 microSec period, the actual time varys from 11 to 13 microSec, but the actual time can vary widely depending on the exact interval picked due to coincidence with other processes. This version also fixes the microsecond timer by using timer45 as a 32-bit counter.
-- Finally we get to something like a final version of the code.


Copyright Cornell University March 6, 2020