Cornell University ECE4760
RP2040 testing


The RP2040 is a dual-core Cortex M0 produced by Raspberry Pi. It is attractive for this course because it is programmed bare-metal, supports C, inline ssembler, and MicroPython, and has an interesting set of hardware co-processors. In addition to the two M0 cores, and the usual peripheral hardware devices (ADC, UART, I2C, SPI, USB, PWM, timer), there are several heavy-weight hardware state-machine co-processors. These include:

For more information:

The following is organized by date for now.
Later there will be topics.

Setup for MicroPython (1/28/21)
Getting the system running requires the micropython image UF2 file to be downloaded to the board. Follow the simple directions on the linked page. Once you have installed micropython (MP) you can connect to the board with a terminal program, but Thonny is a simple IDE which includes an editor, downloader, file handler, and console.
After installing Thonny, setup the connection to the board with:

  1. in Tools > Options > Interpreter Tab
    Choose: micropython(raspberry Pi PICO)
    Choose: USB serial device (with appropriate device name)
  2. in Run menu choose: Stop/restart
    at this point you should see a python >>> prompt near the bottom of the window.
  3. In the edit pane paste in:
      from machine import Pin, Timer
      def tick(timer):
  4. In Run menu choose: Run current script
    The LED should blink if everything is correctly connected.
  5. <cntl>c normally stops a program, but the test program starts a interrupt-service-routine.
    <cntl>d will force a soft reset and kill the ISR.
    The command tim.deinit() also stops the timer-triggered routine
  6. The ISR runs at 10KHz, but fails at 100KHz and HANGS the system!
    You have to unplug/plug the PICO to restart! My guess is that the ISR takes more than 10 microseconds and never actually exits if the ISR rate is too high. To test this hypothesis paste in:
        from machine import Pin, Timer
        while (x<300000):
           x += 1
    This produces 24.6 KHz square wave with around 5% jitter=> loop at ~50 KHz or around 20 microsec/per loop.
  7. Thonny can save a script directly to the MCU flash drive and can erase a file. If you save, then at the command line type: import test it will run form the local file system.
  8. Choosing Tools>Open system shell opens a console directly to the >>> prompt.
    To reconnect Thonny, you will have to close the console and restart the connection to the PICO.

The need for speed (2/2/21)
To get faster i/o you need to use hardware and not python loops. A very interesting feature of this chip is an 8-way parallel, hardware, state machine dedicated to fast i/o processing. Each of the 8 processors can run simple, deterministic, assembler programs (NOT ARM assembler, although that is also available). I stripped down the simplest example program to see how fast it would toggle a pin. The original example blinks the LED at a human rate. The folowing toggles a pin at 25 MHz! The clock frequency of the state machine can be set as high as 125 MHz (the system clock frequency). In the blink routine, the wrap statements act as a zero time, unconditional jump. The set commands set/clear a pin, while the [1] represents a delay parameter. The result is one state machine cycle for each set, followed by a one cycle delay after each set, for a loop-time of 4 cycles. At 100 MHz, that is a 25 MHz squarewave. The main program just turns on the statemachine for 3 seconds. If you delete the two [1] delays, the frequency will be 50 MHz.

  import time
  from rp2 import PIO, asm_pio
  from machine import Pin
  def blink():
     set(pins, 1)   [1] 
     set(pins, 0)   [1] 
# Instantiate a state machine with the blink program, at 100MHz, with set bound to Pin(15) 
  sm = rp2.StateMachine(0, blink, freq=100000000, set_base=Pin(15))
# Run the state machine for 3 seconds and scope pin 15.

High speed interface: DVI from RP2040

The ARM assembler (2/2/21)
Micropython supports the ARM-Thumb assembler instructions. CPU registers R0-R7 can be used by the assembler, with function inputs in R0-R2 and function results being returned in R0. Also, when you name an array as an input parameter to an assembler funciton, python actually passes in the address to the array. To test this, I coded up a single instruction assembler function which returns the absolute memory address of an array. Python does not really want you to know about addresses, but DMA controllers need pointers to arrays to move data. The following routine just takes the input in R0, copies to itself, and exits, returning the address of the input array name. (formatted code)

# test assembler and
# implement 'addressof'
a=array.array('i',[ 1, 2, 3])
# invoke assembler
# passing an array name to the assembler
# actually passes in the address
def addressof(r0):
    # r0 is the output register, so address beomes output
    mov(r0, r0)
# now use the assembler routine  
addr_a = addressof(a)
print(machine.mem32[addressof(a)+8]) # returns '3'

The complete two-assemblers (PIO and ARM_thumb) test code: code.
Note that some browsers do not like to see python code! You will need to rename the *.txt file.
Right-click the link and choose save link as...
(An image of the code.)

Code generation options in micropython (2/11/21)
Micropython (MP) generates a bytecode executable by default. Bytecode is compact, but slow compared to inline code. There are a least three other code generators. (see maximizing speed). The native code generator takes unmodified MP and generates a mix of inline and MP calls. The viper code generator requires some modifications/simplifications of MP source, including variable typing in some cases. It can generate inline assembler in some cases, but cannlot optimize across function calls. The assembler code generator allows you to enter ARM-M0 Thumb assembly code, linked by a simple memory map to the rest of the MP code.
I wrote a program to start testing these options, as well as other timing in MP.
( As usual, Windoze does not like to download Python, so you will need to rename it. )

Micropython co-routines (cooperative scheduling) and threading (multicore) (2/12/2021)

-- Cooperative scheduling
Cooperative scheduling on the rp2040 is similar to Protothreads on the PIC32. Asyncio is a stackless, non-premptive, light-weight task handler. Some descriptive material:
Cooperative scheduling Tutorial (uasyncio module) and demo code
Documentation for version 3
Async version 3 overview

--True threading is currently very limited. From the command line (import _thread, then help(_thread)) you can see that the functionality is a subset of Cpython thread. According to the pico python SDK manual, section 3.5, You can start just one thread on the second core. However the cpu manual indicates that there is hardware support for core-to-core communication in the form of two unidirectional FIFOs, and up to 32 hardware spin locks.

The first example starts two cooperative tasks on core0 using the asyncio library and one separate thread on core1. Programs on each core can determine which core they are running on by reading a memory address. The first part of the code defines the SIO base address, which happens to be the cpu id, and defines the core1 functions. Core1 just toggles an LED and prints out the core id. The second part defines the two asyncio, cooperative tasks running on core0. One task handles i/o to UART0, and the other blinks the onboard LED. The third part starts the two syncio tasks on core0 and the thread on core1, and allows for stopping the program with cntl-c on the REPL console. On both cores, the print command prints on the REPL (USB connection), and the streamwriter on core1 prints to UART0.

The second example tests the inter-core FIFO communication and inter-core spinlock hardware. Each core can write to one FIFO and read from the other FIFO. Each core has a FIFO read address, FIFO write address, and FIFO status, with bits to indicate that space is available to write, and that data is available to read. We need at least five functions, FIFO_read, FIFO_write (both bolcking), FIFO_read_status, FIFO_write_status, and FIFO_purge. FIFO_purge insures that there are no items in the FIFO left over from another program. The spinlocks require two functions, spin_acquire and spin_release. The spin_acquire function takes a lock number, 0 to 31, and can be either blocking or nonblocking. The code uses spin locks to protect a varialbe incremented by both cores. The FIFOs send the variable back and forth between the two cores. The overall effect is to increment the variable by two every time it is printed to the REPL console in CORE1 (CORE1 code). One of two tasks on CORE0 bounces the variable back to CORE1. To avoid deadlock after another program runs, the task initially clears the spinlock and checks to see if there is data in the FIFO.

Setting up for C (Hunter 2/14/2021)

Hardware divider (2/28/2021)
The SIO contains a 8-cycle hardware divider for each core. I wrote test code for it in micropython, then coded it again as a assembler program to test speed. If you stick to python, it is faster to just use the integer divide operator, //, than to invoke the hardware divider directly. The code shows how to directly touch hardware registers. Every time you load a divisor or dividend into the hardware divide inputs, a calculation is started. You can check a done-bit, or just wait 8 cycles. One test result is below. Dividing 37/6 give 5, with remainder 1. The assembler loop time is 168 nSec, of which 72 nSec is the actual divide.
HW div 7 36 5 1
asm_divloop_time = 0.16851 count= 100000
asm div 7 36 5 1

DMA direct memory access (3/5/2021)
Python does not really want you to know about addresses, but DMA controllers need pointers to arrays to move data. The array data type packs data sequentially, like C, and is adressable. A one line assembler program(see above) extracts the address of the zeroth element of an array, and the other elements can be accessed from this base pointer by addition.The DMA system has LOTs of options, including ability to loop on a section of memory, send one address repeatedly (perhaps from a peripherial), chain channels, start a transfer triggered by one of about 60 events, and transmit 8, 16 or 32-bit wide data. The data rate is very high, approaching one transfer per clock cycle.
-- The first example just copies one array to another. The code just sets up the options for channel control. First the addresses of the control registers, then the configuration bits. Then, Two arrays are defined, their addresses determined, and channel zero configured for no-interrupt, permanent trigger, read and write increment, and a data width of 32-bits.

DMA_src_addr = addressof(DMA_src)
machine.mem32[DMA_RD_ADDR0] = DMA_src_addr
DMA_dst_addr = addressof(DMA_dst)
machine.mem32[DMA_WR_ADDR0] = DMA_dst_addr
machine.mem32[DMA_TR_CNT0] = len(DMA_src)
# this writ starts the transfer
perm_trig = 0x3f # alwas go
data_32 = 0x02 # 32 bit
# set up control and start the DMA
machine.mem32[DMA_CTRL0] = (DMA_IRQ_QUIET | DMA_TREQ(perm_trig) |
                    DMA_WR_INC | DMA_RD_INC |
                    DMA_DATA_WIDTH(data_32) |
                    DMA_EN )

MicroPython Notes (mostly for myself) (started 1/28/21)

  1. PICO micropython examples
  2. Micropython libraries/applications/projects --
    1. awesome micropython
    2. Paul Sokolovsky
    3. libraries -- may be a copy of 1
    4. projects -- hackster
    5. projects -- hackaday
  3. Typng help('modules') gives a list of available modules
    (but excluding any modules you may have put into flash memory)
  4. Once you have the module names, typing help(module) gives contents of the module
    BUT you need to import it first!
    >>> help(machine)
    object <module 'umachine'> is of type module
    __name__ -- umachine
    reset -- <function>
    reset_cause -- <function>
    bootloader -- <function>
    freq -- <function>
    mem8 -- <8-bit memory>
    mem16 -- <16-bit memory>
    mem32 -- <32-bit memory>
    ADC -- <class 'ADC'>
    I2C -- <class 'I2C'>
    SoftI2C -- <class 'SoftI2C'>
    Pin -- <class 'Pin'>
    PWM -- <class 'PWM'>
    SPI -- <class 'SPI'>
    SoftSPI -- <class 'SoftSPI'>
    Timer -- <class 'Timer'>
    UART -- <class 'UART'>
    WDT -- <class 'WDT'>
    PWRON_RESET -- 1
    WDT_RESET -- 3
  5. Wait! What is mem8, mem16, mem32?
    Mostly useful for read/write control registers
    read: machine.mem32[address]
    write: machine.mem32[address] = integer
  6. The contents of the RP2 module (PIO assembler functions)
    is not well explained yet. Your best bet is to go to the examples,
    because in the end, only the source matters.
  7. The experimental 2 core _thread
    >>> import _thread
    >>> help(_thread)
    object <module '_thread'> is of type module
    __name__ -- _thread
    LockType -- <class 'lock'>
    get_ident -- <function>
    stack_size -- <function>
    start_new_thread -- <function>
    exit -- <function>
    allocate_lock -- <function>

    The Locktype has the methods:
    >>> help(_thread.LockType)
    object <class 'lock'> is of type type
    acquire -- <function>
    release -- <function>
    locked -- <function>




Copyright Cornell University March 5, 2021