Cornell University ECE4760
The RP2040 is a dual-core Cortex M0 produced by Raspberry Pi. It is attractive for this course because it is programmed bare-metal, supports C, inline ssembler, and MicroPython, and has an interesting set of hardware co-processors. In addition to the two M0 cores, and the usual peripheral hardware devices (ADC, UART, I2C, SPI, USB, PWM, timer), there are several heavy-weight hardware state-machine co-processors. These include:
For more information:
The following is organized by date for now.
Later there will be topics.
Setup for MicroPython (1/28/21)
Getting the system running requires the micropython image UF2 file to be downloaded to the board. Follow the simple directions on the linked page. Once you have installed micropython (MP) you can connect to the board with a terminal program, but Thonny is a simple IDE which includes an editor, downloader, file handler, and console.
After installing Thonny, setup the connection to the board with:
from machine import Pin, Timer led=Pin(25,Pin.OUT) tim=Timer() def tick(timer): led.toggle() tim.init(freq=2.5,mode=Timer.PERIODIC,callback=tick)
from machine import Pin, Timer led=Pin(15,Pin.OUT) x=0 while (x<300000): led.toggle() x += 1This produces 24.6 KHz square wave with around 5% jitter=> loop at ~50 KHz or around 20 microsec/per loop.
The need for speed (2/2/21)
To get faster i/o you need to use hardware and not python loops. A very interesting feature of this chip is an 8-way parallel, hardware, state machine dedicated to fast i/o processing. Each of the 8 processors can run simple, deterministic, assembler programs (NOT ARM assembler, although that is also available). I stripped down the simplest example program to see how fast it would toggle a pin. The original example blinks the LED at a human rate. The folowing toggles a pin at 25 MHz! The clock frequency of the state machine can be set as high as 125 MHz (the system clock frequency). In the blink routine, the wrap statements act as a zero time, unconditional jump. The set commands set/clear a pin, while the  represents a delay parameter. The result is one state machine cycle for each set, followed by a one cycle delay after each set, for a loop-time of 4 cycles. At 100 MHz, that is a 25 MHz squarewave. The main program just turns on the statemachine for 3 seconds. If you delete the two  delays, the frequency will be 50 MHz.
import time from rp2 import PIO, asm_pio from machine import Pin # @asm_pio(set_init=rp2.PIO.OUT_LOW) def blink(): wrap_target() set(pins, 1)  set(pins, 0)  wrap() # Instantiate a state machine with the blink program, at 100MHz, with set bound to Pin(15) sm = rp2.StateMachine(0, blink, freq=100000000, set_base=Pin(15)) # Run the state machine for 3 seconds and scope pin 15. sm.active(1) time.sleep(3) sm.active(0)
High speed interface: DVI from RP2040
The ARM assembler (2/2/21)
Micropython supports the ARM-Thumb assembler instructions. CPU registers R0-R7 can be used by the assembler, with function inputs in R0-R2 and function results being returned in R0. Also, when you name an array as an input parameter to an assembler funciton, python actually passes in the address to the array. To test this, I coded up a single instruction assembler function which returns the absolute memory address of an array. Python does not really want you to know about addresses, but DMA controllers need pointers to arrays to move data. The following routine just takes the input in R0, copies to itself, and exits, returning the address of the input array name. (formatted code)
# test assembler and # implement 'addressof' a=array.array('i',[ 1, 2, 3]) # invoke assembler @micropython.asm_thumb # passing an array name to the assembler # actually passes in the address def addressof(r0): # r0 is the output register, so address beomes output mov(r0, r0) # now use the assembler routine addr_a = addressof(a) print(addr_a) print(machine.mem32[addressof(a)+8]) # returns '3'
The complete two-assemblers (PIO and ARM_thumb) test code: code.
Note that some browsers do not like to see python code! You will need to rename the *.txt file.
Right-click the link and choose save link as...
(An image of the code.)
Code generation options in micropython (2/11/21)
Micropython (MP) generates a bytecode executable by default. Bytecode is compact, but slow compared to inline code. There are a least three other code generators. (see maximizing speed). The native code generator takes unmodified MP and generates a mix of inline and MP calls. The viper code generator requires some modifications/simplifications of MP source, including variable typing in some cases. It can generate inline assembler in some cases, but cannlot optimize across function calls. The assembler code generator allows you to enter ARM-M0 Thumb assembly code, linked by a simple memory map to the rest of the MP code.
I wrote a program to start testing these options, as well as other timing in MP.
( As usual, Windoze does not like to download Python, so you will need to rename it. )
Micropython co-routines (cooperative scheduling) and threading (multicore) (2/12/2021)
-- Cooperative scheduling
Cooperative scheduling on the rp2040 is similar to Protothreads on the PIC32. Asyncio is a stackless, non-premptive, light-weight task handler. Some descriptive material:
Cooperative scheduling Tutorial (uasyncio module) and demo code
Documentation for version 3
Async version 3 overview
--True threading is currently very limited. From the command line (import _thread, then help(_thread)) you can see that the functionality is a subset of Cpython thread. According to the pico python SDK manual, section 3.5, You can start just one thread on the second core. However the cpu manual indicates that there is hardware support for core-to-core communication in the form of two unidirectional FIFOs, and up to 32 hardware spin locks.
The first example starts two cooperative tasks on core0 using the asyncio library and one separate thread on core1. Programs on each core can determine which core they are running on by reading a memory address. The first part of the code defines the SIO base address, which happens to be the cpu id, and defines the core1 functions. Core1 just toggles an LED and prints out the core id. The second part defines the two asyncio, cooperative tasks running on core0. One task handles i/o to UART0, and the other blinks the onboard LED. The third part starts the two syncio tasks on core0 and the thread on core1, and allows for stopping the program with cntl-c on the REPL console. On both cores, the print command prints on the REPL (USB connection), and the streamwriter on core1 prints to UART0.
The second example tests the inter-core FIFO communication and inter-core spinlock hardware. Each core can write to one FIFO and read from the other FIFO. Each core has a FIFO read address, FIFO write address, and FIFO status, with bits to indicate that space is available to write, and that data is available to read. We need at least five functions, FIFO_read, FIFO_write (both bolcking), FIFO_read_status, FIFO_write_status, and FIFO_purge. FIFO_purge insures that there are no items in the FIFO left over from another program. The spinlocks require two functions, spin_acquire and spin_release. The spin_acquire function takes a lock number, 0 to 31, and can be either blocking or nonblocking. The code uses spin locks to protect a varialbe incremented by both cores. The FIFOs send the variable back and forth between the two cores. The overall effect is to increment the variable by two every time it is printed to the REPL console in CORE1 (CORE1 code). One of two tasks on CORE0 bounces the variable back to CORE1. To avoid deadlock after another program runs, the task initially clears the spinlock and checks to see if there is data in the FIFO.
Setting up for C (Hunter 2/14/2021)
Hardware divider (2/28/2021)
The SIO contains a 8-cycle hardware divider for each core. I wrote test code for it in micropython, then coded it again as a assembler program to test speed. If you stick to python, it is faster to just use the integer divide operator, //, than to invoke the hardware divider directly. The code shows how to directly touch hardware registers. Every time you load a divisor or dividend into the hardware divide inputs, a calculation is started. You can check a done-bit, or just wait 8 cycles. One test result is below. Dividing 37/6 give 5, with remainder 1. The assembler loop time is 168 nSec, of which 72 nSec is the actual divide.
HW div 7 36 5 1
asm_divloop_time = 0.16851 count= 100000
asm div 7 36 5 1
DMA direct memory access (3/5/2021)
Python does not really want you to know about addresses, but DMA controllers need pointers to arrays to move data. The array data type packs data sequentially, like C, and is adressable. A one line assembler program(see above) extracts the address of the zeroth element of an array, and the other elements can be accessed from this base pointer by addition.The DMA system has LOTs of options, including ability to loop on a section of memory, send one address repeatedly (perhaps from a peripherial), chain channels, start a transfer triggered by one of about 60 events, and transmit 8, 16 or 32-bit wide data. The data rate is very high, approaching one transfer per clock cycle.
-- The first example just copies one array to another. The code just sets up the options for channel control. First the addresses of the control registers, then the configuration bits. Then, Two arrays are defined, their addresses determined, and channel zero configured for no-interrupt, permanent trigger, read and write increment, and a data width of 32-bits.
DMA_src_addr = addressof(DMA_src) machine.mem32[DMA_RD_ADDR0] = DMA_src_addr DMA_dst_addr = addressof(DMA_dst) machine.mem32[DMA_WR_ADDR0] = DMA_dst_addr machine.mem32[DMA_TR_CNT0] = len(DMA_src) # this writ starts the transfer perm_trig = 0x3f # alwas go data_32 = 0x02 # 32 bit # set up control and start the DMA machine.mem32[DMA_CTRL0] = (DMA_IRQ_QUIET | DMA_TREQ(perm_trig) | DMA_WR_INC | DMA_RD_INC | DMA_DATA_WIDTH(data_32) | DMA_EN )
MicroPython Notes (mostly for myself) (started 1/28/21)
Copyright Cornell University March 5, 2021