rp2350 SWIFT

Cornell University ECE4760
Sliding Windowed Infinite Fourier Transform (SWIFT)
Pi Pico RP2350

SWIFT algorithm.
The SWIFT algorithm is a Fourier transform technique which updates the transform at every input sample time, rather than computing a transform on a block of samples, like the FFT. SWIFT also uses an exponentially weighted window, and hence has an infinitely wide window, unlike the finite window of the FFT. These features have several very handy consequences:

The latency for an updated spectrum is one sample, hence completely real-time.
An exponential window gives more weight to recent data.
You only have to transform the frequences you need.
You do not need to produce a full spectrum, although you can.
Since the window is infinite, frequencies are not at fixed spacing like the FFT.
There is a bandwidth parameter for each transform frequency bin you want to compute.
Setting the time-constant higher makes the bandwidth narrower.
Of course, time/bandwidth limits mean that very narrow frequency bins may converge slowly.

Each transform frequency bin, at frequency ω, is computed as a damped, complex resonator, with a real input. The estimate of the frequency bin amplitude at time n is X_n(ω) which is, of course, a complex value. X_n(ω) is compute iteratively from the n-1 time step estimate by slightly decaying the value toward zero, advancing the phase by one time step, then adding the real input sample.

X_n(ω) = e^(-1/τ) × e^(jω) × X_n-1(ω) + x_n

The units of ω are radians per sample time,
ω=2*π*F/Fs.
Fs is the sample rate and F is the frequency of the transform bin in Hz.

The decay time constant, τ, has units of number of samples. The time constant determines the bandwidth of the bin. Bigger time constant means narrower bandwidth. Typical values might range from 20 to 100, but could be different for each frequency bin. For speech, τ(ω)=3*Fs/F seems to work well.

Since both τ and ω are constants, the exponentials can be computed once and stored in a table. The real-time computation reduces to one complex multiply and one real add per sample. The transform noise level can be significantly lowered with a better window consisting of two exponentials. This doubles the arithmetic. In the paper linked above this is refered to as the α-SWIFT version, which is what I actually implemented. The α-SWIFT algorithm is:

X_n(ω)_α = X_n(ω)_slow - X_n(ω)_fast

The only difference between the fast and slow spectral estimates is the value of τ Where τ_slow > τ_fast > 0. The fast time constant is set to a few samples. The slow time constant is set as above. The overall effect of the two time constants is to reduce spectral leakage by removing the discontinuity in the window when new samples are added.

Example of speech analysis and re-synthesis.
The speech example runs at a sample rate of 10,000 samples/sec. Most of the energy in speech is below 4 KHz, and above ~100 Hz. For testing I transformed the signal into 200 bins from 0 to 4 KHz, with 20 Hz bin spacing. The resulting spectrum, log spectrum and spectrogram was displayed on the VGA.

Since the spectral estimate of each frequency is computed as a time-varying phasor, a crude reconstruction of the original signal was made by just summing the real parts of all the frequency bins.

Code, Project ZIP