Cornell University ECE4760
Sliding Windowed Infinite Fourier Transform (SWIFT)
Pi Pico RP2350
SWIFT algorithm.
The SWIFT algorithm is a Fourier transform technique which updates the transform at every input sample time, rather than computing a transform on a block of samples, like the FFT. SWIFT also uses an exponentially weighted window, and hence has an infinitely wide window, unlike the finite window of the FFT. These features have several very handy consequences:
Each transform frequency bin, at frequency ω, is computed as a damped, complex resonator, with a real input. The estimate of the frequency bin amplitude at time n is Xn(ω) which is, of course, a complex value. Xn(ω) is compute iteratively from the n-1 time step estimate by slightly decaying the value toward zero, advancing the phase by one time step, then adding the real input sample.
Xn(ω) = e(-1/τ) × e(jω) × Xn-1(ω) + xn
The units of ω are radians per sample time,
ω=2*π*F/Fs.
Fs is the sample rate and F is the frequency of the transform bin in Hz.
The decay time constant, τ, has units of number of samples. The time constant determines the bandwidth of the bin. Bigger time constant means narrower bandwidth. Typical values might range from 20 to 100, but could be different for each frequency bin. For speech, τ(ω)=3*Fs/F seems to work well.
Since both τ and ω are constants, the exponentials can be computed once and stored in a table. The real-time computation reduces to one complex multiply and one real add per sample. The transform noise level can be significantly lowered with a better window consisting of two exponentials. This doubles the arithmetic. In the paper linked above this is refered to as the α-SWIFT version, which is what I actually implemented. The α-SWIFT algorithm is:
Xn(ω)α = Xn(ω)slow - Xn(ω)fast
The only difference between the fast and slow spectral estimates is the value of τ Where τslow > τfast > 0. The fast time constant is set to a few samples. The slow time constant is set as above. The overall effect of the two time constants is to reduce spectral leakage by removing the discontinuity in the window when new samples are added.
Example of speech analysis and re-synthesis.
The speech example runs at a sample rate of 10,000 samples/sec. Most of the energy in speech is below 4 KHz, and above ~100 Hz. For testing I transformed the signal into 200 bins from 0 to 4 KHz, with 20 Hz bin spacing. The resulting spectrum, log spectrum and spectrogram was displayed on the VGA.
Since the spectral estimate of each frequency is computed as a time-varying phasor, a crude reconstruction of the original signal was made by just summing the real parts of all the frequency bins.
Code, Project ZIP
Copyright Cornell University May 13, 2026