Cornell University ECE4760
Sliding Windowed Infinite Fourier Transform (SWIFT)
Pi Pico RP2350

SWIFT algorithm.
The SWIFT algorithm is a Fourier transform technique which updates the transform at every input sample time, rather than computing a transform on a block of samples, like the FFT. SWIFT also uses an exponentially weighted window, and hence has an infinitely wide window, unlike the finite window of the FFT. These features have several very handy consequences:

  1. The latency for an updated spectrum is one sample, hence completely real-time.
    An exponential window gives more weight to recent data.
  2. You only have to transform the frequences you need.
    You do not need to produce a full spectrum, although you can.
  3. Since the window is infinite, frequencies are not at fixed spacing like the FFT.
    There is a bandwidth parameter for each transform frequency bin you want to compute.
    Setting the time-constant higher makes the bandwidth narrower.
    Of course, time/bandwidth limits mean that very narrow frequency bins may converge slowly.

Each transform frequency bin, at frequency ω, is computed as a damped, complex resonator, with a real input. The estimate of the frequency bin amplitude at time n is Xn(ω) which is, of course, a complex value. Xn(ω) is computed iteratively from the n-1 time step estimate by slightly decaying the complex amplitude toward zero, advancing the phase by one time step, then adding the real input sample.

Xn(ω) = e(-1/τ) × e(jω) × Xn-1(ω) + xn

The units of ω are radians per sample time,
ω=2*π*F/Fs.
Fs is the sample rate and F is the frequency of the transform bin in Hz.

The decay time constant, τ, has units of number of samples. The time constant determines the bandwidth of the bin. Bigger time constant means narrower bandwidth. Typical values might range from 20 to 100, but could be different for each frequency bin. For speech, τ(ω)=3*Fs/F seems to work well.

Since both τ and ω are constants, the exponentials can be computed once and stored in a table. The real-time computation reduces to one complex multiply and one real add per sample. The transform noise level can be significantly lowered with a better window consisting of two exponentials. This doubles the arithmetic. In the paper linked above this is refered to as the α-SWIFT version, which is what I actually implemented. The α-SWIFT algorithm is:

Xn(ω)α = Xn(ω)slow - Xn(ω)fast

The only difference between the fast and slow spectral estimates is the value of τ Where τslow > τfast > 0. The fast time constant is set to a few samples. The slow time constant is set as above. The overall effect of the two time constants is to reduce spectral leakage by removing the discontinuity in the window when new samples are added.

Example of speech analysis and re-synthesis.
The speech example runs at a sample rate of 10,000 samples/sec. Most of the energy in speech is below 4 KHz, and above ~100 Hz. For testing I transformed the signal into 200 bins from 0 to 4 KHz, with 20 Hz bin spacing. The resulting spectrum, log spectrum and spectrogram was displayed on the VGA. The spectrogram time slice is around 20 milliseconds.

The signal is high-passed at around one Hz to remove residual DC offset. Input comes from a line-level output from a computer. Since the spectral estimate of each frequency is computed as a time-varying phasor, a crude reconstruction of the original signal was made by just summing the real parts of all the frequency bins and feeding the scaled result to a DAC. The sampling ISR, which computes the α-SWIFT and inverse uses about 0.6 of one cpu at 150 MHz.

Code, Project ZIP


Animation of SWIFT algorithm
SWIFT computes the Fourier transform for each frequency separately, just as the DFT does. Unlike the DFT, the correlation between the complex frequency and the input is computed sample-by-sample, with a exponentially decreasing weight for older samples. One way to think of this is to consider the current transform estimate for a given frequency to be a slowly-decaying phasor rotating at that frequency. The next input real-valued sample is just added to the phasor at whatever the phasor's current phase happens to be. The video below shows an input sine wave with frequency of 0.02 radians/time step applied to five complex phasor resonators. The phasor is shown as a rotating white line. The real input is shown as a horizontal yellow line oscillating left-right. The output waveform is the real part of the phasor polotted vs time. You can see that the output of the center phasor becomes much greater than the other because the input freqeucy is always leading the phasor, and therefore driving it to higher amplitude.

 

This scheme requires about eight real adds and eight real multiplies per computed transform frequency, per time sample. It is thus less efficient in terms of cycles compared to FFT, but has several advantages. The main advantage is that it is completely real-time, so there are no FFT block artifacts resulting from artificial block boundaries. Another advantage is that the bandwidth of each trasformed frequency can be set to a different bandwidth, unlike the FFT which is constrained by the number of points in a block and the block duration. The time constant of the exponential weighting function sets the bandwidth. Longer time constant means narrower bandwidth, the usual time/frequency tradoff. Also, since the exponential window is infinite, the values of the transform frequencies are not quantized. The FFT frequency increment is fixed at the reciprocal of the block duration.

The program has a double-buffered animation thread and a serial thread. The serial thread recognizes several commands to control the graphics display. There are commands to

Code, ZIP of project


Investigating the decay-time vs bandwidth tradeoff.
A


Copyright Cornell University June 3, 2026