Introduction

Digital audio effects are an important in the generation of musical signals. Modern popular music producers use a variety of effects to create the high-quality sounds we hear today. One effect that might be used is called pitch shifting. In short, it is the process of changing the pitch or frequency of an audio signal without changing its duration. The process can be used to correct the pitch or intonation of a musical instrument or voice or give the impression that there is more than one vocalist in a composition.

Pitch shifting can be performed on an FPGA in real-time using various digital audio processing techniques. One can take the Fast Fourier Transform (FFT) of a signal, alter it in the frequency domain and then take the inverse FFT. This is called frequency domain pitch shifting. Approaching pitch shifting in the frequency domain is mathematically intensive and therefore complicated to do in hardware.

Pitch shifting can also be done in the time domain using various techniques. Mathematically, the concepts are less complex than frequency domain pitch shifting and therefore relatively easier to implement. We approached our solution in the time domain. The technique involved time stretching the audio signal and resampling in real time.

Theory

Pitch shifting in the time domain involves two main steps. First, the audio signal must be time stretched by a factor of a, which is the user-specified pitch shifting factor. An a greater than one means the pitch is being increased, and an a less than one means it is being decreased. Time stretching a signal will increase/decrease the duration, but preserve the original frequencies. A simple example of time stretching a sine wave is shown below.

Figure 1: Time stretching by a factor of 2 and 0.5

An effective algorithm for time stretching called Synchronous Overlap and Add (SOLA) is explained in DAFX – Digital Audio Effects. It involves segmenting the input signal x(n) into smaller overlapping blocks x i(n), each time shifted by S a samples. These blocks are then overlapped again with a new time shift of a*S a and added together to produce the time stretched signal. To produce a high-quality time stretched signal, the parts of the blocks that actually overlap must be faded in and out as they are being added together. Below is a series of figures which shows the steps of this algorithm.

Figure 2: The basic time stretching algorithm

As it turns out, we were not able to implement this exact algorithm in hardware due to its complexity and the timing contraints of our project. Instead, we used a simplified time stretching method. Rather than overlapping the data blocks and cross-fading between them, we decided to repeat or truncate an audio block a set number of times to produce a signal with longer/shorter duration while preserving the frequency. This algorithm is explained in more detail in the "Hardware Implementation" section.

After time stretching, all that remains to be done is to resample the signal at a rate of 1/ a times the original sampling frequency . Since the time stretched signal is a*n samples long (where n is the original number of samples) with the original frequency, resampling at this rate will restore the original signal duration but change the frequency. The result of time stretching and resampling a sine wave is shown below.

Figure 3: Time stretching and then resampling by a factor of 2