Hardware Implementation

The pitch shifting was done entirely in hardware on the FPGA in real time. The DE2 audio codec was used to sample from the microphone input and output the pitch-shifted voice to the audio DAC. To accomplish this, we set the audio control path register (0x08) via an I 2C interface to sample the microphone. We found that with the default settings provided to us in a previous lab (lab 3), the output to the speakers were not loud enough. Therefore we boosted the gain on both input channels and both output channels (registers 0x00, 0x02, 0x04, and 0x06). Figure 4 shows all the registers that control the audio codec and their functions.

Name

Address

Functions

Line In

00h

bit 4:0 is line volume.
bit 7=1 mutes.
bit 8=1 locks line vol together.

Right Line In

02h

bit 4:0 is line volume.
bit 7=1 mutes.
bit 8=1 locks line vol together.

Left headphone out

04h

bit 6:0 is volume.
bit 7=1 enables zero crossing.
bit 8=1 locks vol together.

Right headphone out

06h

bit 6:0 is volume.
bit 7=1 enables zero crossing.
bit 8=1 locks vol together.

Analog Audio Path Control

08h

bit 0=1 is mic boost.
bit 1=1 is mic mute.
bit 2 selects input to ADC (mic=1 or line=0)
bit 3=1 enables bypass
bit 4=1 turns on DAC
bit 5=1 turns on sidetone
bit 7:6 is sidetone volume

Digital Audio Path Control

0ah

bit 0=1 disables highpass filter
bit 2:1 is de-emphasis control 11=48kHz, 10=44kHz, 01=32kHz, 00=disable
bit 3=1 is DAC soft mute
bit 4=1 stores DC offset

Power Down Control

0ch

 

Digital Audio Interface Format

0eh

bit 1:0 is format, use 01=MSB-first, left-justified
bit 3:2 is bit length, use 00=16 bits

Sampling control

10h

bit 0=0 normal mode
bit 1=1 for best oversampling

Active control

12h

bit 0=1 turns on codec

Reset control

1eh

writing 0x00 resets

Figure 4. Audio Codec control registers

 

Filling the input buffer

Two state machines were implemented to sample the microphone input, stretch it, and then output the result to the audio DAC. Four m4k memory blocks were used to buffer the data coming from the microphone input and to buffer data to be sent to the DAC for outputting to the speakers. M4K blocks can be used in parallel. That is, we can load from one m4k block and write to another one at the same time. This was very handy because not only did it increase the efficiency of our algorithm, but it also simplified the two state machines we used. We decided that the sample size that we would pitch shift would be 256 samples and therefore the maximum size of the input m4k block was 256 words. The maximum size of the output m4k block was 1024 words to give a range of frequency shift up to a 4x increase.

One state machine (figure 5 ) sampled the data at the microphone input to an m4k block and put data into the DAC from another m4k block. At any given point in the program, we were putting samples into one m4k block and outputting from the other. The initial state was that buffer in2 was the input and buffer p1 was the output. After 256 ticks of the audio DAC clock, buffer in1 was input and buffer p2 was the output. This buffer switching was done so that at any point, one output buffer, p1or p2¸ was being filled with time stretched data while the other was being outputted. The reasoning is the same for the input buffers in1 and in2. Once one input buffer was full, this state machine signaled the other state machine that data was ready to be processed via the control signal input_ready.

 

Figure 5: Input buffer filling state machine

 

Time stretching the sampled input

Once one of the input buffers was full, the fill state machine signaled the other state machine, which performed the time stretching. The algorithm first determined which output buffer it would need to clear so that new time stretched data could be stored. This was done with two memory state registers called playing and clear_data. For all intents and purposes, these two registers were the same at any time, but we decided to use separate ones to clear and fill the appropriate register for easier readability of the code. These variables kept track of the previous state of the fill state machine. To clear the appropriate m4k block, we iterated through the entire block length of 1024 words and wrote zeros in each location. Once the appropriate block was prepped, the state machine began to time stretch the data.

The user could choose to either increase or decrease the frequency by a factor of 2 or 4. This is selected using switches SW[2:0] on the DE2 board:

Switch Position (SW 2-0)

Frequency Shift

000

No change

001

2x increase

010

4x increase

100

2x decrease

101

4x decrease

A 2x increase in frequency meant that the input data would be stretched to twice its normal length. Therefore if the input was 256 words, the output would contain 512 words. A 2x decrease meant that opposite would happen. If the input was 256 words, the output would contain 128 words.

As it turned out, we were unable to fully implement the time stretching algorithm detailed in the “Theory” section. Since that algorithm requires the overlapping blocks to be faded in and out using the cross-correlation, it was computationally too expensive to perform in real time. We were forced to come up with a simpler way to time stretch the signal.

Since we are only transforming blocks that are 256 samples long, we made the assumption that the signal over this time would be periodic (meaning that the pitch doesn’t change much in this short interval). This assumption is mostly correct, especially for a person’s voice.

Time stretching a periodic signal can be achieved by repeating or truncating the signal a certain number of times. For example, to stretch a sine wave by a factor of 2, you only have to repeat the same data to produce a sine wave of double the length with the same frequency. Likewise, to stretch it by a factor of 0.5, you just have to truncate the signal half-way through.

We decided to take this approach to time stretching because it could be performed within the time constraints of our system and produced fairly good results. Our method does produce some discontinuities in the output signal, especially when truncating the signal. This occurs because the signal might not be perfectly periodic within the 256 sample long interval, or truncating it might result in some data being lost. An example of when these problems would occur is shown below.

 

Figure 6: Errors that can occur using our time stretching method

 

We attempted to remove some of the discontinuities caused by this problem using a three point averaging filter, which did help somewhat. We implemented this method in hardware by setting the transform_length variable appropriately (512 for a = 2, 128 for a = 0.5, etc.) The following diagram illustrates the time stretching (or compressing) state machine.

 

Figure 7: The buffer clearing and time stretching state machine

 

Playing the time stretched samples

In order to play the time stretched data and obtain our pitch shifted waveform, we resampled the data at a rate relative to the amount of time stretching we performed. This was done in the input buffer filling state machine. Since we could not change the sample rate of the output without changing the sample rate of the input, we instead changed which samples were being played from the output buffer. Normally, if the switches are set so that there is no time-stretching, the output is sampled at the same rate as the input. This is done by having by reading from the same address index in both the input and output buffers. The address was determined by a register called bufferIndex.

However, if the output data was time stretched or compressed, the data would need to be sampled at a higher or lower frequency. This was accomplished by shifting the contents of bufferIndex left or right before putting it into the output buffer address. So if the samples were stretched by a factor of 2x, bufferIndex would be multiplied by a factor of 2, which is a shift by one to the left.

Other Methods Tried

 We also tried to accomplish pitch shifting in the frequency domain using the built-in FFT megafunction in Quartus. The approach seemed simple enough and Altera provided documentation on using the FFT block generated by the wizard. In fact, the FFT block supported a variety of input types as well as the ability to do an inverse transform. However, we found that following the simple timing diagram provided in the FFT documentation was not enough. Like many of the complex functions provided by Altera, the documentation was insufficient to create a working piece of hardware in a short amount of time. Perhaps if we had started with this approach rather than tried it halfway through our project, we would have had more success, but that was not the case.