Bolero Surround Sound

A project that allows a user to listen to synthesized instruments located in different places around a room.

Introduction

Maurice Ravel’s Bolero is a piece where many instruments carry and pass along the same melodic line. This project aims to recreate the experience of standing in a room of musicians playing a snippet of this iconic work. Five different instrument sounds are synthesized to play the score’s three most common parts. A user has the ability to change the location of the instruments and hear their parts come from different directions. A VGA draws the instruments at their various locations, and updates as the user moves the instruments around.

High Level Design

This project was broken down into three main parts: music synthesis, sound spatialization, and a user interface.

Sound Synthesis

To synthesize various instruments, the Karplus-Strong Algorithm was used (see Figure 2). The Karplus-Strong Algorithm is a digital waveguide synthesis algorithm that can model a string using a drive function, delay line, and filter. The drive function provides the initial positions for every node on a string. Once the string’s nodes are “released,” the feedback delay line creates the string’s oscillations. The length of the delay line determines the frequency of the oscillations. The lossy filter recreates the string’s damping. A delay line with more loss will cause the oscillations to decay at a faster rate than a delay line with less loss.

Sound Spatialization

To manipulate the sound so it had a directional component, we used the Center for Image Processing and Integrated Computing (CIPIC) Head Related Transfer Function (HRTF) Database from the University of California - Davis. Each instrument’s single mono channel is fed into a linear FIR filter module which uses the coefficients from the CIPIC HRTF database to produce a filtered instrument sound that sounds as if it is coming from either the left, right, or forward. After each instrument is filtered, the left channels of all instruments are summed and the right channels of all instruments are summed, before being outputted by the audio bus master.

User Interface

The user interacts with the system through a combination of the command line and VGA monitor. The command line is obtained by opening a SSH connection with the HPS and is used to start the program as well as to allow the user to move the instruments around a virtual room. The VGA monitor displays the virtual room and the current locations of the instruments so the user can visually place where the sounds are coming from, which adds to the overall illusion.

Figure 0: A system diagram showing different modules and seperation between FPGA and HPS

FPGA Design

Instrument Module

An instrument module is used to generate a note. It takes eight inputs: instr, type, duration, string_len, clk, GlobalReset, reset, and FIR_done. The instr input selects the type of instrument. For example, this input can select whether this module outputs a flute, clarinet, oboe, cello, or marimba sound. The type input determines whether the module should output a plucked sound or a sustained sound. The duration input determines the length of a sound, if the sound is sustained. The string_len input determines the pitch (i.e. frequency) of the sound. The clk input is the global 50 MHz clock. The GlobalReset input tells the module to load a new drive function into M10K memory. The reset input starts a new note. The FIR_done input tells this module that the FIR filter has completed its calculation, and is ready for a new sample. The instrument module, at fastest, can output a new note sample once every four cycles. In actuality, the output rate of this module is synced with the output rate of the FIR filter.

To generate the sound samples, the Karplus-Strong Algorithm was used. This algorithm takes a drive function and uses feedback to create a note, as shown in Figure 2. The difference between the algorithm for a plucked sound and a sustained sound is the repetition of the drive function. If the drive function is only input once into the system, the output will be a plucked sound. If the drive function is input periodically, the output will be a sustained sound.

Figure 1: The instrument module

Figure 2: The Karplus-Strong Algorithm for plucked sound (top); the Karplus-Strong Algorithm for a sustained sound (bottom). The drive function is highlighted in blue; the delay line is highlighted in red.

The pitch of the output sound is determined by the length of the drive function, which is also the length of the delay line. Because the audio CODEC has an output rate of 48,000 samples/second, the frequency of the output note is (48,000 samples/second)/(N samples) = 48,000/N Hz, where N is the length of a single period of the drive function.

The Karplus-Strong feedback loop uses a low-pass filter (LPF) and contains loss. The LPF is a simple average of the last two samples. The loss is on the order of 0.7%. This loss was chosen because it allows the algorithm to generate a tone without overflow. Without any loss, the loop would build up due to positive feedback and overflow. With too much loss, the drive function would completely dominate and the output will not converge to a sinusoidal-like wave.

Different instruments, or tambers, are produced by altering the drive function. The different drive functions are shown in Figure 3. Interestingly, we found that a plucked clarinet sounded like a plucked cello, and a plucked flute sounded like a marimba. Thus, the instr input selects the clarinet and flute drive functions when a cello or marimba sound is wanted, respectively.

Figure 3: The drive functions for different instruments.

To save register memory, the selected instrument’s drive function and delay-line values are stored in M10K memory. In our first implementation of this project, we only used register (ALM) memory. We quickly discovered that our project would not fit in ALM memory due to the use of large register arrays in the instrument and the FIR modules. Two 20x512 M10K blocks were used (per instrument module) to decrease our ALM memory usage. This ultimately allowed our project to fit on the SoC.

The instrument module’s finite state machine (FSM) is shown in Figure 4. Upon a user-input reset or a change of instruments, the instrument module enters its GlobalReset state. In that state, the selected instrument’s drive function is written to M10K over the course of 512 cycles. In each of the 512 cycles, a new drive sample value is written to M10K.

In the subsequent states, the drive function and previous-delay-line values are read from M10K. During the first iteration of the loop, the drive function is the output, so a read from the previous-delay-line M10K block is not necessary for the output calculation. Once the output is calculated, it gets stored in the previous-delay-line M10K block and the module returns to the Read State.

Figure 4: Instrument model FSM

Every time there is a GlobalReset, 512 values are written to the drive-function M10K block. To achieve different output pitches, the pointer that reads from the M10K blocks circulates back to the string_len value. Therefore, higher-frequency notes will only use a small part of its drive function; lower-frequency notes will read more of the drive function.

The duration input is used if a note is sustained (i.e. type is sustained). The instrument module will continue to feed the drive function into the feedback loop for the duration number of samples. Once the drive function stops getting fed into the feedback loop, the output naturally decays (similar to a pluck’s decay).

The duration input is also used to amplitude-modulate the drive function as it gets fed into the loop. This was originally done to achieve a string instrument’s bowed sound, however, this feature was kept because it adds a smoothness to the start of a wind instrument’s sustained notes.

Figure 5: Amplitude modulation of an example drive function

Score Module

The score module instantiates and drives three instrument modules. One of the instrument modules is a plucked cello and one is a marimba. The third instrument module switches between three instruments: a flute, clarinet, and oboe.

Figure 6: The implemented score

The flute plays the melody the first round, the clarinet plays the melody the second round, and the oboe plays the melody the third round through the piece. The entire piece (flute, clarinet, oboe) continues looping until the user stops it via the HPS command line. Figure 6 shows the score that was implemented in this module.

A counter variable determines the place in the piece. This variable increments every time the FIR filter reads a new sample. When a new note is to be played, the score module drives an instrument module’s reset, duration, and string_len inputs. The duration is calculated using:

Duration = (Number of beats)*BPM*(60 seconds/ minute)*(48,000 samples/ second)

In the above equation, BPM is Beats Per Minute, the standard unit for tempo. The tempo of the piece is 70 BPM. As an example, an sixteenth note (which is a quarter of a beat) will have a duration:

0.25 beats * 1/70 min/beat * 60 sec/min * 48000 samples/sec = 10,285 counts

Some of the longer notes are given a duration that is shorter than what the above equation calculates. This was done to allow the sound to decay before the next note is played. The string_len input was calculated using:

Length = (48,000 samples/sec) / (Desired frequency, Hz)

This equation provides the necessary length (i.e. necessary number of samples) of the drive function to get the desired frequency. Because the drive function samples are stored in a 20x512 M10K block, the lowest frequency note obtainable is 93.75 Hz. The highest frequency note obtainable is around 3,135 Hz. The highest frequency note is limited by the frequency difference between higher notes. Notes are log-base scaled, and thus adjacent higher-frequency notes are more closely spaced than adjacent lower-frequency notes. The Bolero clip uses the frequency range 130 - 587 Hz, which is well within our obtainable range.

FIR_filter Module

The FIR filters are a collection of three different modules, each corresponding to a different direction. Based on the results of a large mux, different instruments would be fed into different filter modules in order to achieve the effect of spatial sound.

At reset, the 400 HRIR (200 left and 200 right) coefficients are stored in two 200 entry arrays of 27 bit registers which is used for the remainder of the program. This process takes 200 cycles because we could only fill one index of each array every cycle. In addition to these HRIR arrays, there is another array x that saves the previous 200 sample values within the filter module, but is initialized to be full of zeros. After these first 200 cycles are past, the module is ready to begin filtering samples. Once the sample is synthesized and reaches the filter module, it is saved in the first index of the array x. Every cycle, a counter is incremented and which multiplies the value of the x array with its corresponding HRIR coefficient. The multiply is performed by two signed_27_mult modules in order to perform the left and right channels in parallel.

Figure 7: A depiction of how a sample is filtered. Cycles 0-199 only occur when the module is reset, while Cycles 200-399 occur every time a sample is filtered.

The product of every sample-coefficient pair is accumulated so that after 200 cycles, the sample is filtered and can be outputted. The filter module looks for the audio_done flag, signalling the audio bus master is ready to accept a sample before then putting its done flag high, signaling that the bus master can go ahead and read the output from the module. After this relay between the filter and bus master, the filter saves the next sample to the 0 index of the x array and shifts all previous values by one index to the right. This effectively makes x a running array of the previous 200 input samples.

HPS Design

User Interface

The user interface is run entirely on the HPS and gives the user a way to interact with the sound output as well as view the progress of the system. The program starts by spawning two pthreads, write_VGA and read_terminal. At startup, the read_terminal thread uses scanf() calls to wait for the user to press enter to start the program. Meanwhile, the write_VGA thread writes the welcome screen to the VGA and waits to be given the signal from the other thread that the user is ready to begin. After the user presses enter, the read_terminal thread monitors the command line for user inputs to turn the instruments left or right. This thread is also responsible for relaying the current instrument configuration to the FPGA through a parallel input/output port. The VGA thread constantly checks if the configuration of the instruments has changed and if so, draws the instruments in the new correct locations.

Results

Sound Synthesis

The instrument and score modules were able to create the Bolero clip shown in Figure 6. Both modules were tested in ModelSim before FPGA synthesis.

In ModelSim, the instrument module was able to produce Figure 8’s waveform for a plucked-string sound. As shown in the screenshot, the output builds up quickly during the first few cycles, and then exponentially decays afterwards.

Figure 8: ModelSim waveform for a plucked string

The instrument module produced Figure 9’s waveform for a sustained sound. Amplitude modulation is used at the start of this waveform, which is partially why the output builds up over a sustained time. Once the drive function stops getting fed into the loop, the output exponentially decays.

Figure 9: ModelSim waveform for sustained sound

Putting the plucked and sustained sounds together produced the waveform in Figure 10. The melodic line is dominant throughout the waveform; it provides the main build up and fall of output amplitude. The melodic line is dominant because it is a sustained sound, and therefore builds up more energy due to the consistent feeding of the drive function into the feedback loop. The other parts are visible in the “fuzz” around the melodic waveform. These parts synthesize lower frequency notes, which is why they appear more sparse.

Figure 10: Modelsim waveform for plucked and sustained sounds together, driven by the score module

The frequencies produced by the instrument module are accurate enough to be recognizable notes; however, they are not perfectly in tune. The input used for note frequency, string_len, had to be rounded because it is used as a memory pointer. The frequency errors are tabulated in Table 1. All the produced frequencies lie within 2 Hz of the correct frequency.

Note Correct Frequency (Hz) Produced Frequency (Hz) Frequency Error (Hz)
C3 130.8 130.79 -0.01
G3 196 195.9 -0.1
C4 261.6 262.3 +0.7
D4 293.7 294.4 +0.7
E4 392.6 328.8 -0.8
F4 349.6 350.3 +0.7
G4 392 393.4 +1.4
A4 440 440.3 +0.3
B4 493.9 494.8 +0.9
C5 523.3 521.7 -1.6
D5 587.3 585.4 -1.9

Table 1: Table of frequency error

Sound Spatialization

Before synthesizing our filters, we needed some way to verify they were working correctly. We originally developed the filters in ModelSim and exported the filtered output and compared this signal to one we knew was filtered correctly. Figure 11 shows a comparison of the same phase-shifted sine wave signal filtered by our Verilog FIR filter and the MATLAB filter() function. This figure shows that our filter module works correctly and generates filtered samples comparable to MATLAB.

Figure 11

Besides graphically comparing the two types of filters, we can also audibly confirm our Verilog filters were working correctly by listening to the filtered sine wave. You can click here to listen to a short clip that will play the original non filtered sine wave, then a MATLAB left filtered sine wave, and finally a Verilog left filtered sine wave. Use headphones to hear the full effect as there is quite a bit of distortion with regular speakers.

Conclusion

Our final project accomplished all our original goals. We intended on synthesizing different instruments, which we accomplished using the Karplus-Strong Algorithm. We also intended on changing the instruments’ sound direction, which we accomplished using HRTF coefficients in FIR filters.

With that being said, this project could be expanded and improved in a variety of ways. Currently, there is clicking in the output sound. This is caused by discontinuities in the samples given to the audio CODEC. The clicking could be removed if an additional averaging filter were used between output samples with large discontinuities.

Additionally, this project could be expanded to include more instruments and more sound directions. More instruments could be implemented by implementing more drive functions. More directions could be implemented by adding more FIR filters. Storing the filters’ coefficients in M10K memory would allow more instruments and filters to fit on the SoC.

Overall, we are satisfied with our design, and would like to thank Hunter Adams and Bruce Land for their guidance on this project.

Appendix

Appendix A

The group approves this report for inclusion on the course website.
The group approves the video for inclusion on the course youtube channel.

Appendix B

Check out the Verilog and C code here!

Appendix C

Diane was responsible for the sound synthesis, and wrote the instrument and score modules. Owen was responsible for the sound directionality and user interface, and wrote the FIR filter module and C script. Both members worked together to mesh and test the project components.

Appendix D

References

Get in touch

We had a blast making this project! Contact Owen or Diane with any questions.

Owen: ov37@cornell.edu

Diane: dms486@cornell.edu