ECE 4760 Final Project: Phased Array Speaker System

Introduction

Project Soundbyte "A phased array speaker system to generate flexible, directional sound"

For our ECE 4760 final project, we designed and built an array of 12 independently-controllable speakers to implement an acoustic phased-array system. The system samples a standard audio input signal at approximately 44.1 kHz, and then outputs this signal to each of 12 speakers, each with a variable delay. The idea behind a phased-array is that by changing how the speakers are driven the angle of the maximum intensity of the output wave can be shifted. This will be further explored in the high level design of the project. This type of array was built to be able to support various other more advanced design challenges, including longer-range acoustic modem transmission and sonar imaging.

High Level Design

Rationale

The initial rationale behind this project was our desire to build a sound system that can direct or shape sound into a specific shape. We knew about phased-arrays for use in microwave and EM applications, and knew the math was the same for acoustic waves. Furthermore, we found that building a wide-band phased array required true time delay between the elements, a task that a microcontroller is particularly well-suited for. The mathematical modeling we did showed that we would be able to steer sound intensity effectively given the size and complexity of an array we could build. Furthermore, all commercially-available speaker phased-arrays are prohibitively expensive - the least expensive we found was over $2000. These facts all inspired us to implement our own affordable speaker array.

Background Math

The physics of a phased-array of transmitters can be derived from the math for a diffraction grating. A diffraction grating assumes N sources of aperture a spaced distance d apart transmitting at a wavelength $\lambda$. Below is an example of such a system (in this case, the system we built).

In the far-field, the wave function of such a system is $$\psi = \psi_0 \frac{\sin(\frac{\pi a}{\lambda}\sin \theta)}{\frac{\pi a}{\lambda} \sin \theta} \frac{\sin (\frac{N}{2} \frac{2\pi d}{\lambda} \sin \theta)}{\sin(\frac{\pi d}{\lambda} \sin \theta)}$$ When we add a phase term, $\phi$, to the second part of the equation, it becomes $$\psi = \psi_0 \frac{\sin(\frac{\pi a}{\lambda}\sin \theta)}{\frac{\pi a}{\lambda} \sin \theta} \frac{\sin (\frac{N}{2}(\frac{2\pi d}{\lambda} \sin \theta+\phi))}{\sin(\frac{\pi d}{\lambda} \sin \theta + \phi)}$$ Next, we square the wave function to get what we are interested in, intensity of the wave. $$I = I_0 \bigg(\frac{\sin(\frac{\pi a}{\lambda}\sin \theta)}{\frac{\pi a}{\lambda} \sin \theta}\bigg)^2 \bigg(\frac{\sin (\frac{N}{2}(\frac{2\pi d}{\lambda} \sin \theta+\phi))}{\sin(\frac{\pi d}{\lambda} \sin \theta + \phi)}\bigg)^2$$ $$I = I_0 \bigg(\frac{\sin(\frac{\pi a}{\lambda}\sin \theta)}{\frac{\pi a}{\lambda} \sin \theta}\bigg)^2 \bigg(\frac{\sin (\frac{\pi}{\lambda}Nd \sin \theta+\frac{N}{2} \phi)}{\sin(\frac{\pi d}{\lambda} \sin \theta + \phi)}\bigg)^2$$ Since the second term is the only term dependent on the phase of the elements, we can estimate the maximum angle by setting the numerator of the second term to its maximum value, 1, and solving for $\theta$ in terms of $\phi$. $$\sin \bigg (\frac{\pi}{\lambda}Nd \sin \theta+\frac{N}{2} \phi \bigg)=1$$ $$\frac{\pi}{\lambda}Nd \sin \theta+\frac{N}{2} \phi = \frac{\pi}{2}$$ $$\frac{\pi}{\lambda}Nd \sin \theta = \frac{\pi}{2} - \frac{N}{2} \phi$$ $$\sin \theta = \frac{\lambda}{2 N d} - \frac{\lambda}{2 \pi d} \phi$$ Assuming N is significantly larger than $\pi$, we can see that the first term becomes negligible and $$\sin \theta = - \frac{\lambda}{2\pi d}\phi$$ $$\theta = \sin ^{-1}\bigg(-\frac{\lambda}{2\pi d} \phi\bigg).$$ Thus, we can see that we should expect the angle of maximum amplitude to be a function of both the phase shift of the wave, and its wavelength (assuming the distance between the sources is fixed). For microcontroller coding, we would prefer to not require different calculations for each frequency. Fortunately, some mathematical manipulation shows us an easier method of adjusting the angle for all frequencies at once.

First, we make the substitution based a fundamental wave equation relating wavelength, frequency, and the velocity of sound. $$\lambda f = v_s$$ $$\theta = \sin ^{-1}\bigg(-\frac{v_s}{2\pi d} \frac{\phi}{f}\bigg).$$ We then recognize that for any frequency, $\frac{\phi}{f}$ is equivalent to a time delay. Thus we get $$\theta = \sin ^{-1}\bigg(-\frac{v_s}{2\pi d} t_d\bigg).$$ Now we have an equation that relates the maximum power angle to delay time between the elements. This equation has no dependence on wavelength, indicating it should hold true for a large band of frequencies. Since many approximations were made throughout this derivation, we did significant simulation in MATLAB to show the principles found here apply.

MATLAB Simulation

The intensity equation was implemented in MATLAB to determine the wave intensity output of the speaker array with respect to the viewing angle. The adjustable parameters are the number of elements N, the distance between adjacent speakers d, the width of the speakers a, the wavelength of the sound wave $\lambda$, and the time delay between signals going to adjacent speakers. Once the mechanical set up of the speaker system was finished the only parameters that were free to change were the wavelength and time delay. The number of speakers was 12, the distance between adjacent speakers was 0.085m, and the width of the speakers was 0.07m.

Using the simulation we were able to show that the main lobe for different frequencies could be adjusted with a time delay. This is important because if different frequencies would not be shifted evenly for a given time delay then a more complicated solution would have to be implemented, such as taking the FFT (Fast Fourier Transform) of the input signal, adjusting the phase terms in the frequency domain, and then returning to the time domain using the IFFT (Inverse Fast Fourier Transform). The problem with this solution is that the microcontroller would not be able to perform the calculations before it was time to play the next data sample. The simulation also showed that a time delay would shift the beam in only one direction. For the beam to be steered to the other side a negative time delay is necessary, which corresponds to changing which end speaker is defined as the lead speaker. The following plots show steering multiple frequencies by changing the time delay. The first plot shows no time delay, and the maximum intensity of all frequencies are centered at 0 degrees.

The next two plots show time delay differences between the outputs of +.3ms and -.3ms respectively. As the graphs show, all the frequencies are shifted by the same angular amount, having their maximum at the same angle.

Thus, we have shown through simulation that wide-band beams can be steered through pure time delay, and no more advanced phase shifting is necessary.

Logical Structure

(Note: more detail on all the hardware and software discussed in this section can be found in the design section of this site)

The overall structure of our hardware design is illustrated in the flow-chart below:

Our input is taken from any standard audio source. Inspection of multiple audio outputs with an oscilloscope confirmed that standard output is generally a signal varying between plus and minus 1V. To get the full use of our Analog to Digital converter (ADC), we needed a signal that varies between 0 and 5V. Therefore, we used an input amplifier to re-bias and amplify the signal before going into the ADC. The ADC then communicates the digital value of this input signal to the ATmega microcontroller. The microcontroller performs the necessary mathematics for the desired phase shift between the speakers, and then outputs each channel to a Digital to Analog converter (DAC). This output is then passed to a speaker amplifier, which both applies a low-pass filter to the signal and buffers it before outputting it to the speaker. The low-pass filter is necessary because the use of DACs introduces quantization noise which we wanted to remove, and the buffer is necessary because the impedance of the speaker is only 8 ohms, much lower than the output impedance of the DAC.

The overall structure of our software design is illustrated below:

Upon power-up the microcontroller initializes its hardware. This includes setting up SPI communication with the ADC, setting up the internal ADC, and setting up the timer interrupts to ensure we get sound capture and play-back at a well-defined frequency. Once this hardware is initialized, the main loop of the program runs. This is fairly simple, it reads an input channel connected to a potentiometer which acts as the user interface, and then sets the delay for the desired angle accordingly. This main loop is interrupted at a frequency of 44.1 kHz to perform audio capture and playback. In the interrupt, the time-delayed outputs are sent to each channel of the DACs, and the new input data is sampled from the ADC. This code seems rather simplistic, but the biggest challenges came in the implementation details - specifically it was challenging to both get this code to run at 44.1kHz and to have it fit in the RAM effectively. These challenges will be discussed in the software design section.

Design Trade-offs

We had to make a few design trade-offs in hardware, most of which were budget-related. First, we had to trade-off between the number of speakers we used, and the quality of the audio produced by the speakers. Due to our budget constraints, higher-quality speakers with more advanced amplifier circuits would necessarily mean we had to use fewer speakers. We settled on using 12 speakers, which according to our simulation would give us relatively good frequency control in our desired range, while still being affordable. We also chose very low-priced speakers which necessarily hurt our sound quality, but we were mostly interested in this project being a proof-of-concept, and we knew we would get more effective results with more speakers in a larger array.

Another hardware design trade-off we made was in our selection of power-supply. We chose to use a computer power supply that we scavenged since it gave us the power we needed at the necessary voltage, and didn't count against our budget. The trade-off here is that the rails from the computer power supply are slightly noisier than those from an isolating power supply, and some of that noise came through on the speakers. Again, we were more interested in having a large array to prove our concept worked than having perfect sound quality, so budget-wise it made most sense to use the computer power-supply.

A more physical trade-off we had to consider was adjusting the distance between our speakers versus the size of the whole array. Our simulation showed that using a larger array gives us better control over lower frequencies, while using elements spaced more closely together gave us better control over higher frequencies. Since we could only fit 12 speakers into our budget, these two variables were at odds with each-other. We decided on a spacing of .5 inches between each speaker, yielding an array a little more than 1 meter long. Our simulation showed us these values gave us the best control over the frequencies we were interested in. The simulated results of the array spacing we chose is below, it has N=12, d = .085m, a = .07m, and td = .272ms:

An example of another possible array layout is one with a slightly larger spacing between the elements of .17m.

By looking at the differences in the plots we can see that the trade-off of adjusting the distance between the speakers. Having a small difference between adjacent speakers causes lower frequencies to have a very wide main lobe which reduces the effective directionality of the phased array. However, if the distance between adjacent speakers becomes too large then the higher frequencies tend to have grating lobes, which are side lobes with amplitudes close to the level of the main lobe. This tends to happen when the distance between adjacent elements is larger than half the wavelength that it is emitting.

Another hardware consideration was in the selection of our DACs. For simplicity of construction, we initially wanted to use DACs driven via SPI, as this would only require 3 or 4 wires per DAC. Unfortunately, we were limited by the maximum SPI output speed of the microcontroller. Given a maximum SPI clock speed of 10MHz, and 12 outputs with 8-bits of data each we see that it takes approximately 9.6$\mathrm{\mu}$s to complete the output, and this is almost half of the time we have for the interrupt, which repeats every 22.7$\mathrm{\mu}$s. This wouldn't be a problem on its own, but the fastest ADC we found within our budget took approximately 2/3 of the interrupt to complete its communication. Therefore, we instead decided to use a parallel-input DAC. The benefit of this style of DAC is we can set each output in two cycles - one to pick the output and one to write the output bits. Running at 20Mhz, this only takes 1.2$\mathrm{\mu}$s to set all the channels, only about 5% of the time available for interrupts. Unfortunately, this led to more physical complexity in the design.

A hardware trade-off we made during testing was our decision to use fans instead of additional circuitry to take care of thermal run-away in our circuits. The details behind the thermal run-away will be discussed in the hardware design section, but our decision to cool with fans again came down to budget. Fixing the thermal runaway issue in hardware would require more components that we would be unable to afford, while the fans we scavenged from old computers were free and kept the problem in check.

The main software trade-off we ended up making was with our user interface. We were initially hoping to have both a physical user-interface and a UART serial interface for increased flexibility. Unfortunately, we found that operating at 44.1 kHz to avoid aliasing took up nearly all of our processor time, and we couldn't afford the necessary overhead for serial communications. Therefore, we decided to just use the potentiometer input as our user interface.

Intellectual Property and Standards

The only applicable standard in the interface of our system is in the audio input to our system. Since we want it to conform with any consumer audio device, we need to match our input amplifier to the standard set forth for line-level, or the voltage level for audio lines. Multiple sources confirmed that consumer audio line-levels are at -10dBV, or a RMS of .316V with a peak-to-peak of .447V. Our own measurements of a few devices, including a laptop and a cell phone confirmed these numbers were approximately correct and apply for normal volume levels. Furthermore, at peak volume the devices actually output a signal closer to 2V peak-to-peak.

There is one existing patent on a phased-array speaker system (No. US 7,130,430 - Phased Array Sound System). Since this patent is for a very similar system, we intentionally did not read it to avoid reverse engineering their design. Our design was entirely based on our understanding of the physics of phased-arrays and methods of audio sampling and synthesis. There are no ethical concerns as we do not intend to seek a patent for our device, nor do we intend to sell it, nor have we reverse-engineered a purchased device. As of now, there is no doctrine for fair-use in patent law to refer to.

Another concern is that some of the music we have tested with is copyrighted. Fortunately, there is a fair-use doctrine for copyrights, and copyrighted music used in a nonprofit, educational environment has traditionally been protected. Since we are not distributing the music to others, and simply using it as an educational example, we are well within the bounds of fair use as defined by both the US Government, and Cornell University's policy on sharing music.

Implementation

Software: MCU Initialization | Main Loop | Timer 1 ISR |

Software

The crux of the software system is sampling and producing audio at 44.1 kHz to be compatible with standard audio.

MCU Initialization

In this section of the code we set up all the hardware on the microcontroller. First, we set up timer 1 to fire an interrupt at 44.15 kHz. This interrupt both samples our input audio, and writes to our output DACs. Since the range of human hearing is about 22kHz, 44.1 kHz is selected to be slightly higher than the frequency required by the Nyquist sampling theorem. The initialization also sets up the SPI communications, as well as the internal ADC input from one channel to provide the user interface.

Main Loop

The purpose of the main loop is to set the delay between consecutive speakers based on the input provided by the user on the potentiometer. A trade-off exists here between the number of states the system has, and the effective resolution of the potentiometer. Instead of directly linking the ADC value to the delay, we created 9 delay bins. The reason for this is it is much easier to center the potentiometer in a bin, and you don't have to worry about your delay switching with noise. We chose delays roughly evenly spaced from -50 cycles to 50 cycles (-1.1 to 1.1 ms), giving us a spatial resolution of approximately -55 to 55 degrees in the room for a 1kHz sine wave.

Timer 1 ISR

Writing this ISR was the trickiest part of the code. The tasks it has to perform are not particularly difficult, but doing all of these tasks in 22.7$\mu$s required some clever coding. The most limiting factor was the SPI speed of our input ADC. First, the ADC could only be run at an SPI clock speed of 1.25MHz. This would not have been a particular problem, as 8 bits at that speed only takes 6.4$\mu$s. Unfortunately, however, this particular ADC required 10 clock cycles to return its full conversion, as the first two cycles returned no data. Unfortunately, we did not find any way to get the SPI hardware on the ATmega to clock 10 cycles instead of 8. Therefore, we had to do two SPI transfers. This put us at 12.8$\mu$s, which is over half of the length of our ISR. We found that if we completed the entire SPI transfer and then set our output DACs, the interrupt wouldn't complete fast enough, and this would lead to high frequency noise. Furthermore, we couldn't have the SPI communications be interrupt-driven because they would not have been able to interrupt the interrupt. Our solution was to write a byte to the SPDR, perform half of our DAC output tasks, then check for the SPI complete flag. Once that flag was set, we sent another byte out on SPDR, did the other half of our DAC tasks, then waited for the flag to be raised again. By completing tasks in parallel with the SPI communication, we were able to run at 44.1 kHz.

Another issue we ran into during the ISR was that checking the bounds for the indices of the array was too computationally intensive, since there are 13 pointers in the array. Our first thought was just to use the natural roll-over of the uint8_t or uint16_t, but an array of 255 was too short, and an array of 65536 was far too large to fit into RAM. Our solution was to use modulo addressing, a technique commonly found in Digital Signal Processors (DSPs). This is implemented by making all the pointers 16-bits long, but then doing a bit-wise and with a mask to make them roll over sooner. This method requires you use an array that is a power of 2 in length, so we chose $2^{11}$, which gave us 2048 bytes, plenty for our storage but still small enough to fit into RAM. Therefore, every time we accessed the array we did a bitwise and of our pointer and 0x07FF. This caused the pointer to roll over back to zero every time it hit decimal 2048, which kept us from having to use bounds checking in the ISR.

Hardware

The entire setup was constructed from two scrap pieces of plywood held together with two aluminum box-beams. Holes were cut in one of the pieces the size of the speakers using a Dremel tool, and the speakers were constrained using four 4-40 screws and nuts each. The exact dimensions of the plywood are not important, as long is there is sufficient room for the full array of speakers and all the electrical components. The power supply was attached to the board with zip-ties, and the fan supply was taped on top of that. The boards were all held in place by stand-offs drilled through the board. The fans were restrained by hot glue. In the photo below, (1) is the power supply, (2) is the fans, (3) is the speakers, (4) is the input amplifier, ADC, microcontroller, and DAC, (5) is the speaker amplifiers, there are 4 per board.

Input Amplifier

The initial amplifier is designed to re-bias and amplify the audio input of -1 to 1V to 0 to 5V and is drawn below.

The first stage is a high-pass filter and re-biasing circuit, implemented by the first capacitor and the resistor divider. The corner frequency of this input high-pass filter is $\frac{1}{RC} = \frac{1}{.1 \mu F ~~ 80 k\Omega} = 125 Hz$. This is lower than the frequency response of our speakers, which is 200Hz, so it will allow all important frequencies through. The resistor divider between 12V and ground lowers the voltage to $12V \frac{100k\Omega}{100k\Omega+390k\Omega} = 2.44V$. Since we want the zero-amplitude wave to be centered at 2.5V, this does a good job.

The second stage of the input amplifier is implemented with an LF353P Op-Amp in a non-inverting topology, and it amplifies only the high-frequency components due to the capacitor between the lower resistor and ground. The amplification factor is $1 + \frac{R_1}{R_2} = 1 + \frac{200k\Omega}{150k\Omega} = 2.33$ for high frequencies. This will turn our -1 to 1V swing into a -2.33 to 2.33 V swing, which gets us near the rails, specifically to .11V and 4.77V. The corner frequency for this high-pass gain is $\frac{1}{RC} = \frac{1}{.1 \mu F ~~ 250 k \Omega} = 40Hz$, which is high enough to block DC amplification, but low enough to amplify the frequency range we are interested in.

An interesting problem while building this stage was that initially the op-amp would rail too easily. This only allowed us to get a voltage swing of about 1.5 to 3.5V, which was lower than we wanted. Instead of solving this by finding a different op-amp, we realized we had voltage rails available to us higher than 5V and lower than 0V. Using those prevented the op-amp from railing, but if one were building a system where only 5V is available, a different op-amp with less overhead should be substituted. Furthermore, it is important to note that the op-amp have a fast enough slew-rate to track a 22kHz sine wave from 0 to 5V.

Analog Digital Converter (ADC)

The purpose of the ADC is to sample the input audio signal and transmit it to the microcontroller. Its schematic is found below.

The ADC we used was the TLC0831. We used this ADC because it sampled just fast enough for us, was easy to use, and we were able to sample it for free. It is connected to the microcontroller via the standard SPI lines. We want the serial clock to be low when idle, which corresponds to CPOL = 0, and we want to sample on the leading edge and setup on the falling edge, which is equivalent to CPHA = 1. Since there is no possible configuration for this ADC, it only has a data out line. Therefore, it doesn't matter what is written to SPDR, as writing any value will activate the serial clock and read values in from MISO. There is one important note about this ADC, and it can be seen from the SPI timing diagram from the datasheet.

Notice the number of clock cycles in in the data transmission. It takes 10 cycles to transmit the 8 bits of data. Unfortunately, it is not possible to have the ATmega 644 run the SPI clock a custom number of times, it must be a multiple of 8. Therefore, two bytes must be written to SPDR to get the full ADC value back, and the bits must be shifted and stored appropriately.

If another ADC is substituted, the important parameters are that it be 8-bit, have a range from 0-5V, and can sample and communicate the result at a rate of at least 44.1kHz, preferably a little faster to allow for additional flexibility.

Microcontroller Setup

The microcontroller used in this setup is the ATmega 644. Another microcontroller could be substituted. The requirements are that the microcontroller must have at least 13 I/O pins, 1 ADC input, and SPI capabilities. Also, it must be able to run the SPI faster than 44.1kHz. The code is written using the AVR macros so different chips can be substituted by compiling with different chips selected in AVR studio. The microcontroller should be connected to power, decoupling capacitors, and an external crystal as recommended by Atmel. The crystal used in this setup should be 20MHz. Below is a table of the port connections of the microcontroller.

Microcontroller pin	Connection
Pin A7..5	A0..2 of both DAC chips
Pin A4	!WR of DAC 2
Pin A3	!WR of DAC 2
Pin A3	Wiper of potentiometer between +5V and Gnd
Pin B7	CLK of ADC
Pin B6	DO of ADC
Pin B4	!CS of ADC
Pin C7..0	D7..0 of both DAC chips

Digital to Analog Converter (DAC)

The DACs we used were two Maxim MX7228 octal, 8-bit DACs. The schematic for how we hooked them up is below.

The MX7228 is a parallel DAC. What this means is the output voltage is a function of the 8-bits set to DB0..7. Given an 8-bit value x, the MX7228 sets the output to $(V_ref-Gnd)*\frac{x}{256}$. The output that is set is defined by the 3 address bits, A0..2, and the register is written when the !WR line is pulled low. The existence of the !WR line allows us to address both DACs from the same 3 pins of the microcontroller and still maintain independent control of all 12 speakers. The code order for writing to one output channel is (1) Set Port C to desired output value, (2) Set Port A to the correct address and pull one !WR low, (3) Set Port A to pull both !WR lines high.

One important issue we ran across with this DAC is that when we were originally applying 5V to Vdd and 0V to Vss the output was unable to achieve the entire 5V range. Fortunately, we had a 12V and -5V rail available to us, and by applying these we were able to achieve the full desired range of the DAC.

If a substitution is desired, the DAC must be fast enough that 12 channels can be written at a rate of 44.1kHz, or preferably faster to allow more flexibility. This usually means a parallel-input DAC should be used, since even at a high SPI frequency it takes more time to write to a serial DAC than a parallel one, and there is not a shortage of output ports on the Microcontroller.

Speaker Amplifier

To drive the speakers, we needed an additional stage after the DAC. Our speaker amplifier is below.

Although a DAC can produce a sine wave by producing the analog values there is a still quantization noise produced since the output of a DAC is not continuous over the output range. Therefore a second order Butterworth low pass filter is implemented to reject all frequencies that are not in the audible region. The filter is implemented using the Sallen-Key topology and the resistors and capacitors are selected so the cutoff frequency is about 23kHz.

A Butterworth filter was selected over cascading two passive RC filters because cascading two RC filters has a gain of -6dB at the cutoff frequency and the output of the signal would need to be buffered before being fed into the common collector amplifier. By using a Butterworth filter the filter is closer to an ideal low pass filter and the signal is buffered as Sallen-Key topology was used. Sallen-Key topology has a very large input resistance and a very low output resistance which is ideal for buffering a signal.

When selecting the op-amp for the filter the main parameter that has to be taken into consideration is the slew rate. An op-amp with a really high slew rate may be necessary if the application is dealing with frequencies above the human range of hearing. For audible frequency ranges a slew rate of 0.5$\frac{V}{\mu s}$ was sufficient. One way to determine if the slew rate of the op-amp is not fast enough for the circuit is if a buffer is designed using the op amp. If a sinusoid wave is passed through the input of the buffer and the output is a triangle wave then the slew rate is too low for the circuit application. The LM324N is a quad op amp package that is sample-able from TI which was used for implementing the low pass filter. Each IC would filter four channels from the DAC that would be sent to the speakers.

The output of the filter is then passed to a common collector amplifier. A common collector amplifier has the following current gain, voltage gain, and output resistance: $$A_i = \beta_0+1$$ $$A_v = \frac{g_m R_E}{g_m R_E + 1}$$ $$r_{out} = R_E||\frac{r_\pi+R_{source}}{\beta_0+1}$$ The speaker is connected in series to a 680$\mu F$ capacitor to create a high pass filter that was aimed at getting rid of the DC-bias of output signal. The equation for a high pass filter implemented with a passive RC circuit is as follows: $$f_c = \frac{1}{2\pi R C}$$ where R is the resistance of the speaker which is 8$\Omega$. This creates a cutoff frequency of about 30Hz. The capacitor that was used for the high pass filter was an aluminum polarized capacitor. It is fine to use a polarized capacitor as long as the voltage drop across the capacitor is smaller than what the voltage rating of the capacitor. The capacitor used was rated to 10V which is fine since the sinusoid being fed into the speaker has an amplitude of about 4V.

The reason a common collector amplifier is needed to drive a speaker is because the speaker has a very low impedance. Therefore any circuit that drives the speaker will need a low output impedance otherwise there will be a significant voltage drop since the circuit forms a voltage divider. To improve the linearity and efficiency of the amplifier a current source should used in place of the emitter resistance.

To implement a current source a NPN transistor is biased to act as a reference current by connecting a 430 $\Omega$ between the base of the transistor and the 5V rail. The current outputted by the emitter of the NPN transistor is then fed into a current mirror implemented by two N-channel MOSFETs. The current mirror essentially copies the reference current and outputs the reference current through the other MOSFET which is connected to the common collector amplifier.

MOSFETs were selected over BJTs for the current mirror because there is no current flowing through the gate of a MOSFET, but there is a current flowing through the gate of a BJT. This causes a BJT to have the following equation for the current mirror: $$I_{copy} = \frac{I_{REF}}{1+\frac{2}{\beta}}$$ The MOSFET current mirror equation is simply $I_{copy} = I_{REF}$. These equations are based off of the assumption that the two transistors used in the current mirror are identical to each other. However transistors tend to have a large variation in their parameters caused by manufacturing. Transistors tend to have less variation in their parameters if array packages are used. Ideally a package with two transistors in the package would have been used. However, due to budget constraints it was cheaper to get individual MOSFET packages.

When selecting the MOSFET for the current mirror it is important to use a MOSFET that has a low $R_{DS}$. A high $R_{DS}$ will limit how much current will be flowing through the reference current side of the current mirror. An $R_{DS}$ smaller than 1 $\Omega$ is small enough to allow sufficient current to flow through the NPN BJT being used as a current reference and improve the performance of the common collector amplifier.

Since the emulated current source replaces the emitter resistor, the emitter resistance becomes infinite which means that the voltage gain and output resistance equations change to the following: $$A_v = 1$$ $$r_{out} \approx \frac{1}{g_m} + \frac{R_{source}}{\beta_0}.$$ Since the source of the input signal for the common collector amplifier is the output of a Sallen-Key topology the $R_{source}$ value is almost zero. Also the value of $\beta_0$ is in the range of 100 to 1000 so the term $\frac{R_{source}}{\beta_0}$ is effectively zero. The term $g_m$ can be calculated with the following relation: $$g_m = \frac{I_C}{V_T}$$ where $I_C$ is the DC collector current and $V_T$ is the thermal voltage which is typically around 26mV. With the speakers being driven at around 0.75A we see that $g_m$ is about 3 $\Omega^{-1}$ which means that the output resistance of the amplifier is much smaller than the impedance of the speaker which allows most of the voltage drop of the signal be across the speaker and not somewhere else on the amplifier circuit.

This resistor was selected by using a potentiometer to see where the output signal of the common collector amplifier looked identical to the filtered input signal. Lowering the current limiting base resistor would improve the quality of the output signal, but would increase total power consumption and cause the NPN and N-channel MOSFETS to become significantly hot.

Unfortunately, with the transistors selected, the output signal did not produce an accurate representation of the original input signal unless the current limiting base resistor was relatively low causing the transistors to get hot. This is very problematic since this allows thermal runaway to occur. Thermal runaway in BJT transistors are caused by the fact that the leakage current of a BJT increases as the temperature increases. A positive feedback loop is formed since as the leakage current increases, the total current flowing through the transistor increases. As the total current increases the temperature increases, which then increases the temperature. Thermal runaway in MOSFETs occurs since as the temperature increases the on-resistance of the MOSFET increases. This causes more power to be dissipated on the on MOSFET since $P= I^2 R$ which increases the temperature.

Thermal runaway could be observed by looking at the current drawn by the amplifier circuit. As the circuit would be left running the current drawn by the circuit would steadily increase and the transistors would get hotter. A cooling system was implemented by adding a series of 12V power supply cooling fans in a way that create constant airflow across all the amplifier circuits. To keep isolation between the fans and the microcontroller a separate power supply from the computer power supply was used.

At first an isolating DC/DC was used to create the isolation. The PTB78560C (sample-able from TI) would take 24V from the power supply (implemented by using the 12V and -12V rails) and output an isolated 12V, by using the correct feedback resistors and capacitors that were mentioned in the datasheet. However, the output power of the DC/DC was not strong enough to power all the fans. Instead of using multiple DC/DCs to split up the number of fans powered by each DC/DC a PA-215 switching adapter was used to supply us an isolated 12V rated to 21.6W of power.

Results

Software Execution | Accuracy | Safety Enforcement | Interference and Accessibility

Software Execution

Due to our clever use of time in our ISR, our code runs smoothly with no delays or problems. There is no discernable delay in the input potentiometer - changing it changes the delay to the speakers immediately. Furthermore, it was measured that our ISR was executing at 44.15kHz using an oscilloscope and toggling a debug pin upon entering and exiting the ISR. By looking at the following waveforms we can see what the microcontroller enters and exits in the expected amount of time. Since all of the main processing power is done in the ISR for computing the time delays and values to output to the DAC we can measure the CPU usage percentage by looking at the duty cycle of the waveform. The high side of the duty cycle is 15.9$\mu$s and the total time was measured to be 22.7$\mu$s giving us a CPU usage percentage of 70.04%.

Oscilloscope Plot showing 22.7 microseconds between ISR triggers and 70.04% CPU usage.

Accuracy

Overall, designing a phased array speaker system was successful as we were able to correctly implement group delay on all twelve speakers within the amount of time during the interrupt handler to manage sampling the music. The following waveforms show the group delay between speakers in the phased array and the amount of time the microcontroller spends in the ISR for updating the speakers based off of the sampled music.

Oscilloscope Plot 4 speaker outputs. The probes are attached to speakers 1, 4, 8, and 12. Notice the delay between the channels.

As for the accuracy of sound capture and reproduction, we were able to confirm both subjectively and objectively that our system reproduced sounds faithfully in the range of human hearing. We did this by both comparing frequency input and output as sourced by a signal generator, as well as by playing music through the system and confirming that, at zero phase shift, there was no distortion.

After tuning the threshold values of the potentiometer by selecting how many group delay positions were allowed the signals were shifted without any jitter. If there were too many steps, which decreased the minimum angle the beam could be steered, there would be jitter since the ADC readings would be fluctuating between two different group delay values.

Although we were able to show that the group delays were correct, one thing that could not be done was predicting the exact location of nodes and antinodes when single notes were being played in a room. This was tested by running the simulation for a single note and seeing at what viewing angles the nodes and antinodes were located. To prevent bias, one group member would run the simulation to see where the nodes and antinodes should be located while the other group member roams around the room listening for them. Although some of the node and antinode locations were correct to within a few degrees of precision, other nodes were significantly off. Multipath interference was causing these errors since the sound waves would be reflecting off of all the walls, columns, seats, etc. in the lecture hall where this was tested. For this to be truly tested, the project must be tested in an acoustic chamber to reduce interference.

When the phased array system was tested when streaming audio it was evident that the change in the group delay would move the main lobe of all the frequencies around. However some frequencies would get moved into angles that we did not expect and others would disappear altogether due to multipath interference.

Safety Considerations

Safety in the design was ensured by not allowing the output of the sound waves to be excessively loud. This is very important because the volume of the main lobe is very loud due to all of the constructive interference. Also, with the cooling fans to prevent thermal runaway, the transistors no longer get hot to the point where the start burning and potentially start fires, which would cause the wooden board and speaker mounts to burn. We also made sure to keep the controls of the system far away from the fans, to avoid any person accidentally sticking their hand into a fan while trying to adjust the system. Every wire we soldered together was wrapped in electrical tape to prevent accidental shorts which could either cause damage to our system, or possibly start an electrical fire. Finally, since we had power supplies in the system drawing AC power, we made sure to unplug these devices while working on the wiring of the system to avoid any AC shocks.

Interference and Accessibility

There was some interference in the output of the speakers caused by the internal fan for the power supply. This caused minor hissing in the speakers. This was the only fan that was causing noise because all the other fans were on a separate ground which isolates the fans electrically from the rest of the hardware. Other than this, our project did not suffer from any external interference. Furthermore, to reduce the interference our project had on others, we made sure to do most of our testing in empty classrooms to avoid disrupting other groups.

Our project is easy to use for anyone who has fine-motor control of their hands. In the current setup, a regular screwdriver is used to turn a potentiometer, which changes the direction of the sound. This system could easily be upgraded to be accessible to even those without control of their hands, as the input is simply a 0-5V analog signal.

Conclusion

Analysis of Results

From a hardware and firmware perspective the results were nearly flawless. The microcontroller was able to sample data at the appropriate sampling frequency of 44.1kHz and apply the appropriate group delays given the voltage readings coming from the potentiometer acting as a voltage divider. The only issue was caused by the fan inside the power supply which would lead to hissing in the speakers. Next time the power supply should be opened and connected to the isolated ground created by the PA-215 power supply. Furthermore serial communication could have been implemented by using a separate microcontroller, such as the ATmega16, to talk to the computer using RS-232. Another microcontroller is needed because the majority of the total CPU processing power is dedicated to meeting the ISR deadline for sampling music and implementing the group delays. The ATmega16 would probably be used since it does not increase the budget. Then the MAX233 would be used for implementing the TTL/RS-232 level shifting since they are unused parts in the CUAUV lab. By talking over serial the MATLAB simulator could be improved so it takes the frequency and time delay readings from the microcontroller and updates the intensity against viewing angle plots.

Further Applications

Another way to design a phased array speaker would be to use ultrasonic transducers to produce sound from ultrasound. This phenomenon occurs when audible signals are modulated and ultrasound carrier frequency. Ultrasound frequencies are sound waves that above the human hearing range. Sound from ultrasound uses the nonlinear properties of air to use the air as a demodulator. The advantages to using sound from ultrasound would be that frequencies are in a smaller range relative to each other. If a carrier frequency of 40kHz was used then the frequency range of the ultrasound phased array would be about 40kHz to 60kHz. Therefore, to prevent grating lobes the spacing between adjacent elements would need to be less than 11.44mm from each other. This allows for ultrasound phased array speaker systems to be much smaller than audio sound phased array speaker systems.

One issue with phased array speaker systems is that human hearing with respect to intensity is on a logarithmic scale. Therefore the main lobe intensity peaks on a linear scale need to be significantly larger than any of the side lobes. Using the mechanical set up of our system for a 1kHz tone with no phase shifts we have the following plots for wave intensity and how our ears interpret hearing:

Intensity of 1kHz tone, linear scale.

Intensity of 1kHz tone, log scale.

What is noticed on the logarithmic intensity scale is that the side lobes are a comparable magnitude to the main lobe. This makes it difficult to tell where the main lobe is. However, the antinodes are much easier to find since at those exact viewing angles there is almost no sound. For the intensity to be drastically different on a logarithmic scale the number of speakers would have to be increased dramatically. If the number of speakers is increased to 120 we have the following results for a 1kHz without any phase shifts.

Intensity of 1kHz tone on larger setup, linear scale.

Intensity of 1kHz tone on larger steup, log scale.

Other applications for acoustic phased array systems can be applied to sonar. Acoustic data modems work really well for underwater communication since the electromagnetic frequencies used for land communication does not propagate through water. In water acoustic communication is ideal because the molecules in a liquid are more compact than in air. Therefore the speed of sound is water is significantly larger (343.3 m/s for air and 1484 m/s for water). By using a phased array acoustic system the sound waves can be focused into a beam which would improve the communication's signal to noise ratio. This would allow for underwater communication to run at high frequencies or to talk over larger distances.

Another application to phased array acoustic systems would be to implement active sonar. By changing the set up of the phased array from a linear setup to a two dimensional array a grid of microphones can also be made. This will allow for two dimensional scanning of the phased array and the microphone grid can be used to determine the time delay between the pulse and the reception. By measuring the time delay the distance can be determined and with enough elements an image with a large enough resolution can be created.

By using microphones in these sonar systems the advantage is that the receivers can implemented with microphones that have linear responses to intensity. This is important because that means that there does not need to be as many elements to create the array to create a main lobe with a magnitude that will be perceived as significantly larger than the side lobes.

Standards and Conformity

Since this project was designed as a platform for other projects to build off of, there are no standards that we needed to conform to. Future projects that have very specific applications will have standards for their application and will have to conform to those standards.

IP Considerations

All of the intellectual property regarding the implementation of the phased array speaker system belongs to us, since we did not actively search for patents or current implementations for the design. We did not sign any non-disclosure agreements to obtain any of the parts we sampled. The information required to design this project was from knowledge learned from classes taken at Cornell University, datasheets given by the part manufacturers, and example circuits found online. The speaker driving circuitry was based off the design implemented in the acoustic data modem project designed by Greg Malysa and Arseney Romanenko for ECE 4760 during the Spring 2010 with their permission.

Furthermore, we are open to the idea that other research/projects groups can use our design as a stepping stone and further improve the design to fit their application needs.

Ethical Considerations

During the design and testing of our project we complied with the IEEE code of ethics. We accepted responsibility in making safety decisions, disclosing the possible ways that the use of our project could injure a person. We discussed the details of relating to the safety hazards of having very hot transistors and loud volumes of sound created from the constructive interference. With the constructive interference created by the speakers, the pressure level of the sound waves can be up to 107.5 dB. For this reason the speakers should not be driven at their maximum power if being used for audio purposes where the receiver is the human ear. We picked a project that avoided conflicts of interest since neither of us are actively involved in working with phased array technology. Although this project has a lot of applications which can be applied to CUAUV with improved acoustic data communications or active sonar, this was not the main application focus. The project was originally intended for phased array speaker systems for focused audio systems. We did not receive any incentives for this project and we do not plan on receiving any in the future for our work. We were honest about the results of our project as we showed that the nodes and ant-nodes did not always agree with what the simulation predicted due to multipath interference. We sought help when needed and offered help and advice to groups that requested it. We did not discriminate against any person for any reason, and we did not injure anyone during the design and testing of this project for any reason.

Legal Considerations

There should be not be any legal considerations with the use of our phased array speaker system. Our design does not allow sound waves at the nodes to exceed sound pressure levels that may result in hearing loss due to long or short term exposure. The project should not be in violation of Ithaca's noise regulations since people can still talk over the sound produced by the project, even when standing in the main lobe.

Appendices

Source Code

A copy of the source code used is available here.

Additionally, the two MATLAB functions we wrote to simulate phased-array acoustics are found here for a single-frequency simulation, and here for mutli-frequency simulation.

Schematics

Click on the schematic for a much larger version where all of the details can be seen.

Complete System Schematic

Parts List

Part Description	DigiKey Part #	Quantity	Unit Price
ATmega 644 Microcontroller	ATMEGA644-20PU	1	$6.00
Custom PC Board	N/A	1	$4.00
8 Ohm 3W 86dB Speaker	668-1240-ND	12	$46.32
2N222 NPN Transistor	P2N2222AGOS-ND	24	$5.93
F12N10L N-Channel MOSFET	N/A*	24	Free^2,3
LM324N Quad Op-Amp	LM324NFS-ND	3	Free¹
LF353P Op-Amp	296-7139-5-ND	1	Free ²
TLC0831 SPI ADC	296-2856-5-ND	1	Free ¹
MX7228KN+ Parallel DAC	MX7228KN+-ND	2	Free ¹
10k$\Omega$ Potentiometer	3362P-103LF-ND	1	Free ³
470$\Omega$ Resistor	CF18JT470RTR-ND	12	Free ³
10k$\Omega$ Resistor	CF18JT10K0TR-ND	24	Free ³
12k$\Omega$ Resistor	CF18JT12K0TR-ND	3	Free ³
100k$\Omega$ Resistor	CF18JT100KTR-ND	1	Free ³
150k$\Omega$ Resistor	CF18JT150KTR-ND	1	Free ³
200k$\Omega$ Resistor	CF18JT200KTR-ND	1	Free ³
390k$\Omega$ Resistor	CF18JT390KTR-ND	1	Free ³
.47 nF Capacitor	490-4265-ND	12	Free ³
1 nF Capacitor	445-2422-ND	12	Free ³
.1 $\mu$F Capacitor	490-5369-ND	2	Free ³
680 $\mu$F Capacitor	P12365-ND	12	$6.00
QMAXLC-B350ATX Power Supply	N/A	1	Free ³
PA-215 Switching Adapter	N/A	1	Free ³
3110GL-B4W-B44 Cooling Fans	N/A	4	Free ³
Audio Jack Cable	N/A	1	Free ³
Plywood Board	N/A	2	Free ²
4-40 Screws	N/A	60	Free ³
4-40 Nuts	N/A	48	Free ³
Spacers	N/A	12	Free ³
Aluminum Box Beam	N/A	2	Free ³
8-32 Bolts	N/A	9	Free ³
8-32 Nuts	N/A	9	Free ³
Multipurpose PC Board with 417 Holes	N/A	4	Free ³
RadioShack Matching Printed Circuit Board	N/A	2	$6.38
	Total		$74.63

^{* - The part has been discontinued and is no longer on DigiKey}
^{1 - Obtained via free sample from the manufacturer, no NDA required.}
^{2 - These parts were already available in the 476 Lab for free.}
^{3 - These parts were already available in the CUAUV Lab for free.}

Task Division

Ed Szoka

Speaker amplifier design and debugging
Speaker amplifier construction (x12)
MATLAB Simulation
Software debugging
Report
Website

Tom Jackson

Input amplifier design and construction
Microcontroller/peripheral interface design and construction
Microncontroller code design/debugging
Mechanical construction
Report
Website

Acknowledgements

We would like to thank Bruce Land and the 476 TAs for keeping long lab hours and helping us with our problems. We would also like to thank the CUAUV team for allowing us the use of team resources to complete our project. We would also like to thank Maxim IC, Texas Instruments, Analog Devices, and Atmel for donating parts (either to us or to 4760 as a class) that made our project possible.

We would like to acknowledge Mark Bunney, who lent us some fans to assist in our cooling system. We would like to acknowledge Chris Peratrovich, Andre Vazquez, Kevin Fuhr, and Markus Burkardt for mechanical consultation and assistance, as well as Michael Mahoney, who opened doors for us (literally). We would also like to thank Mark and Markus for helping us test our system. We would like to acknowledge Arseney Romanenko and Greg Malysa, from whose project we took the website template to modify. We'd also like to thank Greg for consulting with us on amplifier design and phased-array mathematics.

Phased Array Speaker System

Edward Szoka (ecs227) and Tom Jackson (tcj26)