Composition Assistant, End-to-end Music Annotator, Metronome and Tuner,ECE 4760 Final Project
PICcompose is a tool that converts raw audio data into an editable music score!
Over the course of this project, we built a PIC32-driven solution to facilitate the music composition process. We extracted frequencies from an audio source and converted them to the correct note in the MIDI format. We then figured out note timings and sent the data to an external computer, which compiled the data into an actual MIDI file, saved it, and loaded it into MuseScore, our group's favorite (free) music editing software! At the end of the project, we had a nicely packaged product that demonstrated our ability to notate simple melodies with 100% accuracy!
An overview of our project.
The three of us are musicians, and we know that composing music can be a time consuming task. It can be almost as difficult to notate the melodies as it is to find inspiration for them. When brainstorming ideas for tunes, oftentimes a composer will try it out on their instrument of choice, make notes, and then translate their thoughts into a music score editing software of their choice to generate the sheet music. The goal of our project was to eliminate the middleman in the music composition process by creating a tool to directly generate sheet music after playing an instrument. A block diagram of our overall system is shown below.
The core technical portions of our project were the math required for audio processing, and understanding the MIDI protocol.
The raw audio input to PICcompose needed to be amplified.
We chose to use a Sallen-Key filter due to its complex conjugate poles.
The corner frequency, where frequencies are cut off, of a Sallen-Key filter is calculated as follows:
Sallen Key Corner Frequency Equation
The Q value of a Sallen-Key is calculated as follows:
Sallen Key Q Value Equation
A fast Fourier transform (FFT) is used to determine the frequency of the input note. A Fourier transform converts a signal from the time domain to the frequency domain, and a discrete Fourier transform requires that an input time domain signal and the output frequency domain signal are both discrete. An FFT is an algorithm that can compute the DFT quickly. The DFT equation is shown below:
Discrete Fourier Transform Equation
To take an FFT of continuous time audio data, the FFT algorithm first samples the data to generate the input signal, using the ADC on the microcontroller to get input from the microphone. It takes the real and imaginary components of the signal at discrete time intervals and outputs one pair of real and imaginary amplitude components corresponding to each sampled frequency. The output frequencies are evenly spaced from 0 to the Nyquist rate, which is half of the sampling rate. N samples taken at m Hz yields n/2 discrete frequencies starting at 0 and incrementing by 2m/n Hz for each “bin”. We chose to use an 8 kHz sampling rate with 1024 samples for the FFT, giving a frequency spacing of about 8Hz. We found the bin with the largest amplitude, and then used the bin spacing to convert this back to a real frequency and later to a MIDI note.
MIDI (Musical Instrument Digital Interface) is an industry standard method to store song information.
This project involves creating MIDI files, which we do by writing MIDI messages to a .mid file.
Each main MIDI message is encoded into bytes of the following format.
delta time | message_type | data
Delta time is the time (in microseconds, or “ticks”) since the last message. The data will be two bytes long, and what those bytes mean
corresponds to what the message type is. For our purposes, we are only using the
note_on message, which looks like this:
delta time | 0x90 | note_number | velocity
Note number goes from 0 to 127 and corresponds to a note between A0 and G9. For context, a flute’s range is between B3 and C6, so MIDI fully encompasses the range of almost all instruments. The velocity value corresponds to volume; for our purposes, we will set this to a default intermediate value. MIDI also supports meta-messages, which relate to the file at large. This includes tempo, key signature, file name, and other similar messages. They have a structure as follows:
FF | meta_type | length | data
The delta time in this case doesn’t matter.
Since some of the data can be encoded with different numbers of bytes depending on the meta message type,
a length field needs to be specified.
For our project, we mainly use the
set_tempo meta message, which appears like this:
FF | 0x51 | 0x03 | TT | TT | TT
The tempo needs to be encoded as ticks per quarter note. The default tempo is 120 BPM, or a MIDI tempo of 500000. MIDI has support for many other musical encodings, but these are the features we found useful to our project.
A block diagram of our overall system's logical structure is shown below. Most of the processing was done on the PIC32 side.
PICcompose Block Diagram
This project did come with some hardware-software tradeoffs. One major tradeoff involved the filtering. Only the low pass filter was implemented in pure hardware; low frequencies were actually cut off in software. We could have built a band pass filter in hardware, but figured the software implementation might be cleaner. Additionally, we use a keypad as part of the user interface. With some more advanced TFT printing techniques, we may have been able to reduce the number of buttons/change how the tempo settings were done. The keypad could have been swapped out for + and - buttons to increase/decrease tempo, which would have taken up less space visually. However, we decided a keypad provided a little more user flexibility, and that TFT manipulation would have taken too much time to implement cleanly.
We believe that our project is fully compliant with IEEE and other standards.
Our project is designed to be safe and user-friendly, and most of its electrical components are hidden inside the box.
It is easy to use, with simple visual feedback and user controls.
Surprisingly few automatic sheet music generators exist already. AnthemScore, ScoreCloud, and Melody Scanner are some major ones that exist, but they are paid products so we did not use any of their code when creating our project. In addition, our project involves software-hardware co-design, rather than a pure software implementation that can be found in similar products. We discuss this project in relation to our competitors in the Conclusions section
The physical components of our system.
The primary hardware components of this project are the microphone with its associated filtering and amplification circuitry, the keypad, a toggle switch and LED, the UART serial interface, and the TFT LCD. The keypad connects directly to GPIO pins on the port expander to interface with the PIC32 microcontroller. The output of the microphone’s analog circuitry is read by an ADC pin on the PIC32. The toggle switch and LED also connect to GPIO pins, while the UART is connected to the UART receive and transmit pins of the board. The TFT display, which was integral to the user interface aspect of the project, also connected directly to the board. Our final circuitry was soldered onto a solder board, and our hardware was packed into a clean box for the user.
The first end of our system is the amplification and filtering done in hardware. Raw audio is picked up by an Adafruit electret microphone. The signal is then sent through a non-inverting op-amp with a gain of 50.
The Microphone Circuit
A gain of 50 was chosen because it increases the microphone output signal amplitude from 15 mV to 750 mV, an amplitude usable by the PIC32’s ADC. After amplification, the signal is sent through an anti-aliasing Sallen-Key filter. The Sallen-Key filter was implemented to have a pole at 2.5 kHz, and Q of 1. We chose the frequency of 2.5 kHz because the FFT (done on the PIC32) sampled at 8 kHz, and 2.5 kHz corresponds to a note well out of the range of a flute. The output of the Sallen-Key filter is the input to the PIC32’s ADC. After prototyping on a normal breadboard, the circuit was ported over to the small solderboard. Both op-amps used were powered by the PIC32 at 3.3V. The op-amp used was the MCP6242 because that model could be powered with voltages as low as 1.8 V.
The Amplifier Output Signal
The Amplifier Output FFT
The Filter Output Signal
The Filter Output FFT
The keypad, which is for the user to input tempo, has four row pins and three column pins. When a button is pressed it closes a switch between the pins corresponding to the column and row that the pressed button is in. We used the port expander on the board to interface with the keypad, and used the SPI bus to read from and write to the GPIO pins in port Y. We connected the three column pins to GPIO pins Y4 - Y6, and the four row pins through 330Ω resistors to GPIO pins Y0 - Y3. Pins Y0-Y3 are configured as outputs, while pins Y4-Y6 are configured as inputs with pullup resistors.
The Keypad Configuration
The switch is a simple toggle switch for the user to turn record mode on and off. The schematic for this simple circuit is shown below. One side of the switch is connected to power and the other to ground, while the middle pin, the output pin, is connected through a 330Ω resistor to a GPIO pin on the port expander. The pin will then simply read 1 or 0 depending on which way the switch is flipped.
The Switch Circuit
The purpose of the LED is to blink at the set tempo when in record mode. This acts as a metronome to the user, who is presumably playing an instrument. The user can look at the LED when playing their instrument to determine whether they are rushing or dragging. Surprisingly, this feature was pivotal to the design because if the user started to rush or drag, the notes would get interpreted to be the wrong lengths or occur at the wrong times in the sheet music. The LED circuit is simple, involving one pin to ground and one to the PIC via a 330 ohm resistor.
The LED Circuit
The UART serial interface for this lab involved connecting a computer to the PIC using USB, in order to send over data for MIDI file generation. On the computer side, the connection was as simple as plugging in a USB; the rest of the setup was done in software. On the PIC side, we connected the 3 of the 4 wires from the serial-USB cable, avoiding the use of the USB +5V red wire. We connected UART receive pin (U2RX) to the green wire on the cable, UART Transmit (U2TX) to the white wire, and MCU ground to the black wire. The connector is shown in the following image.
The Adafruit USB to TTL Serial Cable
We decided to CAD and laser-cut a box to make our project more presentable. The CAD was completed using Autodesk Fusion 360. The box was designed to expose the user only to the TFT screen, the record switch, the LED, the microphone, and the keypad. The dimensions were 22cm x 15cm x 10cm. The box was laser-cut in the Cornell Rapid Prototyping Lab (RPL). The box was cut from scrap red acrylic found in the RPL.
CAD for the Box Design
Overall, the hardware implementation was fairly straightforward.
Future improvements could include increasing the gain on the amplifier.
Increasing the gain would allow people to play their instruments quietly, or further away from the box.
However, this would come at the cost of picking up undesired noise, such as people talking.
Additionally, when the box was laser-cut, the hole designated for the microphone was slightly too small. We eventually filed away some acrylic to allow the microphone to fit. If this box were to be printed in the future, the microphone hole size should be increased.
The code behind it all.
Our software components involved both a ProtoThreads C implementation running on the PIC32, as well as a Python script running on an external computer. In the C code, we had an FFT function implementation, which an FFT thread would call. That thread found the peak frequency from the FFT, calculated MIDI number and octave. A metronome thread toggled an LED at a user-input tempo, thanks to a variable yield time, and a timer thread running every 1ms updated tft screen transitioning in and out of “record” mode, displayed the note currently being played to the TFT screen, and sent serial note information to the external computer. Additionally, we included a keypad thread which read input from the keypad, handled some locking, in addition to other user inputs. The main C function set up all of these threads and displayed an intro screen welcoming the user to PICcompose. On the Python side, the serial data sent over from the PIC is converted to MIDI messages and saved to a file. Finally, a subprocess call is made to open up MuseScore where the recorded MIDI file can be displayed!
In order to interpret the notes in our audio data, we ran an FFT on the data coming into the ADC from the microphone circuit.
Our FFT code was adapted from sample code on the ECE 4760 website.
We used the same function and FFT thread, but changed the resolution of the FFT from a 512-point to a 1024-point FFT in order to more accurately distinguish between notes in our frequency range of interest.
Above middle C (C4), adjacent notes are at least 16Hz apart, and in the main range of a flute the difference between notes is more than that.
With an 8kHz sampling rate, a 1024-point FFT gives us a resolution of about 8Hz, which allows us to tell apart the notes we care about.
To avoid high-frequency aliasing caused by the start and stop of the FFT sampling, a ramp function was used.
Essentially, this ramp cut the beginning and end components of the FFT sample.
In the FFT thread, we added calculations to determine the highest-amplitude frequency and convert it into a note MIDI number. To do this we simply added to a pre-existing loop that iterates through the FFT bins, comparing the amplitudes and saving both amplitude and frequency of any bin with a higher amplitude than the previously saved highest amplitude. The frequency is calculated by multiplying the bin width (in our case, 8Hz) by the bin number. Due to the presence of some large low-frequency components, we ignore the first 15 bins, giving us a frequency detection threshold of 117Hz. Once we know the strongest frequency present in the FFT, we convert the frequency to its MIDI number using the following equation, where m is the MIDI number and fm is the note frequency:
This MIDI number is saved as the current MIDI note (the note from the most recent run of the FFT), and used later in the serial communication. We also calculate which octave the note is in from its MIDI number, which is used when we display the note name to the TFT screen.
An entire thread was dedicated to the metronome aspect of record mode.
The metronome thread was responsible for blinking the LED at the user-set tempo.
To implement this, the frequency of the metronome was changed every time the tempo was set.
During this thread, the LED would toggle. By using the bpm to determine the thread’s yield time,
we were able to coordinate the tempo to the frequency at which the thread runs and
therefore the frequency at which the LED toggles.
The timing thread was primarily responsible for sending the MIDI numbers, via serial communication, to the external computer. This thread runs every millisecond, and revolves around whether the system is in record mode. While in record mode, whenever a note starts or stops, this thread sends a serial message over UART to the listening Python script containing the note’s MIDI number and the millisecond time stamp. It does so by saving the previous note’s MIDI number as well as the current one and comparing these to each other as well as the “zero” frequency, which is the frequency registered by the FFT when no note is being played. The UART messages are sent by spawning the thread
The thread increments the recording time each time the thread executes in record mode.
This thread also prints the name of the note being played to the TFT display,
both in record mode and out, and re-prints the whole TFT display when the system transitions into and out of record mode.
We used a Python script in order to generate the actual MIDI files by listening for serial inputs, primarily using a library called mido for MIDI parsing and the pySerial library to read in the serial stream.
The first thing the script does is initialize a
MidiFIle object and fill it with the default data (common between all MIDI files)
and establish a serial connection by listening at the 38400 baud rate.
On the computer end, that connection involved downloading the Prolific drivers for the UART cable, which can be found
for macOS. A few default global values are also set, such as a serial timeout length,
whether or not the system was in record mode, and some tempo-related data.
From here, an infinite while loop was used to process the serial stream. When a command is sent over by the PIC,
the script splits the message using spaces as delimiters and saves it in a message array.
The loop then goes through a series of cases depending on what is contained in the message.
The five valid serial messages begin with “BPM”, “BEGIN”, “START”, “STOP”, and “END”. “BPM” was used to set the tempo, “BEGIN” and “END” were used to define the recording interval, and “START” and “STOP” were used for individual note length and value determination. If the message contains “BPM”, the next value in the array is assumed to be a tempo in beats per minute that the user input from the keypad. That value was saved as a global variable, which all note length calculations used, and also written to the
If the script received a message with “BEGIN” in it, the system was put into record mode by setting the
recording global variable to
The “START” and “STOP” messages performed similar functions by writing MIDI bytes to the
corresponding to when a note started and stopped.
In the split array, following the “START” or “STOP” keywords, two pieces of information were sent over by the PIC:
MIDI note and global time in milliseconds.
When writing out the command in MIDI bytes, four numbers are required (delta time, MIDI message type, note number, velocity).
We could directly use the note number, and a default value for velocity, and the same MIDI message type for all notes (
However, we needed to perform some additional processing to get the delta time value required.
MIDI records global time using microseconds, or “ticks.” and the PPQ (ticks per quarter note)
is stored as a default (480) for MIDI tracks created with mido.
Both “START” and “STOP” messages required a delta time (that’s how rests are encoded), so we could calculate them similarly
using the following formula:
tick_diff = (start ms - last ms) * (1min / 60000ms) * (ticks/qn * qn/min)
Essentially, we found the difference in milliseconds between the current note and the last note,
converted that time to minutes, and then used the BPM (qn/min) and PPQ (ticks/qn) to get the note length in ticks,
and wrote that number to the
Unfortunately, this system turned out to be too accurate, and resulted in random rests
and note blips of sixteenth note value or less. To mitigate this, we reduced our “resolution” to quarter note lengths,
i.e. the shortest note length possible was a quarter note.
Messages sent with calculated delta time values of less than half a quarter note were ignored,
everything else was “snapped” to a quarter note length using integer division and rounding.
Apart from note length determination, some additional filtering (such as removing consecutive notes with an octave or
more of a difference) was done in the loop.
Finally, when the script received an “END” message, the recording global variable was set to
MidiFile object was saved to a .mid file with the name “recording”+the current date and time in ISO format.
The Python script then triggered a subprocess call that would open up the MuseScore application on the computer,
where a user could open up the created MIDI file.
Once the MuseScore application was closed, the loop could be restarted with the defaults reset,
in order to make a new recording.
The user interface involved a series of panels drawn using the functions from the TFT library.
The first pane is an intro screen only displayed once, when the system is first powered on.
The text is displayed using the
printline() functions and the eight note was drawn using the
The following two panes are displayed to the user during the tempo input mode, and change based on button presses.
The first of the two is displayed as a user is inputting tempo, and the later one is displayed when the # key is pressed to "lock" the tempo input. The user prompts at the bottom change as well.
tft_fillRoundRect() functions are used to change displays, such as the note played at the time.
The final pane is switched to when the record switch on the front of the box is switched; switching it back returns the user to the tempo input screen.
The main differences from the previous two panes, besides the color, are the user prompts at the bottom and the size of the tempo/note being displayed.
Both modes allow you to see both pieces of information, but one is more important than the other depending on the current mode.
All four panes are shown below:
Tempo Set Screen
Tempo Locked Screen
The purpose of the keypad is to take user input to leave the start-up screen, and to take user input for setting the tempo.
A keypad thread was written to implement this functionality, and starts with a finite state machine that debounces key presses.
Keys are debounced to prevent a single key press from getting interpreted as multiple key presses.
After the keys are debounced, this thread interprets what key was pressed, and what action to take accordingly.
If the user is in tempo input mode and the tempo has not been entered (via the press of the # button),
this thread displays the key pressed on the TFT display in green.
This thread blocks users from entering tempos that are over three digits long.
If a user attempts to continue entering digits beyond three, these digits do not appear on the TFT display.
Once a user has entered their desired tempo, they press the # button. The tempo is then locked, and further key presses are ignored unless the * button is pressed. The * button allows users to clear and reset the tempo.
The purpose of the switch was to switch between tempo input mode and record mode. When the switch is flipped from tempo input mode to record mode, the LED begins blinking and the Python script begins listening for messages. If a tempo has been set by the user, the recording tempo is that user-set tempo. If the user did not enter a tempo, the recording tempo defaults to 120 BPM.
When the switch is flipped from record mode to tempo input mode, the previous tempo is cleared. The user can then set a new tempo for the next recording.
As a whole we ran into little trouble implementing our full software structure and thread organization to execute the tasks we desired. Although the FFT is computationally expensive, the complexity and computation time of our other threads was relatively low, which allowed everything to function smoothly from the user interface perspective with no noticeable delays in the timer and metronome threads. We also did not run into any problems with running out of memory on the PIC32, despite the fact that increasing the FFT resolution used a lot more memory. In addition, the python script interacted well with the UART serial interface. The most tedious part of the software was designing the user interface, as it required us to meticulously plan out where everything should be drawn on the screen.
The execution speed for both the note detection and serial communication were fast enough for our project that the execution was not hindered. The only noticeable latency is immediately between when a user plays a note, and the first time that note is displayed. The FFT does take some time to execute, but users get feedback as soon as possible. On the notation end, fully real-time generation of sheet music was non-essential for our design; the annotated score pops up at the end for a user to evaluate. You can see our project in action here:
Our YouTube demo!
As can be seen in our demo video, we are able to notate simple melodies, such as Twinkle Twinkle Little Star, with 100% accuracy in both note detection as well as note lengths. Our note resolution in the flute’s octave range is also error-free for a short chromatic scale. Screenshots from our notated scores are shown below:
A Short Chromatic Scale
Twinkle Twinkle Little Star
These were recorded at two different tempos in our demo, with no change to accuracy.
We anticipate our notation accuracy will decrease with more complex music, but have not verified this yet.
However, since the score is editable, even if there are errors in it a user can go in and correct them using their notation software and save an updated copy.
But, more robust testing of the product will definitely be required, especially if we want to develop this further.
We enforced safety in the design by encapsulating our electronic components in a box. The box prevents a user from poking around and potentially get shocked by any of our circuitry. Our project did not cause or experience interference from other people’s designs.
Our final thoughts.
Our project met most of our expectations.
We expected our project to take audio from a user and notate that audio.
We could successfully demonstrate this.
The only expectation that we did not meet was the notation of notes shorter than a quarter note.
When we initially tested our project, we discovered that our project was too accurate in that it could pick up and notate artifacts caused by tonguing, vibrato, note decay, etc.
We could see this affect our score; undesired 32nd and 64th notes would appear that notated our every user inaccuracy.
To resolve this issue, we chose to decrease the note length resolution in the Python script.
A note would only be notated as a quarter note if that note were detected for over half the quarter length period,
as designated by the tempo. With this fix, our accuracy was immensely improved.
With that being said, we could further improve this project by modifying how we resolve our notes. Currently, our project only supports quarter note level resolution so the shortest note we can notate is a quarter note. However, we can resolve quarter notes at very fast tempos. An eighth note at 60 BPM is the same as quarter note at 120 BPM, and we are able resolve quarter notes at both tempos. Because of this, we believe that with some changes in how note resolution is calculated and further fine-tuning in the code, we could visually increase our scope to faster notes.
Another design improvement would be to implement a Bluetooth connection. While there was nothing inherently wrong with using a serial interface, a Bluetooth connection would provide a user with more convenience. The use would be able to pick up our project and place it in different locations without carrying a laptop with them.
One great aspect of our project is that it conforms to the universal standard of MIDI. MIDI files may be opened in almost all available score-editing softwares. We opened our scores in MuseScore3, however a user could choose to open their scores in other software such as Sibelius or Finale.
There are potentially publication or patent opportunities associated with this project. Automatic sheet music generators exist, but they are very few in number. Some popular implementations include AnthemScore, ScoreCloud, and Melody Scanner. However, our project is distinct from them in two major ways. First, since we are working with a PIC32 microcontroller, our project is a software-hardware hybrid implementation. Other sheet music generators are a pure software implementation. Because of that, those products support robust calculations including some machine learning tactics. Our only uses a software implementation of an FFT, making it much faster and executable in an embedded environment. Given that it includes hardware in its implementation, PICcompose only needs to be purchased once (if productized). The other products are typically subscription based, making them financially inaccessible to some people. The second major difference is that our project is cross-platform compatible, thanks to the MIDI format we export. The other products typically have a sheet music editor built into their software. Ours can be used directly with pre-existing notation softwares such as MuseScore and Finale, meaning that a composer could use our product without changing their workflow. These aspects of our project are relatively novel, and possibly patent-worthy with further development.
During the design, construction, and use of this project, we abided by the IEEE Code of Ethics. We also did not have any safety concerns or legal considerations throughout the process.
Overall, we loved this project and had a great time working on it together! Stay tuned for updates if we continue development work :)
The group approves this report for inclusion on the course website.
The group approves the video for inclusion on the course YouTube channel.
The following are schematics of the circuitry for each of our main hardware components and how they are connected to the microcontroller board.