ECE 4760 Final Project A Small Smart Voice Decoder System

ECE 4760 Spring 2011
A Smart Voice Decoder System for Vowels

By: Annie (Wei) Dai (wd65@cornell.edu) and Youchun Zhang (yz526@cornell.edu)

Conclusion

Expectation

Overall, our final project result achieved all of the goals we defined in our project proposal. Our speech recognition system is able to accurately identify the vowel user has said. We extended this implementation and simulated a security system where the user must say the vowels in a particular sequence to be able to decode a secret message.

Future Improvements

There are two improvements that can be made in the future. The first one is based on current preciseness. Although our system in recognizing five vowels is clear and fits most users, we still need more experimental simulation to help identify characteristics and narrow the search for peaks in vowels. Furthermore, we may develop more characteristics for other vowels and even those not-vowel sounds. This also requires more research on vowel classification.

Another progress would be more complicated word recognition if we've done all the vowel identification. Since we are already able to tell apart five basic vowels, and for most words, an interesting thing is that you can pronounce and classify them with a sequence of basic vowels. For instance, the waveform of "yes" is similar to [ae] and [ee], while when one pronounce "no" , you can simply tell it apart by [oh]. Besides, for a word like "starbucks", the [ah] sound is obvious inside. Yet the difficulty for a word recognition even just based on sequence of vowels is that our current method is based on a cumulatingpossibility which is not strictly corresponding to time. In this way, the algorithm we implement in decoding is no longer useful. We may need to consider another cumulating possibility for all the words that may sound alike and mark the most possible one as result. This may lead to big challenge in accuracy.

Ethical Considerations

We have done our best effort to conform to the IEEE code of Ethics in the design and execution of our project. The FWT algorithm we used in our design was written by Bruce Land. The button state machines we used were modified code from previous lab exercises. There are no known systems using a similar MCU in implementing vowel recognition system. In fact, much of the speech recognition systems available in the world today require a lot more computation power than the mega644 and is able to analyze much more complex voice inputs.

We are honest in reporting the result of our system and our summary results are as accurate as the precision and real time computation capability the MCU can allow.

This system can be used to implement a security system with speech recognition. This can be potentially more convenient for people with less proficient vision than a keypad security system. Furthermore, with more computation power, our system can recognize individual voices and much more complex voice inputs.

Legal Considerations

Our project is a simple audio input and addressing device. It would not cause any interference with other devices and won't result in any violation of regulation.