An exploration into the way sound interacts with your head.
Our project for ECE 5030 is to build a gesture recognition system based on EMG signals.We use three electrodes in total, one electrode is for reference and the other two, one positive one negative, composes the input channel. The differentiated EMG signal is sent into Matlab via daqboard. The K nearest neighbor algorithm to sort the signals into four different classes, which are left, right, squeeze and turn. Then the sorted result is used to control a set of pictures to change to the left one, the right one, zoom out and rotate.
High Level Design
Rationale and Sources
Hand gesture recognition is a very typical kind of human computer interaction (HCI), which has a wide application ranging from medical rehabilitation to electronics control. There are many different ways of doing gesture recognition. Basically all the patterns can be concluded into two branches. One is acceleration-based, the other is electromyogram-based. Of course there are researches using a combined pattern, such as Xu Zhang and Jihai Yang’s method of using accelerometer to determine the position of hand (such as palm-up and palm-down) and EMG signals for motion detection.
Usually it requires multiple channels for accurate gesture recognition, by comparing or differentiating the signals from different channels. However, we discovered that if choosing the proper muscle for detection, the generated EMG signal can be very sensitive to certain gestures. In Jongwa Kim, Stephan Mastnik and Elisabeth Andre’s paper describing their development in one-channel realtime Hand gesture recognition, the all three electrodes are placed in the middle part of the forearm parallel to the forearm muscle fibers. The first electrode is placed near the wrist, the reference in the middle and then the last one. We placed electrodes on different muscles in the forearm and finally decided to place the first two electrodes on the spots addressed but move the last one to the extensor carpi radialis longus muscle, the location of which will be described in detail in the Hardware part. This spot of electrodes not only provide substantial amount of difference between signals of different gestures, but also reduces the noises.
Existing Commercial Products
We didn’t find any commercially used hand gesture recognition products based on EMG.
After extracting the gesture signals out of the background signals, we feed it to a KNN algorithm. The KNN algorithm, which means k-nearest neighbor algorithm, is a non-parametric method for classification. First we need to build a data base. We sampled the signals of every gesture for 60 times and eliminated data which is obvious wrong, then we calculated several indexes out of the data and used these data to build a high-dimension space with signals of different mature located in different spacial area. Once a new data point is feed into the space, the algorithm finds out the k nearest neighbors of it, and choose the class which dominate in numbers in the k points to be the class of the newly fed in data.
Logical Structure and Hardware/Software Tradeoffs
The structure of our projected can be divided into two parts: the hardware part and the soft ware part.
First the EMG signals is sent to a differentiator, and the differentiated signal is sent to a amplifier, a off set, a high pass filter and a low pass filter.
Since the hardware part does a wonderful job in eliminating noises of all kin, we skipped all kinds of digital filters and directly extract the valid gesture signals out of the background and sort it by KNN algorithm. Then we used the sorted result to process pictures.
Figure 1. Block diagram of isolated EMG circuit
Figure 2. Position of extensor carpi radialis longus muscle
We built a circuit with 3 electrodes, one channel to acquire EMG signal from extensor carpi radialis longus muscle (Fig. 1). Our circuit and electrode connections are shown in Fig. 2.
Figure 3. Circuit and connection
The general procedure of gesture recognition is shown in Fig. 4.
Signal acquisition, DAQ board and GUI set up
We used DAQ USB-4008 to acquire raw EMG data from our circuit. In order to get every detail of EMG signal, the DAQ board was set to its maximum sample rate, which is 10kS/s. We acquired 5 seconds data at the same time. As a result, DAQ board was set to acquire 50kS/trigger with ‘Single Ended’ input type.
GUI setup is shown in Fig. 2. 5 seconds raw EMG, extracted gesture signal, type of gesture signal and a picture are displayed. A start button, stop button and a quit button are used to restart, stop and quit the program. A calibration text box is used to show whether calibration procedure finished. When the text box turned green, the calibration procedure was indicated to finish.
Figure 5. Gui interface
We used a while loop to acquire data. If the quit button is not clicked, the program will enter the loop. All the procedures deal with EMG signal are within the loop.
Gesture signal extraction
Before the gesture signal extraction, the mean value of 5 s EMG signal was subtracted from EMG raw data to make DC offset zero. The threshold values were used to determine the beginning and the end of gesture signal and were calculated in calibration procedure. During calibration, the user should remain stable, then all the signal acquired would be the background noise. The first threshold, which was used to determine the beginning of gesture signal, was set to be 10 times of the root mean square (RMS) of background. The second threshold, which was used to determine the end of gesture signal, was set to be 1.5 times of RMS of background.
After the end of the last gesture signal, the first data point with the value larger than the first threshold was considered as the beginning of the next gesture signal. To detect the end of a gesture signal, we compared the RMS value of the last 1000 data points with the second threshold. If this value fell twice in a row below the second threshold, it was considered as the end of a gesture signal. However, in this way, the extracted gesture signal would be much longer than it supposed to be. To address this issue, we cut all continuous data points with the value smaller than the first threshold from the end of the extracted signal in the following step. If the end of a gesture signal can’t be found in 5 s acquired signal, then this gesture signal is truncated by 5 seconds signal acquisition procedure. In that situation, the extracted gesture signal would be combined together with the next 5 seconds signal first. Then the end of this signal was detected in the new data sequence. The flow chart of gesture signal extraction procedure is shown in Fig. 6.
Figure 6. Flow chart of gesture signal extraction procedure
The chosen of two thresholds in this procedure is the most critical part. If the thresholds are too low, the end of the gesture signal might be detected too late, and high voltage noise spike might be also detected as gesture. Otherwise, the gesture signal might not be extracted completely.
Feature extraction and classification
In this project, we tested our system only on one person. We didn’t choose to train the system to each user by recording several samples of each gesture at the beginning of each driving session. Instead of that, we recorded 50 samples of each gesture from the same user only once. And then we used the features extracted from each gesture to establish our clusters used for k-nearest neighbors algorithm. The features we chose to classify gesture are mean value, standard deviation, duration time, largest peak value, largest peak location, number of peaks and the maximum frequency.Our implementation of everything related to the sampling process is designed to run as fast as possible, since this is what limits our sampling rate.
The most dangerous part is that the dapboard are connected to the ground of the whole building. So in order to prevent any unexpected situations which might happen to the ground of the building, we built an isolator to separate the tested individual from that ground.
Signal extracting and Noise
The onset of the signal is done by a simple threshold at the value of 10 times the RMS of the background noise. However, the offset of the signal cannot be detected in this pattern, for the value would fall below and even cross the zero line in a valid gesture signal. We basically used the same method as Jongwa Kim’s group did. If the observed the RMS of the last 16 incoming values fall twice below a threshold, they mark an end of the gesture. However, this might take a great portion of background noise which is not valid gesture signal into account. So based on their method we add a part which examines the gesture signal once more from the end of it to see at which point it fall below a specific threshold (1.5 RMS of noise level in our case). This method largely improved the accuracy of obtaining the gesture signals, as can be seen in the following figures:
Figure 7. Signal for turning left in Jongwa Kim’s project
Figure 8. Signal for turning left in our project
As can be seen , our signal has almost no “tail” of the background compared to Jongwa’s method, which has a “tail” making up more than half of the whole signal.
We added a ground reference electrodes for the purpose of reducing the noise and it functions promisingly. Figure 7 shows te noise when there is no gesture:
Figure 10. Noise
As can be seen from figure 6 and figure 7, the noise level is a 0.15V while the signal is usually above 1V. The SNR is 10, which is clear enough for the upcoming analysis.
Rate of Successful Detections
We performed every gesture for 50 times and record the result shown in table 1:
As can be seen, most of the mistakes happens in the gesture ”Left” and “Right”, with “Left” more dominating. This is because it’s hard to control the pattern of the gestures of the individual tested, sometimes a gesture is done with higher or lower strength which interferes with the threshold for extracting the signal. Also, if the tested individual suddenly moves, there will be a sharp peak in the signal and this situation will present an error information on our interface. This situation is not counted when we record the data above.
Characters of Different Gestures:
The reason why we choose these four gestures to do recognition is because they are very different in many factors. We calculated the mean value, standard deviation, duration time, largest peak value, largest peak location, number of peaks and the maximum frequency of every gesture. The data are shown in the following tables:
For turn left:
For turn right:
The relate standard deviation of each index ranges from 5% to 35%, roughly, so the characters chosen are stable and reliable for recognition.
In order to determine which factors play the most important roles in detecting gestures, we calculate the difference of indexes between each gesture group and get the relative deviation of each index, the result is shown in the following table and figure:
Figure 10. Relative deviation of different index for each group
So the peak of each line in figure 8 indicates the most important indexes. For the “Left” group, the most important index is number of peaks; for the “Right” group, the most important indexes are mean value, STD and highest peak value; for the “squeeze” group, the most important indexes are points of data(time), highest peak location and maximum frequency; or the “Rotate” group, the most important indexes are points of data(time), highest peak location and maximum frequency.
The recognition has a successful rate of 94%, while most of the false detection lays in the turn left gesture. We collect data every 5 seconds so there are risks that a signal may be cut into two 5-second durations, in this occasions we store the cut signal and then combine them. The drawbacks: 1. If there is a sudden peak in the noise, the program will mistaken that as a gesture signal since it has passed the onset threshold, this can be overcomed by discard those signals shorter then a certain length; 2. We only used one channel for EMG signal detecting, if the tested individual changes his pattern of doing gestures, too hard or too gentle for example, the program will not be working properly. This can be solved by adding more channels and use the difference between channels to recognize the gestures.
Intellectual property considerations
We followed and modified the method used in reference 1:, EMG-based Hand Gesture Recognition for Realtime Biosignal Interfacing. All the codes are written by ourself so there should be no concerns about the IP for the codes.
1. Jonghwa Kim, Stephan Mastnik, Elisabeth Andre, EMG-based Hand Gesture Recognition for Realtime Biosignal Interfacing. IUI ‘08 Proceedings of the 13th international conference on Intelligent user interfaces, 30-39
3. Tanyang Zhang's Work: Reduce noise in hardware, KNN code programming; Jingyuan Dong's Work: Build the circuit, interface design.
We really want to thank Bruce Land for his guidance and cheerful spirits brought to us. And also sincere thanks for our dear TA and classmates, for their support and interest in our project.