Sight Via Sound

Michael Guan (mhg99), Jake Loynd (jrl346), Alex Koenigsberger (ak826)

Final Project for ECE 4760

Introduction

In this project, we investigated the possibility of creating a sight-via-sound machine. The machine would first read an image, and then create a tone based on the respective pixel values, enabling people to quite literally hear pictures. We chose this project because we all had an interest in both digital display and audio synthesis.

High-Level Design

This project was inspired by a video on YouTube channel 3Blue1Brown in which Grant Sanderson demonstrates the applicability of space-filling Hilbert curves to sight-via-sound. Feeling that it is a great intersection of our combined interests in fractals, sound synthesis, and graphics, we decided to implement this system using a PIC32 microcontroller.

We use a space-filling Hilbert curve, pictured above, to map each pixel in cartesian space to a point in frequency space, so an image corresponds to a unique sound. For instance, an earlier pixel along the curve maps to a low frequency and the final point on the curve maps to the highest frequency. This type of Hilbert curve is used (as opposed to a “snake” scheme) because locations on the screen map to approximately the same frequency as the resolution of the image increases. The intensity of the pixel governs the volume of its frequency. Our project only supports black and white images up to 8x8 resolution so far, so white pixels’ frequencies play at equal volumes. There is potential for extensibility, which is discussed later.

The frequencies are mapped to pixel locations by their Hilbert index using a lookup table. We based the frequencies of the lookup table on the half steps of a piano, with adjacent notes having a frequency ratio of 2^1/12 (the ratio between octaves is two and there are 12 half steps between octaves). Similarly, the ratio between adjacent frequencies in the lookup table is (max_freq / min_freq) ^ 1/(# of pixels - 1). Our minimum frequency was 100 Hz and our maximum frequency was 4000 Hz.

To play the sound, we use Direct Digital Synthesis for each frequency using a lookup table to store the wavetable index incrementer values. This table is updated whenever a pixel is drawn or cleared.

The video below is a demonstration of how the display is traversed by a Hilbert curve and the frequencies each position maps to.

References and other Sight-via-Sound applications

To aid our mapping of Hilbert index to Cartesian coordinates and vice versa, we adapted a library from FSU (link). We made modifications so that the curve starts at the top-left corner of the display and ends on the top-right corner for all orders of the curve, which was not the case in the original code.

We found an article of one from 1992 called vOICe (link to sciencemag article). This one, impressively, shows a demonstration of a visually impaired man identifying faces using this system. vOICe works differently in that it repeatedly sweeps across the image and uses the column as a frequency axis, so it differs from the Hilbert approach in that an image takes time to manifest as sound as opposed to play one constant concatenation of tones.

On the 3Blue1Brown subreddit, of all places, we found an implementation of sight-via-sound as described in the video. Unlike our project, this one is in python so it imports an image to generate a sound. So, unlike ours, it does not update in real time or include a display so we did not reference this implementation while developing ours.

Applications

This kind of system can be of practical use to those with visual impairments. If the ear is finely trained to the frequency-to-location mapping, it is possible to decode the image into sound.

Hardware

For this lab we used only the preset hardware setup consisting of a PIC32MX microcontroller with a TFT display plugged into it, along with a digital to analog converter (DAC), and an SPI channel that connected to the lab computer. The PIC32 served as the central processor and would analyze the image displayed on the screen before outputting an audio signal onto the DAC via direct digital synthesis. The PIC also maintained a serial connection to the lab computer so that it could interact with the python GUI running on it which would control the images displayed on the TFT. As for the DAC, it took in the signal generated by the PIC, converted it to analog form and passed it back to the lab computer to be played on the speakers. As the setup was entirely preset, no additional hardware was designed and assembled.

Software

Our software consists of an interactive Python GUI and C code that runs on the PIC32.

Python GUI

We created a python GUI that allowed us to control our Sight-Via-Sound project. The GUI was organized into six main demo routines. The first one colored the pixels by traversing the Hilbert curve coordinates in order, by either drawing the pixels and playing the associated music note one by one, or by having them persist in the screen. The next two demos involve drawing a smiley and frowny face to the display, demonstrating the difference between two very similar images. Next, we had sliders to control vertical and horizontal sweeps. Finally, we have a customizable routine that allowed us to draw, delete, and move rectangles and individual pixels wherever we want on the TFT display.

Hilbert Library

This library provided to us several very useful functions that allowed us to convert back and forth between the one dimensional Hilbert curve coordinates and cartesian coordinates. We modified this a bit so that the order in which the Hilbert curve is traversed goes consistently in a counterclockwise motion.

Downsampled Display Library

We created a custom library that allowed us to more easily draw downsampled pixels to the display using downsampled coordinate values and using the tft_fillRect function to draw a rectangular pixel. In this library we created a method to draw a pixel as well as a rectangle using the downsampled display coordinates.

ISR for DDS

This thread populates a DAC_data variable using Direct Digital Synthesis by using the DDS units to index into the sine table and add up all the corresponding frequency values. Here, we also normalize the DAC data and keep the data within the proper range by setting maximum and minimum DAC data values.

Buttons, Toggle, Slider, Radio Threads

These threads process input changes to the buttons, toggles, sliders, and radio buttons in the python GUI

String Input Thread

This thread processes string inputs from the python GUI which we modified to allow it to read text inputs that trigger the demo routines. If this thread receives a string starting with the character ‘d’, it processes the following value and triggers the appropriate demo.

Demo Trigger Thread

This thread responds to the invoked demo routine from the String input thread and calls upon the six different routines that we have. This thread also sets a signal flag to tell the drawing thread what to draw if we have a continuous draw routine. The draw signals include the hilbert draw routine, the horizontal and vertical sweeps, and the interactive custom draw mode.

Continuous Drawing Thread

This thread provides functionality for continuously drawing pixels by erasing previous pixels and drawing new ones. Depending on the draw signal set by the demo triggering thread, this thread performs different actions. In the Hilbert traversal, this thread draws pixels in linear Hilbert coordinate order. In the horizontal and vertical sweeps, pixels continuously go back and forth vertically or horizontally. In the interactive mode, the pixels can be moved directly based on which directional button was pressed on the python GUI.

Main

In addition to setting up the interrupts and scheduling all the threads, we populate the wave table for the DDS and the frequency lookup table that map to the Hilbert coordinates.

Results

When testing out the machine, we were able to render and synthesize an image of 8x8 resolution while maintaining a reasonable (10kHz) audio sample rate. When in operation, the machine was able to successfully output the multiple tones that “made up” an image while maintaining a distinct sound for each different image. When testing how well it worked for a sight-via-sound device, we conducted multiple tests where Alex would close his eyes and try to guess what the image was. These included differentiating a smile from a frown, moving a box around the screen and guessing its size and location, and removing part of a box and guessing which part was removed. Ultimately, he was able to successfully identify them with decent accuracy despite only having limited, informal, ear training. His background as a musician probably helped with learning, but we do not think this is necessary to learn this kind of mapping.

Hilbert Demo
Blind Test

Conclusion

While our device is quite limited in scale, there is a lot of room for improvement. Outside of improving the resolution of the images that can be processed, there's also additional factors that the machine could include to enhance the user’s experience. For example, rather than images being strictly black or white pixels, grayscale images could be implemented by mapping the intensity of a pixel to the volume of its tone. Color is another potential aspect to include, however this may be difficult as one would need to find additional dimensions in which to express this information. In addition to improving the quality of images that can be heard, another point of improvement would be the inclusion of stereo vision by having two separate images, left and right, get processed and then be played in their respective ear. This would be able to potentially provide a sense of depth perception for the user. In summary, there are many great possibilities to extend this project. Overall, this proved to be a very interesting project. We all learned a lot about various screen mapping techniques and were able to produce a clean sounding set of tones via direct digital synthesis.

All of the engineering fun aside, this type of machine can be used to do good and improve accessibility for those with visual disabilities. As shown in the vOICe article, it is possible to learn a sight-via-sound system to surprisingly good accuracy, enabling people to “see” their surroundings with sounds. We wonder if, with more compute power and additional hardware, we could stream in higher-resolution images from a camera and play their sound in real time. Given the potentially charitable nature of this project, we do not see any dangers or other ethical concerns per the IEEE Code of Ethics.

Class Demo

Appendix

A: Permissions

The group approves this report for inclusion on the course website.

The group approves the video for inclusion on the course youtube channel.

B: Commented Code Listing


Downsampled Display

// ===== Draws a down-sampled "Pixel" ===================
void draw_ds_pixel(int x, int y, unsigned short color){
    int idx;
    // Fills in the "pixel"
    tft_fillRect((short) xds_to_x(x), (short) yds_to_y(y), 
            (short) (XMAX+1)/X_ds, (short) (YMAX+1)/Y_ds, color);
    // Do the DDS incr conversion
    idx = xy2d(HILBERT_ORDER, NPixels, x, y);
    DDS_incr[idx] = (color == 0) ? 0 : freq_to_DDS_incr(f_LUT[idx]);
}

// ===== Draws a rectangle in downsampled coordinates ===================
void fillRect_ds(int x, int y, int w, int h, unsigned short color){
    // Fills in the "pixel"
    tft_fillRect((short) xds_to_x(x), (short) yds_to_y(y), 
            (short) w*(XMAX+1)/X_ds, (short) h*(YMAX+1)/Y_ds, color);
    // Update pixel info lookup tables
    int i, j, idx;
    for (i = x; i < x+w; i++){
        for (j = y; j < y+h; j++){
            idx = xy2d(HILBERT_ORDER, NPixels, i, j);
            DDS_incr[idx] = (color == 0) ? 0 : freq_to_DDS_incr(f_LUT[idx]);
        }
    }
    
}

Hilbert Library

//Converts hilbert coordinate to Cartesian coordinates
void d2xy(int o, int n, int d, int* x, int* y) {
  *x = *y = 0;
  int rx, ry, s;
  int t = d;
  int temp;
  for(s = 1; s < n; s = s*2){
      rx = 1 & ( t / 2 );
      ry = 1 & ( t ^ rx );
      rot ( s, x, y, rx, ry );
      *x = *x + s * rx;
      *y = *y + s * ry;
      t = t / 4;
  }
  // Swap x and y coordinates at certain orders to preserve start and end points
  if (o % 2){
      temp = *x;
      *x = *y;
      *y =  temp;
  }
  return;
}


//Converts Cartesian coordinates to Hilbert coordinate
int xy2d(int o, int n, int x, int y){
    int rx, ry, s, temp;
    int d = 0;
    // Flip first if necessary so it is oriented properly
    if (o % 2){
        temp = x;
        x = y;
        y =  temp;
    }
    for (s = n/2; s > 0; s = s/2){
        rx = ( x & s ) > 0;
        ry = ( y & s ) > 0;
        d = d + s * s * ( ( 3 * rx ) ^ ry );
        rot ( s, &x, &y, rx, ry );
    }
    return d;
}


//Rotates the coordinates
void rot(int n, int *x, int *y, int rx, int ry){
  int t;
  if ( ry == 0 ){
    //Reflect.
    if ( rx == 1 ){
      *x = n - 1 - *x;
      *y = n - 1 - *y;
    }
    //Flip.
    t = *x;
    *x = *y;
    *y =  t;
  }
  return;
}

Direct Digital Synthesis

void __ISR(_TIMER_2_VECTOR, ipl2) Timer2Handler(void)
{
    // you MUST clear the ISR flag
    mT2ClearIntFlag();
    
    int i;
    
    DAC_data = 0;
    // Use all the DDS units to index into wavetable and sum all frequency components
    for(i = 0; i < NPixels; i++){
        DDS_accum[i] += DDS_incr[i];
        DAC_data += sin_table[DDS_accum[i]>>24];
    }
    
    // Normalize the DAC data to its range
    DAC_data = DAC_data >> SHIFT_AMT;
    // Just as additional protection, prevent from exceeding DAC range
    if(DAC_data > 2047){
        DAC_data = 2047;
    }
    if(DAC_data < -2047){
        DAC_data = -2047;
    }
        
    // === DAC Channel A =============
    // wait for possible port expander transactions to complete
    // CS low to start transaction
     mPORTBClearBits(BIT_4); // start transaction
    // write to spi2 
    WriteSPI2( DAC_config_chan_A | ((DAC_data + 2048) & 0xfff));
    while (SPI2STATbits.SPIBUSY) WAIT; // wait for end of transaction
     // CS high
    mPORTBSetBits(BIT_4) ; // end transaction
   //   
    
   // Read Timer 2 to check ISR cycle count
    t2 = ReadTimer2();
}

Continuous Drawing Thread

// === Continuous Drawing Thread  =============================================
// Handles any demo routines that need to erase and redraw autonomously
static PT_THREAD (protothread_draw(struct pt *pt))
{
    PT_BEGIN(pt);
    // Hilbert variables
    static int d;
    int xds, yds;
    // to store previous and current coordinates
    static int xprev, yprev, x, y;
    // Sweep variables
    static int sweepdir;
    // Captures interactive box parameters
    static int w, h;
    while(1){
        PT_YIELD_UNTIL(pt, draw_signal > NONE);
        // HORIZONTAL SWEEP DEMO
        if (draw_signal == HILBERT){
            // Traverse in hilbert index order
            for(d = 0; d < NPixels; d++){
                // go from hilbert index to cartesian
                xds = 0;
                yds = 0;
                d2xy(HILBERT_ORDER, NPixels, d, &xds, &yds);
                // Erase previous pixel if in blink mode
                if(hil_blink){
                    draw_ds_pixel(xprev, yprev, ILI9340_BLACK);
                }
                // Fill in pixel at calculated index
                draw_ds_pixel(xds, yds, ILI9340_WHITE);
                xprev = xds;
                yprev = yds;
                PT_YIELD_TIME_msec(250);
            }  // for each pixel
            draw_signal = 0;
        }// IF HILBERT
        
        // HORIZONTAL SWEEP DEMO
        else if (draw_signal == HORIZ_SWEEP){
            // Reset variables
            xprev = yprev = x = y = 0;
            sweepdir = 1;
            while(draw_signal == HORIZ_SWEEP){
                // clear previous pixel
                draw_ds_pixel(xprev, yprev, ILI9340_BLACK);
                // find and draw next pixel
                x += sweepdir;
                draw_ds_pixel(x, hsweep_y, ILI9340_WHITE);
                xprev = x;
                yprev = hsweep_y;
                // switch directions at the end
                sweepdir = (x == 0 || x == 7) ? -sweepdir : sweepdir;
                PT_YIELD_TIME_msec(200);
            }
        }// HORIZ SWEEP
        
        // VERICAL SWEEP
        else if (draw_signal == VERT_SWEEP){
            // Reset variables
            xprev = yprev = x = y = 0;
            sweepdir = 1;
            // works exactly like horizontal sweep
            while(draw_signal == VERT_SWEEP){        
                draw_ds_pixel(xprev, yprev, ILI9340_BLACK);
                y += sweepdir;
                draw_ds_pixel(vsweep_x, y, ILI9340_WHITE);
                xprev = vsweep_x;
                yprev = y;
                sweepdir = (y == 0 || y == 7) ? -sweepdir : sweepdir;
                PT_YIELD_TIME_msec(200);
            }
        }// VERT SWEEP
        
        // INTERACTIVE MODE
        else if (draw_signal == INTERACTIVE){
            // Capture the box parameters
            x = int_pos_x;
            y = int_pos_y;
            w = int_w;
            h = int_h;
            while(draw_signal == INTERACTIVE){
                PT_YIELD_UNTIL(pt, shape_move > NONE || draw_signal != INTERACTIVE);
                // Choose what edge to erase and draw opposite side
                if(shape_move == UP){
                    fillRect_ds(x, y + h-1, w, 1, ILI9340_BLACK);
                    y -= 1;
                    fillRect_ds(x, y, w, 1, ILI9340_WHITE);
                }
                if(shape_move == DOWN){
                    fillRect_ds(x, y, w, 1, ILI9340_BLACK);
                    y += 1;
                    fillRect_ds(x, y + h-1, w, 1, ILI9340_WHITE);
                }
                if (shape_move == LEFT){
                    fillRect_ds(x + w-1, y, 1, h, ILI9340_BLACK);
                    x -= 1;
                    fillRect_ds(x, y, 1, h, ILI9340_WHITE);
                }
                if (shape_move == RIGHT){
                    fillRect_ds(x, y, 1, h, ILI9340_BLACK);
                    x += 1;
                    fillRect_ds(x + w-1, y, 1, h, ILI9340_WHITE);
                }
                shape_move = NONE;
            }
        } // INTERACTIVE
    } // END WHILE(1)   
    PT_END(pt);  
}

Main: Lookup Table Generation

  // === build the sine lookup table =======
   // scaled to produce values between 0 and 4096
   int ii;
   for (ii = 0; ii < sine_table_size; ii++){
         sin_table[ii] = (int)(2047*sin((float)ii*6.283/(float)sine_table_size));
    }
 
    // === build the frequency lookup table =======  
   f_LUT[0] = base_freq;
   // Calculate the ratio between adjacent frequencies, exponential like a piano
   freq_step = pow((max_freq / base_freq), (float)1/(float)(NPixels - 1));
   // Populate frequency lookup table
   for (ii = 1; ii < NPixels; ii++){
       f_LUT[ii] = (int)( (_Accum)(f_LUT[ii-1]) * freq_step );
   }
  

C: References

3Blue1Brown - "Hilbert's Curve: Is infinite math useful?"

vOICe on sciencemag

Hilbert Library from FSU

Python implementation from 3Blue1Brown Reddit