Cornell University ECE4760
Fixed Point Arithmetic



The fixed point arithemetic is faster than floating point, and often has enough dynamic range and accuracy for animation and DSP.

  1. Fixed point arithmetic performance
    -- Fixed point arithmetic is the first step to building DSP functions. I decided to implement 2.30 and 2.14 formats. This means two bits to the left of the binary-point, one of which is the sign bit. The dynamic range of the systems is either -2 to 2-2-14 or -2 to 2-2-30. The resolution is either 2-14=6*10-5 or 2-30=9*10-10. The resolution is necessary to make stable, accurate, filters. The dynamic range is sufficient for Butterworth, IIR filters, made with second order sections (SOS). SOS help to minimize filter roundoff errors. This program defined the data types and macros for converting float-to-fix, fix-to-float and fixed point multiply. Add and subtract just work. The program uses timer2 to count cycles to profile the time for the add and multiply operations, then uses the UART (see section below) to print the results. The 2.30 format takes 40 cycles to to a multiply-and-accumulate (MAC) operation. The 2.14 format takes 17 cycles for a MAC operation (level 0 opt). The 2.14 result (1.5*0.05-0.25) is in error by 4*10-5, the 2.30 result is correct to 8 places. The macros for the 2.30 follow:
    typedef signed int fix32 ;
    #define multfix32(a,b) ((fix32)(((( signed long long)(a))*(( signed long long)(b)))>>30)) //multiply two fixed 2:30
    #define float2fix32(a) ((fix32)((a)*1073741824.0)) // 2^30
    #define fix2float32(a) ((float)(a)/1073741824.0) 
    For animation, another fixed point system useful over a larger integer range is 16.16 format with a range of +/-32767 and a resolution of 1.5x10-5.
    Note that this is the system used in the particle animations on the TFT and NTSC page.
    The macros for this system are:
    typedef signed int fix16 ;
    #define multfix16(a,b) ((fix16)(((( signed long long)(a))*(( signed long long)(b)))>>16)) //multiply two fixed 16:16
    #define float2fix16(a) ((fix16)((a)*65536.0)) // 2^16
    #define fix2float16(a) ((float)(a)/65536.0)
    #define fix2int16(a) ((int)((a)>>16))
    #define int2fix16(a) ((fix16)((a)<<16))
    #define divfix16(a,b) ((fix16)((((signed long long)(a)<<16)/(b))))
    #define sqrtfix16(a) (float2fix16(sqrt(fix2float16(a))))
    #define absfix16(a) abs(a)
    The performance for operations vary. At level 1 opt, fixed multiply is about 2.4 times faster than floating point (23 cycles), and fixed add is about 8 times faster (8 cycles). However fixed divide is the same speed as float, and fixed square root is 0.6 the speed of the float operation. Test code. Fortunately, DSP uses only add and multiply, in about equal numbers. Animation operations depend on the force law used.

Copyright Cornell University October 7, 2015