Cornell University ECE4760
Fixed Point Arithmetic
PIC32MX250F128B
Introduction
Fixed point arithemetic is generally faster than floating point, and often has enough dynamic range and accuracy for animation and DSP. I decided to implement 2.30, 16.16 and 2.14 formats. The format 2.30 means two bits to the left of the binarypoint, one of which is the sign bit. Fixed point is used in DSP, animation loops, and control loops where speed is the limiting factor.
Fixed point arithmetic systems
2.30
The dynamic range of the 2.30 is 2 to 22^{30}. The resolution is 2^{30}=9*10^{10}. The high resolution is necessary to make stable, very accurate, IIR filters. The dynamic range is sufficient for Butterworth, IIR filters, made with second order sections (SOS). SOS help to minimize filter roundoff errors.
The macros for the 2.30 follow:
typedef signed int fix32 ; #define multfix32(a,b) ((fix32)(((( signed long long)(a))*(( signed long long)(b)))>>30)) //multiply two fixed 2:30 #define float2fix32(a) ((fix32)((a)*1073741824.0)) // 2^30 #define fix2float32(a) ((float)(a)/1073741824.0)
16.16
For animation, another fixed point system useful over a larger integer range is 16.16 format with a range of +/32767 and a resolution of 1.5x10^{5}.
This is the system used in the particle animations. The TFT_animation_BRL4.c example on the ProtoThreads page uses 16.16 to simulate a projectile.
The macros for this system are:
typedef signed int fix16 ;
#define multfix16(a,b) ((fix16)(((( signed long long)(a))*(( signed long long)(b)))>>16))
#define float2fix16(a) ((fix16)((a)*65536.0)) // 2^16
#define fix2float16(a) ((float)(a)/65536.0)
#define fix2int16(a) ((int)((a)>>16))
#define int2fix16(a) ((fix16)((a)<<16))
#define divfix16(a,b) ((fix16)((((signed long long)(a)<<16)/(b))))
#define sqrtfix16(a) (float2fix16(sqrt(fix2float16(a))))
#define absfix16(a) abs(a)
2.14
A narrower 2.14 format is good for fast, reasonable accuracy filters. This is 2bit integer (sign bit and one bit) with 14bit fraction (2.14 format). Range is 2 to 22^{14} and resolution is 2^{14}=6*10^{5}.
For more poles than 2pole IIR, you must use SecondOrderSection filters (example 6 on DSP page).The macros for this system are:
// == bit fixed point 2.14 format ===============================
// == resolution 2^14 = 6.1035e5
// == dynamic range is +1.9999/2.0
typedef signed short fix14 ;
#define multfix14(a,b) ((fix14)((((long)(a))*((long)(b)))>>14)) //multiply two fixed 2.14
#define float2fix14(a) ((fix14)((a)*16384.0)) // 2^14
#define fix2float14(a) ((float)(a)/16384.0)
#define absfix14(a) abs(a)
Note that there are no integer conversion macros for this format because the integer range is only +/2.
Fixed point arithmetic performance
The performance for
operations vary, but fixed point is faster than floating point on this architecture. At optimization level O1, the following table gives the timing in cpu cycles for a multiply operation and a multiplyaccumulate (MAC) operation. MAC is common in DSP. Also given is the number of assembler opcodes executed (To see assembler listing in MPLABX use menu item Window>Debugging>output>Dissasembly).
The times in the table include the time to load/store variables. For 16.16 operations some of the multiplies are 2 cycles. (Code)
multiply 
multiply opcodes 
MAC cycles 
MAC opcodes 


fix14  8  8  9  9 
fix16  12  10  20  16 
float  5053  library call  99119  library call 
Fixed multiply for 16.16 is about 4 times faster than floating point, and 2.14 multiply is about 6.5 times faster than floating point. Other operations do not have as great a ratio. For example, fixed 16:16 divide is the same speed as float, and fixed square root is 0.6 the speed of the float operation. Fortunately, DSP uses only add and multiply, in about equal numbers. Compiling without optimization does not slow down floating point (probably because the libraries are optimized), but slows the fixed operations about a factor of 23.
Copyright Cornell University July 21, 2017