Cornell University

Electrical Engineering 476

Fixed Point mathematical functions

in Codevision C and assembler

**Introduction**

Floating point arithmetic is too slow for small, 8-bit processors to handle, except when human interaction is involved. Scaling a human input in floating point is generally fast enough (compared to the human). However in fast loops, such as IIR filters or animation, you are going to need to use fixed point arithmetic. This page concentrates on *8:8 fixed point* in which there are 8 bits of integer and 8 bits of fraction stored for each number. Numbers are stored in 2's complement form.

This page has the following sections:

- 8:8 fixed point representation
- Basic Arithmetic
- Optimized multipiler in assembler
- Optimized divider in assembler/C
- Square root in assembler/C
- Sine and Cosine

- 4:12 Fixed Point multiply

**8:8 Fixed Point representation **

This section will concentrate on numbers stored as 16-bit `signed ints`

, with the binary-point between bit 7 and bit 8. There will be 8 bits of integer and 8 bits of fraction, so we will refer to this as 8:8 fixed point. This representation allows a dynamic range of `+/- 127`

, with a resolution of `1/256=0.00396`

. Sign representation will be standard 2's complement. For instance to get the fixed point representation for -1.5, we can take the representation for +1.5, which is `0x0180`

, invert the bits and add 1. So inverting, we get `0xfe7f`

and adding one (to the least significant bit) we get `0xfe80`

.

Examples:

Decimal Value |
Fixed Representation |

`0.0` |
`0x0000` |

`1.0` |
`0x0100` |

`1.5` |
`0x0180` |

`1.75` |
`0x01c0` |

`1.00396` |
`0x0101` |

`-1.0` |
`0xff00` |

`-1.5` |
`0xfe80` |

`-2` |
`0xfe00` |

`-127` |
`0x8100` |

`-0.5` |
`0xff80` |

`-0.25` |
`0xffc0` |

`0.5` |
`0x0080` |

`-128` |
`0x8000` |

`127` |
`0x7f00` |

`2.25` |
`0x0240` |

`-2.25` |
`~(0x0240)+1 = 0xfdc0 ` |

**Basic 8:8 Arithmetic **

Addition and subtraction can just use the standard C operators, but multiplication and division need to be defined. These arithmetic functions were initially defined as macros rather than functions. While useful for summarizing the technique and for testing, it turned out to be *much faster* to implement the fixed point multiply as a function (see section below).

#define multfixSlow(a,b) ((int)((((long)(a))*((long)(b)))>>8)) //multiply two fixed #don't use this-too slow: #define divfixSlow(a,b) ((int)((((long)(a))<<8)/((long)(b)))) //divide two fixed #:don't use this-too slow:

We will also need to convert integers to fixed, float to fixed, fixed to integers, perfrom addition, subtraction, multiplication and division, as well as other operations. An example program uses the following macros.

Type conversions:

#define int2fix(a) (((int)(a))<<8) //Convert char to fix. a is a char #define fix2int(a) ((signed char)((a)>>8)) //Convert fix to char. a is an int #define float2fix(a) ((int)((a)*256.0)) //Convert float to fix. a is a float #define fix2float(a) ((float)(a)/256.0) //Convert fix to float. a is an int

**Optimized 8:8 multiplication**

A fast mulitply operation is essential for digital filtering, control systems, and animated games. The macro-based multiply speed was disappointing so I rewrote it as an assembler function call based on the information in Atmel application note AVR201. The speed increase over the macro version (on a Mega32) was a factor of 3 and over the floating version a factor of 6. The entire multiply takes about 52 machine cycles. The time to load operands, multiply, and store is about 26 cycles. The rest of the 52 cycles is C function overhead and pushing/popping registers. This program includes the optimized multiply routine and some timing comparisions between float and fixed operations. The muliply test code is here. I validated the fast multiply routine by running all `2^32`

possible 16-bit operand pairs through it and comparing each output to the macro version. The version below reduces the execution time to 48 cycles by replacing stack operations with register moves.

int mult_opt(int a, int b) begin #asm ;push r20 ;push r21 mov r24,r20 mov r25,r21 ldd R22,Y+2 ;load a ldd R23,Y+3 ld r20,Y ;load b ldd r21,Y+1 muls r23, r21 ; (signed)ah * (signed)bh mov r31, r0 ; mul r22, r20 ; al * bl mov r30, r1 ; ;mov r16, r0 mulsu r23, r20 ; (signed)ah * bl add r30, r0 ; adc r31, r1 ; mulsu r21, r22 ; (signed)bh * al add r30, r0 ; adc r31, r1 ; mov r20,r24 mov r21,r25 ;pop r21 ;pop r20 #endasm end

**Optimized 8:8 division **

A fast division operation is occasionally nice to have, but is harder to implement than multiplcation. The fastest way I have seen uses Newton-Raphson iteration and consists of several steps:

- Strip off the sign of the denominator (D) to make it positive.
- Scale D to the range 0.5<=D<=1.0 and record the number of shifts (N) necessary to to this.
- Estimate of the reciprocal of the scaled D using a linear appoximation requiring only shifts and adds.
- Use Newton's method to iteratatively improve the reciprocal approximation. 8:8 accuracy takes only one iteration to converge.
- Shift the scaled approximation by N back to the correct magnitude.
- Restore the sign to form the complete reciprocal.
- Multiply the numerator by the reciprocal of D.

Threre are two versions of the reciprocal and division routines. The first version is slower, but understandable. The second version is optimized. The optimized version executes about 2 to 3 times faster the the floating point divide routine, depending to the value of the denominator.

D input value |
division cycles |
---|---|

0.5 to 1.0 |
166 |

4 or 0.25 |
190 |

10 or 0.1 |
224 |

100 or 0.01 |
260 |

**Square Root for 8:8 **

The `sqrtfix`

version in this program is at least 18 times faster than the floating point library version. It executes in less than 290 cycles, depending on the value of the input. Shorter times at high input values are caused by limited bits to be computed and some detailed range checking to prevent internal overflow. A zero or negative input yields a zero output. The algorithm first determines one of 12 different possible integer parts of the square root (0 to 11), then uses successive approximation to get the fractional bits.

input value |
cycles |
---|---|

0.00 to 120.99 |
< 290 |

121.00 to 126.56 |
260 |

126.57 to 127.99 |
210 |

A fixed point square root macro which depends on the `math.h`

libary `lsqrt`

function is shown below. This fixed point square root macro is slower, and in some cases, *much slower* than the floating operation:

#define sqrtfixSlow(a) (lsqrt(((long)(a))<<8)) //Don't use this square root -- too slow

**Sine and Cosine for 8:8 **

Trig functions can be calculated using the CORDIC algorithm. Also see this page and code from Christopher Giese. The following two functions compute the sine and cos of an angle from -90 to +90 degrees. The `angle`

is specified in degrees.

int sinefix(int angle) int cosfix(int angle)

The following routine computes both the sine and cos at the same time.

//===CORDIC================================================ //for CORDIC sine/cos only flash int Angles[]={ float2fix(45.0), float2fix(26.565), float2fix(14.0362), float2fix(7.12502), float2fix(3.57633), float2fix(1.78991), float2fix(0.895174),float2fix(0.447614),float2fix(0.223811), float2fix(0.111906),float2fix(0.055953),float2fix(0.027977) }; //From: http://my.execpc.com/~geezer/embed/cordic.htm --- geezer@execpc.com //call as SinCos(int2fix(ang), &s, &c); ang in degrees with s retuning the sine and c the cosine SinCos(int TargetAngle, int *sin, int *cos) begin int X, Y, CurrAngle; unsigned Step; Y=0; CurrAngle=0; X = 0x009b; //0.607 for(Step=0; Step < 12; Step++) { int NewX; if(TargetAngle > CurrAngle) { NewX=X - (Y >> Step); Y=(X >> Step) + Y; X=NewX; CurrAngle += Angles[Step]; } else { NewX=X + (Y >> Step); Y=-(X >> Step) + Y; X=NewX; CurrAngle -= Angles[Step]; }} *sin = Y; //the sine *cos = X; //the cos endThe speed of the sine and cos routines appear to be about 2600 cycles, about twice that of a floating operation, but you can gain another factor of two if you need both sine and cosine.

**4:12 Fixed Point **

Some applications need more fractional accuracy, but less dynamic range than provided by 8:8 notation. The 4:12 notation range is +/-7 with a fractional accuracy of 1/4096=0.00024. The 4:12 notation can be used to make more accurate, but slightly slower, IIR filters. This multiply routine could be used in second order IIR filters down to a relative frequency of 0.05, or maybe lower.

Copyright Cornell University Dec 2007

Bruce R. Land