Cornell University
Electrical Engineering 476
Fixed Point mathematical functions
in Codevision C and assembler

Introduction

Floating point arithmetic is too slow for small, 8-bit processors to handle, except when human interaction is involved. Scaling a human input in floating point is generally fast enough (compared to the human). However in fast loops, such as IIR filters or animation, you are going to need to use fixed point arithmetic. This page concentrates on 8:8 fixed point in which there are 8 bits of integer and 8 bits of fraction stored for each number. Numbers are stored in 2's complement form.

This page has the following sections:


8:8 Fixed Point representation

This section will concentrate on numbers stored as 16-bit signed ints, with the binary-point between bit 7 and bit 8. There will be 8 bits of integer and 8 bits of fraction, so we will refer to this as 8:8 fixed point. This representation allows a dynamic range of +/- 127, with a resolution of 1/256=0.00396. Sign representation will be standard 2's complement. For instance to get the fixed point representation for -1.5, we can take the representation for +1.5, which is 0x0180, invert the bits and add 1. So inverting, we get 0xfe7f and adding one (to the least significant bit) we get 0xfe80.

Examples:

Decimal Value
Fixed Representation
0.0
0x0000
1.0
0x0100
1.5
0x0180
1.75
0x01c0
1.00396
0x0101
-1.0
0xff00
-1.5
0xfe80
-2
0xfe00
-127
0x8100
-0.5
0xff80
-0.25
0xffc0
0.5
0x0080
-128
0x8000
127
0x7f00
2.25
0x0240
-2.25
~(0x0240)+1 = 0xfdc0

Basic 8:8 Arithmetic

Addition and subtraction can just use the standard C operators, but multiplication and division need to be defined. These arithmetic functions were initially defined as macros rather than functions. While useful for summarizing the technique and for testing, it turned out to be much faster to implement the fixed point multiply as a function (see section below).

Multplication and division:
#define multfixSlow(a,b) ((int)((((long)(a))*((long)(b)))>>8)) //multiply two fixed #don't use this-too slow:
#define divfixSlow(a,b)  ((int)((((long)(a))<<8)/((long)(b)))) //divide two fixed #:don't use this-too slow:

We will also need to convert integers to fixed, float to fixed, fixed to integers, perfrom addition, subtraction, multiplication and division, as well as other operations. An example program uses the following macros.

Type conversions:

#define int2fix(a)   (((int)(a))<<8)            //Convert char to fix. a is a char
#define fix2int(a)   ((signed char)((a)>>8))    //Convert fix to char. a is an int
#define float2fix(a) ((int)((a)*256.0))         //Convert float to fix. a is a float
#define fix2float(a) ((float)(a)/256.0)         //Convert fix to float. a is an int 

 


Optimized 8:8 multiplication

A fast mulitply operation is essential for digital filtering, control systems, and animated games. The macro-based multiply speed was disappointing so I rewrote it as an assembler function call based on the information in Atmel application note AVR201. The speed increase over the macro version (on a Mega32) was a factor of 3 and over the floating version a factor of 6. The entire multiply takes about 52 machine cycles. The time to load operands, multiply, and store is about 26 cycles. The rest of the 52 cycles is C function overhead and pushing/popping registers. This program includes the optimized multiply routine and some timing comparisions between float and fixed operations. The muliply test code is here. I validated the fast multiply routine by running all 2^32 possible 16-bit operand pairs through it and comparing each output to the macro version. The version below reduces the execution time to 48 cycles by replacing stack operations with register moves.

int mult_opt(int a, int b)
begin
    #asm
    ;push r20
    ;push r21
    mov  r24,r20
    mov  r25,r21
    
    ldd  R22,Y+2  ;load a
    ldd  R23,Y+3 
	
    ld  r20,Y   ;load b
    ldd r21,Y+1 
	
    muls	r23, r21	; (signed)ah * (signed)bh
    mov		r31, r0		;
    mul		r22, r20	; al * bl
    mov 	r30, r1		;
    ;mov	r16, r0
    mulsu	r23, r20	; (signed)ah * bl
    add		r30, r0     ;
    adc		r31, r1     ;
    mulsu	r21, r22	; (signed)bh * al
    add		r30, r0     ;
    adc		r31, r1     ;
	
    mov r20,r24
    mov r21,r25
    ;pop r21
    ;pop r20
    #endasm
end

Optimized 8:8 division

A fast division operation is occasionally nice to have, but is harder to implement than multiplcation. The fastest way I have seen uses Newton-Raphson iteration and consists of several steps:

  1. Strip off the sign of the denominator (D) to make it positive.
  2. Scale D to the range 0.5<=D<=1.0 and record the number of shifts (N) necessary to to this.
  3. Estimate of the reciprocal of the scaled D using a linear appoximation requiring only shifts and adds.
  4. Use Newton's method to iteratatively improve the reciprocal approximation. 8:8 accuracy takes only one iteration to converge.
  5. Shift the scaled approximation by N back to the correct magnitude.
  6. Restore the sign to form the complete reciprocal.
  7. Multiply the numerator by the reciprocal of D.

Threre are two versions of the reciprocal and division routines. The first version is slower, but understandable. The second version is optimized. The optimized version executes about 2 to 3 times faster the the floating point divide routine, depending to the value of the denominator.

D input value
division cycles
0.5 to 1.0
166
4 or 0.25
190
10 or 0.1
224
100 or 0.01
260

 


Square Root for 8:8

The sqrtfix version in this program is at least 18 times faster than the floating point library version. It executes in less than 290 cycles, depending on the value of the input. Shorter times at high input values are caused by limited bits to be computed and some detailed range checking to prevent internal overflow. A zero or negative input yields a zero output. The algorithm first determines one of 12 different possible integer parts of the square root (0 to 11), then uses successive approximation to get the fractional bits.

input value
cycles
0.00 to 120.99
< 290
121.00 to 126.56
260
126.57 to 127.99
210

A fixed point square root macro which depends on the math.h libary lsqrt function is shown below. This fixed point square root macro is slower, and in some cases, much slower than the floating operation:

#define sqrtfixSlow(a)   (lsqrt(((long)(a))<<8))  //Don't use this square root -- too slow

Sine and Cosine for 8:8

Trig functions can be calculated using the CORDIC algorithm. Also see this page and code from Christopher Giese. The following two functions compute the sine and cos of an angle from -90 to +90 degrees. The angle is specified in degrees.

int sinefix(int angle)
int cosfix(int angle)

The following routine computes both the sine and cos at the same time.

//===CORDIC================================================
//for CORDIC sine/cos only
flash int Angles[]={
	float2fix(45.0),	float2fix(26.565),	float2fix(14.0362),
	float2fix(7.12502),	float2fix(3.57633),	float2fix(1.78991),
	float2fix(0.895174),float2fix(0.447614),float2fix(0.223811),
	float2fix(0.111906),float2fix(0.055953),float2fix(0.027977) };  
	
//From: http://my.execpc.com/~geezer/embed/cordic.htm --- geezer@execpc.com
//call as SinCos(int2fix(ang), &s, &c); ang in degrees with s retuning the sine and c the cosine
SinCos(int TargetAngle, int *sin, int *cos)
begin
    int X, Y, CurrAngle;
	unsigned Step;
	
	Y=0;
	CurrAngle=0;
	X = 0x009b;    //0.607
	
	for(Step=0; Step < 12; Step++)
	{	int NewX;

		if(TargetAngle > CurrAngle)
		{	NewX=X - (Y >> Step);
			Y=(X >> Step) + Y;
			X=NewX;
			CurrAngle += Angles[Step]; }
		else
		{	NewX=X + (Y >> Step);
			Y=-(X >> Step) + Y;
			X=NewX;
			CurrAngle -= Angles[Step]; }} 
		*sin = Y;   //the sine
		*cos = X;   //the cos
end
The speed of the sine and cos routines appear to be about 2600 cycles, about twice that of a floating operation, but you can gain another factor of two if you need both sine and cosine.


4:12 Fixed Point

Some applications need more fractional accuracy, but less dynamic range than provided by 8:8 notation. The 4:12 notation range is +/-7 with a fractional accuracy of 1/4096=0.00024. The 4:12 notation can be used to make more accurate, but slightly slower, IIR filters. This multiply routine could be used in second order IIR filters down to a relative frequency of 0.05, or maybe lower.


Copyright Cornell University Dec 2007
Bruce R. Land