Consider the following C code:
#include
#define SKIP 0x00FFFFF8
int main(void)
{
union { float fv; int iv; } count;
long i;
for (i = 0; i < SKIP; i++)
{ count.fv += 1;
}
for (i = 0; i < 16; i++)
{ count.fv += 1;
printf(" +1 = %8.0f 0x%08X\n", count.fv, count.iv);
}
return;
}
At first blush, it seems this counter will be able to count up to a really high number, because even a 32-bit float can handle very large numeric values. But, the problem is that a float only has a 24-bit mantissa value. So if you have a total count that requires more than a 24-bit integer to represent, the increment will stop working due to roundoff error. In other words, each incremental +1 gets rounded off to be equivalent to +0, and the counter stops counting.
Here's the output of that program compiled with gcc v3.4.4 (header line added):
Count Hex Representation
+1 = 16777209 0x4B7FFFF9
+1 = 16777210 0x4B7FFFFA
+1 = 16777211 0x4B7FFFFB
+1 = 16777212 0x4B7FFFFC
+1 = 16777213 0x4B7FFFFD
+1 = 16777214 0x4B7FFFFE
+1 = 16777215 0x4B7FFFFF
+1 = 16777216 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
+1 = 16777217 0x4B800000
As you can see, once the count hits a 24-bit value (16M), the counter stops counting.
This sort of thing can be hard to find in system test, because you may not run the test long enough to catch the problem. For an example of a similar sort of bug that contributed to a tragedy, read about the Patriot Missile bug at http://www.fas.org/spp/starwars/gao/im92026.htm
Here it is in a nutshell: Never Use A Float For A Running Sum
More generally, avoid floats in embedded systems if possible.
You can often get away with a double, but most times you are better off just using an int or long int.
Detailed notes:
IEEE single precision floating point has a 23 bit mantissa with an implicit leading 1 bit. So I say it is a 24 bit value even though only 23 of those bits show up in the actual binary representation.
The fact that the printed floating point value increments one more time but the hex value doesn't change is interesting. Likely it involves the compiled code temporarily storing the increment value in a double precision (or larger) floating point holding register and using that longer value for the printf. Note that the hex value and thus the 32-bit stored value does stop incrementing at the predicted point.
Reading from count.iv after you have written to count.fv is undefined behavior in C99 (See ANSI C 6.2.6.1.6). Hence, your Hex Representation column could show anything.
ReplyDeleteIf you use an (unsigned) int instead of float, you have to check for an overflow. How is that different to check for an increment?
Btw have a look at the nexttowardf function in math.h ;)
In this particular case, the hex representation is showing the equivalent data in the floating point number. This sort of trick is always hazardous for production code, and I used it here just for illustrative purposes.
ReplyDeleteYes, you'd have to worry about an overflow if you used an int, but it wouldn't happen until 4G counts if you used an unsigned 32-bit int. The real point here is that many programmers don't realize that a float will for practical purposes overflow after a count of only 16M. The sole purpose of this code is to illustrate the problem.
@qznc Yes, of course, you are right that an integer also has a "count limit"
ReplyDeleteBut the problem here is not the limit as such, but the increment: the programmer has assumed that the float can go up to really huge values, so he can just keep incrementing...
Skynet used floating point counters on T2.
ReplyDelete