Friday, February 3, 2012

Avoid Floating Point Counters

Every once in a while I encounter code that uses a float to track time, keep count of events, or otherwise add small increments to a running sum. This can lead to some nasty bugs that don't show up during system test.

Consider the following C code:

#include
#define SKIP 0x00FFFFF8

int main(void)
{
  union  { float fv;  int iv; } count;
  long i;

  for (i = 0; i < SKIP; i++)
  { count.fv += 1;
  }

  for (i = 0; i < 16; i++)
  { count.fv += 1;
    printf(" +1 = %8.0f   0x%08X\n", count.fv, count.iv);
  }
 return;
}

At first blush, it seems this counter will be able to count up to a really high number, because even a 32-bit float can handle very large numeric values. But, the problem is that a float only has a 24-bit mantissa value. So if you have a total count that requires more than a 24-bit integer to represent, the increment will stop working due to roundoff error.  In other words, each incremental +1 gets rounded off to be equivalent to +0, and the counter stops counting.

Here's the output of that program compiled with gcc v3.4.4 (header line added):
       Count     Hex Representation
 +1 = 16777209   0x4B7FFFF9
 +1 = 16777210   0x4B7FFFFA
 +1 = 16777211   0x4B7FFFFB
 +1 = 16777212   0x4B7FFFFC
 +1 = 16777213   0x4B7FFFFD
 +1 = 16777214   0x4B7FFFFE
 +1 = 16777215   0x4B7FFFFF
 +1 = 16777216   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000
 +1 = 16777217   0x4B800000

As you can see, once the count hits a 24-bit value (16M), the counter stops counting.

This sort of thing can be hard to find in system test, because you may not run the test long enough to catch the problem. For an example of a similar sort of bug that contributed to a tragedy, read about the Patriot Missile bug at http://www.fas.org/spp/starwars/gao/im92026.htm

Here it is in a nutshell: Never Use A Float For A Running Sum
More generally, avoid floats in embedded systems if possible.
You can often get away with a double, but most times you are better off just using an int or long int.

Detailed notes:
IEEE single precision floating point has a 23 bit mantissa with an implicit leading 1 bit. So I say it is a 24 bit value even though only 23 of those bits show up in the actual binary representation.

The fact that the printed floating point value increments one more time but the hex value doesn't change is interesting. Likely it involves the compiled code temporarily storing the increment value in a double precision (or larger) floating point holding register and using that longer value for the printf. Note that the hex value and thus the 32-bit stored value does stop incrementing at the predicted point.

Monday, January 2, 2012

New Blog on Checksums and CRCs

I've decided to start a separate blog for checksum and CRC postings. There are only a few posts at this point, but I expect it will accumulate quite a bit more material over time.

http://checksumcrc.blogspot.com/
Checksum and CRC Central: Cyclic Redundancy Codes, Checksums, and related issues for computer system data integrity.

The most recent post gives CRC polynomials for optimal protection of 8-bit, 16-bit, and 32-bit data words such as you might want for ensuring integrity of key variables. The blog you're reading now (Better Embedded System SW) will continue as before.

Here's wishing everyone a happy and prosperous new year!

Monday, December 19, 2011

Using the Inline Keyword Instead of a Macro

Really good code usually consists of a large number of relatively small subroutines (or methods) that can be composed as building blocks in many different ways. But when I review production embedded software I often see the use of fewer, bigger, and less composable subroutines.This makes the code more bug-prone and often makes it more difficult to test as well.

The reason often given for not using small subroutines is runtime execution cost. Doing a subroutine call to perform a small function can slow down a program significantly if it is done all the time. One of my soapboxes is that you should almost always buy a bigger CPU rather than make software more complex -- but for now I'm going to assume that it is really important that you minimize execution time.

Here's a toy example to illustrate the point. Consider a saturating increment, that will add one to a value, but will make sure that the value doesn't exceed the maximum positive value for an unsigned integer:

  int SaturatingIncrement(int x)
  { if (x != MAXINT)
    { x++;
    }
    return(x);
  }

So you might have some code that looks like this:
  ...
  x = SaturatingIncrement(x);
  ...
  z = SaturatingIncrement(z);

You might find that if you do a lot of saturating increments your code runs slowly. Usually when this happens I see one of two solutions.  Some folks just paste the actual code in like this:
  ...
  if (x != MAXINT)  { x++; }
  ...
  if (z != MAXINT)  { z++; }


A big problem with this is that if you find a bug, you get to to track down all the places where the code shows up and fix the bug. Also, code reviews are harder because at each point you have to ask whether or not it is the same as the other places or if there has been some slight change. Finally, testing can be difficult because now you have to test MAXINT for every variable to get complete coverage of all the branch paths.

A slightly better solution is to use a macro:
#define SaturatingIncrement(w)  { if ((w) != MAXINT)  { (w)++; } }
which lets you go back to more or less the original code. This macro works by pasting the text in place of the macro. So the source you write is:
  ...
  SaturatingIncrement(x);
  ...
  SaturatingIncrement(z);

but the preprocessor uses the macro to feed into the compiler this code:
  ...
  if (x != MAXINT)  { x++; }
  ...
  if (z != MAXINT)  { z++; }
 thus eliminating the subroutine call overhead.

The nice things about a macro are that if there is a bug you only have to fix it one place, and it is much more clear what you are trying to do when there is code review. However, complex macros can be cumbersome and there can be arcane bugs with macros.  (For example, do you know why I put "(w)" in the macro definition instead of just "w"?) Arguably you can unit test a macro by invoking it, but that test may well miss strange macro expansion bugs.

The good news is that in most newer C compilers there is a better way. Instead of using a macro, just use a subroutine definition with the "inline" keyword.
  inline int SaturatingIncrement(int x)
  { if (x != MAXINT)
    { x++; }
    return(x);
  }

The inline keyword tells the compiler to expand the code definition in-line with the calling function as a macro would do. But instead of doing textual substitution with a preprocessor, the in-lining is done by the compiler itself. So you can write your code using as many inline subroutines as you like without paying any run-time speed penalty. Additionally, the compiler can do type checking and other analysis to help you find bugs that can't be done with macros.

There can be a few quirks to inline. Some compilers will only inline up to a certain number of lines of code (there may be a compiler switch to set this). Some compilers will only inline functions defined in the same .c file (so you may have to #include that .c file to be able to inline it). Some compilers may have a flag to force inlining rather than just making that keyword a suggestion to the compiler. To be sure inline is really working you'll need to check the assembly language output of your compiler. But, overall, you should use inline instead of macros whenever you can, which should be most of the time.

Sunday, November 27, 2011

Avoiding EEPROM corruption problems

EEPROM is invaluable for storing operating parameters, error codes, and other non-volatile data. But it can also be a source of problems if stored values get corrupted.  Here are some good practices for avoiding EEPROM problems.
  • Pay attention to error return codes when writing values, and make sure that the data you wanted to write actually gets written. Read it back after the write and checking for a correct value.
  • It takes a while to write to data. If your EEPROM supports a "finished" signal, use that instead of a fixed timeout that might or might not be long enough.
  • Don't access the EEPROM with marginal voltage. It is common for an external EEPROM chip to need a higher minimum voltage than your microcontroller. That means the microcontroller can be running happily while the EEPROM doesn't have enough voltage to operate properly, causing corrupted writes. In other words, you might have to set your brownout protection at a higher voltage than normal to protect EEPROM operation.
  • Have a plan for power loss during a write cycle. Will your hold-up cap keep things afloat long enough to finish the write? Do you re-initialize the EEPROM when the CPU powers up in case the EEPROM was left half-way through a write cycle when the CPU was reset?
  • Don't use address zero of the EEPROM. It is common for corruption problems to hit address zero, which can be the default address pointed to in the EEPROM if that chip is reset or otherwise has a problem during a write cycle.
  • Watch out for wearout. EEPROM can only be written a finite number of times. Make sure you stay well within the wearout rating for your EEPROM.
  • Consider using error coding such as a CRC protection for critical data items or blocks of data items stored in the EEPROM. Make sure the CRC isn't re-written so often that it causes EEPROM wearout.
If you've seen other problems or strategies that you'd like to share, let me know!

Friday, November 4, 2011

Embedded System Code Review Checklist

This past summer I had the opportunity to work with my friend Gautam Khattak on several industry design reviews. By the time the dust had settled, we'd had a lot of time to think about patterns of common problems and the types of things that designers should be looking for in peer reviews for embedded system software.

As a result, we've written a two-page checklist for embedded system code reviews that you are free to use however you like.
The sections include: function, style, architecture, exception handling, timing, validation/test, and hardware. This is a pretty thorough starting point, but no doubt you will want to tailor these to your specific situation.

The checklists are broken into sections to make it easier to do perspective-based reviews. The idea is to divide up the sections among 3 or 4 reviewers so that each reviewer is thinking about somewhat different aspects of the code, resulting in better review coverage of potential issues.

Everyone has their own favorite thing to look for in a review. If you think we've missed one please let us know with a comment. One thing we have intentionally left out is system-level problems that are better done in a system-level review. These are primarily intended for module reviews that look at a couple hundred lines of code at a time.

Again, you can use these within your company or for any other purpose without having to ask our permission. But we'd be happy to hear from you if you've found them especially useful or if you have suggestions for things we've missed.

Friday, October 14, 2011

Embedded Security Pitfalls

Every once in a while I'm asked to take a look at a scheme for ensuring that an embedded system is secure for a variety of reasons. Generally the issue isn't really secrecy so much as making sure that an embedded device is really what you think it is, or that a license fee has been paid for a service, or that some bad guy hasn't broken in to a system to issue bogus commands. These are hard problems for any system, but can be made especially difficult for embedded systems because severe cost constraints can discourage the use of bank-grade IT security solutions.

There is plenty out there on the web about security, so I'm not going to attempt a comprehensive tutorial here. (There is a chapter in my book that covers the basics as applied to embedded systems.) But one thing that can be difficult to find is a comprehensive list of high-level pitfalls for embedded security all in one compact list. So here it is. Ignore these hard-won lessons from the security community at your own peril.
  • Security by obscurity never works against a serious adversary. Any sentence that contains the phrase "the bad guys will never figure out how" is false.
  • Almost nobody is good enough to cook up their own cryptography. Use a good standard algorithm. If that costs too much, then realize you're just keeping out the clueless.
  • Secret algorithms never remain secret. Any security that hinges upon the secrecy of an encryption algorithm is a bad idea. (Those sorts of systems are snake oil.)
  • Tamper-proofing only raises the bar to steal the secrets inside a chip. Tamper-proofing is worth doing, but don't count on it as a definitive security strategy.
  • Even if an algorithm is secret, it can be broken.
  • Weak algorithms will be broken, and probably publicized for bragging rights.
  • Even if an algorithm isn't broken, it might be subject to simple cloning or a replay attack that captures and repeats an encrypted message such as a door unlock command.
  • Never use a back-door key, or manufacturer secret key. The information will get out. Place your faith only in a large, unique-per-device random secret key.
  • Make sure you don't generate weak or non-random keys. All the above fallacies apply to the algorithm you use for generating random keys.
  • Don't forget the possibility of a disgruntled employee or a rubber hose attack disclosing your algorithm or master key.
  • Don't use encryption as your primary tool when what you really need is a secure hash, signature, or MAC for authentication. There are endless possibilities for getting it wrong if you use the wrong tool for the job.
  • Don't forget to have a plan for upgrading security or fixing bugs when they appear. Remember that update mechanisms can be attacked (for example, via a trojan horse update file).
  • The system owner might be an attacker (heard of jailbreaking?)
  • The attacker does not need to be sophisticated to break tough security (see "script kiddies")
  • Figure out export control issues as you design your system, and adapt the design if required.
  • Instances of poor security biting embedded system designers are ubiquitous. You just haven't heard most of them because nobody is allowed to talk about their own company's problems.
The point of the above list is the following: if you don't understand what a bullet is and why I'm saying it, you should probably dig deeper into security before you ship a product that needs to be secure. There are always exceptions, but they are rare. This is one of those cases where 80% of ordinary designers think they're the 1% exception case that is clever enough to get away with bending the rules -- and it's just not so.


If you have suggestions for additional items for high level of pitfalls I'd love to hear them.

Friday, September 30, 2011

The NOP Trick For Speed Profiling

An important principle in code optimization is speed up the parts that matter. One of the interpretations of Amdahl's Law is that speedup is limited by the fraction of total time taken by a particular piece of code. If for example a particular snippet of code is 10% of overall execution time, then making it execute infinitely fast is only going to give you a 10% speedup. 10% might be worthwhile, but if instead a snippet of code only takes 0.001% of total execution time, then it is unlikely spending time optimizing it is worth your while. In other words, there is no point wasting hours optimizing parts of the code that won't make any difference to overall execution speed.

But, how do you know what parts of the code matter? Intuition can help sometimes, but in practice it is hard to know your code well enough that you always guess the bottleneck locations correctly.

If you have a tool-rich development environment you can use a profiling tool (wikipedia has a description and a list of common tools). But sometimes you are on a platform that is too small to use these tools, or you're in too much of a hurry to use them because you are just absolutely sure you are right about where all the time is being spent.

Here is a trick that can save you a lot of time doing optimization. I use it pretty much every time I'm going to optimize code as a sanity check (even if a tool tells me where to optimize, because tools aren't perfect and I've learned to be careful before investing valuable time optimizing). The trick is: insert some NOPs in the code you want to optimize and see how much slower it goes. If you don't see a speed difference, then you are wasting your time optimizing that part of the code to make it go faster. If you do see a speed difference, that difference can help you estimate how much speedup you are going to get if you do the optimization.

Here's some example C code before optimization:
  for(int i = 0; i < 1000; i++)
  { x[i] += y[i];
  }

Depending on your compiler it might well be that you can optimize this by switching to pointers instead of indexed arrays. Assuming that you've decided it is worth the trouble to optimize your code instead of buying a faster processor (that's a whole other discussion!) then probably this code can be made faster. But, if this code is only called once in a while and is only a small part of your overall program it might not give enough speedup to be worth optimizing.

Let's say you think you can squeeze 5 clock cycles per loop out of this code with a rewrite. Before you optimize it, use a timer (or an oscilloscope watching an output pulse on an I/O pin, or a stopwatch) to run the code as-is, and then time the following code to see if there is a speed difference:

  for(int i = 0; i < 1000; i++)
  { x[i] += y[i];
    asm("nop");
    asm("nop");
    asm("nop");
    asm("nop");
    asm("nop");
  }

This inserts 5 NOP do-nothing instructions that are executed every time the loop is executed. Assuming a NOP instruction takes 1 clock, it slows the loop down by 5 clock cycles.

If the whole program with the NOPs in the loop runs 10% slower, then you know that optimizing the loop to save 5 clocks will likely cause the whole program to run about 10% faster. (The slowdown and speedup aren't quite the same percent if you think about the math, but for small percents it's close enough to not worry about this.)

If on the other hand adding a bunch of NOPs makes no difference to the overall program speed, then you are wasting your time optimizing that loop. If adding instructions doesn't make things noticeably slower, then removing them won't make things faster.


Monday, August 29, 2011

Compile-Time Constants For I/O conversion

In several design reviews I've run across code that looks something like this.

unsigned int VoltLimitMin =  122;   /* 3.30 Volts */
unsigned int VoltLimitMax = 179;   /* 5.10 Volts */
unsigned int VoltLimitTrip =  205;   /* 5.55 Volts */

The purpose of this code is to set min, max, and emergency shutdown thresholds for voltage coming from some A/D converter. The programmer has manually converted the voltage value to the number of A/D integer steps that corresponds to.  (Similar code might be used to express time in terms of timer ticks as well.) Do you see the bug in the above code?  Would you have noticed if I hadn't told you a bug was there? 

Here is a way to save some trouble and get rid of that type of bug. First, figure out how many steps there are per unit that you care about, such as:
// A/D converter has 37 steps per Volt ( 37 = 1 Volt, 74 = 2 Volts, etc.)
#define UNITS_PER_VOLT      37   

then, create a macro that does the conversion such as:
#define VOLTS(v)   ((unsigned int) (((v)*UNITS_PER_VOLT)+.5))

Here is an example to explain how that works.  If you say VOLTS(3.30) then "v" in the macro is 3.30.  It gets multiplied by 37 to give 122.1.   Adding 0.5 lets it round to nearest positive integer, and the "(int)" ensures that the compiler knows you want it to be an unsigned integer result instead of a floating point number.

Then you can use:


unsigned int VoltLimitMin =  VOLTS(3.30);
unsigned int VoltLimitMax = VOLTS(5.10);
unsigned int VoltLimitTrip =  VOLTS(5.55);

Because the macros are expanded in-line by the preprocessor, most compilers are able to compile exactly the same code as if you had hand-computed the number (i.e., they compile to a constant integer value). So this macro shouldn't cost you anything at run time, but will remove the risk of for hand computation bugs or someone forgetting to update comments if the integer value is changed. Give it a try with your favorite compiler and see how it works.


Notes:
  • The rounding trick of adding 0.5 only works with non-negative numbers.Usually A/D converters output non-negative integers so this trick usually works.
  • This may not work on all compilers, but it works on all the ones I've tried on various microcontroller architectures. I saw one compiler do the float-to-int conversion at run-time if I left out the "(unsigned int)" in the macro, so make sure you put it in. Do a disassembly on your code to make sure it is working for you.
  • The "const" keyword available in some compilers can optimize things even further and possibly avoid the need for a macro if your compiler is smart enough, but I'll leave that up to you to play with if you use this trick in a real program.
  • As mentioned by one of the comments, you might also check for overflow with an ASSERT.


Monday, August 1, 2011

Proper use of .h and .c files

I recently worked with some embedded system teams who were struggling with the best way to use .c and .h files for their source code. As I was doing this, I remembered that back when I was learning how to program that it took me quite a while to figure all this out too!  So, here are some guidelines on a reasonable way to use .c and .h files for organizing your source code. They are listed as a set of rules, but it helps to apply common sense too:

  • Use multiple .c files, not just one "main.c" file. Every .c file should have a set of variables and functions that are tightly related to each other, and only loosely related to the other .c files.  "Main.c" should only have the main loop in it.
  • .C files allocate storage and define executable code. Only .c files have a function defined in them.  Only .c files allocate storage for variables
  • .H files define external storage and define function prototypes The .h files give other modules the information they need to work with a particular .c file, but don't actually define storage and don't actually define code.  The keyword "extern" should generally be showing up in .h files only.
  • Every .c file should have a corresponding .h file. The .h file provides the external interface information to other modules for using the corresponding .c file.

Here is an example of how this works. Let's say you have files:  main.c    adc.c   output.c  process.c  watchdog.c

main.c would have the main loop that polls the A/D converter, processes values, sends outputs, and pets the watchdog timer.  This would look like a sequence of consecutive subroutine calls within an infinite loop.  There would be a corresponding main.h that might have global definitions in it  (you can also have "globals.h" although I prefer not to do that myself).   Main.c would #include  main.h, adc.h, output.h, process.h and watchdog.h because it needs to call functions from all the corresponding .c files.

adc.c would have the A/D converter code and functions to poll the A/D and store most recent A/D values in a data structure.  The corresponding adc.h would have extern declarations and function prototypes for any other routine that uses A/D calls (for example, a call to look up a recent A/D value from polling).  Adc.c would #include adc.h and perhaps nothing else.

output.c would take values and send them to outputs.  The corresponding output.h would have information for calling output functions.  Output.c would probably just #include output.h.  (Whether main.c actually calls something in output.c depends on your particular code structure, but I'm assuming that outputs have two steps: process.c queues outputs, and the main.c call actually sends them.)

process.c would take A/D values, compute on them, and queue results for output. It might need to call functions in adc.c to get recent values, and functions in output.c to send results out. For that reason it would #include process.h, adc.h, output.h

watchdog.c would set up and service the watchdog timer. It would #include watchdog.h


One wrinkle is that some compilers can only optimize in a single .c file (e.g., only do "inline" within a single file). This is no reason to put everything in a single .c file!  Instead, you can just #include all the other .c files from within main.c. (You might have to make sure you include each .h file once depending on what's in them, but often that isn't necessary.) You should avoid #including a .c file from within another .c file unless there is a compelling reason to do so such as getting your optimizer to actually work.

This is only intended to convey the basics. There are many hairs to split depending on your situation, but if you follow the above guidelines you're off to a good start.

NOTE: Michael Barr published another, compatible, take on this topic in May.  I just saw it at: http://www.eetimes.com/discussion/barr-code/4215934/What-belongs-in-a-header-file

Friday, July 1, 2011

The Grand Challenge of Embedded System Dependability

The following is the extended version of a position statement I wrote for a panel session on Grand Challenges in Dependability for DSN 2011 in Hong Kong. Even if you are not a researcher in this area, it might provide some food for thought about the big picture, especially for embedded system security and safety.
You can find a printable version here.

-------------------------------------------

The Grand Challenge of Embedded System Dependability

Philip Koopman
ECE Department
Carnegie Mellon University
Pittsburgh, PA, United States
koopman@cmu.edu

Abstract: Four significant challenges in embedded system dependability are: embedded-specific security approaches, unifying security with safety, dealing with composable emergent properties, and enabling domain experts to use advanced dependability techniques.

Embedded systems permeate our everyday lives, including applications as diverse as cars, consumer electronics, thermostats, and industrial process controls. We have a surprising amount of reliance upon these systems, and we take their dependability almost for granted. Given extreme cost constraints, tremendous deployment scales, and the wide range of application domains, it is amazing that things more or less work well today. But, as application complexity increases, more applications become safety critical, and more embedded systems are attached to the Internet, we cannot expect business as usual with design approaches to maintain the level of dependability we want and need from such systems.

In my opinion the biggest challenges facing embedded systems lie in the areas of creating more suitable security techniques, finding a more unified approach to safety+security, dealing with composable emergent properties, and deploying dependability techniques to small product development teams.

Deeply embedded system security has significantly different constraints and requirements than enterprise and personal computing security. Embedded control systems often have severe resource constraints, limited development budgets, and stringent real time performance requirements. But an even more pressing security problem in many embedded systems is that the effects of a malicious fault can cause physical damage to people and the environment. It is less difficult to reverse or adequately insure against most malicious financial transactions than it is to reverse the release of toxins into the environment or “roll back” a multi-vehicle collision. Additionally, most embedded systems to date have been designed with near-zero security once an attacker has access to the internal control network. IT-based techniques for addressing that situation are unlikely to suffice due to matters of cost, real time dynamics, and lack of complete physical isolation from attackers.

Inevitably, embedded system safety and security will have to merge into a unified discipline, or at least a tightly-coordinated set of sub-disciplines. It is questionable to build safety cases for most everyday systems upon a faulty presumption of perfect security. At the same time, security techniques will need to take into account the safety implications of vulnerabilities and system outages. One element of a safety and security unification strategy might be to look at security faults as an attack on the assumptions of the safety case (e.g., an attacker negates the random independence assumption of fault arrival rates).

Due to the limitations and realities of embedded system development, workable dependability approaches will likely include some notion of cost-effective resilience in the face of inevitable faults, as well as a way to balance the tension among the often conflicting goals of safety, security, performance, and reliability. The good news will be that there are opportunities to exploit domain characteristics such as physical process inertia in ways not practical in desktop and enterprise computing.

A long-standing problem has been increasing the composability of emergent system properties. It is desirable to have building blocks that can be composed arbitrarily without surprises, and by the same token have an ability to decompose a system architecture so that predictable building blocks can be identified in a way that minimizes cross-coupled quality attributes. Much progress has been made on this in the area of real time systems, but much remains to be done in other areas such as safety and security. An additional challenge will be ensuring the composability of massively deployed distributed systems so that, for example, a city full of smart thermostats doesn’t display emergent aggregate behavior that takes down the power grid.

Finally, the serious challenges posed by creating a dependable system are made more difficult when the development teams are typically composed of a handful of domain experts who may have no formal computer training beyond an introductory programming course. The traditional way to deploy advanced knowledge is via synthesis and analysis tools, and this has been done with astonishing success in IC design. More recently, model-based design has been helping embedded system designers in some domains perform code synthesis from relatively high level system behavioral descriptions. But will be a long time before tools can provide us with push-button automation that addresses the myriad aspects of embedded system dependability.
-------------------------------------------