Better Embedded System SW: optimization

Showing posts with label optimization. Show all posts

Monday, May 8, 2017

Optimize for V&V, not for writing code

Writing code should be made more difficult so that Verification &Validation can be made easier.

I first heard this notion years ago at a workshop in which several folks from industry who build high assurance software (think flight controls) stood up and said that V&V is what matters. You might expect that from flight control folks, but their reasoning applies to pretty much every embedded project. That's because it is a matter of economics.

Multiple speakers at that workshop said that aviation software can require 4 or 5 hours of V&V for every 1 hour of creating software. It makes no economic sense to make life easy for the 1 hour side of the ratio at the expense of making life painful for the 5 hour side of the ratio.

Good, but non-life-critical, embedded software requires about 2 hours of V&V for every 1 hour of code creation. So the economic argument still holds, with a still-compelling multiplier of 2:1. I don't care if you're Vee, Agile, hybrid model or whatever. You're spending time on V&V, including at least some activities such as peer review, unit test, created automated tests, performing testing, chasing down bugs, and so on. For embedded products that aren't flaky, probably you spend more time on V&V than you do on creating the code. If you're doing TDD you're taking an approach that has the idea of starting with a testing viewpoint built in already, by starting from testing and working outward from there. But that's not the only way to benefit from this observation.

The good news is that making code writing "difficult" does not involve gratuitous pain. Rather, it involves being smart and a bit disciplined so that the code you produce is easier for others to perform V&V on. A bit of up front thought and organization can save a lot on downstream effort. Some examples include:

Writing concise but helpful code comments so that reviewers can understand what you meant.
Writing code to be obvious rather than clever, again to help reviewers.
Follow a style guide to make your code consistent, and thus easier to understand.
Writing code that compiles clean for static analysis, avoiding time wasted finding defects in test that a tool could have found, and avoiding a person having to puzzle out which warnings matter, and which don't.
Spending some time to make your unit interfaces easier to test, even if it requires a bit more work designing and coding the unit.
Spending time making it easy to trace between your design and the code. For example, if you have a statechart, make sure the statechart uses names that map directly to enum names rather than using arbitrary state variables such as "magic number" integers between 1 and 7. This makes it easier to ensure that the code and design match. (For that matter, just using statecharts to provide a guide to what the code does also helps.)
Spending time up front documenting module interaction so that integration testers don't have to puzzle out how things are supposed to work together. Sequence diagrams can help a lot.
Making the requirements both testable and easy to trace. Make every requirement idea a stand-alone sentence or paragraph and give it a number so it's easy to trace to a specific test primarily designed to test that particular requirement. Avoid having requirements in huge paragraphs of free-form text that mix lots of different concepts together.

Sure, these sound like a good idea, but many developers skip or skimp on them because they don't think they can afford the time. They don't have time to make their code clean because they're too busy writing bugs to meet a deadline. Then they, and everyone else, pay for this during the test cycle. (I'm not saying the programmers are necessarily the main culprits here, especially if they didn't get a vote on their deadline. But that doesn't change the outcome.)

I'm here to say you can't afford not to follow these basic code quality practices. That's because every hour you're saving by cutting corners up front is probably costing you double (or more) downstream by making V&V more painful than it should be. It's always hard to invest in downstream benefits when the pressure is on, but doing so is costing you dearly when you skimp on code quality.

Do you have any tricks to make code easier to understand that I missed?

Monday, December 19, 2011

Using the Inline Keyword Instead of a Macro

Really good code usually consists of a large number of relatively small subroutines (or methods) that can be composed as building blocks in many different ways. But when I review production embedded software I often see the use of fewer, bigger, and less composable subroutines.This makes the code more bug-prone and often makes it more difficult to test as well.

The reason often given for not using small subroutines is runtime execution cost. Doing a subroutine call to perform a small function can slow down a program significantly if it is done all the time. One of my soapboxes is that you should almost always buy a bigger CPU rather than make software more complex -- but for now I'm going to assume that it is really important that you minimize execution time.

Here's a toy example to illustrate the point. Consider a saturating increment, that will add one to a value, but will make sure that the value doesn't exceed the maximum positive value for an unsigned integer:

  int SaturatingIncrement(int x)

  { if (x != MAXINT)

    { x++;

}

    return(x);

}

So you might have some code that looks like this:

...

  x = SaturatingIncrement(x);

...

  z = SaturatingIncrement(z);

You might find that if you do a lot of saturating increments your code runs slowly. Usually when this happens I see one of two solutions. Some folks just paste the actual code in like this:

...

  if (x != MAXINT)  { x++; }

...

  if (z != MAXINT)  { z++; }

A big problem with this is that if you find a bug, you get to to track down all the places where the code shows up and fix the bug. Also, code reviews are harder because at each point you have to ask whether or not it is the same as the other places or if there has been some slight change. Finally, testing can be difficult because now you have to test MAXINT for every variable to get complete coverage of all the branch paths.

A slightly better solution is to use a macro:

#define SaturatingIncrement(w)  { if ((w) != MAXINT)  { (w)++; } }

which lets you go back to more or less the original code. This macro works by pasting the text in place of the macro. So the source you write is:

...

  SaturatingIncrement(x);

...

  SaturatingIncrement(z);

but the preprocessor uses the macro to feed into the compiler this code:

...

  if (x != MAXINT)  { x++; }

...

  if (z != MAXINT)  { z++; }

thus eliminating the subroutine call overhead.

The nice things about a macro are that if there is a bug you only have to fix it one place, and it is much more clear what you are trying to do when there is code review. However, complex macros can be cumbersome and there can be arcane bugs with macros. (For example, do you know why I put "(w)" in the macro definition instead of just "w"?) Arguably you can unit test a macro by invoking it, but that test may well miss strange macro expansion bugs.

The good news is that in most newer C compilers there is a better way. Instead of using a macro, just use a subroutine definition with the "inline" keyword.

  inline int SaturatingIncrement(int x)

  { if (x != MAXINT)

    { x++; }

    return(x);

}

The inline keyword tells the compiler to expand the code definition in-line with the calling function as a macro would do. But instead of doing textual substitution with a preprocessor, the in-lining is done by the compiler itself. So you can write your code using as many inline subroutines as you like without paying any run-time speed penalty. Additionally, the compiler can do type checking and other analysis to help you find bugs that can't be done with macros.

There can be a few quirks to inline. Some compilers will only inline up to a certain number of lines of code (there may be a compiler switch to set this). Some compilers will only inline functions defined in the same .c file (so you may have to #include that .c file to be able to inline it). Some compilers may have a flag to force inlining rather than just making that keyword a suggestion to the compiler. To be sure inline is really working you'll need to check the assembly language output of your compiler. But, overall, you should use inline instead of macros whenever you can, which should be most of the time.

Friday, September 30, 2011

The NOP Trick For Speed Profiling

An important principle in code optimization is speed up the parts that matter. One of the interpretations of Amdahl's Law is that speedup is limited by the fraction of total time taken by a particular piece of code. If for example a particular snippet of code is 10% of overall execution time, then making it execute infinitely fast is only going to give you a 10% speedup. 10% might be worthwhile, but if instead a snippet of code only takes 0.001% of total execution time, then it is unlikely spending time optimizing it is worth your while. In other words, there is no point wasting hours optimizing parts of the code that won't make any difference to overall execution speed.

But, how do you know what parts of the code matter? Intuition can help sometimes, but in practice it is hard to know your code well enough that you always guess the bottleneck locations correctly.

If you have a tool-rich development environment you can use a profiling tool (wikipedia has a description and a list of common tools). But sometimes you are on a platform that is too small to use these tools, or you're in too much of a hurry to use them because you are just absolutely sure you are right about where all the time is being spent.

Here is a trick that can save you a lot of time doing optimization. I use it pretty much every time I'm going to optimize code as a sanity check (even if a tool tells me where to optimize, because tools aren't perfect and I've learned to be careful before investing valuable time optimizing). The trick is: insert some NOPs in the code you want to optimize and see how much slower it goes. If you don't see a speed difference, then you are wasting your time optimizing that part of the code to make it go faster. If you do see a speed difference, that difference can help you estimate how much speedup you are going to get if you do the optimization.

Here's some example C code before optimization:

  for(int i = 0; i < 1000; i++) 

  { x[i] += y[i];

  }

Depending on your compiler it might well be that you can optimize this by switching to pointers instead of indexed arrays. Assuming that you've decided it is worth the trouble to optimize your code instead of buying a faster processor (that's a whole other discussion!) then probably this code can be made faster. But, if this code is only called once in a while and is only a small part of your overall program it might not give enough speedup to be worth optimizing.

Let's say you think you can squeeze 5 clock cycles per loop out of this code with a rewrite. Before you optimize it, use a timer (or an oscilloscope watching an output pulse on an I/O pin, or a stopwatch) to run the code as-is, and then time the following code to see if there is a speed difference:

  for(int i = 0; i < 1000; i++) 

  { x[i] += y[i];

    asm("nop");

    asm("nop");

    asm("nop");

    asm("nop");

    asm("nop");

  }

This inserts 5 NOP do-nothing instructions that are executed every time the loop is executed. Assuming a NOP instruction takes 1 clock, it slows the loop down by 5 clock cycles.

If the whole program with the NOPs in the loop runs 10% slower, then you know that optimizing the loop to save 5 clocks will likely cause the whole program to run about 10% faster. (The slowdown and speedup aren't quite the same percent if you think about the math, but for small percents it's close enough to not worry about this.)

If on the other hand adding a bunch of NOPs makes no difference to the overall program speed, then you are wasting your time optimizing that loop. If adding instructions doesn't make things noticeably slower, then removing them won't make things faster.

Monday, August 29, 2011

Compile-Time Constants For I/O conversion

In several design reviews I've run across code that looks something like this.

unsigned int VoltLimitMin = 122;   /* 3.30 Volts */
unsigned int VoltLimitMax = 179; /* 5.10 Volts */
unsigned int VoltLimitTrip = 205;   /* 5.55 Volts */

The purpose of this code is to set min, max, and emergency shutdown thresholds for voltage coming from some A/D converter. The programmer has manually converted the voltage value to the number of A/D integer steps that corresponds to. (Similar code might be used to express time in terms of timer ticks as well.) Do you see the bug in the above code? Would you have noticed if I hadn't told you a bug was there?

Here is a way to save some trouble and get rid of that type of bug. First, figure out how many steps there are per unit that you care about, such as:
// A/D converter has 37 steps per Volt ( 37 = 1 Volt, 74 = 2 Volts, etc.)
#define UNITS_PER_VOLT      37

then, create a macro that does the conversion such as:
#define VOLTS(v)   ((unsigned int) (((v)*UNITS_PER_VOLT)+.5))

Here is an example to explain how that works. If you say VOLTS(3.30) then "v" in the macro is 3.30. It gets multiplied by 37 to give 122.1.   Adding 0.5 lets it round to nearest positive integer, and the "(int)" ensures that the compiler knows you want it to be an unsigned integer result instead of a floating point number.

Then you can use:

unsigned int VoltLimitMin = VOLTS(3.30);
unsigned int VoltLimitMax = VOLTS(5.10);
unsigned int VoltLimitTrip = VOLTS(5.55);

Because the macros are expanded in-line by the preprocessor, most compilers are able to compile exactly the same code as if you had hand-computed the number (i.e., they compile to a constant integer value). So this macro shouldn't cost you anything at run time, but will remove the risk of for hand computation bugs or someone forgetting to update comments if the integer value is changed. Give it a try with your favorite compiler and see how it works.

Notes:

The rounding trick of adding 0.5 only works with non-negative numbers.Usually A/D converters output non-negative integers so this trick usually works.
This may not work on all compilers, but it works on all the ones I've tried on various microcontroller architectures. I saw one compiler do the float-to-int conversion at run-time if I left out the "(unsigned int)" in the macro, so make sure you put it in. Do a disassembly on your code to make sure it is working for you.
The "const" keyword available in some compilers can optimize things even further and possibly avoid the need for a macro if your compiler is smart enough, but I'll leave that up to you to play with if you use this trick in a real program.
As mentioned by one of the comments, you might also check for overflow with an ASSERT.

Wednesday, November 3, 2010

Embedded Software Risk Areas -- Implementation -- Part 1

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are some of the Implementation red flags (part 1 of 2):

Inconsistent coding style

Coding style varies dramatically across the code base, and often there is no written coding style guideline. Code comments vary significantly in frequency, level of detail, and type of content. This makes it more difficult to understand and maintain the code.

Resources too full

Memory or CPU resources are overly full, leading to risk of missing real time deadlines and significantly increased development costs. An extreme example is zero bytes of program and data memory left over on a small processor. Significant developer time and energy can be spent squeezing software and data to fit, leaving less time to develop or refine functionality.

Too much assembly language

Assembly language is used extensively when an adequate high level language compiler is available. Sometimes this is due to lack of big enough hardware resources to execute compiled code. But more often it is due to developer preference, reuse of previous project code, or a need to economize on purchasing development tools. Assembly language software is usually more expensive to develop and more bug-prone than high level language code.

Too many global variables

Global variables are used instead of parameters for passing information among software modules. The result is often code that has poor modularity and is brittle to changes.

Monday, May 31, 2010

More than 80% full is too full

It is common for embedded systems to optimize hardware costs without really looking at the effect that has on software development costs. Many of us have spent a few hours searching for a trick that will save a handful of memory bytes. That only makes sense if you believe engineering is free. (It isn't.)

Most developers have a sense that 99%+ is too full for memory and CPU cycles. But it is much less clear where to draw the line. Is 95% too full? How about 90%?

As it turns out, there is very little guidance on this area. But performing some what-if analysis with some classic software cost data leads to a rather startling conclusion:

If your memory or CPU is more than 80% full
and you are making fewer than 1 million units
then you should get more memory and a faster CPU.

This may seem too conservative. But what is happening between about 60% full and 80% full is that software is gradually becoming more difficult to develop. In part that is because a lot of optimizations have to be added. And in part that is because there is limited room for run-time monitoring and debug support. You have to crunch the data to see the curves (chapter 18 of my book has the details). But this is where you end up.

This is a rule of thumb rather than an absolute cutoff. 81% isn't much different than 80%. Neither is 79%. But by the time you are getting up to 90% full, you are spending dramatically more on the overall product than you really should be. Chapter 18 of my book discusses the true cost of nearly full resources and explains where these numbers come from.
---

Better Embedded System SW