Monday, February 25, 2013

Using Profiling To Improve System-Level Test Coverage


Sometimes I run into confusion about how to deal with "coverage" during system level testing. Usually this happens in a context where system level testing is the only testing that's being done, and coverage is thought about in terms of lines of code executed.  Let me explain those concepts, and then give some ideas about what to do when you find yourself in that situation.

"Coverage" is the fraction of the system you've exercised during testing. For unit tests of small chunks of code, it is typically what fraction of the code was exercised during testing (e.g., 95% means in a 100 line program 95 lines were executed at least once and 5 weren't executed at all).  You'd think it would be easy to get 100%, but in fact getting the last few lines of code in test is really difficult even if you are just testing a single subroutine in isolation (but that's another topic). Let's call this basic block coverage, where a basic block is a segment of code with no branches in or out, so if you execute one line of a basic block's code you have to execute all of them as a set. So in a perfect world you get 100% basic block coverage, meaning every chunk of code in the system has been executed at least once during testing. (That's not likely to be "perfect" testing, but if you have less than 100% basic block coverage you know you have holes in your test plan.)

By system level test I mean that the testing involves more or less a completely running full system. It is common to see this as the only level of testing, especially when complex I/O and timing constraints make it painful to exercise the software any other way. Often system level tests are based purely on the functional specification and not on the way the software is constructed. For example, it is the rare set of system level tests that checks to make sure the watchdog timer operates properly.  Or whether the watchdog is even turned on. But I digress...

The problem comes when you ask the question of what basic block coverage is achieved by a certain set of system-level tests. The question is usually posed less precisely, in the form "how good is our testing?"  The answer is usually that it's pretty low basic block coverage. That's because if it is difficult to reach into the corner cases when testing a single subroutine, it's almost impossible to reach all the dusty corners of the code in a system level test.  Testers think they're good at getting coverage with system level test (I know I used to think that myself), but I have a lot of doubt that coverage from system-level testing is high. I know -- your system is different, and it's just my opinion that your system level test coverage is poor. But consider it a challenge. If you care about how good your testing is for an embedded product that has to work (even the obscure corner cases), then you need data to really know how well you're doing.

If you don't have a fancy test coverage tool available to you, a reasonable way to go about getting coverage data is to use some sort of profiler to see if you're actually hitting all the basic blocks. Normally a profiler helps you know what code is executed the most for performance optimization. But in this case what you want to do is use the profiler while you're running system tests to see what code is *not* executed at all.  That's where your corner cases and imperfect coverage are.  I'm not going to go into profiler technology in detail, but you are likely to have one of two types of profiler. Maybe your profiler inserts instructions into every basic block and counts up executions.  If so, you're doing great and if the count is zero for any basic block you know it hasn't been tested (that would be how a code coverage tool is likely to work). But more likely your profiler uses a timer tick to sample where the program counter is periodically.  That makes it easy to see what pieces of code are executed a lot (which is the point of profiling for optimization), but almost impossible to know whether a particular basic block was executed zero times or one time during a multi-hour test suite.

If your profiler only samples, you'll be left with a set of basic blocks that haven't been seen to execute. If you want to go another step further you may need to put your own counters (or just flags) into those basic blocks by hand and then run your tests to see if they ever get executed. But at some point that may be hard too. Running the test suite a few times may help catch the moderately rare pieces of code that are executed. So may increasing your profile sample rate if you can do that. But there is no silver bullet here -- except getting a code coverage tool if one is available for your system.  (And even then the tool will likely affect system timing, so it's not a perfect solution.)

At any rate, after profiling and some hand analysis you'll be left with some level of idea of what's getting tested and what's not.  Try not to be shocked if it's a low fraction of the code (and be very pleased with yourself it if is above 90%).  

If you want to improve the basic block coverage number you can use coverage information to figure out what the untested code does, and add some tests to help. These tests are likely to be more motivated by how the software is built rather than by the product-level function specification.  But that's why you might want both a software test plan in addition to a product test plan.  Product tests never cover all the software corner cases.

Even with expanded testing, at some point it's going to be really really hard to exercise every last line of code -- or even every last subroutine. For those pieces you can test portions of the software to make it easier to get into the corner cases. Peer reviews are another alternative that is more cost effective than testing if you have limited resources and limited time to do risk reduction before shipping. (But I'm talking about formal peer reviews, not just asking your next cube neighbor to look over your code at lunch time.) Or there is the tried-and-true strategy of putting a breakpoint in before a hard-to-get-to basic block with a debugger, and then changing variables to force the software down the path you want it to go. (Whether such a test is actually useful depends upon the skill of the person doing it.)

The profiler-based coverage improvement strategy I've discussed is really about how to get yourself out of a hole if all you have is system tests and you're getting pressure to ship the system. It's better than just shipping blind and finding out later that you had poor test coverage. But the best way to handle coverage is to get very high basic block coverage via unit test, then shake out higher level problems with integration test and system test.

If you have experience using these ideas I'd love to hear about them -- let me know.


Friday, January 25, 2013

Exception Handling Fishbone Diagram

Exception handling is the soft underbelly of many software systems. A common observation is that there a lot more ways for things to go wrong than there are for them to go right, so designing and testing for exceptional conditions is difficult to do well. Anecdotally, it is also where you find many bugs that, while infrequently encountered, can cause dramatic system failures. While not every software system has to be bullet-proof, embedded systems often have to be quite robust. And it's a lot easier to make a system robust if you have a checklist of things to consider when designing and testing.

Fortunately, just such a checklist already exists. Roy Maxion and Bob Olszewski at Carnegie Mellon created a structured list of exceptional conditions to consider when designing a robust system in the form of a fishbone diagram (click on the diagram to see the full detail in a new window).




(Source: Maxion & Olszewski, Improving Software Robustness with Dependability Cases, FTCS, June 1998.)

The way to read this diagram is that an exception failure could be caused by any of the general causes listed in the boxes at the end of the fish-bone segments, and the arrows into each fishbone are more specific examples of those types of problems.

If you don't have the picture handy, a way to remember the main branches is:
C - Computational problem
H - Hardware problem
I  - I/O and file problem
L - Library function problem
D - Data input problem
R - Return value problem
E - External user/client problem  (in embedded systems this may include control network exceptions)
N - Null pointer or memory problems

There isn't a silver bullet for exception handling -- getting it right takes attention to detail and careful work. But, this fishbone diagram does help developers avoid missing exception vulnerabilities. You can read more about the idea and the human subject experiments showing its effectiveness in the free on-line copy of their conference paper: Improving Software Robustness with Dependability Cases,

You can read more detail in the (non-free unless you have a subscription) journal paper:
Eliminating exception handling errors with dependability cases: a comparative, empirical study, IEEE Transactions on Software paper, Sept. 2000.  http://dx.doi.org/10.1109/32.877848



Tuesday, December 25, 2012

Global Variables Are Evil sample chapter

My publisher has authorized me to release a free sample chapter from my book.  They let me pick one, and I decided to go with the one on global variables. If this is a success, a couple more chapters may be released one way or another, so I'd welcome input on which the other best topics would be from the table of contents.


Entire chapter in Adobe Acrobat (166 KB):
http://koopman.us/bess/chap19_globals.pdf



Chapter 19
Global Variables Are Evil

• Global variables are memory locations that are directly visible to an entire
software system.
• The problem with using globals is that different parts of the software are
coupled in ways that increase complexity and can lead to subtle bugs.
• Avoid globals whenever possible, and at most use only a handful of globals
in any system.

Contents:

19.1 Chapter overview

19.2 Why are global variables evil?
Globals have system-wide visibility, and aren’t quite the same as static variables.
Globals increase coupling between modules, increase risk of bugs due to
interference between modules, and increase risk of data concurrency problems.
Pointers to globals are even more evil.

19.3 Ways to avoid or reduce the risk of globals
If optimization forces you into using globals, you’re better off getting a faster
processor. Use the heap or static locals instead of globals if possible. Hide
 globals inside objects if possible, and document any globals that can’t be
eliminated or hidden.

19.4 Pitfalls

19.5 For more information



19.1. Overview

“Global Variables Are Evil” is a pretty strong statement! Especially for some-
thing that is so prevalent in older embedded system software. But, if you can
build your system without using global variables, you’ll be a lot better off.

Global variables (called globals for short) can be accessed from any part of a
software system, and have a globally visible scope. In its plainest form, a global
variable is accessed directly by name rather than being passed as a parameter. By
way of contrast, non-global variables can only be seen from a particular module
or set of related methods. We’ll show how to recognize global variables in detail
in the following sections.

19.1.1. Importance of avoiding globals
The typical motivation for using global variables is efficiency. But sometimes
globals are used simply because a programmer learned his trade with a
non-scoped language (for example, classical BASIC doesn’t support variable
scoping – all variables are globals). Programmers moving to scoped languages
such as C need to learn new approaches to take full advantage of the scoping and
parameter passing mechanisms available.

The main problem with using global variables is that they create implicit cou-
plings among various pieces of the program (various routines might set or mod-
ify a variable, while several more routines might read it). Those couplings are not
well represented in the software design, and are not explicitly represented in the
implementation language. This type of opaque data coupling among modules re-
sults in difficult to find and hard to understand bugs.

19.1.2. Possible symptoms
You may have problems with global variable use if you observe any of the fol-
lowing when reviewing your implementation:

- More than a handful of variables are defined as globally visible (ideally, there
are none). In C or C++, they are defined outside the scope of any procedure,
so they are visible to all procedures. An indicator is the use of the keyword
extern to access global variables defined outside the scope of your compiled
module.

- Variables are used in a routine that are neither defined locally, nor passed as
parameters. (This is just another way of saying they are global.)

- In assembly language, variables are accessed by label from subroutines or
modules rather than via being passed on the stack as a parameter value. Any
access to a labeled memory location (other than in a single module that has ex-
clusive access to that location) is using a global.


The use of global variables is sometimes defensible. But their usage should be
for a very few, special values, and not a matter of routine. Consider their occa-
sional use a necessary evil.

19.1.3. Risks of using globals
The problem with global variables is that they make programs unnecessarily
complex. That can lead to:

- Bugs caused by hidden coupling between modules. For example, misbehav-
ing code in one place breaks things in another place.

- Bugs created in one module due to a change in a seemingly unrelated second
module. For example, a change to a program that writes a global variable
might break the behavior of code that reads that variable.

It may be necessary to have variables that are global, or at least serve a similar
purpose as global variables. The risk comes from using global variables too
freely, and not using mitigation strategies that are available.

19.2. Why are global variables evil?

Global variables should be avoided to the maximum extent possible. At a high
level, you can think of global variables as analogous to GOTO statements in pro-
gramming languages (this idea dates back at least as far as Wulf (1973)). Most of
us reflexively avoid GOTO statements. It’s odd that we still think globals are
OK.

First, let’s discuss what globals are, then talk about why they should be
avoided.

... read more ...


(c) Copyright 2010 Philip Koopman

Sunday, December 16, 2012

Software Timing Loops

Once in a while I see someone using code that uses a timing loop to wait for some time to go by. Code might look like this:

// You should *NOT* use this code as-is!
#define SCALE   15501    // System-dependent time scaling factor

void WaitMS(unsigned long ms)
{ unsigned long counter = ms * SCALE; // Compute how long to wait  
  while(counter > 0) { counter--;}    // Waste time



The idea is that the designer tweaks "SCALE" so that WaitMS takes exactly one msec for each integer value of the input ms.  In other words WaitMS(5) waits for 5 msec. But, there are some significant problems with this code.
  • Different compiler versions and different optimization levels could dramatically change the timing of this loop depending upon the code that is generated. If you aren't careful, your system could stop working for this reason when you do a recompile, without you even knowing the timing has changed or why that has happened.
  •  Changes to hardware can change the timing even if the code is recompiled. Changing the system clock speed is a fairly obvious problem. But other more subtle problems include clock throttling on high-end processors due to thermal management, putting the code in memory with a wait states for access, moving to a processor with instruction cache, or using a different CPU variant that has different instruction timings.
  • The timing will change based on interrupt service routines delaying execution of the while loop. You are unlikely to see the worst case timing disruption in testing unless you have a very deterministic system and/or really great testing.
  • This approach ties up the CPU, using power and preventing other tasks from running.
What I recommend is that you don't use software-based timing loops!  Instead you should change WaitMS() to look at a hardware timer, possibly going to sleep (or yielding to other tasks) until the amount of time desired has passed. In this approach, the inner loop checks a hardware timer or a time of day value set by a timer interrupt service routine.


Sometimes there isn't a hardware timer or there is some other compelling reason to use a software loop. If that's the case, the following advice might prove helpful.
  • Make sure you put the software timing loop in one place like this instead of having lots of small in-line timing loops all over the place. It is hard enough to get one timing loop right!
  • The variable "counter" should be defined as "volatile" to make sure it is actually decremented each time through the loop. Otherwise the optimizer might decide to just eliminate the entire while loop.
  • You should calibrate the "SCALE" value somehow to make sure it is accurate. In some systems it makes sense to calibrate a variable during system startup, or do an internal sanity check during outgoing system test to make sure that it isn't too far from the right value. This is tricky to do well, but if you skip it you have no safety net in case the timing does change.

(There are no doubt minor problems that readers will point out as well depending upon style preferences.   As with all my code examples, I'm presenting a simple "how I usually see it" example to explain the concept.  If you can find style problems then probably you already know the point I'm making. Some examples:  WaitMS might check for overflow, uint32_t is better than "unsigned long," "const unsigned long SCALE = 15501L" is better if your compiler supports it, and there may be mixed-length math issues with the multiplication of ms * SCALE if they aren't both 32-bit unsigned integers.  But the above code is what I usually see in code reviews, and these style details tend to distract from the main point of the article.)

Monday, November 26, 2012

Top 10 Best Practices for Peer Reviews

Here is a nice list of best practices for peer reviews from SmartBear. These parallel the recommendations I usually give, but it is nice to have the longer version readily available too (see link below).

1. Review fewer than 200-400 lines of code at a time.
2. Aim for an inspection rate of less than 300-500 LOC/hr (But, see comment below)
3. Take enough time for a proper, slow review, but not more than 60-90 minutes
4. Authors should annotate source code before the review begins
5. Establish quantifiable goals for code review and capture metrics so you can improve your process
6. Checklists substantially improve results for both authors and reviewers
7. Verify that defects are actually fixed
8. Managers must foster a good code review culture in which finding defects is viewed positively
9. Beware the "Big Brother" effect  (don't use metrics to punish people)
10. The Ego Effect: do at least some code review, even if you don't have time to review it all

And now my comments:  The data I've seen shows 300-500 LOC/hr is too high by a factor of 2 or so. I recommend 100-200 lines of code per hour for 60-120 minutes. It may be that SmartBear's tool lets you go faster, but I believe that comes at a cost that exceeds the time save.

I deleted their best practice #11, which says that lightweight reviews are great stuff, because I don't entirely buy it.  Everything I've seen shows that lightweight reviews (which they advocate) are better than no reviews, and for that reason perhaps they make a good first step. But if you skip the in-person review meeting you're losing out on a lot of potential benefit.  Your mileage may vary.

You can get the full white paper here: http://support.smartbear.com/resources/cc/11_Best_Practices_for_Peer_Code_Review.pdf
 (They are not compensating me for posting this, and I presume they don't mind the free publicity.)

Monday, June 4, 2012

Cool Reliability Calculation Tools

I ran across a cool set of tools for computing reliability properties, including reliability improvements due to redundancy, MTBF based on testing data, availability, spares provisioning, and all sorts of things.  The interfaces are simple but useful, and the tools are a lot easier than looking up the math and doing the calculations from scratch.  If you need to do anything with reliability it's worth a look:

http://reliabilityanalyticstoolkit.appspot.com

The one I like the most right now is the one that tells you how long to test to determine MTBF based on test data, even if you don't see any failures in testing:
http://reliabilityanalyticstoolkit.appspot.com/mtbf_test_calculator

Here is a nice rule of thumb based on the results of that tool. If you want to use testing to ensure that MTBF is at least some value X, you need to test about 3 times longer than X without ANY failures being observed. That's a lot of testing! If you do observe a failure, you have to test even longer to determine if it was an unlucky break or whether MTBF is smaller than it needs to be. (This rule of thumb assumes 95% confidence level and no testing failures observed -- as well as random independent failures. Play with the tool to evaluate other scenarios.)



Tuesday, May 15, 2012

CRC and Checksum Tutorial

I published a tutorial on CRCs and Checksums on the blog I maintain on that topic.  Since I get a lot of hits on those search terms here, I'm providing a pointer for those who are interested:
http://checksumcrc.blogspot.com/2012/05/crc-and-checksum-tutorial-slides.html


Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...