Monday, February 25, 2013

Using Profiling To Improve System-Level Test Coverage

Sometimes I run into confusion about how to deal with "coverage" during system level testing. Usually this happens in a context where system level testing is the only testing that's being done, and coverage is thought about in terms of lines of code executed.  Let me explain those concepts, and then give some ideas about what to do when you find yourself in that situation.

"Coverage" is the fraction of the system you've exercised during testing. For unit tests of small chunks of code, it is typically what fraction of the code was exercised during testing (e.g., 95% means in a 100 line program 95 lines were executed at least once and 5 weren't executed at all).  You'd think it would be easy to get 100%, but in fact getting the last few lines of code in test is really difficult even if you are just testing a single subroutine in isolation (but that's another topic). Let's call this basic block coverage, where a basic block is a segment of code with no branches in or out, so if you execute one line of a basic block's code you have to execute all of them as a set. So in a perfect world you get 100% basic block coverage, meaning every chunk of code in the system has been executed at least once during testing. (That's not likely to be "perfect" testing, but if you have less than 100% basic block coverage you know you have holes in your test plan.)

By system level test I mean that the testing involves more or less a completely running full system. It is common to see this as the only level of testing, especially when complex I/O and timing constraints make it painful to exercise the software any other way. Often system level tests are based purely on the functional specification and not on the way the software is constructed. For example, it is the rare set of system level tests that checks to make sure the watchdog timer operates properly.  Or whether the watchdog is even turned on. But I digress...

The problem comes when you ask the question of what basic block coverage is achieved by a certain set of system-level tests. The question is usually posed less precisely, in the form "how good is our testing?"  The answer is usually that it's pretty low basic block coverage. That's because if it is difficult to reach into the corner cases when testing a single subroutine, it's almost impossible to reach all the dusty corners of the code in a system level test.  Testers think they're good at getting coverage with system level test (I know I used to think that myself), but I have a lot of doubt that coverage from system-level testing is high. I know -- your system is different, and it's just my opinion that your system level test coverage is poor. But consider it a challenge. If you care about how good your testing is for an embedded product that has to work (even the obscure corner cases), then you need data to really know how well you're doing.

If you don't have a fancy test coverage tool available to you, a reasonable way to go about getting coverage data is to use some sort of profiler to see if you're actually hitting all the basic blocks. Normally a profiler helps you know what code is executed the most for performance optimization. But in this case what you want to do is use the profiler while you're running system tests to see what code is *not* executed at all.  That's where your corner cases and imperfect coverage are.  I'm not going to go into profiler technology in detail, but you are likely to have one of two types of profiler. Maybe your profiler inserts instructions into every basic block and counts up executions.  If so, you're doing great and if the count is zero for any basic block you know it hasn't been tested (that would be how a code coverage tool is likely to work). But more likely your profiler uses a timer tick to sample where the program counter is periodically.  That makes it easy to see what pieces of code are executed a lot (which is the point of profiling for optimization), but almost impossible to know whether a particular basic block was executed zero times or one time during a multi-hour test suite.

If your profiler only samples, you'll be left with a set of basic blocks that haven't been seen to execute. If you want to go another step further you may need to put your own counters (or just flags) into those basic blocks by hand and then run your tests to see if they ever get executed. But at some point that may be hard too. Running the test suite a few times may help catch the moderately rare pieces of code that are executed. So may increasing your profile sample rate if you can do that. But there is no silver bullet here -- except getting a code coverage tool if one is available for your system.  (And even then the tool will likely affect system timing, so it's not a perfect solution.)

At any rate, after profiling and some hand analysis you'll be left with some level of idea of what's getting tested and what's not.  Try not to be shocked if it's a low fraction of the code (and be very pleased with yourself it if is above 90%).  

If you want to improve the basic block coverage number you can use coverage information to figure out what the untested code does, and add some tests to help. These tests are likely to be more motivated by how the software is built rather than by the product-level function specification.  But that's why you might want both a software test plan in addition to a product test plan.  Product tests never cover all the software corner cases.

Even with expanded testing, at some point it's going to be really really hard to exercise every last line of code -- or even every last subroutine. For those pieces you can test portions of the software to make it easier to get into the corner cases. Peer reviews are another alternative that is more cost effective than testing if you have limited resources and limited time to do risk reduction before shipping. (But I'm talking about formal peer reviews, not just asking your next cube neighbor to look over your code at lunch time.) Or there is the tried-and-true strategy of putting a breakpoint in before a hard-to-get-to basic block with a debugger, and then changing variables to force the software down the path you want it to go. (Whether such a test is actually useful depends upon the skill of the person doing it.)

The profiler-based coverage improvement strategy I've discussed is really about how to get yourself out of a hole if all you have is system tests and you're getting pressure to ship the system. It's better than just shipping blind and finding out later that you had poor test coverage. But the best way to handle coverage is to get very high basic block coverage via unit test, then shake out higher level problems with integration test and system test.

If you have experience using these ideas I'd love to hear about them -- let me know.

Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...