Monday, August 28, 2017

The Spaghetti Factor -- A Software Complexity Metric Proposal


I've had to review code that has spaghetti-level complexity in control flow (too high cyclomatic complexity).  And I've had to review code that has spaghetti-level complexity its data flow (too many global variables mixed together into a single computation).  And I've had to review procedures that just go on for page after page with no end in sight. But the stuff that will really make your brain hurt is code that has all of these problems.

There are many complexity metrics out there. But I haven't seen a one that directly tries to help balance three key points of complexity into a single intuitive number: code complexity, data complexity, and module size. So here is a proposal that could help drive improvement in a lot of the terrible embedded control code I've seen:



The Spaghetti Factor metric (SF):

SF = SCC + (Globals*5) + (SLOC/20)

SCC = Strict Cyclomatic Complexity
Globals = # of read/write global variables
SLOC = # source lines of non-comment code (e.g., C statements)

Scoring:
5-10 - This is the sweet spot for most code except simple helper functions
15 - Don't go above this for most modules
20 - Look closely; review to see if refactoring makes sense
30 - Refactor the design
50 - Untestable; throw the module away and fix the design
75 - Unmaintainable; throw the module away; throw the design away; start over
100 - Nightmare; probably you need to throw the whole subsystem away and re-architect it



Notation:

SCC is Strict Cyclomatic Complexity (sometimes called CC2).  This is a variant of McCabe Cyclomatic complexity (MCC). In general terms, MCC is based on the number of branches in the program. SCC additionally considers complexity based on the number of conditions within each branch. SCC is an approximation of how many test cases it takes to exercise all the paths through code including all the different ways there are to trigger each branch. In other words, it is an estimate of how much work it is to do unit testing. Think of it as an approximation to the effort required for MC/DC testing. But in practice it is also a measure of how hard it is to understand the code.  The idea is to keep SCC low enough that it is feasible to understand and test paths through the code.

Globals is the number of read/write global variables accessed by the module. This does not include "const" values, nor file static variables.  In an ideal world you have zero or near-zero global variables. If you have inherent global state, you should encapsulated that in a state object with appropriate access functions to enforce well-disciplined writes.  Referencing an unstructured pile of dozens or hundreds of global variables can make software difficult to test, and can make subsystem testing almost impossible. Partly that is due to the test scaffolding required, but partly that is simply due to the effort of chasing down all the globals and trying to figure out what they do both inbound and outbound. Moreover, too many globals can make it nearly impossible to chase down bugs or understand the effects of changing one part of the code on the rest of the code. An important goal of this part of the metric is to discourage use of many disjoint global variables to implicitly pass data around from routine to routine instead of passing parameters along with function calls.

SLOC is the number of non-comment "Source Lines of Code."  For C programs, this is the number of programming statements. Typical guidelines are a maximum 100-225 maximum lines of code for a module, with most modules being smaller than that.

As an example calculation, if you have 100 lines of code with an SCC of 9 and 1 global reference, your score will be  SF = 9 + (1*5) + (100/20) = 19.  A score of 19 is on the upper edge of being OK. If you have a distribution of complexity across modules, you'd want most of them to be a bit lower in complexity than this example calculation.

Discussion:

The guideline values are taken primarily from MCC, which typically has a guideline of 10 for most modules, 15 as a usual bound, and 30 as limit.  To account for globals and length, based on my experience, I've changed that to 15 for most modules, 20 as a soft limit and 30 as a hard limit.  You might wish to adjust the threshold and multipliers based on your system and experience. In particular it is easy to make a case that these limits aren't strict enough for life-critical software, and a case can be made for being a little more relaxed in throw-away GUI management code.  But I think this is a good starting point for most every-day embedded software that is written by a human (as opposed to auto-generated code).

The biggest exception is usually what to do about switch statements.  If you exempt them you can end up with multiple switches in one module, or multiple switch/if/switch/if layered nesting.  (Neither is a pretty sight.) I think it is justifiable to exempt modules that have ONLY a switch and conditional logic to do sanity checking on the switch value.  But, because 30 is a pretty generous limit, you're only going to see this rarely. Generally the only legitimate reason to have a switch bigger than that is for something like processing a message type for a communication protocol.  So I think you should not blanket exempt switch statements, but rather include them in an overall case-by-case sign-off by engineering management as to which few exceptions are justifiable.

Some might make the observation that this metric discourages extensive error checking.  That's a different topic, and certainly the intent is NOT to discourage error checking. But the simple answer is that error checking has to be tested and understood, so you can't simply ignore that part of the complexity. One way to handle that situation is to put error checking into a subroutine or wrapper function to get that complexity out of the way, then have that wrapper call the actual function that does the work.  Another way is to break your overall code down into smaller pieces so that each piece is simple enough for you to understand and test both the functionality and the error checking.

Finally, any metric can be gamed, and that is surely true of simple metrics like this one.  A good metric score doesn't necessarily mean your code is fantastic. Additionally, this metric does not consider everything that's important, such as the total number of globals across your code base. On the other hand, if you score poorly on this metric, most likely your code is in need of improvement.

What I recommend is that you use this metric as a way to identify code that is needlessly complex.  It is the rare piece of code indeed that unavoidably needs to have a high score on this complexity metric. And if all your code has a good score, that means it should be that much easier to do peer review and unit testing to ensure that other aspects of the code are in good shape.

References:

A NIST paper on applying metrics is here: http://www.mccabe.com/pdf/mccabe-nist235r.pdf including an interesting discussion of the pitfalls of handling switch statements within a complexity framework.

Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...