Monday, August 28, 2017

The Spaghetti Factor -- A Software Complexity Metric Proposal


I've had to review code that has spaghetti-level complexity in control flow (too high cyclomatic complexity).  And I've had to review code that has spaghetti-level complexity its data flow (too many global variables mixed together into a single computation).  And I've had to review procedures that just go on for page after page with no end in sight. But the stuff that will really make your brain hurt is code that has all of these problems.

There are many complexity metrics out there. But I haven't seen a one that directly tries to help balance three key points of complexity into a single intuitive number: code complexity, data complexity, and module size. So here is a proposal that could help drive improvement in a lot of the terrible embedded control code I've seen:



The Spaghetti Factor metric (SF):

SF = SCC + (Globals*5) + (SLOC/20)

SCC = Strict Cyclomatic Complexity
Globals = # of read/write global variables
SLOC = # source lines of non-comment code (e.g., C statements)

Scoring:
5-10 - This is the sweet spot for most code except simple helper functions
15 - Don't go above this for most modules
20 - Look closely; review to see if refactoring makes sense
30 - Refactor the design
50 - Untestable; throw the module away and fix the design
75 - Unmaintainable; throw the module away; throw the design away; start over
100 - Nightmare; probably you need to throw the whole subsystem away and re-architect it



Notation:

SCC is Strict Cyclomatic Complexity (sometimes called CC2).  This is a variant of McCabe Cyclomatic complexity (MCC). In general terms, MCC is based on the number of branches in the program. SCC additionally considers complexity based on the number of conditions within each branch. SCC is an approximation of how many test cases it takes to exercise all the paths through code including all the different ways there are to trigger each branch. In other words, it is an estimate of how much work it is to do unit testing. Think of it as an approximation to the effort required for MC/DC testing. But in practice it is also a measure of how hard it is to understand the code.  The idea is to keep SCC low enough that it is feasible to understand and test paths through the code.

Globals is the number of read/write global variables accessed by the module. This does not include "const" values, nor file static variables.  In an ideal world you have zero or near-zero global variables. If you have inherent global state, you should encapsulated that in a state object with appropriate access functions to enforce well-disciplined writes.  Referencing an unstructured pile of dozens or hundreds of global variables can make software difficult to test, and can make subsystem testing almost impossible. Partly that is due to the test scaffolding required, but partly that is simply due to the effort of chasing down all the globals and trying to figure out what they do both inbound and outbound. Moreover, too many globals can make it nearly impossible to chase down bugs or understand the effects of changing one part of the code on the rest of the code. An important goal of this part of the metric is to discourage use of many disjoint global variables to implicitly pass data around from routine to routine instead of passing parameters along with function calls.

SLOC is the number of non-comment "Source Lines of Code."  For C programs, this is the number of programming statements. Typical guidelines are a maximum 100-225 maximum lines of code for a module, with most modules being smaller than that.

As an example calculation, if you have 100 lines of code with an SCC of 9 and 1 global reference, your score will be  SF = 9 + (1*5) + (100/20) = 19.  A score of 19 is on the upper edge of being OK. If you have a distribution of complexity across modules, you'd want most of them to be a bit lower in complexity than this example calculation.

Discussion:

The guideline values are taken primarily from MCC, which typically has a guideline of 10 for most modules, 15 as a usual bound, and 30 as limit.  To account for globals and length, based on my experience, I've changed that to 15 for most modules, 20 as a soft limit and 30 as a hard limit.  You might wish to adjust the threshold and multipliers based on your system and experience. In particular it is easy to make a case that these limits aren't strict enough for life-critical software, and a case can be made for being a little more relaxed in throw-away GUI management code.  But I think this is a good starting point for most every-day embedded software that is written by a human (as opposed to auto-generated code).

The biggest exception is usually what to do about switch statements.  If you exempt them you can end up with multiple switches in one module, or multiple switch/if/switch/if layered nesting.  (Neither is a pretty sight.) I think it is justifiable to exempt modules that have ONLY a switch and conditional logic to do sanity checking on the switch value.  But, because 30 is a pretty generous limit, you're only going to see this rarely. Generally the only legitimate reason to have a switch bigger than that is for something like processing a message type for a communication protocol.  So I think you should not blanket exempt switch statements, but rather include them in an overall case-by-case sign-off by engineering management as to which few exceptions are justifiable.

Some might make the observation that this metric discourages extensive error checking.  That's a different topic, and certainly the intent is NOT to discourage error checking. But the simple answer is that error checking has to be tested and understood, so you can't simply ignore that part of the complexity. One way to handle that situation is to put error checking into a subroutine or wrapper function to get that complexity out of the way, then have that wrapper call the actual function that does the work.  Another way is to break your overall code down into smaller pieces so that each piece is simple enough for you to understand and test both the functionality and the error checking.

Finally, any metric can be gamed, and that is surely true of simple metrics like this one.  A good metric score doesn't necessarily mean your code is fantastic. Additionally, this metric does not consider everything that's important, such as the total number of globals across your code base. On the other hand, if you score poorly on this metric, most likely your code is in need of improvement.

What I recommend is that you use this metric as a way to identify code that is needlessly complex.  It is the rare piece of code indeed that unavoidably needs to have a high score on this complexity metric. And if all your code has a good score, that means it should be that much easier to do peer review and unit testing to ensure that other aspects of the code are in good shape.

References:

A NIST paper on applying metrics is here: http://www.mccabe.com/pdf/mccabe-nist235r.pdf including an interesting discussion of the pitfalls of handling switch statements within a complexity framework.

10 comments:

  1. Although I've not used it in years, I seem to recall that McCabe takes account of module sloc and function size when calculating parts of its metrics.

    ReplyDelete
  2. There are a ton of McCabe-defined metrics here: http://www.mccabe.com/iq_research_metrics.htm and it might be you're thinking of one or more of those.
    The classical McCabe Cyclomatic Complexity (MCC) metric is just paths; not module size.

    ReplyDelete
  3. When you speak of SCC at the module level, how do you compute that? Directly add the SCC of all of the member functions? Or take the worst SCC of the member functions? Let's day I have a driver to a peripheral device that has 20 very simple functions within it which all have an SCC of 3. Would that mean my module SCC is 3, or 60?

    ReplyDelete
    Replies
    1. The important part of MCC/SCC has to do with ability to unit test. In general, you'd want to manage each member function as its own function for complexity. So if you have a peripheral device with 20 very simple functions and you can unit test each of those very simple functions directly, then your complexity will be low. If you want to use the max of the member functions that sounds like a reasonable way to boil down the numbers.

      Delete
  4. So..., how would you count this work around. I inherited some network code that used two globals. They were pointers to a struct that contained about 80 variables. Every variable used was in the giant struct, referenced by pointer. Two globals only counts as 10 points in your formula.
    Don't worry, we fixed it.

    ReplyDelete
  5. I feel like you guys are thinking about this too much. Use an IDE like intellij to refactor. May the source be with you.

    ReplyDelete
    Replies
    1. Once the metric value gets too high, refactoring is exactly what you should be doing.
      A metric can be useful because it provides a more quantitative way to identify code that has been let slide too far towards spaghetti. In other words, you need to trigger the refactoring, especially if the complexity builds up over time due to accumulated changes. It's hard to justify a complete module refactor if you only made one small change to the code without a metric like this. But eventually those small changes add up, and this type of metric points out when you've gotten to the tipping point for refactoring.

      Delete
  6. i agree with phil koopman once the metric value gets too high , refactoring is exactly what you should be doing .

    ReplyDelete
  7. I received a direct e-mail query that I'm answering here for others as well:
    "do you have any more details about how you calculate this for the Spaghetti Factor (SF) Metric ?"

    Some tools, such as Understand, calculate SCC for you. If you don't have such a tool available, follow the link given above for "SC2" which contains a very concrete description: http://www.aivosto.com/project/help/pm-complexity.html

    ReplyDelete
  8. For C language based programs (typical for embedded systems), I would also look at fan-in and fan-out of modules to judge coupling. Assuming files are organized properly in directories corresponding to modules, Doxygen can give a good structural overview. high fan-in for upper level modules should be a concern

    ReplyDelete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!

Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...