Monday, June 9, 2014

Minimize Use of Global Variables

Critical embedded software should use the minimum practicable variable scope for each variable, and should minimize use of global variables. As one of the chapters in my book says, Global Variables Are Evil.  Over-use of globals can reasonably be expected to result in significantly increased defect rates and the presence of difficult-to-find software defects that are likely to render a system unsafe.

Consequences: Using too many global variables increases software complexity and can be reasonably expected to increase the number of bugs, as well as make the code difficult to maintain properly. Defining variables as globals that could instead be defined as locals can be reasonably expected to significantly increase the risk of the data being improperly used globally, as well as to make it more difficult to track and analyze the few variables that should legitimately be global. In short, excessive use of globals leads to an increased number of software defects.

Accepted Practices:
  • A significant minority of variables (or fewer) should be global. Ideally zero variables should be global. (Special globals such as mathematical constants and configuration information might be excluded from this metric.) The exact number varies with the system, but an expected range would be from less than 1% to perhaps 10% of all statically allocated variables (even this is probably too high), with an extremely strong preference for the lower side of that range Exceeding this range can reasonably be expected to lead to an increase in software defects.
  • The need for each global or category of globals should be specifically justified as required for effective software construction. Speed increases are generally not sufficient justification, nor is limited memory space.
  • In any system with more than one task (including systems that just have a main task plus interrupts), every non-constant global should be declared volatile, and every access to a global should be protected by a form of concurrency management (e.g., disabling interrupts or using a mutex). Failing to do either can normally be expected to result in concurrency defects somewhere in your code.
  • Each variable should be declared locally if possible. Variables used by a collection of functions in the same C programming language module should be declared as top-level “static” within the corresponding .c file to limit visibility to functions declared within that .c file. Variables used by only one function should be declared within that function and thus not be visible to any other function.
  • Global variables must be declared consistently throughout the system, including having exactly the same type information (e.g., if a global is declared as “unsigned” in one place, it needs to be “unsigned” everywhere). 
Discussion:

Global variables (often called “globals” for short) are programming language variables that are visible to the entirety of the program. In contrast, non-global variables (often called local variables, or locals for short), have a reduced scope that makes them visible only in the part of the program where they are used, and invisible to the rest of the program. Excessive use of global variables must be avoided in safety critical software due to the complexity introduced as well as the risk of concurrency hazards.

One reason to avoid globals is that use of many globals can be reasonably expected to lead to high coupling among many disparate portions of a program. Any variable that is globally visible might be read or written from anywhere in the code, increasing program complexity and thus the chance for software defects. While analysis might be performed to determine where globals actually are read and written, doing so for a large number of globals is time consuming, and must be re-done any time any substantive change is made to the code. (For example, it would be expected that such analysis would have to be re-done for each software release.)

Another reason to avoid globals is that they introduce concurrency hazards (see the discussion on concurrency in an upcoming post). Because it is difficult to keep track of what parts of the program are reading and writing a global, safe code must assume that other tasks can access the global and use full concurrency protection each time a global is referenced. This includes both locking access to the global when making a change and declaring the global “volatile” to ensure any changes propagate throughout the software.

Even if concurrency hazards are generally avoided, if more than one place in the program modifies a global it is easy to have unexpected software behavior due to two portions of the program modifying the globals’ value in an unanticipated sequence. This can be reasonably expected to lead to infrequent and subtle (but potentially severe) concurrency defects.

As an analogy, consider if you keep your grocery shopping list on a whiteboard on your fridge. If you live alone and are careful, this may work out fine, and you will never accidentally have an item erased or added in error to your list. But if you live in a busy house (perhaps a dormitory), something you wrote on that whiteboard might be changed or accidentally erased, and you might have no idea that it happened or who did it. In this analogy, everything you write on that whiteboard is a global variable – anyone who enters the house can see it, act upon the information, and potentially modify it. As an extension of this analogy, if that whiteboard is a “to do” list, you are using it to communicate information to others in the house. If that list is modified, corrupted, or overwritten, your communication will fail, with that sort of problem becoming more likely as more and more people share the whiteboard for a diversity of purposes.

Using too many globals can be thought of as the data flow version of spaghetti code. With code “control” flow (conditional “if” statements and the like) that is tangled, it is difficult to follow the flow of control of the software. Similarly, with too many globals it is difficult to follow the flow of data through the program – you get “spaghetti data.” In both cases (tangled data flow and tangled control flow) designers can reasonably be expected that the spaghetti software will have elevated levels of software defects. Excessive use of global variables makes unit testing difficult, because it requires identifying and setting specific values in all the globals referenced by a module that is being unit tested.

In some cases the risk of a global may seem only theoretical rather than practical, if a variable is only used in a single module but happens to be defined as a global. However, this still presents a latent risk of some other piece of the software modifying the variable intentionally (defeating the principle of information hiding), or by accident via a typographical error that was intended to refer to a similarly named variable defined elsewhere. Therefore, it is an important accepted practice to only define variables as global if it is essential to do so.

There are two related classes of globals that are an exception, and are generally considered acceptable for use. One class is constants that represent physical properties (e.g., the number of degrees in a circle being 360), system configuration information (e.g., the fact that a vehicle has a 6 cylinder engine or a 4-speed transmission), and other values that don’t change at run time. These are properly defined as constant values using the programming language keyword “const” in the code or other similar approach. 

Another class of generally acceptable globals is system state information that must be generally accessible across most software, such as engine speed in a vehicle. These system state variables should be written in exactly one place in the code, should not be duplicated (to avoid confusion and the chance for different copies of the variable to get out of synch), should be kept few in number, and ideally should be read via access functions rather than directly to enforce modularity. These are the sorts of globals that should be kept to a minimum and count toward the statement that having a few well-chosen globals might be OK. In distributed embedded systems these globals usually show up as broadcast bus values rather than as actual memory-based global variables.

Entirely eliminating the use of non-constant globals is sometimes unachievable. But when complete elimination isn’t practical, globals should nonetheless be very few (handfuls, not thousands of them) and strategically selected. The globals that are used should be used carefully, and reviewed to ensure each is essential.

Selected Sources:

Wulf & Shaw in 1973 wrote a paper entitled: “Global variable considered harmful,” (Wulf 1973) describing it as the data version of a “goto” statement – which a previous paper by Dijkstra had famously declared “harmful.” After that publication, the concept of avoiding globals has mostly been a matter of clarifying the gray areas and educating new programmers.



McConnell says: “Global data is like a love letter between routines – it might go where you want it to go, or it might get lost in the mail.” (McConnell 1993, pg. 88). McConnell also says: “global-data coupling is undesirable because the connection between routines is neither intimate nor visible. The connection is so easy to miss that you could refer to it as information hiding's evil cousin – ‘information losing.’” (McConnell 1993, pg. 90).


Ganssle advises “Minimize global variables, both to reduce interaction between routines and to reduce the number of places where interrupts will cause problems.” (Ganssle 1992, pg. 186). A few years later, Ganssle and designers in general appreciated that global variables were even more of a problem than previously thought. He then wrote, resorting to humor to make his point: “One of the greatest evils in the universe, an evil in part responsible for global warming, ozone depletion, and male pattern baldness, is the use of global variables.” Humor aside, he goes on to confirm the problems with globals, and gives recommendations. “Every firmware standard – backed up by the rigorous checks of code inspections – must set rules about global use.” He finishes by saying: “I feel that defining a global is such a source of problems that the team leader should approve every one.” (Ganssle 2000, p. 38)

NASA recommends avoiding too many inter-component dependencies: “components should not depend on each other in a complex way,” which includes as a significant factor the use of globals to communicate data between components (NASA 2004, p. 93).

MISRA Software Guidelines recommend against using global variables (MISRA Report 5, p. 10).  IEC 61508-3 highly recommends a modular approach (p. 79), information hiding/encapsulation (id., p. 93), and a fully defined interface (id., p. 93), which sum up to recommending against the use of global variables.

Update: you can find an example of getting rid of a global variable here:
  http://betterembsw.blogspot.com/2013/09/getting-rid-of-global-variables.html

References:
  • Ganssle, J., The Art of Programming Embedded Systems, Academic Press, 1992.
  • Ganssle, J., The Art of Designing Embedded Systems, Newnes, 2000.
  • IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems (E/E/PE, or E/E/PES), International Electrotechnical Commission, 1998. Parts 2,3,7. 
  • McConnell, Code Complete, Microsoft Press, 1993.
  • MISRA, Development Guidelines for Vehicle Based Software, November 1994 (PDF version 1.1, January 2001).
  • MISRA, Report 5: Software Metrics, February 1995.
  • NASA-GB-8719.13, NASA Software Safety Guidebook, NASA Technical Standard, March 31, 2004.
  • Wulf & Shaw, Global variable considered harmful, SIGPLAN Notices, Feb 1973, pp. 28-34.

5 comments:

  1. I agree with all the arguments for avoiding global variables, but what is the solution?

    The performance overhead when calling a function to retrieve data, can still be an issue with many embedded systems. I have tried the “inline” method of executing “functions”, but that seems to be supported by only a handful of compilers.
    Ideally I would like to find a solution that works across different platforms, and I don’t want to rely on a particular compiler/processor that will limit my options in the future.

    Thanks,

    ReplyDelete
  2. There are several answers to this including the following:

    - Your time is valuable. If you have a choice (which you may or may not), only work with compiler/processor pairs that have decent compilers that support things line inline. Saving a couple cents per system in exchange for terrible tool chains only makes sense if you are making millions of things, not thousands. (I realize there are many organizational disfunctions that may force use of bad tool chains.) Or buy a faster processor (same economic argument)

    - Restructure your code so you do more parameter passing and avoid passing values between procedures with globals. Taking a global-ridden software structure and simply wrapping everything in function calls isn't a magic wand that will fix poorly organized code.

    - Even if it does cost more, the choice is between having bug-prone code and code that actually works. Do you want to sell a cheap product that doesn't work or a slightly more expensive one that does work? (I realize both answers are valid depending on your situation, but that might in the end be the choice you face.)

    I understand none of the above might resolve the question you pose for your particular situation. Engineering is always about tradeoffs. There is no one-size-fits-all solution.

    ReplyDelete
  3. Thanks for all the great resources! After finding your blog and YouTube channel I ordered your book, and really look forward to reading it.

    I have a question: if I protect a global variable with a mutex, do I still need it to declare it as volatile? Doesn't the mutex act as a memory barrier?

    ReplyDelete
  4. Yes, you still have to declare it as volatile, at least in C. The mutex helps ensure multiple processes are well behaved, but the compiler has no idea the mutex is associated with a particular variable. So you also need to use volatile to make sure the compiler writes the value back out before you release the mutex (among other things).

    ReplyDelete
  5. thats right. i also agree with you and others opinion too. you need to use volatile to make sure the compiler

    ReplyDelete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!

Job and Career Advice

I sometimes get requests from LinkedIn contacts about help deciding between job offers. I can't provide personalize advice, but here are...