Monday, September 16, 2013

Getting Rid of Global Variables

Summary: Global variables are evil. Here is an example of how to get rid of many of them.



Global variables are well known to be evil -- and you can read all about why that is in my free sample book chapter by that name. This posting gives a running example of changes that fix a common type of global variable.

Let's start with a pretty typical situation in a C program. You have a "globals.c" file that defines a mess of globals, including:
   int g_ErrCount;
which might be used to tally the number of run-time errors seen by the system. I've used a "g_" naming convention to emphasize that is a global, which means that every .c file in the program can read and write this variable with wild abandon.

Let's say you also have the following places this variable is referenced, including globals.c just mentioned:
globals.c:    int g_ErrCount;      // define the variable
globals.h:    extern int g_ErrCount; // other files include this
init.c:       g_ErrCount = 0; // init when program starts
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

There are all sorts of risks with this approach... but let's concentrate on fixing them instead of diving into the Globals Are Evil discussion.

The first thing we're going to do is collect all the error counter functions into a single module, ErrCount.c, which would contain error counting, error reporting, and so on. This gets rid of the need to define g_ErrCount in globals.c, giving the below. We've also changed to using ErrCount.h for the extern definition:
globals.c:       // not needed any more for this variable
ErrCount.c:    int g_ErrCount;      // define the variable
ErrCount.h:    extern int g_ErrCount; // other files include this
init.c:       g_ErrCount = 0; // init when program starts
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

Now let's get rid of the initialization. Having a central init.c is asking for problems if you forget to call an initialization function. Also, having a separate init.c forces variables to be global. So let's initialize the variable where it is defined:
globals.c:       // not needed any more for this variable
ErrCount.c:    int g_ErrCount = 0;    // define and init variable
ErrCount.h:    extern int g_ErrCount; // other files include this
init.c:       // no longer needed
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

Instead of having the variable be global, let's hide it as a static variable inside ErrCount.c. Using the "static" keyword in defining a variable outside a function makes it invisible to other .c files. This step results in the program being broken, because other .c files can't get at the static variable. (We've also renamed the variable without the "g_" prefix because it's not global any more.)
ErrCount.c:    static int ErrCount = 0; // only visible in this file
ErrCount.h:    // static variables are invisible outside .c file
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

To fix the problem with .c files seeing the static variable, we're going to add some access functions to ErrCount.c to provide the ability to touch the value without making the variable global.:
ErrCount.c:    static int ErrCount = 0; // only visible in this file
           inline void ErrCount_Incr() { ErrCount++; }
           inline int ErrCount_Get() { return(ErrCount); }
           inline void ErrCount_Reset() { ErrCount = 0; }
ErrCount.h:    inline void ErrCount_Incr();  // increment the count
           inline int ErrCount_Get();   // get current count value
           inline void ErrCount_Reset(); // reset count
           // Note that there is NO access to ErrCount directly
moduleX.c:      ErrCount_Incr();   // tally another error
moduleY.c:     XVar = ErrCount_Get(); // get current number of errors
moduleZ.c:     ErrCount_Reset();  // clear number of reported errors

And that's it -- we're there.  ErrCount is no longer a global variable. It is visible only inside ErrCount.c, and any accesses to the variable are performed via access functions that increment, read, and reset the value. Note that the keyword "inline" should, with a good compiler, make this code just as fast and efficient as the global variable version of the code -- except without actually having a global variable. In fact, what we've been doing is a C-based approach for making ErrCount into an object (the variable) with access methods to increment, read, and reset the object. Not quite as clean as you might see in C++, but it gets the job done with C syntax.

Some folks might just say this is slight of hand. If it generates the same code, why bother?  Here are some reasons that at least some developers find it useful to take this approach:
  • Software authors can only perform intended functions specific to an error counter: increment, read, and reset. Setting to an arbitrary value isn't allowed. If you don't want the value changed other than via incrementing, you can just delete the reset function. This prevents some types of bugs from ever happening.
  • If you need to change the data type or representation of the counter used that all happens inside ErrCount.c with no effect on the rest of the code. For example, if you find a bug with error counts overflowing, it is a lot easier to fix that in one place than every place that increments the counter! 
  • If you are debugging with a breakpoint debugger it is easier to know when the variable has been modified, because you can get rid of the "inline" keywords and put a breakpoint in the access functions. Otherwise, you need watchpoints, which aren't always available.
  • If different tasks in a multitasking system need to access the variable, then it is a lot easier to get the concurrency management right inside a few access functions than to remember to get it right everywhere the variable is read or written  (get it right once, use those functions over and over). Don't forget to make the variable volatile and disable interrupts when accessing it if concurrency is an issue.
I'm sure different readers have different ways of approaching this problem, And some globals are harder to get rid of than others. But I've seen a lot of code that is structured just like the "before" code. (I'm sure I must have written things that way myself in my mis-spent youth!) This approach cleans up a large fraction of globals with minimal pain and often no speed penalty.




8 comments:

  1. Thanks for the article. I like following your blog. Actually, I instantly thought of a object-oriented solution in C++ but it's also nice to see how to properly solve this in plain C.

    ReplyDelete
  2. Actually, inlining won't help if you do separate compilation of each .c file; as the inliner can only access the C code that its actively compiling.

    You'd need to have a compiler that allows reading many C files in a single compilation step.

    However, even if the result is bigger code, its more robust. And robust matter more than just about anything else in embedded systems.

    ReplyDelete
  3. You're right about this. I've seen embedded systems that support inline across an entire project. Some don't, so that's something to ask your compiler vendor about. You're also right that in embedded systems having fast but buggy code isn't the right way to play the optimization game.

    ReplyDelete
  4. Thx Phil for your solution. Just starters like us read something like "Global are evil" and also get the explanation but never the solution but you offer solutions which I really appreciate. Btw your book just arrived today.

    ReplyDelete
  5. Due to possible inefficiency of inlining mentioned above, I'd rather use NDEBUG switch to select between access by macros (production code) and normal functions (development code):

    ErrCount.c:
    #ifdef NDEBUG
    int ErrCount = 0;
    #else
    static int ErrCount = 0;
    void ErrCount_Incr() { ErrCount++; }
    int ErrCount_Get() { return (ErrCount); }
    void ErrCount_Reset() { ErrCount = 0; }
    #endif

    ErrCount.h:
    #ifdef NDEBUG
    #define ErrCount_Incr() do { ErrCount++; } while (0)
    #define ErrCount_Get() (ErrCount)
    #define ErrCount_Reset() do { ErrCount = 0; } while (0)
    extern int ErrCount;
    #else
    void ErrCount_Incr();
    int ErrCount_Get();
    void ErrCount_Reset();
    #endif

    moduleX.c: ErrCount_Incr();
    moduleY.c: XVar = ErrCount_Get();
    moduleZ.c: ErrCount_Reset();

    ReplyDelete
  6. I do NOT recommend this approach if you can find a way to use the inline approach. You are changing the debug code from the production code, and you are using macros. And this is complicated enough it's harder to tell if there is a bug. Adding an apparently null do/while around a single execution is not making your code easier to understand. Most compilers will do just fine with the inline keyword. If your compiler doesn't handle that properly you should look into getting a better compiler. (Yes, I know, there are always projects that are under-resourced in so many ways. Those projects also have an elevated probability of failure.)

    ReplyDelete
  7. Hi Phil, I develop in C++ and typically I organize the application into a few components such as 'display', 'settings', 'logging', and top-level control is typically a stateMachine 'mainController'. All events are sent to the mainController, who then (depending on its state) dispatches to the subsytems by calling their memberfunctions.

    So at the top-level I have one instance of these objects, and I prefer to have these as 'global'. Alternatives are to have them on the stack or heap, but I don't like this :
    * the object may have a large data-footprint (eg buffer for a display)
    * I know I will have exactly 1 instance (objects are made nonCopyable)
    * Even if the objects are global, they are properly encapsulated and all internals are private, only an API is exposed

    Do you consider this use of global objects a case of using global variables ? (and as such a practice to be avoided), If so, how would you recommend to do it better ?

    Thanks,

    Pascal

    ReplyDelete
  8. I can't provide individual advice here, but some general observations:

    - Remember that static allocation is only one of two salient aspects of globals; you can get it while restricting variable scope. You can make the storage file static in main.c or even static in main and pass pointers down to the functions if that makes sense.

    - You can make each of these components their own object with accessor functions. The accessor functions might have global scope, but not the actual data structures. sometimes this makes sense.

    - Sometimes having a couple system-level info data structures that are globally visible can make sense, but only if other alternatives don't work.

    In your case it sounds like you might be conflating "global" with "statically allocated" so it might be helpful to tease those concepts apart in your design.

    ReplyDelete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!

Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...