Monday, September 16, 2013

Getting Rid of Global Variables

Summary: Global variables are evil. Here is an example of how to get rid of many of them.



Global variables are well known to be evil -- and you can read all about why that is in my free sample book chapter by that name. This posting gives a running example of changes that fix a common type of global variable.

Let's start with a pretty typical situation in a C program. You have a "globals.c" file that defines a mess of globals, including:
   int g_ErrCount;
which might be used to tally the number of run-time errors seen by the system. I've used a "g_" naming convention to emphasize that is a global, which means that every .c file in the program can read and write this variable with wild abandon.

Let's say you also have the following places this variable is referenced, including globals.c just mentioned:
globals.c:    int g_ErrCount;      // define the variable
globals.h:    extern int g_ErrCount; // other files include this
init.c:       g_ErrCount = 0; // init when program starts
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

There are all sorts of risks with this approach... but let's concentrate on fixing them instead of diving into the Globals Are Evil discussion.

The first thing we're going to do is collect all the error counter functions into a single module, ErrCount.c, which would contain error counting, error reporting, and so on. This gets rid of the need to define g_ErrCount in globals.c, giving the below. We've also changed to using ErrCount.h for the extern definition:
globals.c:       // not needed any more for this variable
ErrCount.c:    int g_ErrCount;      // define the variable
ErrCount.h:    extern int g_ErrCount; // other files include this
init.c:       g_ErrCount = 0; // init when program starts
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

Now let's get rid of the initialization. Having a central init.c is asking for problems if you forget to call an initialization function. Also, having a separate init.c forces variables to be global. So let's initialize the variable where it is defined:
globals.c:       // not needed any more for this variable
ErrCount.c:    int g_ErrCount = 0;    // define and init variable
ErrCount.h:    extern int g_ErrCount; // other files include this
init.c:       // no longer needed
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

Instead of having the variable be global, let's hide it as a static variable inside ErrCount.c. Using the "static" keyword in defining a variable outside a function makes it invisible to other .c files. This step results in the program being broken, because other .c files can't get at the static variable. (We've also renamed the variable without the "g_" prefix because it's not global any more.)
ErrCount.c:    static int ErrCount = 0; // only visible in this file
ErrCount.h:    // static variables are invisible outside .c file
moduleX.c:   g_ErrCount++;   // tally another error
moduleY.c:    XVar = g_ErrCount; // get current number of errors
moduleZ.c:    g_ErrCount = 0;  // clear number of reported errors

To fix the problem with .c files seeing the static variable, we're going to add some access functions to ErrCount.c to provide the ability to touch the value without making the variable global.:
ErrCount.c:    static int ErrCount = 0; // only visible in this file
           inline void ErrCount_Incr() { ErrCount++; }
           inline int ErrCount_Get() { return(ErrCount); }
           inline void ErrCount_Reset() { ErrCount = 0; }
ErrCount.h:    inline void ErrCount_Incr();  // increment the count
           inline int ErrCount_Get();   // get current count value
           inline void ErrCount_Reset(); // reset count
           // Note that there is NO access to ErrCount directly
moduleX.c:      ErrCount_Incr();   // tally another error
moduleY.c:     XVar = ErrCount_Get(); // get current number of errors
moduleZ.c:     ErrCount_Reset();  // clear number of reported errors

And that's it -- we're there.  ErrCount is no longer a global variable. It is visible only inside ErrCount.c, and any accesses to the variable are performed via access functions that increment, read, and reset the value. Note that the keyword "inline" should, with a good compiler, make this code just as fast and efficient as the global variable version of the code -- except without actually having a global variable. In fact, what we've been doing is a C-based approach for making ErrCount into an object (the variable) with access methods to increment, read, and reset the object. Not quite as clean as you might see in C++, but it gets the job done with C syntax.

Some folks might just say this is slight of hand. If it generates the same code, why bother?  Here are some reasons that at least some developers find it useful to take this approach:
  • Software authors can only perform intended functions specific to an error counter: increment, read, and reset. Setting to an arbitrary value isn't allowed. If you don't want the value changed other than via incrementing, you can just delete the reset function. This prevents some types of bugs from ever happening.
  • If you need to change the data type or representation of the counter used that all happens inside ErrCount.c with no effect on the rest of the code. For example, if you find a bug with error counts overflowing, it is a lot easier to fix that in one place than every place that increments the counter! 
  • If you are debugging with a breakpoint debugger it is easier to know when the variable has been modified, because you can get rid of the "inline" keywords and put a breakpoint in the access functions. Otherwise, you need watchpoints, which aren't always available.
  • If different tasks in a multitasking system need to access the variable, then it is a lot easier to get the concurrency management right inside a few access functions than to remember to get it right everywhere the variable is read or written  (get it right once, use those functions over and over). Don't forget to make the variable volatile and disable interrupts when accessing it if concurrency is an issue.
I'm sure different readers have different ways of approaching this problem, And some globals are harder to get rid of than others. But I've seen a lot of code that is structured just like the "before" code. (I'm sure I must have written things that way myself in my mis-spent youth!) This approach cleans up a large fraction of globals with minimal pain and often no speed penalty.




Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...