Monday, July 22, 2013

Making Main Loop Scheduling Timing More Deterministic

Summary: Wasting time in a main loop scheduler can make testing system-level timing a whole lot easier.



It's common enough to see a main loop scheduler in an embedded system along the lines of the following:

for(;;)
{ Task1();
  Task2();
  Task3();
}

I've heard this referred to as a "static scheduler," "cyclic executive," "main loop scheduler," or "static non-preemptive scheduler" among other terms. Regardless of what you call it, the idea is simple: run all the tasks that need to be run, then go back and do it again until the system is shut down or reset. There might be interrupt service routines (ISRs) also running in the system.

The main appeal of this approach is that it is simple. You don't need a real time operating system and, even more importantly, it would appear to be difficult to get wrong. But, there's a little more to it than that...

The first potential problem is those pesky ISRs. They might have timing problems, cause concurrency problems, and so on. Those issues are beyond what I want to talk about today except for making the point that they can affect the execution speed of one iteration through the main loop in ways that may not be obvious. You should do timing analysis for the ISRs (chapter 14 of my book has gory details).  But for today's discussion we're going to assume that you have the ISRs taken care of.

The next problem is timing analysis of the main loop. The worst case response time for running a task is one trip through the loop. But how long that trip is might vary depending on the calculations each task performs and how much time ISRs steal. It can be difficult to figure out the absolute worst case execution time (you should try, but it might not be easy). But the really bad news is, even if you know the theoretical worst case timing you're unlikely to actually see it during testing.

Consider the tester trying to make sure the system will function with worst case timing. How do you get the above static scheduler to take the worst case path through the code a bunch of times in a row to see what breaks? It's a difficult task, and probably most testers don't have a way to pull that off. So what is happening is you are shipping product that has never been tested for worst case main loop execution time. Will it work?  Who knows.  Do you want to take that chance with 10,000 or 100,000 units in the field? Eventually one of them will see worst case conditions and you haven't actually tested what will happen.

Fortunately there is an easy way to mitigate this risk. Add a time-waster at the end of the main loop. The time waster should convert the above main loop, which runs as fast as it can, to a main loop that runs exactly once per a defined period (for example, once every 100 msec):

for(;;)
{ StartTimer(100);  // start a 100 msec countdown
  Task1();
  Task2();
  Task3();
  WaitForTimer(0);  // wait for the 100 msec countdown to reach 0
}

This is just a sketch of the code -- how you build it will depend upon your system. The idea is that you waste time in the WaitForTimer routine until you've spent 100 msec in the main loop, then you run the loop again. Thus, the main loop runs exactly once every 100 msec.  If the tasks run faster than 100 msec as determined by a hardware timer, you waste time at the end, waiting for the 100 msec period to be up before starting the next main loop iteration. If the tasks take exactly 100 msec then you just start the main loop again immediately. If the tasks run longer than 100 msec, then you should log an error or perform some other action so you know something went wrong.

The key benefit to doing this is to ensure that in testing the average timing behavior is identical to the worst case timing behavior. That way, if something works when the system is fast, but breaks when it actually takes 100 msec to complete the main loop, you'll see it right away in testing. A second benefit is that since you are actively managing the main loop timing, you have a way to know the timing ran a little long on some loops even if it isn't bad enough to cause a watchdog reset.






Job and Career Advice

I sometimes get requests from LinkedIn contacts about help deciding between job offers. I can't provide personalize advice, but here are...