Better Embedded System SW: real time

Showing posts with label real time. Show all posts

Monday, February 1, 2016

Multi-Rate Main Loop Task Timing

In the past couple posts I've talked about how to build a multi-rate main loop scheduler and the two biggest mistakes I tend to see with timing analysis. In this posting I'll wrap up the topic (at least for now) by describing how to determine whether a particular task in a non-preemptive multi-rate system is going to meet its deadlines. The math gets pretty complex and is pretty painful, so I'll work an example and give the general rules without trying to show the iterative math behind the method. At that, this posting will take more work to understand completely than my usual posts, but it's just a complex topic and this isn't the place for an entire book chapter worth of math. On real systems you can generally do the checking with a spreadsheet once you understand the technique.

Consider the following task set:

Task Period Compute CPU Load
0 7 2 28.6%
1 10 2 20%
2 20 3 15%
3 101 5 5%
4 199 3 1.5%
Total CPU Load 70.1% (numbers rounded to nearest 0.1%)

If we want to find out the worst case response time for Task 2, we need to look at the following worst case:

All the tasks in the task set become ready to run simultaneously. In practice this means that the system timer is at zero or equal to the least common multiple of all periods
The most obnoxious task with priority lower than the task we care about sneaks in and starts running just before the highest priority task gets to run. (Generally this happens because that task got a chance to start right at the last instant before the counter ticked.)
All the tasks with higher priority run one or more times (based on their periods) before the task we care about runs. In general some tasks will run multiple times before our task gets a chance to run.

So for the example case of Task #2, that means:

Identify Task #3 as the task with the largest compute time from those tasks with a priority higher than Task #2. Assume it starts running at time zero because it snuck in ahead of all the other tasks.
Because it is time zero, all other tasks are ready to run starting at time zero. But because Task #3 snuck in first, all the other tasks are in the run queue at time zero.

Now we do an iterative calculation as follows, with each numbered step being an iteration of run the highest queued priority task and compute the new system time when it ends.

Start at time 0 with Task 3 running.
- Tasks 0, 1, 2, 4 queue to run at time zero.
- Running Task 3 takes us to time 5 when Task 3 completes.
- At time 5 tasks still queued are: 0, 1, 2, 4
Time 5: run highest priority queued task: Task 0.
- This runs for 2 msec, bringing us to time 7.
- At time 7 we reach the Task 0 period, so another copy of Task 0 queues to run.
- At time 7 tasks still queued are: 0, 1, 2, 4
At time 7 run highest priority queued task: Task 0.
- This runs for 2 msec, bringing us to time 9.
- At time 9 tasks still queued are: 1, 2, 4
At time 9 run highest priority queued task: Task 1.
- This runs for 2 msec, bringing us to time 11.
- At time 10 another copy of Task 1 is queued (note that we have missed the deadline for Task 1 since it should have completed at time 10, but ran until time 11).
- At time 11 tasks still queued are: 1, 2, 4
At time 11 run highest priority queued task: Task 1.
- This runs for 2 msec, bringing us to time 13.
- At time 13 tasks still queued are: 2, 4
At time 13 run highest priority queued task: Task 2.
- This runs for 3 msec, bringing us to time 16.
- At time 14 another copy of Task 0 is queued.
- At time 14 tasks still queued are: 0, 4.
At time 16 Task #2 has completed, ending our analysis

We've computed the worst case time to complete Task 2 is 16 msec, which is less than its 20 msec period. So Task 2 will meet its deadline. However, along the way we noticed that even though the CPU is only loaded to 70.1% Task 1 is going to miss its deadline.

Graphically, the above scenario looks like this:

The blue arrows show where tasks become queued to run, and the boxes show which task is running when.

To determine if your system is schedulable you need to repeat this analysis for every task in the system. In this case, repeating it for Task 1 will reveal that Task 1 misses its deadlines even though the CPU is only loaded at 70.1%.

In general the technique is to assume the longest-running task (biggest compute time) with priority lower than yours starts running and all other tasks queue at that same time. Then play out the sequence to see if you meet your deadline. There are a few notes on special cases. The longest task with lower priority may vary depending on which task you are evaluating. For example, the longest lower priority task for Task #3 is not Task #3, but rather Task #4. And the lowest priority task doesn't have a lower priority blocking task, so when evaluating Task #4 you can just assume it starts at time zero (there is no initial blocker). This can be expressed as an iterative equation that has to be cycled until it converges. If you really want a math-based approach take a look at the CAN performance lecture slides in my grad course, which is pretty much the same math. But trying to explain the math takes a lot more words and complicated typography than are practical in a blog entry.

If you find out you are missing deadlines and want to fix it, there are two general techniques that can help:

Make any really long-running low priority task run shorter, possibly by breaking it up into multiple shorter tasks that work together, or that can "yield" control back to the main loop every once in a while during their execution and pick up the next time they run where they left off. This will reduce the length of the initial blocking effect for all higher priority tasks.
Schedule tasks so their periods evenly divide each other (this will result in the Least Common Multiple of all periods equals the largest period). This corresponds to the the approach of harmonic task periods discussed in the real time scheduling chapter of my book. For non-preemptive tasking it will NOT support 100% CPU usage, but probably it will make worst case latency better.

Notes for readers who want to be thorough:

If you have interrupts you have to include the ISR contribution to CPU workload when doing this analysis. Doing so can get a bit tricky. The real time scheduling chapter of my book shows how to do this for a single-rate main loop scheduler.
It may be that you want to resolve a situation in which the fastest task gets blocked for too long by putting that faster task into an ISR. If you do that, keep it short and don't forget that it still affects CPU workload. This approach is discussed for a single-rate main loop scheduler, also in the real time scheduling chapter of my book.
We don't include the current task in the "most obnoxious" consideration because we start by assuming the system will meet deadlines where deadline = task period. Any particular task has to have finished running before it can run again unless it misses its deadline. So a task can't self-block in a system that is schedulable.

Monday, January 4, 2016

Top Two Mistakes In Main Loop Scheduling

In my previous posting I showed you how to build a multi-rate main loop. In this posting we're going to start digging into the critical issue of how to do timing analysis for a multi-rate main loop cooperative scheduler. I'll explain the top two mistakes I see in industry code that lead to missed task deadlines.

I'm going to assume your multi-rate main loop is implemented more or less the way I described in my previous blog posting. The important property we care about is whether each task gets to run to completion sometime within its assigned time period.

Multi-Rate Main Loop CPU Utilization

Let's say you have a set of tasks that looks like this, with the table columns being the task number, the period the task runs at (in msec), and the amount of CPU time it takes to run the task (again in msec):

Task Period Compute
0 5 1
1 10 2
2 20 3
3 100 11

In this example, Task 2 runs once every 20 msec, and takes 3 msec to run once it has started.

Here is the first big question: is the CPU overloaded to more than 100%? To answer that, for each row in the table add a CPU % that is computed by dividing each task's run time by its period. For example task 2 runs 3 msec out of every 20 msec, so the CPU utilization is 3/20 = 15%.

Task Period Compute CPU Load
0 5 1 20%
1 10 2 20%
2 20 3 15%
3 100 11 11%
Total CPU Load 66%

OK, total CPU load is only 66%. This is good news so far.

To do this math in the real world, you also need to add in the CPU time spent by interrupt service routines. Sometimes you end up with more than 100%, and that is obviously a problem.

It might seem impossible to get this wrong. But I would not be writing this if I hadn't seen this problem in a real industry design review. More than once. The way teams get this wrong is not by getting the math wrong. The problem is that they don't do the analysis in the first place, and have no idea what their CPU load is because they never measure it.

An especially tricky problem is when a task might normally run quickly, but once in a while has a very long compute time. This means that the system might meet its deadlines most of the time, but not all the time. If you were to monitor typical CPU load on a testbed things would be fine. But you might still miss deadlines once in a while -- perhaps only once in a really long while that you don't see during system test.

When doing a CPU loading calculation, the compute time has to be the worst case path through each and every task in the code (sometimes called Worst Case Execution Time: WCET), not just a typical path. Determining WCET can be a bit of a pain, but doing so for each task -- even the very infrequent ones, is a required first step in knowing if you are going to miss deadlines.

Blocking Time

But we're not out of the woods yet! It is pretty easy to design a system that misses deadlines despite low CPU loads. Have you spotted the problem with the example task set above yet?

The issue is that in a non-preemptive system like this once a task starts running, it runs to completion. If it is an infrequent task it might account for a small CPU load, but still hog the system for a long time.

Take another look at Task 3. It only runs every 100 msec, but it runs for 11 msec. That's only 11% CPU load. But 11 msec is more than twice as long as Task 1's period! So that means no matter what you do, once task 3 starts running, task 1 will miss a deadline. If you want to see a simple example, consider the following timeline:

Time 100: Task 0 runs for 1 msec
Time 101: Task 3 runs for 11 msec
Time 112: Task 0 can start running. But it needed to run again between times 105 and 110, so it has already missed its deadline.

This problem would occur regardless of CPU load. Even if Task 3 executes only once per day, every day Task 0 would miss its deadline. In fact, the longer the period for Task 3, the less frequently the problem will happen and the harder it is going to be to find this problem in testing. (Sure, some systems can occasionally miss a deadline. But you at least want to know that's going to happen to make sure you can design the system to be robust to a missed deadline. And for a safety critical system, having a system that misses any deadlines due to a design flaw is a bad idea regardless of system robustness.)

In general, if any task has a compute time longer than twice the period of the fastest task, you're going to miss deadlines. (A more precise formulation is if the sum of twice the compute time of the fastest task plus the compute time of the longest task is more than twice the period of the fastest task, you'll miss deadlines.) Note that this is a necessary but not sufficient condition for schedulability.

In design reviews I see this problem of low CPU % but blocked high priority task frequently. Very frequently. So don't let it bite you!

Even if this isn't the case it is still possible to miss deadlines with less than 100% CPU load, but that analysis takes a while to explain, so it will have to wait for another posting..

If you simply can't wait, the easy-to-explain (but tedious) way to check things is to lay out a timeline of when tasks will execute using a spreadsheet, with one row per msec and starting with all tasks being ready to run at time zero. Put down when each task will run as a series of occupied cells in the timeline. Repeat until you reach the least common multiple (LCM) of all the task periods. If they all fit, you should be OK. If not, you need to dig deeper into what's going on. (Note that this neglects the effects of jitter in task start times caused by race conditions between the timer ISR and the main loop if..else chain. So exercise care with that analysis.)

Monday, December 7, 2015

Multi-Rate Main Loop Tasking

Recently I was looking for an example of a prioritized, cooperative multitasking main loop and realized it is surprisingly difficult to find one that is (1) readily available and (2) comprehensible.

Sure, once you understand the concept maybe you can dig through an industrial-strength implementation with all sorts of complexity. But good luck getting up the learning curve! So, here is a simple (I hope!) description of a multi-rate main loop scheduler.

First of all, we're talking about non-preemptive multitasking. This is variously called main loop scheduling, a main loop tasker, a prioritized cooperative multitasker, a cyclic executive, a non-preemptive scheduler, and no doubt a bunch of other terms. (Not all of these terms are precisely descriptive, but in terms of search terms they'll get you in the ballpark.) The general idea is to be able to schedule multiple tasks that operate at different periods without having to use an RTOS and without having to use the generally bad idea of stuffing everything into timer-based interrupts.

Single Rate Main Loop

The starting point is a single-rate main loop. This is a classical main loop schedule in which all tasks are run to completion over and over again:

void main(void)
{ ... initialization ...
while(1)
{ Task0();
    Task1();
    Task2();
    Task3();
}
}

The good news about this is that you don't need an RTOS, and it's really hard to get things wrong. (In other words reviews and testing are easy to do well.)

The bad news is that all tasks have to run at the same period. That means that if one task needs to run really quickly you'll miss its deadlines.

I've seen way too many hacks that use conditional execution based on CPU load to sometimes skip tasks. But ad hoc approaches have the problem that you can't really know if you'll miss deadlines. Instead, you should use a methodical approach that can be analyzed mathematically to make sure you'll meet deadlines. The way people generally go is with some variation of a multi-rate main loop.

Multi-Rate Main Loop

The idea behind a multi-rate main loop is that you can run each task at a different periodic rate. Each task (which is just a subroutine) still runs to completion, so this is not a full-up preemptive multitasking system. But it is relatively simple to build, and flexible enough for many embedded systems.

Here is some example code of the main loop itself, with some explanation to follow.

void main(void)
{ ... initialization ...
while(1)
{ if( flag0 ) { flag0 = 0; Task0(); }
else if ( flag1 ) { flag1 = 0; Task1(); }
else if ( flag2 ) { flag2 = 0; Task2(); }
else if ( flag3 ) { flag3 = 0; Task3(); }
}
}

The way this code works is as follows. All the tasks that need to be run have an associated flag set to 1. So if Task1 and Task2 are the only tasks that need to run, flag0 is 0, flag1 is 1, flag2 is 1, and flag3 is 0. The code crawls through an "else if" cascade looking for the first non-zero flag. When it finds a non-zero flag it executes that task, and only that task.

Note that each task sets its flag to zero so that it runs exactly one time when it is activated by its flag. If all flags are zero then no task is executed and the do..while loop simply spins away until a flag finally becomes non-zero. More about how flags get set to 1 in a moment.

After executing at most one task, the loop goes back to the beginning. Because at most one task is executed per iteration of the main do..while loop, the tasks are prioritized. Task0 has the highest priority, and Task3 the lowest priority.

Yes, this prioritization means that if your CPU is overloaded Task0 may execute many times and Task3 may never get a turn. That's why its important to get scheduling right (this will be a topic in a later blog posting).

Multi-Rate Timers

The main loop wasn't so bad, except we swept under the rug the messy business of getting the flags set properly. Trying to do that in the main loop generally leads to problems, because a long task will cause many milliseconds to go by between timer checks, and it is too easy to have a bug that misses setting a flag some of the time. Thus, in general you tend to see flag maintenance in the timer interrupt service routine.

Conceptually the code looks like this, and for our example lives in a timer interrupt service routine (ISR) that is called every 1 msec. A variable called "timer" keeps track of system time and is incremented once every msec.

// in a timer ISR that is called once every msec
timer++;
if ((timer % 5) == 0) { flag0 = 1; } // enable 5 msec task0
if ((timer % 10) == 0) { flag1 = 1; } // enable 10 msec task1
if ((timer % 20) == 0) { flag2 = 1; } // enable 20 msec task2
if ((timer % 100) == 0) { flag3 = 1; } // enable 100 msec task3
if (timer >= 100) { timer = 0; } // avoid timer overflow bug

Every 5 msec the timer will be zero modulo 5, every 10 msec the timer will be zero modulo 10, and so on. So this gives us tasks with periods of 5, 10, 20, and 100 msec.   Division is slow, and in many low-end microcontrollers should be avoided in an ISR. So it is common to see a set of counters (one per flag), where each counter is set to the period of a particular task and counts down toward zero once per msec. When a counter reaches zero the associated flag is set to 1 and the counter is reset to the tasks' period. This takes a little more RAM, but avoids division. How it's implemented depends upon your system tradeoffs.

The last line of this code avoids weird things happening when the timer overflows. The reset to zero should run at the least common multiple of all periods, which in this case happens to be equal to the longest period.

Concurrency Issues

As with any scheduling system there are potential concurrency issues. One subtle one is that the timer ISR can run part-way down the else..if structure in the main loop. This could cause a low-priority task to run before a high priority task if they both have their flags set to 1 on the same timer tick. It turns out that this doesn't make the worst case latency much worse. You could try to synchronize things, but that adds complexity. Another way to handle this is to copy the current time into a temporary variable and do the checks for when to run each task in the main loop instead of the timer ISR.

It's also important to note that there is a potential concurrency problem in writing flags in the main loop since both the ISR and the main task can write the flag variables. In practice the concurrency bug will only hit when you're missing deadlines, but good coding style dictates disabling interrupts when you update the flag values in the main loop, which isn't shown in the main loop code in an attempt to keep things simple for the purpose of explanation.

The Big Picture

OK, that's pretty much it. We have a main loop that runs each task when its ready-to-run-flag is set, and a timer ISR that sets a ready-to-run flag for each task at the desired period. The result is a system that has the following properties:

Each task runs once during its assigned period
The tasks are prioritized, so for example task 2 only runs when task 0 and task 1 do not need to run

The big benefit is that, so long as you pay attention to schedulability math, you can run both fast and slow tasks without needing a fancy RTOS and without missing deadlines.

In terms of practical application this is quite similar to what I often see in commercial systems. Sometimes developers use arrays of counters, arrays of flags, and sometimes even arrays of pointers to functions if they have a whole lot of functions, allowing the code to be a generic loop rather than spelling out each flag name and each task name. This might be necessary, but I recommend keeping things simple and avoiding arrays and pointers if it is practical for your system.

Coming soon ... real time analysis of a multi-rate main loop

Monday, May 19, 2014

Task Death In Safety Critical Systems

Safety critical systems must be designed with an expectation that one or more tasks on the same CPU might fail via hanging, might fail to be scheduled, or otherwise might not execute in a periodic manner as intended, leading to a partial software failure.

Consequences:
Failing to monitor each and every task and failing to mitigate task execution faults can lead to a task “dying” without being recovered. If a task dies, it no longer performs its functions even though other tasks are still operational. This can leave the system in an uncontrolled state, cause loss of fail-safe functionality (if the fail-safe is in a dead task), cause stale recorded failure data (if the task collecting that data dies), etc.

Accepted Practice:

Each task must be continually monitored for completion within a defined period. This ensures that the task is scheduled and that it is completed on time. A properly designed watchdog timer that monitors all tasks is an accepted way to accomplish this.
If a task fails to periodically complete execution, fault mitigation must be performed. Common fault mitigation techniques include restarting the task or restarting the system.

Discussion

Embedded systems typically have one or more periodic software tasks that must complete execution within some maximum period (each task has its own deadline, and by default the deadline equals the period unless some other deadline is specified). In a properly designed system, careful analysis and testing has been performed to ensure that each task will continue operation and meet its deadline even under worst case operating conditions.

However, competent safety-critical embedded system designers realize that faults may occur, resulting in a partial failure of the software system, and plan accordingly. Each task must be monitored to ensure the following: each task executes at least once in every period that it is supposed to execute; each task finishes execution without abnormally terminating; and each task completes execution within its predetermined deadline. This means that the system must not only make sure that once a task is scheduled to run it meets its deadline, but also make sure that the task is scheduled to run in the first place.

As an analogy, consider if you have a set of things you need to do every evening, but have a bad memory. One way you might handle this is to put post-it notes on your fridge to remind yourself what needs to be done every evening before you go to sleep. Let’s say you’re very methodical, so you take each note down in turn, do the action, then put it back on the fridge, in order. One of the notes says “check that basement door is locked” because you’ve found the kids have a habit of forgetting to lock it when they come in from playing. For a while this works fine. But one day you get distracted while locking the basement door and set the note down, forgetting to put it back on the fridge. The next night you might forget to check the door and not even realize it (at least not until you wake up in the middle of the night wondering about it – but embedded CPUs don’t have that feature!). What has happened is that a disruption caused one of your tasks to fall off the scheduling list. Just because you finished the entire list before morning doesn’t mean everything got done, because something was missing from the list. The same thing can happen in a computer if one of its tasks dies and doesn’t get put back on the task to-do list – the computer “thinks” it is running all its tasks, but in reality one of the tasks hasn’t been run because it isn’t even on the to-do list.

Selected Sources

Kleidermacher points out that tasks can die, saying “When a thread faults (for example, due to a stack overflow), the kernel should provide some mechanism whereby notification can be sent to the supervisor thread. If necessary, the supervisor can then make a system call to close down the faulted thread, or the entire process, and restart it. The supervisor might also be hooked into a software ‘watchdog’ setup, whereby thread deadlocks and starvation can be detected as well.” (Kleidermacher 2001, pg. 23).

Ganssle refers to the necessity of checking all tasks by saying “This problem multiplies in a system with an RTOS, as a reliable watchdog monitors all of the tasks. If some of the tasks die but others stay alive – perhaps tickling the [Watchdog Timer] – then the system’s operation is at best degraded.” (Ganssle 2000, p. 125)

Tasks can be expected to die due to a variety of causes, and this is especially likely to happen in a system without hardware support for memory protection, such as OSEK, because software from one task can accidentally corrupt another task's data. Task death is an expected possibility in such an RTOS. For example, a technical summary of the OSEK operating system shows “Failed” as a task state, indicating that tasks can be expected to fail and need to be restarted to restore system operation. (Feiler 2003, Fig 1, shown below).

OSEK tasks can be expected to fail, requiring a task restart. (Feiler 2003, Fig. 1.)

Another contemporaneous real time operating system, VxWorks, did not support memory protection and did not support automatic task restart. As a result, it was dramatically more prone to subtle but significant partial software failures if software applications misbehaved (Koopman 2008, pp. 220-221, recounting work done several years earlier). In particular, parts of the system would appear to be working, but key pieces of the system would have failed, giving a superficial appearance of a system that was working (the keyboard accepted commands and displayed them on the computer screen) when in fact the system was mostly dead, and programs could not be run on it. (VxWorks has since added memory protection capabilities, so these experimental results do not necessarily represent current VxWorks systems.)

Safety critical system designers should reasonably expect that a task might die (cease to finish computation in a periodic manner). They should have complete analysis to determine which task deaths might lead to an unsafe system (by default, this is all tasks in the system), and should take proactive measures to detect and recover from one or more tasks dying, regardless of the underlying cause -- even if no specific cause for the task death can be imagined by the designers.

References:

Feiler, Real-time application development with OSEK: a review of the OSEK standards, Carnegie Mellon University Software Engineering Institute, CMU/SEI-2003-TN-004, Nov 2003.
Ganssle, J., The Art of Designing Embedded Systems, Newnes, 2000.
Kleidermacher, D. & Griglock, M., Safety-Critical Operating Systems, Embedded Systems Programming, Sept. 2001, pp. 22-36.
Koopman, P., DeVale, K. & DeVale, J., "Interface robustness testing: experiences and lessons learned from the Ballista Project," In: Kanoun, K. & Spainhower, L., Eds., Dependability Benchmarking for Computer Systems, IEEE Press, 2008, pp. 201-226.

Monday, May 12, 2014

Real Time Scheduling Analysis for Critical Systems

Critical embedded software must be created using methodical scheduling analysis that ensures that every task can meet its deadline under worst-case fault-free operational conditions. Looking just at CPU load is not only inadequate to determine if you will meet real time deadlines -- it is a specifically bad practice that is to be avoided.

Consequences:
Failing to do an accurate real time schedule design for a safety critical system means that designers do not know if the system will miss deadlines. It is impractical in a complex system (which means almost any real-world product) to do enough testing with a multitasked RTOS-based system to ensure that deadlines will be met under worst case conditions. Thus, a design team that fails to do scheduling analysis should appreciate that they have an unknown probability of missing deadlines, and can reasonably expect to miss deadlines in the worst case if the CPU load is above about 69% for a large number of non-harmonic tasks.

Accepted Practices:

Using Rate Monotonic Scheduling (RMA) when using an RTOS with multi-tasking capability, or using some other mathematically sound scheduling analysis. Rate Monotonic Scheduling (RMS) is what you get as a result of RMA.
Documenting the RMA analysis including at least listing: all tasks, WCET for each task, period of each task, and worst case system blocking time. Assumptions used in the analysis should be stated and justified (for example, an assumption that no low priority task blocks a high priority task must be justified via written explanation, or if untrue then advanced techniques must be used to ensure schedulability). The system use must be less than 100% of the CPU if a set of favorable assumptions such as harmonic task periods can be documented to hold, and may be restricted to as low as 69.3% CPU used (meaning 30.7% unused CPU capacity) in the general case.
A specifically bad practice is basing real time performance decisions solely on spare capacity (e.g., “CPU is only 80% loaded on average”) in the absence of mathematical scheduling analysis, because it does not guarantee safety critical tasks will meet their deadlines. Similarly, monitoring spare CPU capacity as the only way to infer whether deadlines are being met is a specifically bad practice, because it does not actually tell you whether high frequency deadlines are being met or not.
Permitting more than one instance of a real-time task to queue is a specifically bad practice, because this can only happen when real time deadlines are being missed. This practice is a bandaid for a real-time system, and indicates that the system is missing its real-time deadlines.

Discussion

Real time scheduling is a mathematical approach to ensuring that every task in a real time embedded system meets its deadlines under all specified operating conditions. Using a mathematical approach is required because testing can only exercise some of the system operating conditions. There are almost always real time scheduling situations that can’t be adequately tested in a complex piece of software, requiring either simplifying the system to make testing feasible, or using a rigorous mathematical approach to ensure that complex scheduling is guaranteed to work. When a multi-tasking real time operating system such as OSEK is used, a mathematical approach must be used to ensure deadlines are met.

Rate Monotonic Analysis

The generally accepted method for scheduling critical real time systems is to perform Rate Monotonic Analysis (RMA), possibly with one of a number of adaptations for special circumstances, and create a system with a Rate Monotonic Schedule. RMA has the virtue of providing a simple set of rules that guarantees all tasks in a system will meet their deadlines. To achieve this, RMA requires rules to be followed such as all tasks having a defined fastest time period at which they will run, and the period of tasks being harmonic multiples to permit using 100% of the CPU capacity. (Task periods are considered harmonic if they are exact multiples of each other. For example, periods of 1, 10, 100, 1000 msec are harmonic, but 1, 9, 98, 977 msec are not harmonic.) To implement RMA, designers sort tasks based on period, and assign the fastest task to the highest priority, second fastest task to the second highest priority, and so on. If a designer wishes to have multiple tasks at the same priority that is acceptable, but it is required that all tasks at a given priority execute at the same period.

The benefit of RMA is that there is a mathematical proof that all tasks will meet deadlines if a certain set of rules is followed. If any of the rules is bent, however, then more complex mathematical analysis must be performed or other special techniques used to ensure that deadlines will be met. In particular, if task periods are not harmonic multiples of each other, guaranteeing that deadlines will be met requires leaving slack capacity on the CPU. For a system with many tasks, RMA can only guarantee that task deadlines will be met if the CPU load is a maximum of 69.3%. (For this reason, it is common for embedded systems to use harmonic multiples of task periods.)

Example execution of a system scheduled with RMA.
(Source: http://blogs.abo.fi/alexeevpetr/2011/11/24/generating-multithread-code-from-simulink-model-for-embedded-target/)

Employing RMA requires at least the following steps at design time: listing all tasks in the system including interrupt service routines, determining the fastest period of each task, determining the worst case execution time (WCET) of each task, and determining the maximum blocking time of the system (longest time during which interrupts are disabled). Given those numbers, tasks can be assigned priorities and designers can know if the system will always meet its deadlines. Given correct RMA scheduling and a fault-free operational system, every task will complete by the end of its period, so it will never be possible to have a second occurrence of a task enqueued while waiting for a first occurrence to complete.

If the CPU is over-subscribed, there are established methods of ensuring that critical tasks always complete while non-critical tasks get whatever CPU resources are left. The simplest method is to simply assign all non-critical tasks priorities lower than the lowest priority critical task (which works so long as those non-critical tasks can’t substantively delay with the operation of the critical tasks). If it is important to share available CPU time across a number of noncritical tasks even when the CPU is overloaded, this can be done by having non-critical tasks take turns at the lowest priority. Note that an overloaded CPU does not cause any critical tasks to miss their deadlines in this scenario; it is simply a matter of making non-critical tasks wait a bit longer to execute if the CPU is overloaded. (This assumes that the CPU can handle worst case demand from critical tasks. This assumption must be ensured by the design process for the system to be safe.)

Less Than 100% CPU Load Does Not Guarantee Deadlines Are Met

A specifically bad practice is looking at idle time during testing to determine whether or not the CPU is overloaded, and inferring from less that 100% usage that the system will meet its deadlines. That simply does not account for the worst case, or potentially even just an infrequent heavy load case.

As a non-computer example of missing deadlines with less than 100% loading, let’s say you want to spend five days per week working and three days out of every 12 volunteering at a homeless shelter (the fact that it is out of 12 days instead of out of 14 days makes these periods non-harmonic). Because work has a period of 7 days, it will have higher priority than the 12 day volunteer service period. This means you’ll complete all of your work the first 5 days (Monday – Friday) out of each 7-day work week before you start volunteering on weekends. Most weeks this will work fine (you’ll have enough time for both). But when the 12-day volunteer period starts on a Monday, you’ll only have one weekend (two days) in the 12 day period for volunteering that runs Monday of one week through Friday of the next week. Thus you’ll be a day short for volunteering whenever the volunteer period starts on a Monday. The amount of time you’ve committed is 5 days out of 7 plus 3 days out of 12: 5/7 + 3/12 = 96.4%. But you’re going to come up a whole day short on volunteering once in a while even though you are less than 100% committed and you are taking some weekend days off, performing neither task. You could solve this by committing 4 days out of 14 to volunteering, which is actually a slightly higher workload (100% total), but changes the period to be harmonic with the weekly work cycle. (3.5 days out of every 14 would also work.) Thus, as shown by this example, non-harmonic task periods can result in missed deadlines even with a workload that is less than 100%.

Example of 79% loaded CPU missing a task deadline with only 4 tasks.

As a computer-based example, the above figure shows another example in terms of a task in a four-task system missing its deadline on a system with non-harmonic periods even though the CPU is only 79% loaded. This scenario would happen rarely, and only when all the tasks’ periods synchronize, which for this example would be only once every Least Common Multiple of (19,24,29,34) = 224,808 time units – and even then only if each task actually took its worst case time to execute. Thus, while Task 4 would be expected to miss its deadline in operation, detecting that situation might be difficult with limited duration testing.

Selected Sources

Rate Monotonic Scheduling was developed to address the problem of creating an easy-to-follow recipe for ensuring that real time schedules can be met in a system with multiple tasks. Influential early papers include (Liu & Layland 1973 and Lehoczky et al. 1989).

Ganssle recommends the use of RMA as early as 1992 (Ganssle 1992, pg. 200-201) and sketches its use, giving a reference for more details.

Douglass provides a pattern for static priority real time scheduling and states: “the most common policy for the selection of policies is rate monotonic scheduling or RMS”. (Douglass 2002, section 5.9.5; note that rate monotonic scheduling is what you get as a result of rate monotonic analysis).

In the context of safety critical operating systems, Kleidermacher says “Rate monotonic analysis (RMA) is frequently used by system designers to analyze and predict the timing behavior of systems.” (Kleidermacher 2001, pg. 30).

MISRA Report 3 discusses the use of real-time kernels (which for our purposes are operating systems that use some sort of scheduling approach). It notes that there are a number accepted scheduling techniques, and that the use of “best available technology” such as for example using a certified RTOS brings benefit, providing a reference to a number of text.

References:

Douglass, Doing Hard Time: Developing Real-Time Systems with UML, Objects, Frameworks, and Patterns, Addison-Wesley Professional, 1999.
Ganssle, J., The Art of Programming Embedded Systems, Academic Press, 1992.
Kleidermacher, D. & Griglock, M., Safety-Critical Operating Systems, Embedded Systems Programming, Sept. 2001, pp. 22-36.
Lehoczky, J.; Sha, L.; Ding, Y. "The rate monotonic scheduling algorithm: exact characterization and average case behavior", IEEE Real-Time Systems Symposium, 1989, pp. 166–171.
Liu, C. L.; Layland, J., "Scheduling algorithms for multiprogramming in a hard real-time environment", Journal of the ACM, 1973, (1): 46–61
MISRA, Report 3: Noise, EMC and Real-Time, February 1995.

Monday, January 6, 2014

Do Not Re-Enable Interrupts In An ISR

Summary: It might sound intuitive to re-enable interrupts within an ISR so higher priority interrupts can run without delay. But, in fact, this is probably the worst thing you can do for delay, and might even make the code unsafe due to stack overflow.

A previous post on rules for using interrupts included the rule:

"Don't re-enable interrupts within an Interrupt Service Routine (ISR). That's just asking for subtle race condition and stack overflow problems."

Some developers take a different point of view, and feel that it is best to re-enable interrupts to let higher priority ISRs run without delay. The idea is that high priority interrupts should run as soon as possible, without having to wait for low priority interrupts to complete. Re-enabling interrupts within an ISR seems to let that happen. BUT, while there might be some intuitive appeal to this notion, this is a dangerous practice that makes things worse instead of better.

First, to recap, when an interrupt triggers an ISR one of the first things that happens is that further interrupts get masked by the interrupt handling hardware mechanisms. Once the ISR starts running, at some point (best is at the beginning) it acknowledges the interrupt source, clearing the interrupt request that triggered the ISR. At that point the ISR can re-enable interrupts if it wants to, or leave them masked until the ISR completes execution. (The "return from interrupt" instruction will typically restore interrupt flags, re-enabling interrupts as appropriate when the ISR completes.) If interrupts are re-enabled within the ISR, then another interrupt can suspend the ISR and run some other, second ISR. This means that if a higher priority interrupt comes along, it can run its ISR right away.

The problem is that other, bad, things can also happen once interrupts are re-enabled:

If a lower priority interrupt comes along, it also gets to run, suspending the currently running, higher priority ISR. Interrupt priority hardware does not keep track of interrupt history after an ISR starts and acknowledges its interrupt source, and so loses track of how high the priority is for the running ISR. Worse, if a high and low priority interrupt happen at the same time, this approach guarantees that the high priority ISR waits for the low priority ISR. This happens because the high priority ISR runs first, then gets preempted by the lower priority ISR as soon as interrupts are re-enabled. In the case where no other interrupts are pending, the low priority ISR runs to completion before the high priority ISR gets to finish.

The ISRs nest as above rather than running one at a time, filling up the stack. You might be able to account for this by allocating enough stack for all ISRs to be active at the same time. But if you leave interrupts masked in ISRs. the worst case is only the single biggest ISR stack use. (Some hardware has multiple tiers/levels/classes ... pick your favorite term ... of interrupts, but in that case it is still only one ISR of stack use per tier rather than one per ISR source.)

The same ISR might run more than once at a time, especially if it got unlucky and was preempted by other ISRs, delaying its completion time. For example, if you get a burst of noise on an ISR hardware line you might kick of a half dozen or so copies of the same ISR. Or once in a while hardware events happen close together and re-trigger the ISR. This will lead to trouble if your ISR code is not re-entrant. It also could overflow the stack, ending up in memory corruption, etc. unless you can accurately predict or limit how many times ISRs can be re-triggered in absolute worst-case conditions.

You could say "the highest priority ISR doesn't re-enable interrupts" -- but what about the second-highest priority ISR? Once you get more than a couple ISRs involved this gets hopeless to untangle. You could try to write some sort of ISR handler to mitigate some of these risks, but it's going to be difficult to get right, and add overhead to every ISR. In all, the situation sounds pretty messy and prone to problems .. and it is. You might get away with this on some systems some of the time if you are really good (and never make mistakes). But, getting concurrency-related tricky code right is notoriously difficult. Re-enabling interrupts is just asking for problems.

So let's look at the alternative. What is the true cost you might be trying to avoid in terms of delaying that oh-so-urgent high priority ISR because you're not re-enabling interrupts in an ISR?

The worst case is that the longest-running low priority ISR runs to completion, making all the higher priority ISRs wait for it to complete before they can start. But after that all the remaining ISRs that are pending will complete in priority order -- highest to lowest priority. That's exactly what you want except for the low priority ISR clogging up the works. So if you have an obnoxiously long low priority ISR that's a problem. But if none of your ISRs run for very long (which is how you're supposed to write ISRs), you're fine. Put into scheduling terms, you want to make sure none of your ISRs runs long, because a long-running ISR gives you a high blocking time, and blocking time delays high priority tasks from completing.

Let's compare outcomes for the two alternative strategies. If you re-enable interrupts, the worst case latency for the highest priority ISR in the system is that it arrives, and then gets preempted by every other ISR in the system (including the longest-running ISR if it comes in later, but before the high priority ISR has a chance to complete). If you leave interrupts masked, the worst case is that the longest-running ISR has to complete, but then the high priority ISR goes immediately afterward. So, leaving interrupts disabled (masked) during every ISR is clearly a win for the worst case, in that you only have to wait for the longest-running ISR to complete before running the highest priority ISR, instead of waiting for all ISRs to complete. The worst case is typically what you care about in a real time embedded system, so you should leave interrupts disabled in ISRs to ensure the fastest worst-case completion time of high priority ISRs. And, leaving interrupts disabled in ISRs also gets rid of the risks of stack overflow and re-triggered ISRs we mentioned.

UPDATE: To avoid confusion, it's important to note that the above is talking about what happens at ONE level of interrupts, such that when one ISR is running no other interrupts run until the ISR completes or ISRs at that level complete. Many architectures have multiple levels, in which one ISR can interrupt another ISR at a lower level even if that lower level has interrupts masked. This corresponds to the comment about one ISR per level being active in the worst case. Also, note that if an architecture can change the priorities of interrupts within a single level that's irrelevant -- it is the existence of levels that are each individually maskable and that are prioritized as groups of interrupts per level that gives a way around some of these problems. So, going back to the title says, do not RE-enable the same level of interrupts within an ISR.

Monday, July 22, 2013

Making Main Loop Scheduling Timing More Deterministic

Summary: Wasting time in a main loop scheduler can make testing system-level timing a whole lot easier.

It's common enough to see a main loop scheduler in an embedded system along the lines of the following:

for(;;)
{ Task1();
Task2();
Task3();
}

I've heard this referred to as a "static scheduler," "cyclic executive," "main loop scheduler," or "static non-preemptive scheduler" among other terms. Regardless of what you call it, the idea is simple: run all the tasks that need to be run, then go back and do it again until the system is shut down or reset. There might be interrupt service routines (ISRs) also running in the system.

The main appeal of this approach is that it is simple. You don't need a real time operating system and, even more importantly, it would appear to be difficult to get wrong. But, there's a little more to it than that...

The first potential problem is those pesky ISRs. They might have timing problems, cause concurrency problems, and so on. Those issues are beyond what I want to talk about today except for making the point that they can affect the execution speed of one iteration through the main loop in ways that may not be obvious. You should do timing analysis for the ISRs (chapter 14 of my book has gory details). But for today's discussion we're going to assume that you have the ISRs taken care of.

The next problem is timing analysis of the main loop. The worst case response time for running a task is one trip through the loop. But how long that trip is might vary depending on the calculations each task performs and how much time ISRs steal. It can be difficult to figure out the absolute worst case execution time (you should try, but it might not be easy). But the really bad news is, even if you know the theoretical worst case timing you're unlikely to actually see it during testing.

Consider the tester trying to make sure the system will function with worst case timing. How do you get the above static scheduler to take the worst case path through the code a bunch of times in a row to see what breaks? It's a difficult task, and probably most testers don't have a way to pull that off. So what is happening is you are shipping product that has never been tested for worst case main loop execution time. Will it work? Who knows. Do you want to take that chance with 10,000 or 100,000 units in the field? Eventually one of them will see worst case conditions and you haven't actually tested what will happen.

Fortunately there is an easy way to mitigate this risk. Add a time-waster at the end of the main loop. The time waster should convert the above main loop, which runs as fast as it can, to a main loop that runs exactly once per a defined period (for example, once every 100 msec):

for(;;)
{ StartTimer(100); // start a 100 msec countdown
Task1();
Task2();
Task3();
WaitForTimer(0); // wait for the 100 msec countdown to reach 0
}

This is just a sketch of the code -- how you build it will depend upon your system. The idea is that you waste time in the WaitForTimer routine until you've spent 100 msec in the main loop, then you run the loop again. Thus, the main loop runs exactly once every 100 msec. If the tasks run faster than 100 msec as determined by a hardware timer, you waste time at the end, waiting for the 100 msec period to be up before starting the next main loop iteration. If the tasks take exactly 100 msec then you just start the main loop again immediately. If the tasks run longer than 100 msec, then you should log an error or perform some other action so you know something went wrong.

The key benefit to doing this is to ensure that in testing the average timing behavior is identical to the worst case timing behavior. That way, if something works when the system is fast, but breaks when it actually takes 100 msec to complete the main loop, you'll see it right away in testing. A second benefit is that since you are actively managing the main loop timing, you have a way to know the timing ran a little long on some loops even if it isn't bad enough to cause a watchdog reset.

Saturday, May 25, 2013

Adding Prioritization to an Single Level Interrupt Priority System

Summary of technique: Add a software structure that executes only the highest priority pending interrupt within the ISR polling loop. Then start again at the top of the polling loop instead of polling all possible ISRs. This gives you a prioritized non-preemptive interrupt service routine scheduler.

- - - - - - - - - - - - - - - - - - - -

With some microcontrollers, all of your interrupts come in at the same priority level (for example, via an external interrupt request pin). The usual thing to do in that case is create a polling loop to check all the sources of interrupts and see which one needs to be serviced by looking at peripheral status registers. For example:
if(HWTimerTick) { ... ISR to service hardware timer tick ... }
if(ADCReady) { ... ISR to service A to D converter ... }
if(SerialPortDataInReady ) { ... ISR to read a serial port byte... }
if(SerialPortDataOutReady) { ... ISR to write a serial port byte ... }
...
(Of course this isn't real code ... I'm just sketching a flow that you've seen before if you've written this type of ISR that polls all the devices that can cause interrupts to see which one actually needs to be serviced.)

If only one of these devices is active, then this approach should work pretty well. And if you do system-level testing probably things will work fine -- at least most of the time.

But the way you can get into trouble is if one of the interrupts has a short deadline for being serviced. Let's say you have the above code and are seeing serial input bytes being dropped once in a while. What could be happening?

One cause of dropping bytes might be that the HW Timer Tick and/or the ADC Ready interrupts are active at the same time that the serial port data input interrupt is ready. You need to execute them before you can get data from the serial port. If the sum of their two execution times is longer than the time between serial byte arrivals, you're going to take too long to get to the serial port input ISR and will drop bytes.

You might buy a faster processor (which might be unnecessary as we'll see), but before doing that you might reorganize the code to put the serial input first in the list of ISRs so you can get to it faster when an interrupt comes in:
if(SerialPortDataInReady ) { ... read a serial port byte... }
if(HWTimerTick) { ... service hardware timer tick ... }
if(ADCReady) { ... service A to D converter ... }
if(SerialPortDataOutReady) { ... write a serial

And that will *almost* work. Things might get a little better, but it won't cure the problem. (Or, MUCH worse, it will cure the problem in testing only to have the problem reappear in the field after you've shipped a lot of systems!) Now when you get an interrupt you'll service the serial port input ISR first. But, then you'll go off and do the other ISRs. If those other ISRs take enough time, you will be stuck in those other ISRs too long and will miss the next byte -- you won't get back to the top of the list of ISRs in time.

You might try re-enabling interrupts inside any long ISRs to let the serial port get processed sooner. But resist the temptation -- that probably won't work, and will likely result in stack overflows due to recursive interrupt processing. (Simple rule: NEVER re-enable interrupts from inside an ISR.)

What we really need here is prioritization. And it's pretty easy to get even though we don't have hardware interrupt prioritization. All you have to do is (1) put the checks for each ISR in priority order, and (2) only execute the first one in the list each time you process interrupts. This can be done as follows:
if(SerialPortDataInReady ) { ... read a serial port byte... }
else if(HWTimerTick) { ... service hardware timer tick ... }
else if(ADCReady) { ... service A to D converter ... }
else if(SerialPortDataOutReady) { ... write a serial port byte ... }

Now only the first active interrupt will be serviced and the rest ignored. When you drop out of this structure and exit, any pending interrupt will re-trigger the checks from the beginning, again executing the highest priority interrupt that is still active (i.e., the first active one in the list). This will continue until all pending interrupts have been processed. You can use a "while" loop around the code above, or in many systems it may make sense just to exit interrupt processing and let the hardware interrupts re-trigger to re-run the polling code as a new interrupt.

This approach means that the worst case delay between processing serial input bytes is no longer all the ISRs running (if all interrupts are active). Rather, the worst case is the single longest ISR happens to be running, completes, and the serial port input ISR runs next. This happens because the list only runs at most one ISR rather than all of them. If that one ISR runs too long to meet deadlines, then it's probably too "fat" and should be simplified or its job moved out of ISRs and into the main loop.

There is no free lunch. The lowest priority ISR (the one at the end of the list) might starve. Making sure you meet all your ISR deadlines is trickier with this structure. Without the "elseif" approach the worst case timing is easy to compute -- it is the run time of all ISRs. But it might be too slow to live with. With this structure you have a nonpreemptive prioritized scheduling system for ISRs, and need to use suitable math and a suitable scheduling approach. Generally you'd want to use rate monotonic analysis (RMA) suitably adapted for the ISRs being non-preemptive. The analysis may be a little more complex, but this approach might help you salvage a situation in which you're missing deadlines and have already committed to a certain speed of microcontroller.

(Note on terminology: technically the whole thing is one big ISR that calls a different function depending upon what's active. But I'm calling each such function an ISR because that is really what it does ... you're using a software dispatcher to pick which ISR to run instead hardware prioritization logic to pick an ISR.)

Thursday, April 25, 2013

Why Short Interrupt Service Routines Matter

Most embedded systems I see use interrupts to handle high priority events, which are typically triggered by some peripheral device. So far so good. But it is also common for these systems to have significant timing problems even though their CPUs are not 100% loaded.

Let's take an example of three interrupts and see how this type of thing can happen. Let's call their service routines IntH, IntM, and IntL (for high/medium/low priority), and assume this is a single-level interrupt priority system. By that I mean that these Interrupt Service Routines (ISRs) can't be interrupted by any of the others once they start executing.

Say that you write your software and you measure an idle task at taking 80% of the CPU. The most important ISR has highest priority, etc. And maybe this time it works fine. But eventually you'll run into a system which has timing problems. You're only 80% loaded; how could you have timing problems? To find out why, we need to dig deeper.

The first step is to measure the worst case (longest) execution time and worst case (fastest) period for each ISR. Let's say it turns out this way:

IntH: Execution time = 10 msec Period = 1000 msec
IntM: Execution time = 0.01 msec Period = 1 msec
IntL: Execution time = 2 msec Period = 100 msec

Let's take a look a the numbers. This task set is loaded at: (10/1000) + (0.01/1) + (2/100) = 4%.
BUT it will miss deadlines! How can that be?

IntM and IntL are both going to miss their deadlines (if we assume deadline = period) periodically. IntM will miss its deadline up to 10 times every time IntH runs, because the CPU is tied up for 10 msec with IntH, but IntM needs to run every 1 msec. So once per second IntM will miss its deadlines because it is starved by IntH.

OK, so maybe you saw that one coming. But there is a more insidious problem here. IntM can also miss its deadline because of IntL. Once IntL executes, it ties up the CPU for 2 msec, causing IntM to miss its 1 msec period. Even though IntL has a lower priority, once it runs it can't be interrupted, so it hogs the CPU and causes a deadline miss.

There are plenty of bandaids that can be tossed at this system (and I have the feeling I've seen them all in design reviews). The obvious hack of re-enabling interrupts partway through an ISR is dangerous and should not be used under any circumstance. It leads to timing-dependent stack overflows, race conditions and so on. And more importantly, re-enabling interrupts in an ISR is, in my experience, a sign that the designers didn't understand the root cause of the timing problems.

But there is a principled way to solve these problems involving two general rules:
- If possible, sort ISR and task priority by period; shortest period with highest priority. This minimizes effective CPU use when you do scheduling. To understand why this is important you'll need to read up on Rate Monotonic Scheduling and related techniques.
- Keep ISR worst case execution time as small as possible -- only a few handfuls of instructions. If you need to get more done, dump data from the ISR into a buffer and kick off a non-ISR task do do the processing. This prevents one ISR from making another miss its deadline and largely deflects the problem of ISRs not necessarily being assigned the priority you'd like in your particular hardware.

The key insight is that "important" and "priority" are not the same things. Priority is about making real time scheduling math work, and boils down to assigning highest priority to short-period and short-deadline tasks. Getting that to work in turn requires all ISRs (even low priority ones) to be short. The importance of an ISR from the point of view of functionality ("this function is more important to the customer") is largely irrelevant -- the point of real time scheduling is to make sure everything executes every time. Sometimes "important" and "short deadline" correspond, but not always. It is the deadline that should be paid attention to when assigning priorities if you want to meet real-time deadlines. (Or, put another way, "important" means real-time and unimportant means non-real-time.)

The discussion above also applies to systems with multiple levels of interrupt priorities. Within each level of priority (assuming one level can interrupt ISRs in another level), once a pig ISR starts none of the other interrupts at that task level can interrupt it.

Make all your ISRs short, and do the analysis to make sure the worst case clumping of ISR executions doesn't overload your CPU.

Sunday, December 16, 2012

Software Timing Loops

Once in a while I see someone using code that uses a timing loop to wait for some time to go by. Code might look like this:

// You should *NOT* use this code as-is!
#define SCALE 15501 // System-dependent time scaling factor

void WaitMS(unsigned long ms)
{ unsigned long counter = ms * SCALE; // Compute how long to wait
while(counter > 0) { counter--;} // Waste time
}

The idea is that the designer tweaks "SCALE" so that WaitMS takes exactly one msec for each integer value of the input ms. In other words WaitMS(5) waits for 5 msec. But, there are some significant problems with this code.

Different compiler versions and different optimization levels could dramatically change the timing of this loop depending upon the code that is generated. If you aren't careful, your system could stop working for this reason when you do a recompile, without you even knowing the timing has changed or why that has happened.
Changes to hardware can change the timing even if the code is recompiled. Changing the system clock speed is a fairly obvious problem. But other more subtle problems include clock throttling on high-end processors due to thermal management, putting the code in memory with a wait states for access, moving to a processor with instruction cache, or using a different CPU variant that has different instruction timings.
The timing will change based on interrupt service routines delaying execution of the while loop. You are unlikely to see the worst case timing disruption in testing unless you have a very deterministic system and/or really great testing.
This approach ties up the CPU, using power and preventing other tasks from running.

What I recommend is that you don't use software-based timing loops! Instead you should change WaitMS() to look at a hardware timer, possibly going to sleep (or yielding to other tasks) until the amount of time desired has passed. In this approach, the inner loop checks a hardware timer or a time of day value set by a timer interrupt service routine.

Sometimes there isn't a hardware timer or there is some other compelling reason to use a software loop. If that's the case, the following advice might prove helpful.

Make sure you put the software timing loop in one place like this instead of having lots of small in-line timing loops all over the place. It is hard enough to get one timing loop right!
The variable "counter" should be defined as "volatile" to make sure it is actually decremented each time through the loop. Otherwise the optimizer might decide to just eliminate the entire while loop.
You should calibrate the "SCALE" value somehow to make sure it is accurate. In some systems it makes sense to calibrate a variable during system startup, or do an internal sanity check during outgoing system test to make sure that it isn't too far from the right value. This is tricky to do well, but if you skip it you have no safety net in case the timing does change.

(There are no doubt minor problems that readers will point out as well depending upon style preferences. As with all my code examples, I'm presenting a simple "how I usually see it" example to explain the concept. If you can find style problems then probably you already know the point I'm making. Some examples: WaitMS might check for overflow, uint32_t is better than "unsigned long," "const unsigned long SCALE = 15501L" is better if your compiler supports it, and there may be mixed-length math issues with the multiplication of ms * SCALE if they aren't both 32-bit unsigned integers. But the above code is what I usually see in code reviews, and these style details tend to distract from the main point of the article.)

Wednesday, November 10, 2010

Embedded Software Risk Areas -- Implementation -- Part 2

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are some of the Implementation red flags (part 2 of 2):

Ignoring compiler warnings

Programs compile with ignored warnings and/or the compilers used do not have robust warning capability. A static analysis tool is not used to make up for poor compiler warning capabilities. The result can be that software defects which could have been caught by the compiler must be found via testing, or miss detection entirely. If assembly language is used extensively, it may contain the types of bugs that a good static analysis tool would have caught in a high level language.

Inadequate concurrency management

Mutexes or other appropriate concurrent data access approaches aren’t being used. This leads to potential race conditions and can result in tricky timing bugs.

Use of home-made RTOS

An in-house developed RTOS is being used instead of an off-the-shelf operating system. While the result is sometimes technically excellent, this approach commits the company to maintaining RTOS development skills as a core competency, which may not be the best strategic use of limited resources.

Thursday, October 21, 2010

Embedded Software Risk Areas -- Design

Design is skipped or is created after code is written

Developers create the design (usually in their heads) as they are writing the code instead of designing each module before that module is implemented. The design might be written down after code is written, but usually there is no written design. As a result, the structure of the implementation is messier than it ought to be.

Flowcharts are used when statecharts would be more appropriate

Flowcharts are used to represent designs for functions that are inherently state-based or modal and would be better represented using a state machine design abstraction. Associated code usually has deeply nested, repetitive “if” condition clauses to determine what state the system is in, rather than having an explicit state variable used to control a case statement structure in the implementation. The result is code that is significantly more bug prone code and difficult to understand than code based on a state-machine based design.

No real time schedule analysis

There is no methodical approach to real time scheduling. Typically an ad hoc approach to real time scheduling is used, frequently featuring conditional execution of some tasks depending upon system load. Testing rather than an analytic approach is used to ensure real time deadlines will be met. Often there is no sure way to know if worst case timing has been experienced during such testing, and there is risk that deadlines will be missed during system operation.

No methodical approach to user interface design

The user interface does not follow established principles (e.g., [5]), making use of the product difficult or error-prone. The interface might not take into account the needs of users in different demographic groups (e.g., users who are colorblind, hearing impaired, or who have trouble with fine motor control).

Monday, August 16, 2010

100% CPU Use With Rate Monotonic Scheduling

Rate Monotonic Scheduling (RMS) is probably what you should be using if you are using an RTOS. With rate monotonic scheduling you assign fixed priorities based solely on task execution periods. The fastest period gets the highest priority, and the slower periods get progressively lower priorities. It's that simple -- except for the math that limits maximum allowable CPU use.

The descriptions of Rate Monotonic theory (often called Rate Monotonic Analysis -- RMA) point out that you can't use 100% of the CPU if you want to guarantee schedulability. Maximum usage ranges between 69% to 85% of the CPU (see, for example Wikipedia for this analysis). And, as a result, many developers shy away from RMS. Who wants to pay up to about a third of their CPU capacity to the gods of scheduling? Nobody. Not even if this is an optimal way to schedule using fixed priorities.

The good news is that you can have your cake and eat it too.

The secret to making Rate Monotonic Scheduling practical is to use harmonic task periods. This requires that every task period evenly divide the period of every longer period. So, for example, the periods {20, 60, 120} are harmonic because 20 divides 60 evenly, 20 divides 120 evenly, and 60 divides 120 evenly. The period set {20, 67, 120} is not harmonic because, for example, 20 doesn't divide 67 evenly.

If you have harmonic periods, you can schedule up to 100% of the CPU. You don't need to leave slack! (Well, I'd leave a little slack in the real world, but the point is the math says you can go to 100% instead of saying you top out at a much lower utilization.) So, in a real system, you can use RMA without giving up a lot of CPU capacity.

What if your tasks aren't harmonic? Them make them so by running some tasks faster. For example, if your task periods are {20, 67, 120} change them to {20, 60, 120}. In other words, run the 67 msec task at 60 msec, which is a little faster than you would otherwise need to run it. Counter-intuitively, speeding up one task will actually lower your total CPU requirements in typical situations because now you don't have to leave unused CPU capacity to make the scheduling math work. Let's face it; in real systems task frequencies often have considerably leeway. So make your life easy and choose them to be harmonic.

You can find out more about RMA and how to do scheduling analysis in Chapter 14 of my book.

Better Embedded System SW