Monday, January 6, 2014

Do Not Re-Enable Interrupts In An ISR


Summary: It might sound intuitive to re-enable interrupts within an ISR so higher priority interrupts can run without delay. But, in fact, this is probably the worst thing you can do for delay, and might even make the code unsafe due to stack overflow.



A previous post on rules for using interrupts included the rule:
  • "Don't re-enable interrupts within an Interrupt Service Routine (ISR).  That's just asking for subtle race condition and stack overflow problems."
Some developers take a different point of view, and feel that it is best to re-enable interrupts to let higher priority ISRs run without delay. The idea is that high priority interrupts should run as soon as possible, without having to wait for low priority interrupts to complete. Re-enabling interrupts within an ISR seems to let that happen. BUT, while there might be some intuitive appeal to this notion, this is a dangerous practice that makes things worse instead of better.

First, to recap, when an interrupt triggers an ISR one of the first things that happens is that further interrupts get masked by the interrupt handling hardware mechanisms. Once the ISR starts running, at some point (best is at the beginning) it acknowledges the interrupt source, clearing the interrupt request that triggered the ISR. At that point the ISR can re-enable interrupts if it wants to, or leave them masked until the ISR completes execution. (The "return from interrupt" instruction will typically restore interrupt flags, re-enabling interrupts as appropriate when the ISR completes.) If interrupts are re-enabled within the ISR, then another interrupt can suspend the ISR and run some other, second ISR. This means that if a higher priority interrupt comes along, it can run its ISR right away.

The problem is that other, bad, things can also happen once interrupts are re-enabled:
  •  If a lower priority interrupt comes along, it also gets to run, suspending the currently running, higher priority ISR. Interrupt priority hardware does not keep track of interrupt history after an ISR starts and acknowledges its interrupt source, and so loses track of how high the priority is for the running ISR. Worse, if a high and low priority interrupt happen at the same time, this approach guarantees that the high priority ISR waits for the low priority ISR. This happens because the high priority ISR runs first, then gets preempted by the lower priority ISR as soon as interrupts are re-enabled. In the case where no other interrupts are pending, the low priority ISR runs to completion before the high priority ISR gets to finish.
  • The ISRs nest as above rather than running one at a time, filling up the stack. You might be able to account for this by allocating enough stack for all ISRs to be active at the same time.  But if you leave interrupts masked in ISRs. the worst case is only the single biggest ISR stack use. (Some hardware has multiple tiers/levels/classes ... pick your favorite term ... of interrupts, but in that case it is still only one ISR of stack use per tier rather than one per ISR source.)
  • The same ISR might run more than once at a time, especially if it got unlucky and was preempted by other ISRs, delaying its completion time. For example, if you get a burst of noise on an ISR hardware line you might kick of a half dozen or so copies of the same ISR. Or once in a while hardware events happen close together and re-trigger the ISR. This will lead to trouble if your ISR code is not re-entrant. It also could overflow the stack, ending up in memory corruption, etc. unless you can accurately predict or limit how many times ISRs can be re-triggered in absolute worst-case conditions.
You could say "the highest priority ISR doesn't re-enable interrupts"  -- but what about the second-highest priority ISR? Once you get more than a couple ISRs involved this gets hopeless to untangle. You could try to write some sort of ISR handler to mitigate some of these risks, but it's going to be difficult to get right, and add overhead to every ISR. In all, the situation sounds pretty messy and prone to problems .. and it is. You might get away with this on some systems some of the time if you are really good (and never make mistakes). But, getting concurrency-related tricky code right is notoriously difficult.  Re-enabling interrupts is just asking for problems.

So let's look at the alternative. What is the true cost you might be trying to avoid in terms of delaying that oh-so-urgent high priority ISR because you're not re-enabling interrupts in an ISR? 

The worst case is that the longest-running low priority ISR runs to completion, making all the higher priority ISRs wait for it to complete before they can start. But after that all the remaining ISRs that are pending will complete in priority order -- highest to lowest priority. That's exactly what you want except for the low priority ISR clogging up the works. So if you have an obnoxiously long low priority ISR that's a problem. But if none of your ISRs run for very long (which is how you're supposed to write ISRs), you're fine. Put into scheduling terms, you want to make sure none of your ISRs runs long, because a long-running ISR gives you a high blocking time, and blocking time delays high priority tasks from completing. 

Let's compare outcomes for the two alternative strategies. If you re-enable interrupts, the worst case latency for the highest priority ISR in the system is that it arrives, and then gets preempted by every other ISR in the system (including the longest-running ISR if it comes in later, but before the high priority ISR has a chance to complete).  If you leave interrupts masked, the worst case is that the longest-running ISR has to complete, but then the high priority ISR goes immediately afterward. So, leaving interrupts disabled (masked) during every ISR is clearly a win for the worst case, in that you only have to wait for the longest-running ISR to complete before running the highest priority ISR, instead of waiting for all ISRs to complete. The worst case is typically what you care about in a real time embedded system, so you should leave interrupts disabled in ISRs to ensure the fastest worst-case completion time of high priority ISRs. And, leaving interrupts disabled in ISRs also gets rid of the risks of stack overflow and re-triggered ISRs we mentioned.

UPDATE: To avoid confusion, it's important to note that the above is talking about what happens at ONE level of interrupts, such that when one ISR is running no other interrupts run until the ISR completes or ISRs at that level complete. Many architectures have multiple levels, in which one ISR can interrupt another ISR at a lower level even if that lower level has interrupts masked. This corresponds to the comment about one ISR per level being active in the worst case. Also, note that if an architecture can change the priorities of interrupts within a single level that's irrelevant -- it is the existence of levels that are each individually maskable and that are prioritized as groups of interrupts per level that gives a way around some of these problems. So, going back to the title says, do not RE-enable the same level of interrupts within an ISR.

14 comments:

  1. Thanks for the post. It is an interesting observation, that with interrupts re-enabled the ISRs are actually executed in a reverse order of their priority levels.
    BTW, isn't the worst case latency in the interrupt re-enable approach actually unbounded?

    ReplyDelete
    Replies
    1. You are correct in the strictly worst timing case. But you're going to overflow your stack before you hit an infinitely long time. My summary was assuming that you can make assurances that each ISR will execute at most once before the highest priority ISR finally gets to go (e.g., as one might do using Rate Monotonic Analysis).

      Delete
  2. I see your point, though I am not sure if RMA would give any meaningful result in such setting, where priorities are practically non-existent.

    ReplyDelete
  3. Thanks for the interesting post. Another option would be to just re-enable handling of higher priority interrupts within the ISR, i.e. modify the interupt mask accordingly. What do you think about this?

    ReplyDelete
    Replies
    1. Yes, that's fine and I mentioned its effects but didn't emphasize. If you have multiple levels of interrupts then the worst case is one per level, which is usually fine. The word "level" might be some other term in your system. An example is "hi" and "low" priority levels where hi priority can preempt lo priority, but hi priority can't preempt another hi priority level interrupt. Another way to look at this is if you have 4 different mask bits for 4 different levels, then worst case is one ISR per mask bit taking up stack space = 4 total.

      Delete
  4. I am indeed using a system with different "levels" or "priorities". For us it is mandatory that a high priority ISR can interrupt a lower priority one ASAP (very critical timing, only for the highest prio ISRs). Re-enabling interrupts (globally, therefor allow higher prio ISRs only) is absolutely possible and safe. Your remarks on stack usage are correct, but in this case are limited to the number of ISR priorities that you use.
    Anyway, thanks for sharing this!

    ReplyDelete
    Replies
    1. Right -- if you have a fixed number of priority LEVELS that can interrupt lower priority LEVELS that is fine so long as you allocate enough stack space for all priority levels to have active interrupts at the same time. The part where you get into trouble is when multiple prioritized interrupts at the same level can interrupt each other.

      Beyond your comment there is a lot of confusion over this I think because different CPUs do things so differently. Some processor families have lots of levels, and some have only one or two levels. The fewer levels you have the more likely this issue is to become a big deal in your designs.

      Delete
  5. what if you clear the stack after every interrupt call? like:
    1.interrupt call
    2.ISR initiation
    3.clear stack
    4. run rest of the ISR
    5. end ISR
    6. Back to main program

    ReplyDelete
    Replies
    1. I'd avoid it. There are just so many ways this can go wrong. At some point if you clear the stack either the stuff you are clearing out needs to get put somewhere (which can itself overflow -- so you've just punted the problem to another memory range) or you lose information by throwing it away.
      Just as importantly, this doesn't address the problem with lower priority ISRs stalling the high priority ISR as described.

      Delete
    2. What if I don't need that information?

      Delete
    3. BTW How do you clear stacks?

      Delete
    4. I would suggest not clearing stacks even if you think you don't the information on them. It is better to build your system so you aren't messing with stacks outside the normal mechanisms. It is just too easy to get things wrong initially, and even more likely there will be a problem if you have to make changes to the system later that violate assumptions you initially made.

      Delete
  6. I really dont need the information in the stacks. No assumptions will be violated. So help me maybe?

    ReplyDelete
    Replies
    1. Subject to my previous caution that messing with the stacks is an overall dangerous idea, the answer depends on your CPU and the compiler you are using. Generally it is something like an assembly language instruction that changes the stack pointer value (maybe popping items off it; more often adding an offset to the current value to change it). You'll need to consult specific reference materials for your system. (And, in general, if you can't figure it out from those materials, you probably shouldn't be doing it.) I don't give system-specific advice on this blog.

      Delete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!