Monday, September 13, 2010

Testing a Watchdog Timer

I recently got a query about how to test a watchdog timer. This is a special case of the more general question of how you test for a fault that is never supposed to happen. This is a tricky topic in general, but the simple answer (if you can figure out how to do it easily) is to insert the fault intentionally into your system and see what happens.

Here are some ideas that I've seen or thought up that you may find helpful:

(1) Set the watchdog timer period to shorter and shorter periods until it trips in normal operation. This will give you an idea how close to the edge you are. But, it is easy to make a mistake changing the watchdog period back to the normal value. So it isn't a good test for nearly-final code.

(2) Add a timeout loop (for example, a do-nothing loop that you have made sure won't be removed by your optimizer). Increase the timeout value until the watchdog timer trips. This similarly gives you an idea how close to the edge you are in terms of timing, but with a nicer level of detail. It also has the advantage of testing operation without modifying the watchdog code itself.

(3) Use a jumper that, when inserted, activates a time-wasting task (similar to idea #2). The idea is when a jumper is installed it enables the running of a task that wastes so much time it is guaranteed to trip the watchdog. When the jumper is removed, that task doesn't run and the system operates properly.  You can insert the jumper during system test to make sure the watchdog function works properly.  (So just jumping to the watchdog handling code isn't the idea -- you have to simulate a situation of CPU overload for this to be a realistic test.)  When you ship the system, you make sure the jumper has been removed. Just to avoid problems, put the watchdog test first on the outgoing test plan instead of last. That way the jumper is sure to be removed before shipment. If you want to be really clever, that same jumper hardware could be used to disable the watchdog in early testing, and code could be changed to have it trip the watchdog as the system nears completion.  But whether you want to be that tricky is a matter of taste and the type of system you are building.

1 comment:

  1. Testing a watchdog timer raises some very interesting and complex issues. A closely related problem that is well known in computer science is the halting decision, analyzed thoroughly over the years since Turing first raised it.
    Since we can not anticipate what a bug in code might do, we can't really do a test. Systems like the Martian rovers with watchdogs are primarily responding to radiation induced errors, and the element of randomness is probably necessary for any real WDT test. Somewhat like the Schrodinger cat in that regard!

    ReplyDelete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!

Job and Career Advice

I sometimes get requests from LinkedIn contacts about help deciding between job offers. I can't provide personalize advice, but here are...