Monday, May 10, 2010

Which Error Detection Code Should You Use?

Any time you send a message or save some data that might be corrupted in storage, you should think about using some sort of error detection code so you can tell if the data has been corrupted. If you do a web search you will find a lot of information about error detection codes. Some of it is great stuff. But much of it is incorrect, or on a good day merely suboptimal. It turns out that the usual rule of thumb of "do what the other guys do and you can't go far wrong" works terribly for error detection. There is lots of folk wisdom that just isn't right.


So, here is a guide to simple error correction in embedded systems. There is a journal paper with all the details (see this link), but this is the short version.

If you want the fastest possible computation with basic error detection:
  • Parity is fine if you have only one bit to spend, but takes about as much work to compute as a checksum.
  • Stay away from XOR checksums (often called LRCs or Longitudinal Redundancy Checks). Use an additive checksum instead to get better error detection at the same cost.
  • Use an additive checksum if you want something basic and fast. If possible, use a one's complement additive checksum instead of normal addition. This involves adding up all the bytes or words of your data using one's complement addition and saving the final sum as the checksum. One's complement addition cuts vulnerability to undetected errors in the top bit of the checksum in half. In a pinch normal integer addition will work, but gives up some error detection capability.

If you want intermediate computation speed and intermediate error detection:
  • Use a Fletcher checksum. Make sure that you use one's complement addition in computing the parts of that checksum, not normal integer addition. Normal integer addition just kills error detection performance for this approach.
  • Don't use an Adler checksum. In most cases it isn't as good as a Fletcher checksum and it is a bit slower to compute. The Adler checksum seems like a cool idea but it doesn't really pay off compared to a Fletcher checksum of the same size.
If you can afford to spend a little more computation speed to get a lot better error detection:
  • Use a CRC (cyclic redundancy check)
  • If you are worried about speed there are a variety of table lookup methods that trade memory for speed. CRCs aren't really as slow as people think they will be. Probably you can use a CRC, and you should if you can.  Mike Barr has a posting on CRC implementations.
  • Use an optimal CRC polynomial if you don't have to conform to a standard. If you use a commonly used polynomial because other people use it, probably you are missing out on a lot of error detection capability. (More on this topic in a later post.)
You can find more discussion about CRCs and Checksums at my blog on that topic: http://checksumcrc.blogspot.com/

    9 comments:

    1. reguarding: "If you are worried about speed there are a variety of table lookup methods that trade memory for speed. "

      What table methods are best for speed? can you elaborate on this?

      ReplyDelete
    2. There are several points in the tradeoff space between speed and size. Mike Barr's posting (follow the link in the blog posting) gives the short version of generic table methods.

      I have a paper that goes into this in significant detail. It turns out that if you pick the right type of polynomial with a lot of zero terms you can reduce table size and further increase speed. (Click below link to see the paper.)

      Ray, J., & Koopman, P. "Efficient High Hamming Distance CRCs for Embedded Applications," DSN06, June 2006

      ReplyDelete
    3. Personally I would trade off speed for better detection. A CRC is a mathematics used to make sure that data retain its integrity state while being transferred. It is a checking process that detects damaged data.

      If you receive such a CRC alert, it means that your file being read by your computer or application is corrupted.

      It's really depends on what is your processing objective here.

      ReplyDelete
    4. Hi Phil,when using CRC (cyclic redundancy check), these two are commonly used: binary CRC codes and binary non-CRC codes. According to some research I've made, binary CRC seems to be effective for error detection but software implementation is not really that efficient. Do you have any recommendations to which is better and more efficient software?

      ReplyDelete
    5. Generally embedded systems use binary CRCs because they are easy to compute and provide good error detection vs. complexity+speed tradeoffs. If it is good enough for your application it is usually a good choice to just stay with a binary CRC (in other words, just a plain CRC).

      If you need something with better error detection performance you need to get a coding theory expert to help you pick a code (or be ready to wade through some heavy math in a few books). There are plenty of codes and they all have effectiveness, complexity, speed, and size tradeoffs. But there is no one "best" code for all situations.

      ReplyDelete
    6. Natasha Z. CullisonApril 3, 2011 at 9:09 AM

      I try this "Error detection" and I use the "Parity bit" Parity bits can not only detect that an error has occurred, but also which bits have been inverted, and should therefore be re-inverted to restore the original data.

      Natasha Z. Cullison
      My Blog: http://www.SpanishLearn.org

      ReplyDelete
    7. Natasha,

      You may be referring to a Hamming Code (such as a single-error correction, double error detection code). You can also figure out which bits have been inverted with a CRC to correct them (although it is tricky to do). But, people usually use CRCs when they are only concerned with error detection.

      ReplyDelete
    8. I am in agreement with @Joe Kochesky to trade off speed for better detection.

      In our times, many defects like a vague error generation-scenario and missing of formalization which is the foundation for the habitual error detection subsist in field of code error. Moreover, the mechanization of error detection code will significantly have an effect on the value and effectiveness of software testing. I believe on more profoundly research on detection code that needs to be completed.

      Phil, I would be glad if you will suggest a link about the researches on Error Detection Code.

      Thanks

      ReplyDelete
    9. It is a big field and most of the research is pretty math-heavy. If you are just starting out this is as good a place as any to look (and has pointers to follow):

      http://en.wikipedia.org/wiki/Error_detection_and_correction

      ReplyDelete

    Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

    If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
    If you want a personal response please make sure to include your e-mail reply address. Thanks!

    Static Analysis Ranked Defect List

      Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...