Unintentional modification of data values can cause arbitrarily bad system behavior, even if only one bit is modified. Therefore, safety critical systems should take measures to prevent or detect such corruptions, consistent with the estimated risk of hardware-induced data corruption and anticipated software quality. Note that even best practices may not prevent corruption due to software concurrency defects.
- When hardware data corruption may occur and automatic error correction is desired, use a hardware single error correction/multiple error detection circuitry (SECMED) form of Error Detection and Correction circuitry (EDAC), sometimes just called Error Correcting Code circuitry (ECC) for all bytes of RAM. This would protect against hardware memory corruption, including hardware corruption of operating system variables. However, it does not protect against software memory corruption.
- Use a software approach such as a cyclic redundancy code CRC (preferred), or checksum to detect a corrupted program image, and test for corruption at least when the system is booted.
- Use a software approach such as keeping redundant copies to detect software data corruption of RAM values.
- Use fault injection to test data corruption detection and correction mechanisms.
- Perform memory tests to ensure there are no hard faults.
- Brown, D., “Solving the software safety paradox,” Embedded System Programming, December 1998, pp. 44-52.
- Kleidermacher, D. & Griglock, M., Safety-Critical Operating Systems, Embedded Systems Programming, Sept. 2001, pp. 22-36.
- Seeger, M., Fault-Tolerant Software Techniques, SAE Report 820106, International Congress & Exposition, Society of Automotive Engineers, 1982, pp. 119-125.
- Stepner, D., Nagarajan, R., & Hui, D., Embedded application design using a real-time OS, Design Automation Conference, 1999, pp. 151-156.
- Vinter, J., Aidemark, J., Folkesson, P. & Karlsson, J., Reducing critical failures for control algorithms using executable assertions and best effort recovery, International Conference on Dependable Systems and Networks, 2001, pp. 347-356.