Controller Area Network (CAN)is a very popular embedded network protocol. It's widely used in automotive applications, and the low cost that results in makes it popular in a wide variety of other applications.
CAN is a very nice embedded protocol, and it's very appropriate for a wide variety of applications. But, like most protocols it has some faults. And like many protocol faults, they aren't widely advertised. They are the sort of thing you have to worry about if you need to create an ultra-dependable system. For a robust system architecture for every day control functions, probably they are not a huge deal.
Here is a summary of CAN Protocol faults I've run into.
(1) Unprotected header. The length bits of a CAN message aren't protected by the CRC. That means that a single corrupted bit in the length field can cause the CRC checker to look in the wrong place for the CRC. For example, if a 3-byte payload is corrupted to look like a 1-byte payload, the last two bytes of the payload will be interpreted as a CRC. This could cause an undetected 1-bit error. There are some other framing considerations that might or might not cause the error to be discovered, but it is a vulnerability in CAN that is fixed in later protocols via use of a dedicated header CRC (e.g., as done in FlexRay). I've never seen this written up for CAN in particular, but this is discussed on page 20 of [Driscoll09]
(2) Stuff bit errors compromising the CRC. A pair of bit errors in the bit-stuffed format of a CAN message (i.e., one on the wire) can cause a cascading of bit errors in the message, leading to undetected two-bit errors that are undetected by the CRC. See [Tran99] and slides 15-16 of
In general there are issues in any embedded network if the physical layer encoding can cause the propagation of one physical data bit fault into affecting multiple data bits.
(NOTE: added 10/16/2013. I just found about about the CAN-FD protocol. Among other things it includes the stuff bits in the CRC calculation to mitigate this problem. Whether or not there are any holes in the scheme I don't know, but it is clearly a step in the right direction.)
(3) CAN is intended to provide exactly-once delivery of messages via the use of a broadcast acknowledgement mechanism. However, there are faults with this mechanism. In one fault scenario, some nodes see an acknowledge but others don't, so some nodes will receive two copies of a message (thinking they are independent copies rather than a retransmission) while others receive just one copy. In another less likely (but plausible) scenario, the above happens but the sender fails before retransmitting, resulting in some recipients having a copy of the message and others never having received a copy. [Rufino98] Further info here: [Tindel20b]
(4) It is possible for a CAN node to lock up the network due to retries if a transmitter sees a different value on the bus than it transmitted. (This is a failure of the transmitter, but the point is one node can take down the network.) [Perez03]
(5) If a CAN nodes' transmitter gets stuck at a dominant bit value, it will jam up the entire network. The error counters inside each CAN controller are supposed to help mitigate this, but if you have a stuck-at-"on" transmitter that ignores "turn off" commands there isn't anything you can do with a standard CAN setup.
(6) CAN suffers from a priority inversion issue, which can cause delays of high priority messages due to queuing issues in nodes that locally use FIFO transmission order. [Tindell20] Note that strictly speaking this is a CAN driver and hardware issue rather than something inherent to the on-wire protocol. To quote Ken Tindell via a social media message: "Any software that does FIFO queuing of CAN frames in a critical application is broken."
There are various countermeasures you might wish to take to combat the above failure modes for highly dependable applications. For example, you might put an auxiliary CRC or checksum in the payload (although that doesn't solve the stuff bit vulnerability). You might shut down the network if the CAN controller error counter shows a high error rate. Or you might just switch to a more robust protocol such as FlexRay.
CAN is also not secure in that provides no message authentication. It was not designed to be secure, so this is more of limitation in scope than a design defect.
Have you heard of any other CAN failure modes? If so, let me know.
[Driscoll09] Driscoll, K., Hall, B., Koopman, P., Ray, J., DeWalt, M., Data Network Evaluation Criteria Handbook, AR-09/24, FAA, 2009.
[Tran99] Eushiuan Tran Tsung, Multi-Bit Error Vulnerabilities in the Controller Area Network Protocol, MS Thesis, Carnegie Mellon University ECE Dept, May 1999
[Rufino98] Rufino, J.; Verissimo, P.; Arroz, G.; Almeida, C.; Rodrigues, L., Fault-tolerant broadcasts in CAN, FTCS 1998, pp. 150-159.
[Perez03] Perez, J., Reorda, M., Veiolante, M.; Accurate dependability analysis of CAN-based networked systems, 2003 SBBCCI.
[Tindell20] Tindell, K., CAN Priority Inversion, June 29, 2020.
[Tindell20b[ Tindell, K., "CAN, atomic broadcast and Buridan's Ass," https://kentindell.github.io/2020/07/11/can-atomic-multicast/
A connection on LinkedIn asked me for help deciding between job offers. I can't provide personalize advice, but here are my thoughts in ...
There are only a handful of hardcover books left of the first edition, so I spend some time converting things over to an eBook & Paperba...
(If you want to know more, see my Webinar on CRCs and checksums based on work sponsored by the FAA.) If you are looking for a lightwei...
Cyclomatic complexity is a measure of the number of paths through a particular piece of code (a higher number means the software is more com...