Monday, August 17, 2015

Three Suggestions from Les Chambers

Les Chambers is a knowledgeable software engineer with plenty of experience in critical embedded systems.  He had the following three comments about what kinds of "paper" (documentation) are essential for larger-scale embedded systems, which are all spot on.  (His points are in bullets below with permission and light editing for this format.)
  • Without an architectural design document it is impossible to plan and manage any software project with more than two or three people coding. You've got to know numbers of things to estimate resource requirements. How many functions, how many screens, how many objects. I've seen regularly a problem with big teams coming together and floundering around with nothing productive to do because the architect is still pulling his ideas together and has no vehicle for communicating them to the team.

Les is correct. Any time you have a project with more than a couple people you really need an architectural document of some sort.  Ideally a single sheet of paper, usually with boxes and arrows, that shows you what the pieces are and how they fit together. Once you have 5 people on the team, this is absolutely mandatory, and in my experience as Les says you will have just chaos until that picture is nailed down.

A related problem I've seen is when the architecture document shows all the hardware boxes and communication links, but software is nowhere to be found in the picture. You need to either put software on that same diagram or have a separate picture for the software structure that is compatible with the hardware architecture. Chapter 10 of my book has some general rules to help in constructing these types of diagrams. However, I'll be the first to admit that creating a good architecture is as much art as science. If you want to really delve into this, the best systems architecture book I've found is Rechtin's book on System Architecting (The first edition is by far the best for an initial read, if you can find it. The 3rd edition by Maier & Rechtin has a lot more material, but is a bit more complex if you are just going for the essentials.)
  • Another issue you've hinted at but not explicitly stated is the importance of detailed rationales for design approaches. The symptom here is, six months down the track, someone questions a nonintuitive design approach, spends a week working through the design rationale and then decides, "... oh yes it was right in the first place." Worse: someone, not in possession of the facts, changes an approach that was made for rational reasons and injects bugs.

Yes, I've seen this one too. This can especially be a problem if a design decision is made for an extra-functional purpose such as safety. For example, consider an aircraft in which two cable bundles are run down different sides of an aircraft.  Someone later might conclude that it is cheaper and easier to run them next to each other. Functionally there is (at least at first glance) no difference. But the point of separating the wires was so that if physical damage occurs to one part of the aircraft only one of the two cable bundles will be affected. (Could this happen? Read about the United Airlines Flight 232 crash where three hydraulic lines were damaged where they ran too close together.)

In general, it is a good idea to capture not just requirements but also design decisions with rationale so that the basis for important decisions is not lost. This is especially important for long-lived systems that are likely to be maintained and updated over periods of decades, which is a common enough situation in the embedded systems world.
  • Another piece of paper I think should be added is the configuration management documentation. Exactly what versions of what software are running on what versions of what hardware where. I once had to tackle this problem on a project with in excess of 200 computers deployed all over a [geographically distributed embedded system] network. The symptoms were: people in the development shop spending a week working on reproducing a bug found on site and being unable to do so because they are working on the wrong version of the code. Large deployment teams turning out to do site installation and having to abandon because they were armed with the wrong version of the software – incompatible with other control computers. The obvious solution was a database application, which took me three months to build, and turned out to be very useful - more useful than a stack of procedures and paper records.
And again, I've seen this one as well.  It is common enough for the configuration management for older systems to be a filing cabinet full of hard copy printouts, sometimes with the software for every single field installation having been customized by a field engineer. Usually the paper copy is out of date with reality.  If you find a bug, how can you fix it if you don't really even know what software is out in the field?

Configuration management is important not just for your build process, but also to keep track of what's out in the field. The most basic requirement is that your device needs to be able to tell you what version of software is installed (for example, with a start-up message). Beyond that, you really want a database that you can run queries on to find out what's out there. Such databases often get stale, so it's also very helpful to make a configuration audit part of every time you touch the equipment for maintenance to keep the database updated.

Les writes a thought-provoking (and nicely styled) long-form blog on system engineering :
The stories are interesting and well told, with a few twists and turns along the way.  For example, the article on Fagan inspections provides insight into how culture changes if you are serious about creating high quality software. The title gives you an idea of the style:  "Extreme Review: A Tale of Nakedness, Alsations and Fagan Inspection." Highly recommended reading.