Tuesday, October 2, 2018

Cost of highly safety critical software

It's always interesting to see data on industry software costs. I recently came across a report on software costs for the aviation industry. The context was flight-critical radio communications, but the safety standards discussed were DO-178B and DO-254, which apply to flight controls as well.

Here's the most interesting picture from the report for my purposes:


(Source: Page 28 https://www.eurocontrol.int/sites/default/files/content/documents/communications/29012009-certification-cost-estimation-for-fci-platform.pdf.pdf )

Translating from DO-178B terminology, this means:

  • DAL A  (failure would be "catastrophic"):  3 - 12 SLOC/day
  • DAL B  (failure would be "hazardous"): 8 - 20 SLOC/day
  • DAL C (failure would be "major"): 15 - 40 SLOC/day
  • DAL D (failure would be "minor"): 25 - 64 SLOC/day
Worth noting is that, in my experience, really solid mission critical but NOT life-critical embedded software can be done at up to 16 SLOC per day for well-run experienced teams, so it tends to line up with DAL B costs.


For interpretation, "DAL" expresses a criticality level (a "Development Assurance Level"), with more critical software requiring more rigorous processes.  The document has quite a lot to say about how the engineering process works, and is worth a read if you want to see how the aviation folks do business.  (I'm aware that DO-178C is out, but this paper talks about the older "B" version.)    Note that there are other cost models in the paper that are less pessimistic in that report, but this is the one that says "industry experience."

Have you found other cost of software data for embedded or mission critical systems?

Saturday, September 8, 2018

Different types of risk analysis: ALARP, GAMAB, MEMS and more

When we talk about how much risk is enough, it is common to do things like compare the risk to current systems, or argue about whether something is more (or less) likely than events such as being killed by lightning. There are established ways to think about this topic, each with tradeoffs.

Tightrope Walker


The next time you need to think about how much risk is appropriate in a safety-critical system, try these existing approaches on for size instead of making up something on your own:

ALARP: "As Low As Reasonably Practicable"  Some risks are acceptable. Some are unacceptable. Some are worth taking in exchange for benefit, but if that is done the risk must be reduced to be ALARP.

GAMAB: "Globalement Au Moins Aussi Bon"  Offer a level of risk at least as good as the risk offered by an equivalent existing system. (i.e., no more dangerous than what we have already for a similar function)

MEM: "Minimum Endogenous Mortality"  The technical system must not create a significant risk compared to globally existing risks. For example, this should cause a minimal increase in overall death rates compared to the existing population death rates.

MGS: "Mindestens Gleiche Sicherheit"   (At least the same level of safety) Deviations from accepted practices must be supported by an explicit safety argument showing at least the same level of safety. This is more about waivers than whole-system evaluation.

NMAU: "Nicht Mehr Als Unvermeidbar"  (Not more than unavoidable)  Assuming there is a public benefit to the operation of the system, hazards should be avoided by reasonable safety measures implemented with reasonable cost.

Each of these approaches has pros and cons.  The above terms were paraphrased from this nice discussion:
Kron, On the evaluation of risk acceptance principles,
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.455.4506&rep=rep1&type=pdf

There is an interesting set of slides that covers similar ground here, and works some examples. In particular the graphs involving whether risks are taken voluntarily for different scenarios is thought provoking:
http://agse3.informatik.uni-kl.de/teaching/suze/ws2014/material/folien/SRES_03_Risk_Acceptance.pdf

In general, if you want to dig deeper into this area, a search on
    gamab mem alarp 
will bring up a number of hits

Also note that legal and other types of considerations exist, especially regarding product liability.

Monday, March 12, 2018

Embedded Code Quality and Best Practices Training Videos full length

I've posted the full series of my available embedded system code quality and related best practices videos on YouTube.  These are full-length narrated slides of the core set of safety topics from my new course.  They concentrate on getting the big picture about code quality and good programming practices.
Each of the videos is posted to YouTube as a playlist, with each video covering a slide or two. The full lecture consists of playing the entire play list, with most lectures being 5-7 videos in sequence. (The slide download has been updated for my CMU grad course, so in general has a little more material than the original video. They'll get synchronized eventually, but for now this is what I have.)

Obviously there is more to code quality and safety than just these topics. Additional topics are available slides-only.  You can see the full set of course slides including for those lectures and others here:
  https://users.ece.cmu.edu/~koopman/lectures/index.html#642

Sunday, February 25, 2018

New Blog on Self-Driving Car Safety

I'm doing a lot more work on self-driving car (autonomous vehicle) safety, so I've decided to split my blogging for that activity.  I'll still post more general embedded system topics here, perhaps with reduced frequency.

You can see my new blog on self-driving car safety here:
    https://safeautonomy.blogspot.com

Just to keep perspective, self-driving cars are still very complex embedded systems. You need to get the basics right (this blog) if you want them to be safe!

Friday, February 16, 2018

Robustness Testing of Autonomy Software (ASTAA Paper Published)

I'm very pleased that our research team will present a paper on Robustness Testing of Autonomy Software at the ICSE Software Engineering in Practice session in a late May. You can see a preprint of the paper here:  https://goo.gl/Pkqxy6

The work summarizes what we've learned across several years of research stress testing many robots, including self-driving cars.

ABSTRACT
As robotic and autonomy systems become progressively more present in industrial and human-interactive applications, it is increasingly critical for them to behave safely in the presence of unexpected inputs. While robustness testing for traditional software systems is long-studied, robustness testing for autonomy systems is relatively uncharted territory. In our role as engineers, testers, and researchers we have observed that autonomy systems are importantly different from traditional systems, requiring novel approaches to effectively test them. We present Automated Stress Testing for Autonomy Architectures (ASTAA), a system that effectively, automatically robustness tests autonomy systems by building on classic principles, with important innovations to support this new domain. Over five years, we have used ASTAA to test 17 real-world autonomy systems, robots, and robotics-oriented libraries, across commercial and academic applications, discovering hundreds of bugs. We outline the ASTAA approach and analyze more than 150 bugs we found in real systems. We discuss what we discovered about testing autonomy systems, specifically focusing on how doing so differs from and is similar to traditional software robustness testing and other high-level lessons.

Authors:
Casidhe Hutchison
Milda Zizyte
Patrick Lanigan
David Guttendorf
Mike Wagner
Claire Le Guoes
Philip Koopman


Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...