Tuesday, September 13, 2016

How Safe Is The Tesla Autopilot?


It was only a matter of time until there was a fatality involving a "self-driving" car, because no car is likely to be 100% safe.  And it happened in June this year.  This week, Tesla announced a software update involving a number of strategy changes.  https://www.tesla.com/blog/upgrading-autopilot-seeing-world-radar

In this post I'm going to try to take a look at what safety claims can, and cannot, be made for the Tesla Autopilot based on publicly available information.  It's sufficient just to look at what Tesla itself says and assume it is 100% accurate to get a broad idea of where they are, and where they aren't.  Short version: they have a really long way to go to demonstrate they are at least as good as a normal vehicle.

The Math:

First, let's take Tesla's blog posting about the tragic death that occurred in June.
   https://www.tesla.com/blog/tragic-loss
Tesla claims 130 million miles of Autopilot driving, and a US fatality every 94 million miles.

One might be tempted to draw an incorrect conclusion from Tesla's statement that Autopilot is safer than manual driving, because 130 million is larger than 94 million. However, that is far too simplistic an approach.  Rather, the jury is still very much out on whether Tesla Autopilot will prove safer than everyday vehicles.

From a purely statistical approach, the question is: how many miles does Tesla have to drive to demonstrate they are at least as good as a human (94 million mile mean time to fatality).

This tool:  http://reliabilityanalyticstoolkit.appspot.com/mtbf_test_calculator tells you how long you need to test to get a 94M mile Mean Time Between Failure (MTBF) with 95% confidence.  Assuming that a failure is a fatal mishap in this case, you need to test 282M miles with no fatalities to be 95% sure that the mean time is only 94 million miles.  Yes, it's hard to get that lucky.  But if you do have a mishap, you have to do more testing to distinguish between 130 million miles having been lucky, or just reflecting normal statistical fluctuations in having achieved a target average mishap rate.  If you only have one mishap, you need a total of 446M miles to reach 95% confidence.  If another mishap occurs, that extends to 592M miles for a failure budget of two mishaps, and so on.  It would be no surprise if Tesla needs about 1-2 billion miles of accumulated experience for the statistical fluctuations to settle out, assuming the system actually is as good as a human driver.

Looking at it another way, given the current data (1 mishap in 130 million miles), this tool: http://reliabilityanalyticstoolkit.appspot.com/field_mtbf_calculator tells us that Tesla has only demonstrated an MTBF of 27.4M miles or better at 95% confidence at this point, which is less than a third of the way to break-even with a human.
(Please note, I did NOT say that they are only a third as safe as a human.  What I am saying is that the data available only supports a claim of about 29.1% as good.  Beyond that, the jury is still out.)

In other words, they need a whole lot more testing to make strong claims about safety.  This is because it is really difficult (usually impractical) to test safety into a system.  You usually need to do something more, which is one of the reasons why software safety standards such as ISO 26262 exist.

Threats To Validity:

With any analysis like this, it's important to be clear about the assumptions and possible flaws in analysis, of which there are often many.  Here are some that come to mind.

- The calculations above assume that it is more or less the same software for all 130 million miles.  If Tesla makes a "major" change to software, pretty much all bets are off as to whether the new software will be better or worse in practice unless a safety critical assurance methodology (e.g., ISO 26262) has been used.  (In fact, one can argue that *any* change invalidates previous test data, but that is a fine point beyond what I want to cover here.) Tesla says they're making a dramatic switch to radar as a primary sensor with this version.  That sounds like it could be a major change.  It would be no surprise if this software version resets the clock to zero miles of experience in terms of software field reliability both for better and for worse.

- The Tesla is most definitely not a fully autonomous vehicle.  Even Tesla makes it quite clear in their public announcements that constant driver attention and supervision is required.  So the Tesla experience isn't really about fully self-driving cars per se. Rather, it is about the safety of a human+car partnership (Level 3 autonomy in NHTSA-speak). That includes both the pros (humans can react to weird things the software might get wrong), and the cons (humans easily "drop out" of systems that don't require their attention).  If Tesla moves to Level 4 autonomy later, at best they're going to have to make a very nuanced argument to take credit for Level 3 autonomy experience.

- The calculations assume random independent failures, which in general is probably not a good assumption for software.  Whether it is a valid assumption for this scenario is an interesting question.

(Calculation note:  I just plugged miles instead of hours into the MTBF calculations, because the units don't really matter as long as you are consistent.  If you want to translate to hours and back, 30 mph is not a bad approximation for vehicle speed accounting for city and highway driving.  If you'd like 90% confidence or 99% confidence feel free to plug the numbers into the tools for yourself.)

Thursday, September 1, 2016

Toyota Unintended Acceleration Case Study Talk Update

My Toyota UA talk from a couple years ago has gotten a lot of views, but watching the video was a bit of a pain because of the proprietary streaming format it was in.

I was finally able to convert the files to .mp4.  So now it's a lot more accessible, especially from mobile devices:
Please see my original posting with more details about various other ways to view and download the materials.

Static Analysis Ranked Defect List

  Crazy idea of the day: Static Analysis Ranked Defect List. Here is a software analysis tool feature request/product idea: So many times we...