Tuesday, September 13, 2016

How Safe Is The Tesla Autopilot?


It was only a matter of time until there was a fatality involving a "self-driving" car, because no car is likely to be 100% safe.  And it happened in June this year.  This week, Tesla announced a software update involving a number of strategy changes.  https://www.tesla.com/blog/upgrading-autopilot-seeing-world-radar

In this post I'm going to try to take a look at what safety claims can, and cannot, be made for the Tesla Autopilot based on publicly available information.  It's sufficient just to look at what Tesla itself says and assume it is 100% accurate to get a broad idea of where they are, and where they aren't.  Short version: they have a really long way to go to demonstrate they are at least as good as a normal vehicle.

The Math:

First, let's take Tesla's blog posting about the tragic death that occurred in June.
   https://www.tesla.com/blog/tragic-loss
Tesla claims 130 million miles of Autopilot driving, and a US fatality every 94 million miles.

One might be tempted to draw an incorrect conclusion from Tesla's statement that Autopilot is safer than manual driving, because 130 million is larger than 94 million. However, that is far too simplistic an approach.  Rather, the jury is still very much out on whether Tesla Autopilot will prove safer than everyday vehicles.

From a purely statistical approach, the question is: how many miles does Tesla have to drive to demonstrate they are at least as good as a human (94 million mile mean time to fatality).

This tool:  http://reliabilityanalyticstoolkit.appspot.com/mtbf_test_calculator tells you how long you need to test to get a 94M mile Mean Time Between Failure (MTBF) with 95% confidence.  Assuming that a failure is a fatal mishap in this case, you need to test 282M miles with no fatalities to be 95% sure that the mean time is only 94 million miles.  Yes, it's hard to get that lucky.  But if you do have a mishap, you have to do more testing to distinguish between 130 million miles having been lucky, or just reflecting normal statistical fluctuations in having achieved a target average mishap rate.  If you only have one mishap, you need a total of 446M miles to reach 95% confidence.  If another mishap occurs, that extends to 592M miles for a failure budget of two mishaps, and so on.  It would be no surprise if Tesla needs about 1-2 billion miles of accumulated experience for the statistical fluctuations to settle out, assuming the system actually is as good as a human driver.

Looking at it another way, given the current data (1 mishap in 130 million miles), this tool: http://reliabilityanalyticstoolkit.appspot.com/field_mtbf_calculator tells us that Tesla has only demonstrated an MTBF of 27.4M miles or better at 95% confidence at this point, which is less than a third of the way to break-even with a human.
(Please note, I did NOT say that they are only a third as safe as a human.  What I am saying is that the data available only supports a claim of about 29.1% as good.  Beyond that, the jury is still out.)

In other words, they need a whole lot more testing to make strong claims about safety.  This is because it is really difficult (usually impractical) to test safety into a system.  You usually need to do something more, which is one of the reasons why software safety standards such as ISO 26262 exist.

Threats To Validity:

With any analysis like this, it's important to be clear about the assumptions and possible flaws in analysis, of which there are often many.  Here are some that come to mind.

- The calculations above assume that it is more or less the same software for all 130 million miles.  If Tesla makes a "major" change to software, pretty much all bets are off as to whether the new software will be better or worse in practice unless a safety critical assurance methodology (e.g., ISO 26262) has been used.  (In fact, one can argue that *any* change invalidates previous test data, but that is a fine point beyond what I want to cover here.) Tesla says they're making a dramatic switch to radar as a primary sensor with this version.  That sounds like it could be a major change.  It would be no surprise if this software version resets the clock to zero miles of experience in terms of software field reliability both for better and for worse.

- The Tesla is most definitely not a fully autonomous vehicle.  Even Tesla makes it quite clear in their public announcements that constant driver attention and supervision is required.  So the Tesla experience isn't really about fully self-driving cars per se. Rather, it is about the safety of a human+car partnership (Level 3 autonomy in NHTSA-speak). That includes both the pros (humans can react to weird things the software might get wrong), and the cons (humans easily "drop out" of systems that don't require their attention).  If Tesla moves to Level 4 autonomy later, at best they're going to have to make a very nuanced argument to take credit for Level 3 autonomy experience.

- The calculations assume random independent failures, which in general is probably not a good assumption for software.  Whether it is a valid assumption for this scenario is an interesting question.

(Calculation note:  I just plugged miles instead of hours into the MTBF calculations, because the units don't really matter as long as you are consistent.  If you want to translate to hours and back, 30 mph is not a bad approximation for vehicle speed accounting for city and highway driving.  If you'd like 90% confidence or 99% confidence feel free to plug the numbers into the tools for yourself.)

5 comments:

  1. Good observation and comments! Also, I assume drivers are "self-driving mode" when riding on the highway, when there are less obstacles and are avoiding to use it in an environment with too many factors of risk. It means that the numbers of miles they claimed are "easy mile" and does not prove that the car is handling difficult situations. Unfortunately, this is difficult to distinguish "easy" (where driving conditions are simple) and "hard" (lot of potential issues/hazards that must be handled by the software).

    That being said, these numbers only take care of the driver safety. Considering public safety could also be a good thing and account the number of bike/pedestrian accidents. This is an area where many humans fail and where self-driving cars could definitively help to make safer cities. I am not an expert on car safety and what the rules are - might be interesting to see how they address this.

    ReplyDelete
    Replies
    1. Julien -- these are all excellent points! Thanks for contributing. I think part of the overall problem is that people often say that "self driving cars are safer" when they should be saying "self driving cars have the potential to be safer". Whether/when they will achieve that potential is an important topic that should not be swept under the rug.

      Delete
  2. There are countries in the developing world which have a much higher fatality rate, and the number 94 million may be way lower there.

    If we plug in a smaller number in the tool, we get a number that is smaller to validate the required MTBF. The entire point is, are we using/assuming the same distribution of road and driving conditions for human drivers and these (semi)autonomous vehicles? By virtue of these cars being accessible to only a certain economic demography, is the distribution random enough to draw statistical conclusions?

    ReplyDelete
  3. This is another item on the threats to validity list. (I'm sure there are more, but it's more interesting to see which ones come in via comments.) If you believe that the Tesla field experience is representative of driving in the US, then arguably the computation is useful as an upper bound on credible safety claims. But the argument breaks down as soon as a vehicle gets into a situation that is not similar to that field experience. The first time a Tesla entered Pittsburgh (or pick your favorite city that can be confusing for human drivers) it was unclear how the car would do, and it's more difficult to argue that previous field experience assures a certain level of safety. Even more so for a different country that has different infrastructure, different different driver behaviors, and so on.

    Overall, the point of this posting is to say that even under extremely favorable (probably unrealistic) assumptions, Tesla has not even come close to establishing equivalent field safety to a human driver in the US. If you want to change variables such as country, the posting gives both a methodology for thinking about a bound on plausible safety claims given experience in that country, and a (growing) list of reasons why that bound might be unrealistically optimistic, as well as the limitations of applying such an analysis to a different driving environment.

    ReplyDelete
  4. Good points. I also wonder how it compares to human drivers that are sober, drug-free and rested. The number 94M miles per deadly incident may as well include those state that most drivers would not drive in.

    ReplyDelete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!