Better Embedded System SW: management

Showing posts with label management. Show all posts

Monday, October 9, 2017

Top Five Embedded Software Management Misconceptions

Here are five common management-level misconceptions I run into when I do design reviews of embedded systems. How many of these have you seen recently?

(1) Getting to compiled code quickly indicates progress. (FALSE!)

Many projects are judged by "coding completed" to indicate progress. Once the code has been written, compiles, and kind of runs for a few minutes without crashing, management figures that they are 90% there. In reality, a variant of the 90/90 rule holds: the first 90% of the project is in coding, and the second 90% is in debugging.

Measuring teams on code completion pressures them to skip design and peer reviews, ending up with buggy code. Take the time to do it right up front, and you'll more than make up for those "delays" with fewer problems later in the development cycle. Rather than measure "code completed" do something more useful, like measure the fraction of modules with "peer review completed" (and defects found in peer review corrected). There are many reasonable ways to manage, but waterfall-ish projects that treat "code completed" as the most critical milestone is not one of them.

(2) Smart developers can write production-quality code on a long weekend (FALSE!)

Alternate form: marketing sets both requirements and end date without engineering getting a chance to spend enough time on a preliminary design to figure out if it can actually be done.

The true bit is anyone can slap together some code that doesn't work. Some folks can slap together code in a long weekend that almost works. But even the best of us can only push so many lines of code in a short amount of time without making mistakes, much less producing something anyone else can understand. Many of us remember putting together hundreds or thousands of lines on an all-nighter when we were students. That should not be mistaken for writing production embedded code.

Good embedded code tends to cost about an hour for every 1 or 2 lines of non-comment code all-in, including testing (on a really good day 3 lines/hr). Some teams come from the Lake Wobegone school, where all the programmers are above average. (Is that really true for your team? Really? Good for you! But you still have to pay attention to the other four items on this list.) And sure, you can game this metric if you try. Nonetheless, it is remarkable how often I see a number well above about 2 SLOC/hour of deeply embedded code corresponding to a project that is in trouble.

Regardless of the precise productivity number, if you want your system to really work, you need to treat software development as a core competency. You need an appropriately methodical and rigorous engineering process. Slapping together code quickly gives the illusion of progress, but it doesn't produce reliable products for full-scale production.

(3) A “mostly working,” undisciplined prototype can be deployed. (FALSE!)

Quick and dirty prototypes provide value by giving stakeholders an idea of what to expect and allowing iterations to converge on the right product. They are invaluable for solidifying nebulous requirements. However, such a prototype should not be mistaken for an actual product! If you've hacked together a prototype, in my experience it's always more expensive to clean up the mess than it is to take a step back and start a project from scratch or a stable production code base.

What the prototype gives you is a solid sense of requirements and some insight into pitfalls in design.

A well executed incremental deployment strategy can be a compromise to iteratively add functionality if you don't know all your requirements up front. But an well-run Agile project is not what I'm talking about when I say "undisciplined prototype." A cool proof of concept can be very valuable. It should not be mistaken for production code.

(4) Testing improves software quality (FALSE!)

If there are code quality problems (possibly caused by trying to bring an undisciplined prototype to market), the usual hammer that is brought to bear is more testing. Nobody ever solved code quality problems by testing. All that testing does is make buggy code a little less buggy. If you've got spaghetti code that is full of bugs, testing can't possibly fix that. And testing will generally miss most subtle timing bugs and non-obvious edge cases.

If you're seeing lots of bugs in system test, your best bet is to use testing to find bug farms. The 90/10 rule applies: many times 90% of the bugs are in bug farms -- the worst 10% of the modules. That's only an approximate ratio, but regardless of the exact number, if you're seeing a lot of system test failures then there is a good chance some modules are especially bug-prone. Generally the problem is not simply programming errors, but rather poor design of these bug-prone modules that makes bugs inevitable. When you identify a bug farm, throw the offending module away, redesign it clean, and write the code from scratch. It's tempting to think that each bug is the last one, but after you've found more than a handful of bugs in a module, who are you kidding? Especially if it's spaghetti code, bug farms will always be one bug away from being done, and you'll never get out of system test cleanly.

(5) Peer review is too expensive (FALSE!)

Many, many projects skip peer review to get to completed code (see item #1 above). They feel that they just don't have time to do peer reviews. However, good peer reviews are going to find 50-75% of your bugs before you ever get to testing, and do so for about 10% of your development budget. How can you not afford peer reviews? (Answer: you don't have time to do peer reviews because you're too busy writing bugs!)

Have you run into another management misconception on a par with these? Let me know what you think!

Monday, May 8, 2017

Optimize for V&V, not for writing code

Geralt / CC0 PD/noattrib.

Writing code should be made more difficult so that Verification &Validation can be made easier.

I first heard this notion years ago at a workshop in which several folks from industry who build high assurance software (think flight controls) stood up and said that V&V is what matters. You might expect that from flight control folks, but their reasoning applies to pretty much every embedded project. That's because it is a matter of economics.

Multiple speakers at that workshop said that aviation software can require 4 or 5 hours of V&V for every 1 hour of creating software. It makes no economic sense to make life easy for the 1 hour side of the ratio at the expense of making life painful for the 5 hour side of the ratio.

Good, but non-life-critical, embedded software requires about 2 hours of V&V for every 1 hour of code creation. So the economic argument still holds, with a still-compelling multiplier of 2:1. I don't care if you're Vee, Agile, hybrid model or whatever. You're spending time on V&V, including at least some activities such as peer review, unit test, created automated tests, performing testing, chasing down bugs, and so on. For embedded products that aren't flaky, probably you spend more time on V&V than you do on creating the code. If you're doing TDD you're taking an approach that has the idea of starting with a testing viewpoint built in already, by starting from testing and working outward from there. But that's not the only way to benefit from this observation.

The good news is that making code writing "difficult" does not involve gratuitous pain. Rather, it involves being smart and a bit disciplined so that the code you produce is easier for others to perform V&V on. A bit of up front thought and organization can save a lot on downstream effort. Some examples include:

Writing concise but helpful code comments so that reviewers can understand what you meant.
Writing code to be obvious rather than clever, again to help reviewers.
Follow a style guide to make your code consistent, and thus easier to understand.
Writing code that compiles clean for static analysis, avoiding time wasted finding defects in test that a tool could have found, and avoiding a person having to puzzle out which warnings matter, and which don't.
Spending some time to make your unit interfaces easier to test, even if it requires a bit more work designing and coding the unit.
Spending time making it easy to trace between your design and the code. For example, if you have a statechart, make sure the statechart uses names that map directly to enum names rather than using arbitrary state variables such as "magic number" integers between 1 and 7. This makes it easier to ensure that the code and design match. (For that matter, just using statecharts to provide a guide to what the code does also helps.)
Spending time up front documenting module interaction so that integration testers don't have to puzzle out how things are supposed to work together. Sequence diagrams can help a lot.
Making the requirements both testable and easy to trace. Make every requirement idea a stand-alone sentence or paragraph and give it a number so it's easy to trace to a specific test primarily designed to test that particular requirement. Avoid having requirements in huge paragraphs of free-form text that mix lots of different concepts together.

Sure, these sound like a good idea, but many developers skip or skimp on them because they don't think they can afford the time. They don't have time to make their code clean because they're too busy writing bugs to meet a deadline. Then they, and everyone else, pay for this during the test cycle. (I'm not saying the programmers are necessarily the main culprits here, especially if they didn't get a vote on their deadline. But that doesn't change the outcome.)

I'm here to say you can't afford not to follow these basic code quality practices. That's because every hour you're saving by cutting corners up front is probably costing you double (or more) downstream by making V&V more painful than it should be. It's always hard to invest in downstream benefits when the pressure is on, but doing so is costing you dearly when you skimp on code quality.

Do you have any tricks to make code easier to understand that I missed?

Monday, November 7, 2016

Embedded System Software Quality: Why is it so often terrible? What can we do about it?

I had a great time meeting old friends and new folks at ISSRE 2016.

Here are slides from my keynote address:

Embedded System Software Quality:
Why is it so often terrible? What can we do about it?

Failures of embedded system software increasingly make the news. Everyday products we rely upon are suffering from safety issues, security issues, and just plain bugs. While perfection is unrealistic, surely we can improve this situation. Two key ideas apply: (1) embedded products often aren’t created by computer specialists, and (2) teaching application domain specialists just how to code is more of a problem than a solution.

Phil Koopman's ISSRE 2016 Keynote from edgecaseresearch

Monday, February 2, 2015

Tester To Developer Ratio Should Be 1:1 For A Typical Embedded Project

Believe it or not, most high quality embedded software I've seen has been created by organizations that spend twice as much on validation as they do on software creation. That's what it takes to get good embedded software. And it is quite common for organizations that don't have this ratio to be experiencing significant problems with their projects. But to get there, the head count ratio is often about 1:1, since developers should be spending a significant fraction on their time doing "testing" (in broad terms).

First, let's talk about head count. Good embedded software organizations tend to have about equal number of testers and developers (i.e., 50% tester head count, 50% developer head count). Also, typically, the testers are in a relatively independent part of the organization so as to reduce pressure to sign off on software that isn't really ready to ship. Ratios can vary significantly depending on the circumstance. 5:1 tester to developer ratio is something I'd expect in safety-critical flight controls for an aerospace application. 1:5 tester to developer ratio might be something I'd expect on ephemeral web application development. But for a typical embedded software project that is expected to be solid code with well defined functionality, I typically expect to see 1:1 tester to developer staffing ratios.

Beyond the staffing ratio is how the team spends their time, which is not 1:1. Typically I expect to see the developers each spend about two thirds of their time on development, and one-third of their time on verification/validation (V&V). I tend to think of V&V as within the "testing" bin in terms of effort in that it is about checking the work product rather than creating it. Most of the V&V effort from developers is spent doing peer reviews and unit testing.

For the testing group, effort is split between software testing, system testing, and process quality assurance (SQA). Confusingly, testing is often called "quality assurance," but by SQA what I mean is time spent creating, training the team on, and auditing software processes themselves (i.e., did you actually follow the process you were supposed to -- not actual testing). A common rule of thumb is that about 5%-6% of total project effort should be SQA, split about evenly between process creation/training and process auditing. So that means in a team of 20 total developers+testers, you'd expect to see one full time equivalent SQA person. And for a team size of 10, you'd expect to see one person half-time on SQA.

Taking all this into account, below is a rough shot at how effort across a project should break down in terms of head counts and staff.

Head count for a 20 person project:

10 developers
9 testers
1 SQA (both training and auditing)

Note that this does not call out management, nor does it deal with after-release problem resolution, support, and so on. And it would be no surprise if more testers were loaded at the end of the project than the beginning, so these are average staffing numbers across the length of the project from requirements through product release.

Typical effort distribution (by hours of effort over the course of the project):

33%: Developer team: Development (left-hand side of "V" including requirements through implementation)
17%: Developer team: V&V (peer reviews, unit test)
44%: V&V team: integration testing and system testing
3%: V&V team: Process auditing
3%: V&V team: Process training, intervention, improvement

If you're an agile team you might slice up responsibilities differently and that may be OK. But in the end the message is that you're probably only spending about a third of your time actually doing development compared to verification and validation if you want to create very high quality software and ensure you have a rigorous process while doing so. Agile teams I've seen that violated this rule of thumb often did so at the cost of not controlling their process rigor and software quality. (I didn't say the quality was necessarily bad, but rather what I'm saying is they are flying blind and don't know what their software and process quality is until after they ship and have problems -- or don't. I've seen it turn out both ways, so you're rolling the dice if you sacrifice V&V and especially SQA to achieve higher lines of code per hour.)

There is of course a lot more to running a project than the above. For example, there might need to be a cross-functional build team to ensure that product builds have the right configuration, are cleanly built, and so on. And running a regression test infrastructure could be assigned to either testers or developers (or both) depending on the type of testing that was included. But the above should set rough expectations. If you differ by a few percent, probably no big deal. But if you have 1 tester for every 5 developers and your effort ratio is also 1:5, and you are expecting to ship a rock-solid industrial-control or similar embedded system, then think again.

(By way of comparison, Microsoft says they have a 1:1 tester to developer head count. See: How We Test Software at Microsoft, Page, Johston & Rollison.)

Thursday, January 13, 2011

Embedded Software Risk Areas -- Five Forbodes Failure

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. The results of this study inspired the chapters in my book.

To conclude these series of postings, here is an observation to ponder.

One of the informal observations made across the course of these reviews was that developer teams with exactly 5 primary contributors have the most spectacular project failures. Invariably these teams had previously completed a project with 3 or 4 members successfully, and increased the team size to tackle a more complex project without making any changes in their software process. But they failed with the new, 5-person team.

While this is an anecdotal result, projects that grow past 4 developers in size should seriously consider switching to a heavier weight software process (more paper, more formality, more methodical rigor). Smaller teams still seem to benefit from good process, but basically can get away with informality with less dramatic risks than larger teams (5 or more developers) working on more complex projects.

Does this mean with fewer than 5 people you can simply ignore all the risk areas I've posted? No. Most of the reviews (perhaps 80%) were conducted on teams with fewer than 5 people. What this does mean is that with only one or a few developers you can get away with a lot and only have a few red flag risk areas. If you are lucky you will survive them, and if you are unlucky they will bite you. Hard. But most of the time you will slide by well enough that you will work nights, weekends, and have no social life -- but you will still have a job.

But, if you have more than 5 people you probably have to do most or even all of the process activities listed in these risk areas. If you blow off using a rigorous process, it is pretty likely you will fail. Probably you will fail spectacularly.

At least this is what I have observed doing 95 reviews over 10 years in industry. Your mileage may vary.

Wednesday, December 22, 2010

Embedded Software Risk Areas -- People

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are the People red flags:

High turnover and developer overload

Developers have a high turnover rate. As a result, code quality and style varies. Lack of a robust paper trail makes it difficult to continue development. Often more important is that replacement developers may lack the domain experience necessary for understanding the details of system requirements.

No training for managing outsource relationships

Engineers who are responsible for interacting with outsource partners do not have adequate time and skills to do so, especially for multi-cultural partnering. This can lead to significant ineffectiveness or even failure of such relationships.

Thursday, May 27, 2010

Managing developer staff by head count

Last I checked, engineers were paid pretty good salaries. So why does management act like they are free?

Let me explain. It is common to manage engineers by headcount. Engineering gets, say, 12 people, and their job is to make products happen. But this approach means engineers are an overhead resource that is just there to use without really worrying about who pays for it. Like electricity. Or free parking spots.

The good part about engineering headcount is it is simple to specify and implement. And it can let the engineering staff concentrate on doing design instead of spending a lot of time of marketing themselves to internal customers. But it can cause all sorts of problems. Here are some of the more common ones:

Decoupling of cost from workload. Most software developers are over-committed. You get the peanut butter effect. No matter how big your slice of bread, it always seems possible to spread the peanut butter a little thinner to cover it all. After too much spreading you spend all your time fighting fires and no time being productive. (Europeans -- feel free to substitute Nutella in this analogy! I'm not sure how well the analogy works Down Under -- I've only been brave enough to eat Vegemite once.)
Inability to justify tool spending. Usually tools, outside consultants, and other expenses come from "real" money budgets. Usually those budgets are limited. This often results in head-count engineers spending or wasting many hours doing something that they could get from outside much more cheaply. If a $5000 software tool saves you a month of time that's a win. But you can't do it if you don't have the $5000 to spend.
Driving use of the lowest possible cost hardware. Many companies still price products based on hardware costs and assume software is free. If you have an engineering headcount based system, engineers are in fact "free" (as far as the accountants can tell). This is a really bad idea for most products. Squeezing things into constrained software gets really expensive! And even if you can throw enough people at it, resource constraints increase the risk of bugs.

There are no doubt other problems with using an engineering headcount approach (drop me a line if you think of them!). And there are some benefits as well. But hopefully this gives you the big picture. In my opinion head counts do more harm than good.

Corporate budgeting is not a simple thing, so I don't pretend to have a simple magic wand to fix things. But in my opinion adding elements of the below will help over the long term:

Include engineering cost in product development cost. You probably budget for manufacturing tooling and other up-front (NRE) costs. Why isn't software development budgeted? This will at the very least screen out ill conceived products with expensive (complex, hard to get right) software put into marginally viable products.
If you must use head-counts, revisit them annually based on workload and adjust accordingly.
Treat headcount as a fixed resource that is rationed, not an infinite resource. Or at least make product lines pay into the engineering pool in proportion to how much engineering they use, even if they can't budget for it beforehand.

There aren't any really easy answers, but establishing some link between engineering workload and cost probably helps both the engineers and the company as a whole. I doubt everyone agrees with my opinions, so I'd like to hear what you have to say!

Better Embedded System SW