Better Embedded System SW

Monday, February 21, 2011

Categorizing Bug Severity

One of the imponderables when using a bug tracking system such as Bugzilla is how to assign "severity" to a particular defect entry. It is instinctive to assign severity based on how dramatic the outcome is. So in that sort of system an LED that sometimes doesn't blink might be "medium" and "low" and a system crash might be "critical." BUT, that's usually not the best way to look at severity.

Instead defect severity should be based on how urgent it is to get fixed for the business environment your product lives in. To use the above example, what if the non-blinking LED causes 1000 field support visits calls because people think the device isn't working properly (perhaps it is a cable modem and "blinking" means "I'm working properly")? What if the system crash is likely to happen to only one out of every 100,000 customers, and happens in a situation in which the customer is very likely just to cycle power to clear the problem without any big deal? In that situation the first defect might be "critical" and the second might only be "medium" severity.

So when you are thinking about assigning issue severity, consider the business context and not just how the defect feels to a tester or developer at a test bench. The general idea is that defect severity should correspond to value to the business of fixing the defect, and not how embarrassing it might be to the developers. (You can read more about good practices for defect tracking in chapter 24 of my book.)

Monday, January 24, 2011

Peer Reviews and the 50/50 Rule

Surprisingly, there is a fairly easy way to know if your peer reviews are effective. They should be finding about 50% of your defects, and testing should be finding the other 50%. In other words the 50/50 rule for peer reviews is they should find half the defects you find before shipping the product. If peer reviews aren't finding that many defects, something is wrong.

I base this rule of thumb on some study data and my observations of a number of real projects. If you want to find out if teams are really doing peer reviews (and doing ones that are effective), just ask what fraction of defects are found in peer reviews. If you get an answer in the 40%-60% range probably they're doing well. If you get a lower answer than 40%, peer reviews are being skipped or are being done ineffectively. If they answer is "we do them but don't log them" then most of the time they are being done ineffectively, but you need to dig deeper to find out what is going on.

If you are trying to find all your defects in test (instead of letting peer review get half of them for you), you are taking some big risks. Test is usually a more expensive way to find defects. More importantly, peer review tends to find many defects or poor design choices that are difficult to find by testing with any reasonable effort.

So, why make your testing expensive and your product more bug prone? Try some peer reviews and see what they find.

Thursday, January 13, 2011

Embedded Software Risk Areas -- Five Forbodes Failure

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. The results of this study inspired the chapters in my book.

To conclude these series of postings, here is an observation to ponder.

One of the informal observations made across the course of these reviews was that developer teams with exactly 5 primary contributors have the most spectacular project failures. Invariably these teams had previously completed a project with 3 or 4 members successfully, and increased the team size to tackle a more complex project without making any changes in their software process. But they failed with the new, 5-person team.

While this is an anecdotal result, projects that grow past 4 developers in size should seriously consider switching to a heavier weight software process (more paper, more formality, more methodical rigor). Smaller teams still seem to benefit from good process, but basically can get away with informality with less dramatic risks than larger teams (5 or more developers) working on more complex projects.

Does this mean with fewer than 5 people you can simply ignore all the risk areas I've posted? No. Most of the reviews (perhaps 80%) were conducted on teams with fewer than 5 people. What this does mean is that with only one or a few developers you can get away with a lot and only have a few red flag risk areas. If you are lucky you will survive them, and if you are unlucky they will bite you. Hard. But most of the time you will slide by well enough that you will work nights, weekends, and have no social life -- but you will still have a job.

But, if you have more than 5 people you probably have to do most or even all of the process activities listed in these risk areas. If you blow off using a rigorous process, it is pretty likely you will fail. Probably you will fail spectacularly.

At least this is what I have observed doing 95 reviews over 10 years in industry. Your mileage may vary.

Wednesday, December 22, 2010

Embedded Software Risk Areas -- People

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are the People red flags:

High turnover and developer overload

Developers have a high turnover rate. As a result, code quality and style varies. Lack of a robust paper trail makes it difficult to continue development. Often more important is that replacement developers may lack the domain experience necessary for understanding the details of system requirements.

No training for managing outsource relationships

Engineers who are responsible for interacting with outsource partners do not have adequate time and skills to do so, especially for multi-cultural partnering. This can lead to significant ineffectiveness or even failure of such relationships.

Saturday, December 18, 2010

International Shipping To Additional Countries and FBA

My publisher has recently added Fulfillment By Amazon (FBA) support for shipping. What this means is that the book is kept in stock at an Amazon warehouse and can be shipped as if it were any other Amazon.com product. This includes overnight shipping and international shipping to many more countries than can be supported via the Paypal fulfillment channel. Amazon prime shipping rates (where available) and most other Amazon policies apply. The discount from retail is less than on the author web site primarily because Amazon charges a significant fee for providing this service, but it does have advantages for many readers.

You can, of course, choose whichever channel makes sense to you based on total cost, delivery time, and whether you prefer to do business with Amazon. This link has pointers to both options.

Perhaps most importantly for many readers, the Amazon web site seems to indicate they will ship to India and China. If you have feedback about the Amazon service (both good and bad) please let me know. In particular, if you are from India or China and find that the service worked for you that would be very helpful to know. Thanks!

Wednesday, December 15, 2010

Embedded Software Risk Areas -- Project Management -- Part 2

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are some of the Project Management red flags (part 2 of 2):

Schedule not taken seriously

The software development schedule is externally imposed on an arbitrary basis or otherwise not grounded in reality. As a result, developers may burn out or simply feel they have no stake in following development schedules.

Presumption in project management that software is free

Project managers and/or customers (and sometimes developers) make decisions that presume software costs virtually nothing to develop or change. This is one contributing cause of requirements churn.

Risk of problems with external tools and components

External tools, software components, and vendors are a critical part of the system development plan, and no strategy is in place to deal with unexpected bugs, personnel turnover, or business failure of partners and vendors.

Disaster recovery not tested

Backups and disaster recovery plans may be in place but untested. Data loss can occur if backups are not being done properly.

Wednesday, December 8, 2010

Embedded Software Risk Areas -- Project Management -- Part 1

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are some of the Project Management red flags (part 1 of 2):

No version control

Sometimes source code is not under version control. More commonly, the source code is under version control but associated tools, libraries, and other support software components are not. As a result, it may be difficult or impossible to recreate and modify old software versions to fix bugs.

No backward compatibility and version management plan

There is no plan for dealing with backward compatibility with old products, product migration, or installations with a mix of old and new product versions. The result may be incompatibilities with fielded equipment or a combinatorial explosion of multi-component compatibility testing scenarios necessary for system validation.

Use of cheap tools (software components, etc.) instead of good ones

Developers have inadequate or substandard tools (for example, free demo compilers instead of paid-for full-featured compilers) because tool costs can’t be reckoned against savings in developer time in the cost accounting system being used. As a result, developers spend significant time creating or modifying tools to avoid spending money on tool procurement.