Better Embedded System SW: December 2010

Wednesday, December 22, 2010

Embedded Software Risk Areas -- People

Series Intro: this is one of a series of posts summarizing the different red flag areas I've encountered in more than a decade of doing design reviews of industry embedded system software projects. You can read more about the study here. If one of these bullets applies to your project, you should consider whether that presents undue risk to project success (whether it does or not depends upon your specific project and goals). The results of this study inspired the chapters in my book.

Here are the People red flags:

High turnover and developer overload

Developers have a high turnover rate. As a result, code quality and style varies. Lack of a robust paper trail makes it difficult to continue development. Often more important is that replacement developers may lack the domain experience necessary for understanding the details of system requirements.

No training for managing outsource relationships

Engineers who are responsible for interacting with outsource partners do not have adequate time and skills to do so, especially for multi-cultural partnering. This can lead to significant ineffectiveness or even failure of such relationships.

Saturday, December 18, 2010

International Shipping To Additional Countries and FBA

My publisher has recently added Fulfillment By Amazon (FBA) support for shipping. What this means is that the book is kept in stock at an Amazon warehouse and can be shipped as if it were any other Amazon.com product. This includes overnight shipping and international shipping to many more countries than can be supported via the Paypal fulfillment channel. Amazon prime shipping rates (where available) and most other Amazon policies apply. The discount from retail is less than on the author web site primarily because Amazon charges a significant fee for providing this service, but it does have advantages for many readers.

You can, of course, choose whichever channel makes sense to you based on total cost, delivery time, and whether you prefer to do business with Amazon. This link has pointers to both options.

Perhaps most importantly for many readers, the Amazon web site seems to indicate they will ship to India and China. If you have feedback about the Amazon service (both good and bad) please let me know. In particular, if you are from India or China and find that the service worked for you that would be very helpful to know. Thanks!

Wednesday, December 15, 2010

Embedded Software Risk Areas -- Project Management -- Part 2

Schedule not taken seriously

The software development schedule is externally imposed on an arbitrary basis or otherwise not grounded in reality. As a result, developers may burn out or simply feel they have no stake in following development schedules.

Presumption in project management that software is free

Project managers and/or customers (and sometimes developers) make decisions that presume software costs virtually nothing to develop or change. This is one contributing cause of requirements churn.

Risk of problems with external tools and components

External tools, software components, and vendors are a critical part of the system development plan, and no strategy is in place to deal with unexpected bugs, personnel turnover, or business failure of partners and vendors.

Disaster recovery not tested

Backups and disaster recovery plans may be in place but untested. Data loss can occur if backups are not being done properly.

Wednesday, December 8, 2010

Embedded Software Risk Areas -- Project Management -- Part 1

No version control

Sometimes source code is not under version control. More commonly, the source code is under version control but associated tools, libraries, and other support software components are not. As a result, it may be difficult or impossible to recreate and modify old software versions to fix bugs.

No backward compatibility and version management plan

There is no plan for dealing with backward compatibility with old products, product migration, or installations with a mix of old and new product versions. The result may be incompatibilities with fielded equipment or a combinatorial explosion of multi-component compatibility testing scenarios necessary for system validation.

Use of cheap tools (software components, etc.) instead of good ones

Developers have inadequate or substandard tools (for example, free demo compilers instead of paid-for full-featured compilers) because tool costs can’t be reckoned against savings in developer time in the cost accounting system being used. As a result, developers spend significant time creating or modifying tools to avoid spending money on tool procurement.

Saturday, December 4, 2010

Jack Ganssle Book Review

Jack Ganssle has published a review of my book. You can see it at this link.

Thanks Jack!

Thursday, December 2, 2010

Embedded Software Risk Areas -- An Industry Study

I've had the opportunity to do many design reviews of real embedded software projects over the past decade or so. About 95 reviews since 1996. For each review I usually had access to the project's source code and design documentation. And in most cases I got to spend a day with the designers in person. The point of the reviews was usually identifying risk areas that they should address before the product went into production. Sometimes the reviews were post mortems -- trying to find out what caused a project failure so it could be fixed. And sometimes the reviews were more narrow (for example, just look at security or safety issues for a project). But in most cases I (sometimes with a co-reviewer) found one or more "red flag" issues that really needed to be addressed.

In other postings I'll summarize the red flag issues I found from all those reviews. Perhaps surprisingly, even engineers with no formal training in embedded systems tend to get the basics right. The books that are out there are good enough for a well trained non-computer engineer to pick up what they need to get basic functionality right most of the time. Where they have problems are in the areas of complex system integration (for example, real time scheduling) and software process. I'm a hard-core lone cowboy techie at heart, and process is something I've learned gradually over the years as that sort of thing proved to be a problem for projects I observed. Well, based on a decade of design reviews I'm here to tell you that process and a solid design methodology matters. A lot. Even for small projects and small teams. Even for individuals. Details to follow in upcoming posts.

I'm giving a keynote talk at an embedded system education workshop at the end of October. But for non-academics, you'd probably just like me to summarize what I found:

(The green bar means it is things most embedded system experts think are the usual problems -- they were still red flags!) In other words, of all the red flag issues identified in these reviews, only about 1/3 were technical issues. The rest were either pieces missing from the software process or things that weren't being written down well enough.

Before you dismiss me as just another process guy who wants you to generate lots of useless paper, consider these points:

I absolutely hate useless paper. Seriously! I'm a techie, not a process guy. So I got dragged kicking and screaming to the conclusion that more process and more paper help (sometimes).

These were not audits or best practice reviews. What I mean by that is we did NOT say "the rule book says you have to have paper of type X, and it is missing, so you get a red flag." What we DID say, probably most of the time, was "this paper is missing, but it's not going to kill you -- so not a red flag." Only once in a while was a process or paperwork problem such high risk that it got a red flag.

Most reviews had only a handful of red flags, and every one was a time bomb waiting to explode. Most of the time bombs were in process and paperwork, not technology.

In other postings (click here to see the list as it grows) I'll summarize the 43 areas that had red flags (yes, including the technology red flags). They'll be organized, loosely, by phase in the design cycle. No doubt you have seen some project-killer risks that aren't on the lists, and I welcome brief war story postings about them.

Wednesday, December 1, 2010

Agile for Embedded Software and SQA (updated)

I'll be the first to admit that I'm not an eager adopter of Agile Methods, but I believe that there are some good ideas in the agile methods arena (and good ideas elsewhere as well). Based on some recent experiences I believe that agile can produce good long-lived embedded software ... but that it requires some things that aren't standard agile practices to get there.

Some folks use the "agile" banner as an excuse to have an ad hoc process. That's not going to work out so well for most embedded software. So we'll just move past that and assume that we're talking about an agile methods approach that uses a well defined process. (I didn't say "tons of paper" -- I said "well defined." That definition could be a single process flow chart on a whiteboard, some powerpoint slides, or the name of a book that is being followed.)

One of my major concerns with Agile as I have seen it practiced (and described by advocates) is that it is too light in Software Quality Assurance (SQA). This is by design -- as I understand it a primary point of Agile is to trust developers to do the right things, and audits aren't part of a typical trust picture. I've seen Agile do well, and I've seen Agile do not so well. But most interestingly, from what I've seen it is very difficult to tell if a gung-ho, competent Agile team that is saying they are doing all the right things is actually doing well or not in terms of resultant code quality. In other words, an agile team that says (and genuinely thinks) they are doing a good job might be producing great code or crummy code. And if nobody is there to find out which is happening, the team won't know the difference until it is much too late to fix things.

SQA is designed to make sure you're following the process you think you are following (whether agile or otherwise). Without an external check and balance it is all too easy to have a process failure. The failure could be skipped steps. It could be following the process too superficially, leading to ineffective efforts. It could be honest misunderstanding of the intent of the process rather than the letter of the process law. But in the end, I believe that you are taking a huge chance with any software process if you have no external check and balance

This isn't an issue of integrity, capability, or any other character flaw in developers. The simple fact is that developers and their management are paid to get the product done. While they can try their best to stay objective about the quality of their process, that isn't their primary job. The actual code is what they concentrate on, not the process. But everyone needs an external check and balance. And without one, upper management has no objective way to judge whether software development processes are working or broken (at least until some disaster happens, but then it's too late).

If I'm evaluating an Agile Methods group I ask the following questions: (1) Show me the written process you are following. (It can be an agile book; it can be a list of steps on a piece of paper; it can be just the Agile Manifesto. It can be anything.) (2) Explain to me how an external person can tell whether you are actually following that process. (3) Explain to me how an external person can tell that the process is producing the quality of software you want. I am flexible in terms of answers that are satisfactory, and none of these things preclude you from using an Agile Methods approach. But questions (2) and (3) typically aren't part of defined Agile Method approaches.

You'd be surprised how hard it is to answer those questions for many agile projects with an answer that amounts to something other than "trust us, we're professionals." I'm a professional too, but I'm not perfect. Nobody is. And I have learned in many different scenarios that everyone needs an external mechanism to make sure they are on track. Sometimes the answer is more or less "Agile always works." It doesn't just automatically work. Nothing does.

Sometimes an agile team works really well because of a single individual leader who makes it work. That's great, and it can be very effective. But in a year, or five, or ten, when that leader is promoted, wins the lottery, retires, or otherwise leaves the scene, you are at very high risk of having the team slip into a situation that isn't so good. So while getting a good leader to get things rolling however it can be done is great, a truly excellent leader will also plan for the day when (s)he isn't there. And that means establishing both a defined process and a quality check-and-balance on that process.

In my mind the above three questions capture the essence of SQA in a practical sense for most embedded system software projects. If you are happy with your answers for these questions, then Agile might work well for you. But if you simply blow off those topics and skimp on SQA, you are increasing your risk. And generally it's the kind of risk that eventually bites you.

Embedded Software Risk Areas -- Dependability -- Part 2

Insufficient consideration of system reset approach

System resets might not ensure a safe state during reboots that occur when the system is already in operation, resulting in unsafe transient actuator commands.

Neither run-time fault instrumentation nor error logs

There is no run-time instrumentation to record anomalous operating conditions, nor are there error logs to record events such as software crashes. This makes it difficult to diagnose problems in devices returned from customers.

No software update plan

There is no plan for distributing patches or software updates, especially for systems which do not have continuous Internet access. This can be an especially significant problem if the security strategy ends up requiring regular patch deployment. Updating software may require technician visits, equipment replacement, or other expensive and inconvenient measures.

No IP protection plan

There is no plan to protect the intellectual property of the product from code extraction, reverse engineering, or hardware/software cloning. (Protection strategies can be legal as well as technical.) As a result, competitors may find it excessively easy to successfully extract and sell products with exact software images or extracted proprietary software technology.

Better Embedded System SW