Featured Product
This Week in Quality Digest Live
Risk Management Features
Hayder Radha
And what that means for the future of self-driving cars
David Stevens
Tracking your assets is critical to patient safety
Richard Harpster
Good news? You are probably already doing it.
Adam Zewe
Researchers find the root cause of side-channel attacks that are easy to implement but difficult to detect
Naresh Pandit
Enter the custom recovery plan

More Features

Risk Management News
Major ERP projects take six months longer than companies were told
Three webinars to increase participation and understanding within the world of quality assurance
Partnership bolsters defense against growing cybersecurity risks
It is a smart way to eliminate waste and maximize value
An early warning system lets Arctic people know when bears approach
ISO 21434 automotive cybersecurity and implementing design and process FMEAs
Implementing a SIOP process can smooth supply spikes while improving cash flow and increasing profitability
Does your business’ security match up with competitors?
Prior to vote, IAF seeks industry feedback to understand the level of demand from businesses and regulators.

More News

Fred Schenkelberg

Risk Management

Failure Analysis

What have you really learned?

Published: Thursday, November 5, 2015 - 16:21

Why do so many avoid confronting the reality of failure? In plant asset management, we are surrounded by people who steadfastly don’t want to know about nor talk about failures. Yet failure does happen; let’s not ignore this simple fact.

The blame game

Unlike a murder mystery, failure analysis (FA) is not a game of whodunnit. The knee-jerk response to blame someone rarely solves the problem, nor does it create reliability in the workplace.

If the routine is to blame someone when a failure is revealed, fewer people will reveal failures. If it’s clear we don’t want to talk about failures in a civilized manner, well, we’ll just not talk about failures.

Of course, failures will still occur. In the blame-centric organization, the majority of people who have the ability to understand and solve problems simply turn and avoid “seeing” failures. When friends and colleagues are vilified in their attempt to “solve problems,” it becomes clear that this is not a safe environment in which to point out failures.

Root cause analysis

This is the first step in the FA process, and it’s a critical one to get right. The basic idea is to understand the fundamental (i.e., molecular, physical, chemical, and/or material property) level of the circumstances and events leading to failure. We should be able to reproduce the issue at will, and also be able to turn it off or avoid the failure at will. In this way, we come to understand the root cause.

Techniques like 5 Whys provide a framework to ensure that we understand the cause of failure. Equipment from magnifying lenses to scanning electron microscopes helps us “see” the physical and chemical clues.

The failure analysis process

The eight disciplines (8D) is a common FA process. There are many variations, yet the pattern tends to remain the same:

First, upon initial recognition of a failure, you must gather information, symptoms, and circumstances. If needed, you must implement an emergency response (i.e., extinguish a fire, apply first aid, contain the chemical spill, etc.)

Second, form a team. This can be composed of a couple of people or a formal multidiscipline team depending on the magnitude of the failure and associated consequences.

Third, describe the problem in terms of what is and is not known. The more detail and facts here the better.

Fourth, contain the problem. Isolate the batch, stop shipments of suspect products, etc. Limit the occurrence of additional failures if at all possible. If there is an immediate workaround or patch, use that to mitigate and avoid failures. These aren’t the solutions, just stopgap actions.

Fifth, undertake root cause analysis. This is the sleuthing part, and it has nothing to do with assigning blame; rather, you want to determine what actually happened at a fundamental level. One piece of advice: Don’t send suspect components to suppliers or vendors for FA work. It takes too long and rarely results in a meaningful root cause analysis. Instead, use internal or contracted FA labs. It may cost more to get the analysis, but it will be quicker and clearer.

Sixth, implement a corrective action, but only once armed with the accurate fundamental understanding of the root cause. This may include a design, material, or process change.

Seventh, test the solution and verify that it actually works. Monitor as long a necessary to validate that the solution provides a fundamental resolution.

Finally, based on what the team learned, determine what the organization learned to avoid similar issues in the future. This is often the most difficult step. Step back from the immediate problem and review the processes in design and production that created the situation in which the failure occurred. This is not the step to add more controls and checks, rather, it’s the time to assess the process and improve your ability to make better decisions in the future. For example, if the root cause for a material defect is the use of an unstable additive, then simply concluding that there is a need to add that additive to a “do not use” list is shortsighted. Instead, determine what part of the process should have revealed the faulty material choice, and ask why the stability question wasn’t asked earlier in the process. Perhaps it was a lack of resources, or the team’s focus on time to market. Maybe a system structure blinded the ability to identify the issue earlier.

Learn from the failure, and not just how to resolve the immediate issue. Instead, learn how to avoid making similar mistakes in the future.


Every organization has stories about failures, especially organizations that don’t talk about failures. Failures happen, and when they do we can use the information gleaned from them to learn and improve.

So, what are your failure stories? Share one in the comments or send me a note directly.


About The Author

Fred Schenkelberg’s picture

Fred Schenkelberg

Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. Schenkelberg is developing the site Accendo Reliability, which provides you access to materials that focus on improving your ability to be an effective and influential reliability professional.