The Risks in Root Cause Analysis

A good design of experiments can prevent incorrect conclusions

What’s wrong with root cause analysis? Let’s begin with the name, which is singular. It implies that there is only one root cause, when in reality most problems are usually caused by a complex combination of several factors, some of which are more significant than others.

To appreciate this point, readers should reflect on the results of formal design of experiments (DOE). So let’s take a closer look at root cause analysis and how it might be improved.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

QTech goal !

I strongly recommend reading 1941 Aldous Huxley's "Grey Eminence": on Vintage UK Random House edition, ISBN 97800994477822, on pages 14 to 24 the risks of over-simplification are deeply analyzed. I wish you a good reading.

No such thing as a root cause

Two major problems with "root cause analysis". They are about working out what the problem really is...

First is understanding the difference between cause and effect; there is no difference in reality, as Dean Gano"s Apollo Root Cause Analysis approach makes clear. Timing and sequence are the issues to get a handle on.

Second is what I call "Solutions First Syndrome" (SFS) where "common sense" indicates a "solution" based on "evidence" that is bent to fit. This is my main objection to the Ishikawa Diagram used against standard cause areas. SFS is a human behaviour that happens so often it's usually completely missed. Dr Deming called it tampering. "Common sense", of course, as a concept is a complete snare and delusion...see Dean Gano again, but since all our life experiences are different just what is "common sense"?

Unintended occurrences, bad or good, arise because there is at least one "hole" in the "planned arrangements" (to use ISO 9000 speak) and one action that exploited it. In his Swiss Cheese Model of organisational accidents, Prof James Reason called the first "latent conditions" (LC) and the second "unsafe acts" (UA). Since you need one of each for an occurrernce, the minimum number of "root causes" is two.

Once you've understood that cause and effect are the same, each LC and UA has an LC and UA of its own, so there's a binary relationship. Exploring this in a team environment gives very clear problem definitions very quickly in true KISS fashion. It's rarely necessary to go more than three or four levels down.

Hope this helps.

cause to effect or effect to cause?

I agree that the 'popular' or 'common' use of the fishbone diagram is a very ineffective approach to determing the causal mechanism. Primarily because it breaks the true cause-effect links and reduces the user to guess or advocate for specific single factors. (this is best refered to as a random walk) This method is a "cause to effect" approach. It relies on selecting a factor - or several factors - and then testing to prove that it - or they - creates the (undesireable) effect. Too often it results in either one factor at-a-time experiments that 'hold the other factors constant' OR a very large fractional factorial.

A much more effective approach is one that moves from effect back to cause. It also utilizes - as you point out - a series of iterative small experiments that disprove large groups of factors until the final causal mechanism reveals itself. The pity is that this approach is rarely taught or written about in our literature, let alone practiced.

DFSS

Nice article. In broad sense, doing DOE as a part of DFSS approach is best, so that we take evidence based decisions, is it right?

Root Cause

I have been saying for years that its quite impossible to blame a single event on a multiple modal activity (Process). I see people time and time again attempt to use root cause as the only tool for solving problems, and almost every instance which is not related to machines, fails. When I made the statement in a recent meeting with managers that root cause was wrong as the problem solving tool for processes, they looked at me as if I were from another planet.

I went on to explain that processes have multiple players and multiple parties of influence and that it was near impossible for any one of those to become the single cause of failure. I stated that its more likely process failures are the result of a set of participants and influence, creating common causes of variation at a single point in time (the whole is the sum of its parts), and not a single root cause. This idea was completely foreign to them and most of them with college degrees, rejected the idea outright. Then I produce the challenger event, which is a study of the challenger failure. While its been stated that the seal was the "Root Cause" of the failure, the evidence simply does not support that hypothesis.

One would have to reasonable consider that if there are multiple and diverse actions occurring upon a process then it would be conclusive that each of those actions has equal chance in contributing to the failure of that process. Therefore, its more likely that process failure is the result of a concert of the actions of multiple participants and influence, and not any one single event. Just as in the Challenger failure, the issue was not simply the failure of a seal. Design of the engines, Design of the fuel tanks, Management decisions, time, expense, weather, Validation Testing, political pressure, all became contributing factors in the failure of the Challenger. To say that it was the seal, which was the single root cause of the failure, is to do an extreme disservice to those Americans who lost their lives in that tragic set of circumstances.

Where the challenger is a very touchy subject for many, its just the type of subject to dissolve this silly and dangerous notion, that all process failures are the result of a single root cause. After nearly 3 decades of working under this notion, and some digging into its origin, I think the failure was one of misunderstanding when to apply, what tools, during problem solving.

Look at Ishikawa and Deming; both knew well that there were many contributing factors to any process and that all of these factors needed consideration. The problem came about when a focus tool such as 5 whys, which was supposed to bring focus to each of the factors being considered under the various branches of the Ishikawa diagram, was instead used to resolve the problem in its entirety. Common causes became ignored and the hunt for the root cause became the only focus. Therefore root cause became the problem-solving tool and contributing factors to the problem were case aside, allowing the causes to become ignored and the failure to reoccur. I see the same result with Risk analysis when its considered the preventive measure to resolve all potential failures, which is soddish thinking. But even risk analysis indicates that there are the potential for multiple causes of failure within any one process occurring at any one given time. Risk analysis therefore stands against the notion of Root Cause related to process failure.

Common sense should prevail related to problem solving and the multiple influencing and contributing factors of a process considered when problems need resolution. Further each influence and each contributing factors potential to contribute to a failure need to be identified and considered, only then can effective resolution come to fulfillment. In short, businesses need to learn when to apply the proper tools of problem solving.

Ishkawa presents us with a tool to consider influence and input into a process related to the problem

Demming presents us with the notion of Common and Special causes of variation

Toyota provides us the the focus of 5 whys.

Its only when these tools are used in correct concert, that the identification of problems becomes effective.