Featured Product
This Week in Quality Digest Live
Six Sigma Features
Mark Rosenthal
The intersection between Toyota kata and VSM
Scott A. Hindle
Part 7 of our series on statistical process control in the digital era
Adam Grant
Wharton’s Adam Grant discusses unlocking hidden potential
Scott A. Hindle
Part 6 of our series on SPC in a digital era
Douglas C. Fair
Part 5 of our series on statistical process control in the digital era

More Features

Six Sigma News
Helps managers integrate statistical insights into daily operations
How to use Minitab statistical functions to improve business processes
Sept. 28–29, 2022, at the MassMutual Center in Springfield, MA
Elsmar Cove is a leading forum for quality and standards compliance
Is the future of quality management actually business management?
Too often process enhancements occur in silos where there is little positive impact on the big picture
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth

More News

John Flaig

Six Sigma

The Risks in Root Cause Analysis

A good design of experiments can prevent incorrect conclusions

Published: Thursday, April 4, 2013 - 12:14


What’s wrong with root cause analysis? Let’s begin with the name, which is singular. It implies that there is only one root cause, when in reality most problems are usually caused by a complex combination of several factors, some of which are more significant than others.

To appreciate this point, readers should reflect on the results of formal design of experiments (DOE). So let’s take a closer look at root cause analysis and how it might be improved.

As engineers we are constantly faced with problems that must be solved. Root cause analysis (RCA) is a popular tool in the engineer’s problem-solving tool kit. However, the RCA effort sometime does not result in complete success, so it is important to understand how things can go wrong. The first opportunity for increasing the risk of failure is caused by the problem-solvers themselves. If they choose to attack the problem by themselves, the probability of getting to a robust solution in a timely manner is put at risk. The smart analyst will opt for having a cross-functional team help him explore the possible causes and potential solution to the problem.

The additional brainpower will generally help, but if there is no structured approach for investigating the problem, then again the probability of failure increases. To address this issue, the sharp problem-solver will usually have the team brainstorm the problem, its causes, and possible solutions (brainstorming is used by about 90 percent of all problem-solving teams). This approach improves the chances of success, but there are several pitfalls in brainstorming that can severely impair its effectiveness.

The hidden deficiencies of the brainstorming approach are:
• It promotes linear thinking.
• It promotes groupthink.
• It may lead to a cause but not the causes.
• It can lead to a minor cause and miss a major one.
• It offers only a limited recognition of multivariate causality.
• It does not recognize nonlinear causes or quantify their effect.
• It does not recognize causal interactions or quantify their effect.
• In fact, it does not quantify the magnitude of the effect of any causal factor.

To deal with some of these issues requires a structured approach to brainstorming that guides the problem-solving team through the various categories of potential causal factors so that the team does not overlook some potentially important causes. The tool that is often used is the cause and effect diagram (aka a fishbone or Ishikawa diagram, see figure 1). For manufacturing problems the typical causal categories are manpower, machines, materials, methods, measurements, and the environment, also referred to as the 5Ms and 1E. (There are other methodological and psychological tools to increase the efficiency of the brainstorming activity, but we won’t discuss them in this short article.) Using this structured approach to the investigation helps to mitigate some of the issues listed above, but it does not remove all of them.

The cause and effect diagram


Figure 1. The cause and effect diagram. Click here for larger image.

The next trap the typical problem-solving team members fall into is thinking they know the so-called “root causes” based on their subjective judgment. Looking at a fishbone diagram like the example in figure 1, they declare that X2 and X3 are “root causes.” In one sense they are root causes because they are the ends of a root, but that is just a picture. The real issue is their effect on the response variable Y. The hidden, but invalid, assumption is that there is a perfect correlation between the “root cause” Xs and the response Y. Expressed mathematically this is r(X2, Y) = 1 and r(X3, Y) = 1.

The problem is that the deeper the “root cause” is in the tree diagram, generally the lower the correlation between the so-called “root cause” and the actual effect (Y). Further, it can be shown that the longer the causal chain, generally the smaller the effect on the response variable Y, and the greater the modeling error (i.e., the poorer the predictability of the model). So, in fact, a “root cause” may have only a minor effect on the response Y, and an inconsistent one at that. Unfortunately, frequently the improvement team falls into this trap. It makes the prescribed adjustment to the “root cause” variable, only to discover that process improvement is negligible.

So how can the practitioner improve her chances of finding a viable solution to a given problem? The best approach is a properly designed, sequential set of experiments. If a solution exists, then good DOE offers the best chance for understanding the possibly complex casual relationships while addressing many of the brainstorming deficiencies listed above.


Discuss

About The Author

John Flaig’s picture

John Flaig

John J. Flaig, Ph.D., is a fellow of the American Society for Quality and is managing director of Applied Technology at www.e-at-usa.com, a training and consulting company. Flaig has given lectures and seminars in Europe, Asia, and throughout the United States. His special interests are in statistical process control, process capability analysis, supplier management, design of experiments, and process optimization. He was formerly a member of the Editorial Board of Quality Engineering, a journal of the ASQ, and associate editor of Quality Technology and Quantitative Management, a journal of the International Chinese Association of Quantitative Management.

Comments

Root Cause

I have been saying for years that its quite impossible to blame a single event on a multiple modal activity (Process).  I see people time and time again attempt to use root cause as the only tool for solving problems, and almost every instance which is not related to machines, fails. When I made the statement in a recent meeting with managers that root cause was wrong as the problem solving tool for processes, they looked at me as if I were from another planet. 

I went on to explain that processes have multiple players and multiple parties of influence and that it was near impossible for any one of those to become the single cause of failure.  I stated that its more likely process failures are the result of a set of participants and influence, creating common causes of variation at a single point in time (the whole is the sum of its parts), and not a single root cause.  This idea was completely foreign to them and most of them with college degrees, rejected the idea outright.  Then I produce the challenger event, which is a study of the challenger failure. While its been stated that the seal was the "Root Cause" of the failure, the evidence simply does not support that hypothesis.  

One would have to reasonable consider that if there are multiple and diverse actions occurring upon a process then it would be conclusive that each of those actions has equal chance in contributing to the failure of that process.  Therefore, its more likely that process failure is the result of a concert of the actions of multiple participants and influence, and not any one single event.   Just as in the Challenger failure, the issue was not simply the failure of a seal.  Design of the engines, Design of the fuel tanks, Management decisions, time, expense, weather, Validation Testing, political pressure, all became contributing factors in the failure of the Challenger. To say that it was the seal, which was the single root cause of the failure, is to do an extreme disservice to those Americans who lost their lives in that tragic set of circumstances.  

Where the challenger is a very touchy subject for many, its just the type of subject to dissolve this silly and dangerous notion, that all process failures are the result of a single root cause.  After nearly 3 decades of working under this notion, and some digging into its origin, I think the failure was one of misunderstanding when to apply, what tools, during problem solving. 

Look at Ishikawa and Deming; both knew well that there were many contributing factors to any process and that all of these factors needed consideration.  The problem came about when a focus tool such as 5 whys, which was supposed to bring focus to each of the factors being considered under the various branches of the Ishikawa diagram, was instead used to resolve the problem in its entirety. Common causes became ignored and the hunt for the root cause became the only focus.  Therefore root cause became the problem-solving tool and contributing factors to the problem were case aside, allowing the causes to become ignored and the failure to reoccur.  I see the same result with Risk analysis when its considered the preventive measure to resolve all potential failures, which is soddish thinking.  But even risk analysis indicates that there are the potential for multiple causes of failure within any one process occurring at any one given time.  Risk analysis therefore stands against the notion of Root Cause related to process failure.

Common sense should prevail related to problem solving and the multiple influencing and contributing factors of a process considered when problems need resolution.  Further each influence and each contributing factors potential to contribute to a failure need to be identified and considered, only then can effective resolution come to fulfillment. In short, businesses need to learn when to apply the proper tools of problem solving.  

Ishkawa presents us with a tool to consider influence and input into a process related to the problem

Demming presents us with the notion of Common and Special causes of variation

Toyota provides us the the focus of 5 whys.  

Its only when these tools are used in correct concert, that the identification of problems becomes effective. 

 

DFSS

Nice article. In broad sense, doing DOE as a part of DFSS approach is best, so that we take evidence based decisions, is it right?

cause to effect or effect to cause?

I agree that the 'popular' or 'common' use of the fishbone diagram is a very ineffective approach to determing the causal mechanism.  Primarily because it breaks the true cause-effect links and reduces the user to guess or advocate for specific single factors.  (this is best refered to as a random walk)  This method is a "cause to effect" approach.  It relies on selecting a factor  - or several factors - and then testing to prove that it - or they - creates the (undesireable) effect.  Too often it results in either one factor at-a-time experiments that 'hold the other factors constant' OR a very large fractional factorial.

A much more effective approach is one that moves from effect back to cause.  It also utilizes - as you point out - a series of iterative small experiments that disprove large groups of factors until the final causal mechanism reveals itself.  The pity is that this approach is rarely taught or written about in our literature, let alone practiced. 

QTech goal !

I strongly recommend reading 1941 Aldous Huxley's "Grey Eminence": on Vintage UK Random House edition, ISBN 97800994477822, on pages 14 to 24 the risks of over-simplification are deeply analyzed. I wish you a good reading. 

No such thing as a root cause

Two major problems with "root cause analysis".  They are about working out what the problem really is...

First is understanding the difference between cause and effect; there is no difference in reality, as Dean Gano"s Apollo Root Cause Analysis approach makes clear. Timing and sequence are the issues to get a handle on.

Second is what I call "Solutions First Syndrome" (SFS) where "common sense" indicates a "solution" based on "evidence" that is bent to fit.  This is my main objection to the Ishikawa Diagram used against standard cause areas.   SFS is a human behaviour that happens so often it's usually completely missed. Dr Deming called it tampering.  "Common sense", of course, as a concept is a complete snare and delusion...see Dean Gano again, but since all our life experiences are different just what is "common sense"?

Unintended occurrences, bad or good, arise because there is at least one "hole" in the "planned arrangements" (to use ISO 9000 speak) and one action that exploited it. In his Swiss Cheese Model of organisational accidents, Prof James Reason called the first "latent conditions" (LC) and the second "unsafe acts" (UA).  Since you need one of each for an occurrernce,  the minimum number of "root causes" is two.

Once you've understood that cause and effect are the same, each LC and UA has an LC and UA of its own, so there's a binary relationship.  Exploring this in a team environment gives very clear problem definitions very quickly in true KISS fashion.  It's rarely necessary to go more than three or four levels down. 

Hope this helps.