Featured Product
This Week in Quality Digest Live
Operations Features
Huw Thomas
A long-awaited expansion of workers’ rights
Tom Rish
Keep it organized and ready for inspection at any time
ISO
MSMEs are encouraged to uphold the highest standards
engineering.com
It’s actually the differences in a twin that are most useful
Del Williams
Preventing damage caused by large, suspended particles

More Features

Operations News
Maximum work envelope in a small footprint
On-demand pipe flow measurement, no process interruptions
Extends focus on data-driven explainability and adds customizability
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Making designs a physical reality with the know-how to make more
Major ERP projects take six months longer than companies were told

More News

Bryan Christiansen

Operations

Applied Root Cause Analysis, Part 2

Tools and techniques

Published: Monday, May 17, 2021 - 12:03

All articles in this series:

Root cause analysis is not a singular way to an answer. It is a conceptual framework for investigating the true reasons behind the events we observe. Many frameworks are available to execute RCA that have been tried and tested by experimenters. None of these methods are foolproof, but they provide a solid base for how to go about root problem investigation. Let's discuss some of the prominent tools and techniques.

Each of the RCA tools has its own merits, and certain methods are more suitable for different industries and the types of problems that are being investigated. Each company and its management team should have a protocol to adhere to when conducting RCA. Different companies might prefer different techniques. In some instances, external consultants might be brought in to conduct RCA. In such cases, the consultants will have a preferred technique or a combination of techniques they use to conduct RCA. This is one of the reasons why it is hard to create a universal template for RCA that everyone can follow.

Oftentimes, the company will have a preferred RCA technique. If that one does not give the needed answers, other techniques might be explored.

5 Why analysis

The 5 Whys technique, developed for root cause analysis, addresses everything with a “why,” just like a curious child. When we ask why the visible problem occurred, we can trace its cause. Then the question why can be asked about the cause we just identified.

A flow of questions aimed at discovering the root cause of an event
A flow of questions aimed at discovering the root cause of an event

This process can be continued till a stage where there is no need to ask why any further. At that point, we should have reached the root cause of the problem. As a rule of thumb, asking and finding answers to five subsequent whys should be enough to unveil the root cause of most problems. Hence the name 5 Why analysis.

Fishbone diagram (aka Ishikawa diagram)

The Ishikawa method for root cause analysis emerged from quality control techniques that were employed in the Japanese shipbuilding industry by Kaoru Ishikawa. The shape of the resulting diagram looks like a fishbone, which is why it is also called a fishbone diagram. This diagram is predicated on the idea that multiple factors, including the five main ones called the 5 Ms, can lead to the failure/event/effect we are investigating.

The 5 M framework—aka. a fishbone diagram—from the Toyota Production System is utilized for RCA with the Ishikawa method.
The 5 M framework—aka. a fishbone diagram—from the Toyota Production System is utilized for RCA with the Ishikawa method.

The 5 Ms are:
1. Man/mind power
2. Machines
3. Measurement
4. Methods
5. Material

The problem or fault is written down at the right end of the diagram, where the fish head is presumed to be. The cause for it is represented along the horizontal line. Further effects and their respective causes are written along the fish bones that represent each of the 5 Ms. This process is continued until the team conducting it is convinced that the root cause is identified.

The fishbone diagram serves as a visual aid for structured brainstorming sessions. The same technique is also used for product design, ergonomic design, and process improvement.

Failure mode and effects analysis (FMEA)

FMEA is a proactive approach to root cause analysis, preventing potential failures of a machine or system. It is a combined systematic approach of reliability engineering, safety engineering, and quality control efforts. It tries to predict future failures and defects by analyzing past data.

Failure mode effects and analysis (FMEA)
Failure mode effects and analysis (FMEA)

A diverse cross-functional team is essential to undertake FMEA. The scope of the analysis must be well-defined and conveyed clearly to all the team members. Each subsystem, design, and process is brought under the microscopic scrutiny of the cross-functional team. The purpose, need, and function of each system are questioned. Potential failure modes are brainstormed. Failure of similar processes and products in the past can also be analyzed to supplement the process.

Risk priority number (RPN)
Risk priority number (RPN)

The potential effects and disruptions that could be caused by each of the identified failure modes are assessed and used to calculate its risk priority number (RPN). If the failure mode has a higher RPN than a company is comfortable with, it must be addressed by changing one or more factors outlined in the image above.

Fault tree analysis

Fault tree analysis is a method for root cause analysis that uses Boolean logic to figure out the cause of failure. It was developed in Bell laboratories to evaluate an intercontinental ballistic missile (ICBM) launch-control system for the U.S. Air force.

Fault tree analysis (FTA).
Fault tree analysis (FTA). Image source

Fault tree analysis tries to map the logical relationships between faults and the subsystems of a machine. The fault we are analyzing is placed at the top of the chart, and information flows down through various “gates” symbolizing the relationships between input and output events. If two causes have a logical OR gate combination causing effect (depicted by the purple symbol in the illustration above), they are combined with a logical OR operator. For example, if a machine can fail while in operation or while under maintenance, it is a logical OR relationship.

If two causes need to occur simultaneously for the fault to occur, it is represented with a logical AND gate. For example, if a machine fails only when the operator pushes the wrong button and relay fails to activate, it is a logical AND relationship. It is represented using the Boolean AND symbol (depicted by the turquoise symbol in the illustration above).

The symbols used in the diagrams represent different kinds of events: A circle is used for a basic event, a pentagon for an external event, a rhombus for an undeveloped event, an ellipse for a conditioning event, and a rectangle for an intermediate event.

The fault tree created for a failure is analyzed for possible improvements and risk management. This is an effective tool to conduct RCA for automated machines and systems.

Pareto charts

Italian economist Vilfredo Pareto recognized a common theme with almost all frequency distributions he could observe: There is a huge asymmetry between the ratio and the effects caused by them. As a rule of thumb, he indicated that, in any system, 80 percent of the results (or failures) are caused by 20 percent of all potential reasons.

The principle is dubbed the Pareto principle (some know it as the 80–20 rule). This skew between cause and effect is evident in many different distributions, from wealth distribution among people to failures in a machine.

Pareto chart
Pareto chart. Image source

With the Pareto principle in mind, failures and their possible causes are analyzed. A bar graph and line graph are drawn indicating the frequency of faults and the causes for the faults. With this graph, we are able to observe the skew between causes and failures. Usually, we will discover how a small percentage of factors causes the majority of faults.

The causes that contribute to the most number of faults are then analyzed further, and corrective actions are taken to eliminate the most common faults.

Pareto charts are excellent tools to determine the priority for taking up root cause analysis. According to the Pareto principle, eliminating 20 percent of the most common causes of failure can result in reducing the overall number of malfunctions by 80 percent. Pareto charts will indicate the top failure causes to be further investigated and addressed, according to the criticality of the machine, the impact failure of a specific part, or a combination of the two.

Honorary mentions

Root cause analysis is open-ended, and it has many widely used tools in various industries. Major ones were mentioned in the sections above. Still, there are other noteworthy tools for RCA. Here are a few honorary mentions:
• Cause and effect diagrams. The fishbone diagram is an example of cause and effect diagrams. There are many similar tools that try to map the relationship between causes and effects in a system.
• Kaizen. It is another tool from the stable of Japanese process improvements. It is a continuous process improvement method. Root cause analysis is embedded within the structure of kaizen.
• Barrier analysis. It is an RCA technique commonly used for safety incidents. It is conducted on the premise that a barrier between personnel and potential hazards can prevent most safety incidents.
• Change analysis. When a potential incident occurs due to a change in a single element or factor, change analysis is employed as the root cause analysis technique.
• Scatter diagram. Scatter diagram is a statistical tool that plots the relationship between two data in a two-dimensional chart. It can also be used as an RCA tool.

Root cause analysis examples

RCA example No. 1
Injection-molding machines are widely used around the world to create plastic in almost any shape or form. The part produced by the machine should match specifications for the same, within allowable tolerance.

Let’s imagine there is a high incidence rate of faulty products, and we need to get to the bottom of it.

First, the problem needs to be well defined. This includes explaining the precise defect the plastic output is having. By observing the output, we can determine if it is one of the four main defects that could occur with injection molding. They are:
• Flash
• Gassing and venting
• Part distortion
• Short mold

Let’s presume that the defect is part distortion. The problem has to be clearly written down, with the number of defects occurring as a percentage. Once that portion is completed, all the available data must be collected. Maintenance logs can be pulled from a computerized maintenance management system (CMMS), manuals from the injection-mold machine manufacturer can be reviewed, etc.

Information should be collected on each defective product. From this, the deviation from specifications should be measured. The heat signature of the product is taken once it comes out of the mold. The temperature of molten plastic in the barrel is also measured.

We know that part distortion almost always occurs due to temperature problems. But we can’t be sure where the temperature problem is—in the barrel while heating, or in the mould while cooling. From the data collected, we would be able to identify that. Let’s assume the heat signature of the finished product is different from the expected one.

This determines that the problem is in the cooling process. Further investigation concludes that the root problem is the wrong spatial arrangement of cooling liquid conduits.

Changing the conduit arrangement that best fits the mold currently being produced will solve the problem of part distortion.

RCA example No. 2
Imagine an investigation into a machine that stopped because it overloaded, and the fuse blew. Investigation shows that the machine overloaded because it had a bearing that wasn’t being sufficiently lubricated. The investigation proceeds further and finds that the automatic lubrication mechanism had a pump that was not pumping sufficiently; hence, the lack of lubrication. Investigation of the pump shows that it has a worn shaft. Investigation of why the shaft was worn discovers that there isn’t an adequate mechanism in place to prevent metal scraps getting into the pump. This enabled scraps to get into the pump and damage it.

The apparent root cause of the problem is metal scrap contaminating the lubrication system. Fixing this problem ought to prevent the whole sequence of events recurring. The real root cause could be a design issue if there is no filter to prevent the metal scrap getting into the system. Or if it has a filter that was blocked due to a lack of routine inspection, then the real root cause is a maintenance issue.

Compare this with an investigation that does not find the causal factor: Replacing the fuse, the bearing, or the lubrication pump will probably allow the machine to go back into operation for a while. But there is a risk that the problem will simply reoccur until the root cause is dealt with.

This example originally appeared here.

Additional RCA resources

Root cause analysis is a vast umbrella term that can’t be exhaustively explained in a single article. Here are some additional resources to learn more about RCA, its tools, and its techniques:
This 70-minute video from the consulting firm KT Kepner-Trego is a good place to glean a broad understanding of RCA and major techniques.
• Six Sigma Development Solutions is an accredited provider of lean Six Sigma certifications. They have extensive material on root cause analysis and also provide online courses and certifications for it. You have the option to choose between classes with different structures that can accommodate your schedule.
• A root cause analysis course is available from the University System of Georgia, which can be accessed on Coursera. You can enroll in the course for free and receive certification for a small fee. Coursera courses are widely recognized.
• The textbook Root Cause Analysis (Productivity Press, 2014) by Mathew A. Barsalou is an excellent guide to choosing the right RCA tool for the right context.
Root Cause Analysis: The Core of Problem Solving and Corrective Action (ASQ Quality Press, 2009) by Duke Okes is another comprehensive and authoritative resource on root cause analysis.

Technical analysis should not be done by cutting corners

Root cause analysis is a complex methodology and should not be done on a whim. The team might decide to cut corners to save on time and speed up the process. If you want to get to the bottom of any complex event, rushing the process can be detrimental to the whole project. If you have a good reason to conduct RCA, then it is in your best interest to create an environment in which the process can be executed successfully.

First published April 14, 2021, on the Limble CMMs blog.

Discuss

About The Author

Bryan Christiansen’s picture

Bryan Christiansen

Bryan Christiansen is the founder and CEO of Limble CMMS. Limble is a modern, easy-to-use mobile CMMS software that takes the stress and chaos out of maintenance by helping managers organize, automate, and streamline their maintenance operations.

Comments

RCA - 5 Why's

I was taught that you have not reached the final why, in the 5 why's method, until the answer returns a process improvement. In the example you give, I would say the answer to the final why would be that the equipment needs to be inspected more often the verify that metal shavings are being properly cleaned out.