Featured Product
This Week in Quality Digest Live
Lean Features
Laurie Flynn
Researchers find ways to lower US healthcare administration costs by analyzing other countries’ approaches
Bruce Hamilton
Will lean thinking inform the designers of AI?
Gene Kaschak
Lean supply is not just about the size of inventory
Eight unique best-practice sessions featuring 11 process improvement and thought leaders

More Features

Lean News
From excess inventory and nonvalue work to $2 million in cost savings
Tactics aim to improve job quality and retain a high-performing workforce
Sept. 28–29, 2022, at the MassMutual Center in Springfield, MA
Enables system-level modeling with 2D and 3D visualization, reducing engineering effort, risk, and cost
It is a smart way to eliminate waste and maximize value
Simplified process focuses on the fundamentals every new ERP user needs
DigiLEAN software helps companies digitize their lean journey
Partnership embeds quality assurance at every stage of the product life cycle, enables agile product introduction
First trial module of learning tool focuses on ISO 9001 and is available now

More News

Jody Muelaner


Understanding the Cause of Faults in the Lean Factory

Five kaizen tools for manufacturing

Published: Wednesday, January 29, 2020 - 13:03

Understanding the causes of faults and defects, and then improving the system or process so it won’t happen again, is central to lean manufacturing. This article looks at some of the methods used to identify the root causes of issues so that you can prevent downtime and move toward zero-defect manufacturing.

The emphasis on understanding the root causes of failures began early in the Toyota Production System. Toyota instructed workers to stop the production line whenever they identified a defect or problem. They would then ask the 5 Whys to dive deeper into why the defect had occurred. This was driven by a desire to eliminate waste from the production system. If a defective part is allowed to continue down the production line, the opportunity to understand why the defect occurred is lost. The mistake will probably be repeated, resulting in more waste. To avoid this situation, an andon light is used to notify workers that a problem has occurred. In the original Toyota Production System, workers had a pull cord they could use to stop the line and activate the light. Automated machines may automatically trigger an andon light if a fault is detected. This is a critical first step in understanding the cause of defects. It allows them to be dealt with immediately.

Once the root cause has been identified, the process or system should be improved so that the problem cannot occur again. This type of error proofing is known as poka-yoke within lean. Identifying and eliminating root causes of defects and failures should be part of a process of continuous improvement carried out by the whole team and is known as kaizen in lean.

The 5 Whys

The 5 Whys is the original method used to identify the root causes of failures within the Toyota Production System. This is a simple method where you ask the question, “Why did this fault occur?” Once you have identified the direct cause, you then ask, “Why?” again. This time, the focus is on the reason for the cause just identified. This process can be repeated to dive deeper into the underlying, or root, cause. It was named the 5 Whys because it was observed that this is how many iterations usually were required to get to the root cause of most problems. However, it isn’t intended to be prescriptive. The process should stop whenever the root cause is identified, and the root cause should be actionable.

The process can be understood with a simple example. Imagine that you find a defect in a machined part:
1. Why did the defect occur? The cutting tool broke.
2. Why did it break? There wasn’t any coolant.
3. Why wasn’t there any coolant? The operator had not checked the coolant level.
4. Why hadn’t he checked it? There is no written process saying he should check it.
5. Why isn’t there are written process for this? The shop-floor processes have not been fully documented.

As you can see, deciding when you have reached the root cause is subjective. The process could easily have stopped after the fourth why, identifying the lack of a written process as the root cause. The value of this process is that it should lead you to a corrective action that will make your processes much more robust, eliminating future defects and failures. The aim of asking the 5 Whys is to identify how the process can be made more robust. Therefore, it is important to direct the questioning toward controllable causes and avoid concluding that the root cause is something that is out of control. This can sometimes be achieved by rephrasing the question as, “Why did the process fail?”

It should be noted that issues often have multiple root causes. These can be discovered by repeating the process but asking the questions in different ways. Ishikawa or fishbone diagrams are often used to identify multiple root causes and may be used together with a 5 Whys approach.

Ishikawa or fishbone diagrams

An Ishikawa diagram, aka fishbone or cause-and effect-diagram, looks a bit like a fish with the problem or the effect at the head, a spine running horizontally and the main categories of causes radiating out on both sides like fish bones. A number of standard headings are often used to provide an initial stimulus for the generation of ideas. Traditionally, the five Ms have been used: machine, method, material, manpower, and measurement. These are often adapted to fit different organizations.

Figure 1: An Ishikawa diagram, aka fishbone or cause-and effect-diagram

An Ishikawa diagram is simply a hierarchical diagram. It contains the same information structure as a tree or mind map. There is no reason you couldn’t use one of these diagrams instead to record the causes of an effect. In fact, other types of hierarchical diagrams can be more convenient for representing the type of cascade of causes that a 5 Whys analysis will uncover. It is, however, traditional to use the Ishikawa diagram to identify the causes of a problem.

Root cause analysis

The 5 Whys and Ishikawa diagram are both techniques that can be used for root cause analysis (RCA). Variation breakdown, or a thought map, is a similar method of identifying root causes. In some ways, it is preferable because it explicitly directs you to identify multiple causes at the top level, like an Ishikawa diagram, and directs you to drill all the way down to the root cause for each of these, like the 5 Whys.

Figure 2: A simple thought map diagram can be a more useful way of identifying root causes than an Ishikawa diagram.

Process mapping may also be carried out as part of an RCA to gain a deeper understanding of the process and all of its inputs. A cause-and-effect matrix and failure mode and effects analysis (FMEA) may also be used.

Failure mode and effects analysis

Failure mode and effects analysis is an important method of understanding the potential causes of problems. It is often used proactively when designing a process. It evaluates the subjective likelihood and severity of different events using a table, much like a risk analysis. A number of different names are sometimes used such as a failure mode effects and criticality analysis (FMECA) or a process failure mode and effects analysis (PFMEA).

An FMEA should start with a system definition, which may involve creating a system block diagram. However, most of the work is often done by filling out a table. Many manufacturing companies have their own formats for FMEA tables, often created as an Excel spreadsheet. The first column should list the system components, or process steps for a PFMEA. For each of these, multiple failure modes are listed. Each failure mode may in turn have multiple effects. Finally, each effect may have multiple causes. There is, therefore, a hierarchy consisting of, from top to bottom:
• Process step or system component
• Multiple potential failure modes for each step or component
• Multiple potential effects for each failure mode
• Multiple potential causes of each failure mode

Each cause has its own row with the process steps or system components, failure modes, effects, and causes spanning a number of cause rows. Related to each cause of failure are additional columns used to input estimates for likelihood and severity; methods of mitigation, such as controlling the process through prevention or detection; other bespoke requirements; and columns that calculate combined values.

Figure 3: An FMEA is often completed using a spreadsheet with a standard company format. Note the hierarchy of rows because of the process step having multiple failure modes. These potentially have multiple effects, and each effect has multiple causes.

The structured nature of an FMEA can really help draw out ideas and identify methods of preventing problems from occurring in the future. This method has become widely used within the manufacturing industry. It can, however, be time-consuming, forcing the user to spend time considering insignificant possibilities.

Failure mode effects criticality analysis (FMECA) is an excellent hazard analysis and risk assessment tool, but it suffers from other limitations. This alternative does not consider combined failures, nor does it typically include software and human interaction considerations. It also usually provides an optimistic estimate of reliability. Therefore, FMECA should be used in conjunction with other analytical tools when developing reliability estimates.

Fault tree analysis

Fault tree analysis is a rigorous way of quantitatively accessing the causes of faults. It is typically used in safety and reliability engineering, especially within aerospace, nuclear power, and chemicals processing. Whereas FMEA is a qualitative assessment, fault tree analysis uses probabilities of individual events, combined with Boolean logic, to give an overall probability of system failure. This is a top-down process in which you start with the possible system failure and work down through the causes that could lead to it. As you move down through the lower levels, they are connected back to the system failure at the top though a network of Boolean logic. This provides a quantitative understanding of how a system could fail, leading to the identification of optimal methods of reducing this risk.

For each possible system failure condition, the severity is first determined to establish the extent of analysis required. The most severe failure conditions should be evaluated using a full fault tree analysis. For each of these, the system failure condition is written at the top of the chart, and a fault tree is drawn below it. The fault tree shows different types of events that might contribute to the failure condition. The Boolean logic shows how these would combine or cascade to result in the failure.

The following types of events are used in fault tree analysis:
Basic events are the lowest level of events, which can’t be developed any further. They may be considered root causes; asking, “Why?” won’t generate any useful underlying reasons why this event happened.
Undeveloped events are events that have not been developed any further but may have the potential to be developed.
Intermediate events are events that come in between the failure condition and the root cause.
Transfer events are used to continue a tree on another diagram when the tree is too large to view as a single diagram.

Figure 4: Event symbols used in fault tree analysis

Events are connected using two main types of Boolean logic gates: AND gates and OR gates. An AND gate is used when the output event occurs when all the input events occur. An OR gate is used when the output event will occur if any one of the input events occurs. The simple example used for the previous types of analysis will clearly show this principle.

Figure 5: A simple fault tree analysis

More complex systems may also include exclusive OR, priority AND, and inhibit gates. An exclusive OR causes the output event if exactly one input event occurs. A priority AND causes the output if both inputs occur in a specific sequence. An inhibit gate results in the output event if the input occurs according to some specified conditioning event.

By using fault tree analysis it is possible to model complex chains of events leading to failure. When probabilities are assigned to the basic events and the undeveloped events, it is possible to calculate the probability of the system failure condition


Different methods can be used to understand the causes of faults and defects. One approach is to do this work reactively after a problem is detected. The 5 Whys and RCA are both normally used in this way. Proactive or preventive analysis may also be carried out to identify the possible causes of faults before they happen. FMEA is often used in this way. Ideally, FMEA should be carried out for new processes to proactively eliminate potential causes of failure. Reactive analysis should also be carried out if any problems are encountered. For safety-critical processes, more rigorous methods, such as fault tree analysis, may be required.

First published Dec. 9. 2019, on the engineering.com blog.


About The Author

Jody Muelaner’s picture

Jody Muelaner

Jody Muelaner, is a mechanical engineer with expertise in metrology and advanced manufacturing. Muelaner’s website provides information on topics ranging from the basics of metrology and measurement systems analysis to specific guides such as how to perform a gauge R&R study in Excel.