Rethinking Failure Mode and Effects Analysis

A better method takes into account economic loss

The classic version of FMEA is an engineering tool for quality and reliability improvement through project prioritization. It was formally released by the U.S. government with MIL-P-1629 in 1949 and updated in 1980 as MIL-STD-1629A. The classic FMEA methodology has proven to be reasonably effective tool for product, service, and process improvement over the years, but it’s by no means optimal.

Each part of the system, subsystem, or component is analyzed for potential failure modes, possible causes, and the possible effects. The possible failure mode is given a rank score from 1 to 10 in three categories: severity, occurrence, and detection. Multiplying these three category ranks together will yield a number called the risk priority number, or RPN, which is between 1 and 1,000. The RPN results are reviewed for each failure mode, and corrective action projects are prioritized based on the RPN (i.e., the higher the RPN, the higher the corrective action priority).

Below we will explore some of the deficiencies in classic FMEA to see where improvements might be made.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Very interesting!

I have long shared this unease with the ordinal scales used in FMEA, and in many other risk/contingency plans. You make a great case, and ypur proposition sounds very workable.

I have used probability estimation instead of the occurrence scale, but had never made the translation you make to get to the "DC" factor for the detection scale.

My one concern is in trying to universally equate severity with cost in dollars. Cost of production, rework, scrap, etc. can probably be estimated with reasonable precision--so boiling things down to purely economic terms probably works well in process FMEA and maybe in Project FMEA. For other types of hazard (personnel safety, for example), the calculations would get much more complex, I would think, to estimate in dollars.

The New FMEA

RIP,

Thank you for your very thoughtful comments. I agree with you that estimating cost of failure can sometimes be difficult. Typically Cost of Poor Quality reporting is a good start but as you point out there are many costs that are unknown and unknowable. Many times people just think about the cost to their company, however in many cases the cost to the customer can be several times the the cost to the sipplier.

John

Prevention Control Vs Occurrence

Hello John:

Please can you further explain the difference between Detection Control (prevention) and Occurrence? I am confused as to what is the difference. I always looked at Detection as the ability to stop escapes and Occurence as the likelihood of the failure happening including the effect of prevention controls.

Quality Digest: It would be great if you could link Dr. Wheeler's article on the problems with RPN calculations with Dr. Flaig's artilce. They compliment each other nicely.

Thank you, Dirk

Never Mind!

Quality Digest: The link to Dr. Wheeler's article is right in Dr. Flaig's artilce. I have to adjust my Detection score from a 3 to an 8! :-)

The New FMEA

Dirk,

Detection has two components Control and Containment. Controls are tools or techniques that prevent failures from being generated (e.g., Poka Yoka, PIDs, Control Charts, MSA, SOP, PM, etc.). Containment is tools or techniques that prevent failures from going downstream or out to the customer (e.g., inspection, test, etc.). Occurrence is the relative frequency of failures observed (i.e., the probability).

John

Still Confused

Hello Dr. Flaig:

I am still confused. It seems to me that tools or techniques that prevent failures from being generated would affect the relative frequency of failures.

And if failures are being prevented from being generated, how is that related to detecting them? There isn't anything to detect. But the occurence has been reduced or eliminated.

I am nt a fan of the AIAG FMEA method. I am not defending it. I am trying to learn and break through my mental models.

Thank you, Dirk

Safety

If the failure mode is a safety issue, the severity goes to 10 and action must be taken, regardless of occurrence and detection numbers. The cost calculation is for prioritizing everything else.

Problems with Occurrence

This was a nice article John. I especially liked the idea of linking risk to economic consequences. One other problem I have run into over the years lies with the determination of occurrence (O). When evaluating a new process or system, the true rate of failure is unknown. If the team is composed of individuals with prior understanding of similar systems or historic performance, it is easy to estimate the new occurrence rating by examining historic experience. Often though, the default is to rank occurrence with a high number due to lack of experience. In a risk adverse environment, the result is that high risk is everywhere due to high occurrence or detection numbers. The FMEA then becomes unmanageable and loses any advantage of prioritization.