Failure mode and effects analysis (FMEA) is an engineering tool that has been heavily adapted for use in Six Sigma programs where it is commonly used to decide which problem to work on. In this usage a risk priority number (RPN) is computed for each of several problems, and the problem with the largest RPN value is selected. The purpose of this column is to explain the inherent problems of RPN values.
Typically a list of several candidate problems will be rated on three scales: severity of failure (S), likelihood of occurrence (O), and difficulty of detection in advance (D). The problem will be assigned a rating of 1 to 10 on each scale, with 10 being severe, very likely to occur, and impossible to detect. These three ratings are then multiplied together to obtain a value known as a risk priority number, and these RPN values are then used to rank the problems. The idea is that the problem with the highest RPN value is the one that needs to be worked on first. This approach has been the subject of textbooks and has been used as the basis for several different types of voting and ranking schemes. Unfortunately, there are two major problems with the use of risk priority numbers.
The first problem is the fact that while the RPN values range from 1 to 1,000, there are only 120 possible values for the RPN values. Moreover, these 120 possible values are not uniformly spread out between 1 and 1,000. This nonuniform spacing may be seen on the horizontal axis of Figure 1.
Figure 1:
This restriction on the number of possible values for the RPN scores leads to the second problem. With 10 levels of severity, 10 levels of occurrence, and 10 levels of difficulty of detection, we will have 1,000 different possible problem descriptions. The RPN scores will sort these 1,000 problem descriptions into 120 distinct classes. Some RPN values will group up to 24 problem descriptions together, while other RPN values will correspond to only one problem description, as shown by the frequencies in figure 1. So, the second problem is that the RPN values sort the 1,000 problem descriptions into 120 artificial groupings of different sizes.
To see the artificiality of these groupings, consider the group having an RPN score of 360. On a scale of 1 to 1,000, 360 does not sound like a very high score. However, consideration of Figure 1 will show that 862 of the 1,000 problem descriptions will have a smaller RPN score. Using the criteria given in the auto industry FMEA Manual, Figure 2 lists the 15 problems that outrank 86 percent of the other possible problems.
Figure 2:
According to the auto industry FMEA Manual, the problem in Row 1 of figure 2 involves a hazardous failure mode that would affect the safe operation of the vehicle and that would occur without warning. This problem would have a very high incidence of occurrence, affecting approximately one vehicle in three. And this problem would have a moderately high chance of being detected during the design phase and eliminated from the vehicle before production begins.
In the same way the problem in row 15 of figure 2 corresponds to a failure mode that would affect fit and finish. This failure mode would affect approximately one vehicle in three and cannot be detected in the design phase.
Does it seem reasonable to you that the two problems above should be equivalent? Is a hazardous problem affecting one vehicle in three that might be caught before production of equal importance with an appearance problem affecting one vehicle in three that cannot be caught at the design phase? The risk priority numbers rank these two problems the same!
The problem with FMEA is not the subjective ordering of the three different aspects of a problem. It is not even a problem to have more levels than adjectives. The problem is with the risk priority numbers and their use to create a ranking between the problems.
One of my clients decided that a 10-point scale was too detailed. They shifted to using a four-point scale (1 = very low, 3 = low, 6 = moderate, and 9 = high) for severity, occurrence, and detectability. It is left for the reader to confirm that using the RPN scores here will map 64 problem descriptions onto 16 possible values ranging from 1 to 729. Thus, it is not a problem with the number of levels, but with the nonsensical notion that you can multiply rankings together.
When we place a series of categories in order in some continuum such as severity, occurrence, or detectability, we may represent this ordering with numbers. Such numbers are rankings. If we assign the value of 1 to the lowest ranked category in the continuum, then 1 is below 2, 2 is below 3, 3 is below 4, and so on. Values with this property of order are called “ordinal-scale data.” The rankings on severity, occurrence, and detectability are intended to be ordinal-scale data.
However, before the operations of addition and subtraction are meaningful, you absolutely and positively must have interval-scale data. Interval-scale data are data that possess both ordering and distance—not only is 1 less than 2, and 2 is less than 3, but also the distance from 1 to 2 is exactly the same as the distance from 2 to 3. It is this notion of distance that gives meaning to addition and subtraction. Without the metric imposed by distance, you are operating in Wonderland, where 1 + 2 is equal to whatever the Red Queen wants it to be today.
Before the operations of multiplication and division can be meaningful, you must have ratio-scale data. Ratio-scale data are data that posses ordering, distance, and an absolute zero point. A classic example of data that are interval-scale but not ratio-scale are temperatures in degrees Fahrenheit or Celsius. Since both of these scales use an arbitrary zero point, multiplication and division do not make sense. However, addition and subtraction do result in meaningful numbers. For example, in either system, the following is a true statement: 60° + 10° = 70° But in either system the following equation is nonsense: 60°/80° = 0.75
Figure 3:
Clearly, using the operation of division with interval-scale values will result in nonsense. Because division and multiplication are two facets of the same operation, neither of these operations make sense with interval-scale data.
Thus, with ratio-scale data we can add, subtract, multiply, and divide numbers to get meaningful results.
With interval-scale data we can add and subtract numbers to get meaningful results, but multiplication and division will result in nonsense.
With ordinal-scale data addition, subtraction, multiplication, and division are all nonsense operations. And as the product of three ordinal-scale values, any RPN value is nonsense squared. The lack of a distance function and the lack of an absolute zero will combine to result in inconsistencies where both serious and trivial problems have the same RPN value, and where some trivial problems end up with larger RPN values than other, more serious, problems.
This is why any attempt to use RPN values is an exercise in absurdity. Their use in the same room with a mathematician will tend to produce a spontaneous explosion. They are utter and complete nonsense.
If you feel that you have successfully used RPN values to identify problems to work on, you have been deluding yourself with the elaborate mumbo-jumbo that surrounds the calculation of the RPN values. While you may have been successful, that success did not come from the use of the RPN values.
When working at the design phase, there is a rationale to doing an FMEA. In this case you may use the three scales, and use the rankings, but you should not use the RPN values. If you feel that you absolutely must have a systematic overall ranking of all the failure modes, then use 1 through 5 instead of 1 to 10 for the rankings of each aspect, and then create a three-digit code for each failure mode, where the first digit is severity, the second digit is occurrence, and the third digit is detectability. To signify the ordering of these three aspects, designate this three digit code as the SOD code. An SOD code based on rankings of 1 through 5 will result in 125 values for 125 situations. When these SOD codes are placed in descending numerical order, they will prioritize the situations first by severity, second by occurrence within each level of severity, and lastly by detectability within each combination of severity and occurrence. Notice that this approach uses the original rankings without distorting them. This will allow you to rationally choose the problems that need to be addressed.
When working with an existing process, there is no need for the elaborate exercise of an FMEA since the process behavior charts can spotlight the problems that need to be addressed in spite of all the fog and confusion associated with production.
Comments
(duplicate entry deleted)
(duplicate entry deleted)
An excellent analysis, as
An excellent analysis, as always Don. However I think that no amount of revealing the nonsense behind Six Sigma will make industry wake up and bring it out of the crisis.
My feeling is that rating "likelihood of occurrence" is quite unrealistic. Accidents for example, are commonly a combinatination of a series of "impossible" events, which seem to happen all too frequently.
FMEA as the lesser evil
Yes, ADB, the job of rating the likelihood of occurrence is very subjective. This is why I never use FMEAs. However, at the preproduction stage, when data are unavailable, it can be helpful to go through a FMEA to see if anything has been overlooked that needs to be addressed. It is always imperfect, but it beats the alternative of doing nothing.
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality
Change in RPN approach?
Advancing technology means we can improve on "the way we've always done it." As an example, traditional attribute charts are hopelessly obsolete when it is easy enought to compute exact control limits (0.00135 false alarm risk at each end) on a computer. I would say the same of the R and s chart because it is easy enough to calculate exact 0.00135 risk control limits (or any other specified risk) instead of relying on the normal approximation. The same might be possible with FMEA and PFMEA but it requires quantification of all three risks. For severity, what is the dollar cost of the failure (including intangibles such as enraged customers who never come back and recommend the same to their friends--the airline industry comes to mind immediately)? There are already guidelines for the probability of occurrence and probability of non-detection. Multiply them get the RPN: a direct measurement of the estimated (decision theory) cost of the failure in question. When the severity involves a threat to human life, perhaps set the cost at a very high "penalty cost" (similar to that for enforcing constraints in linear programming, in which the solution must be revised to eliminate "M" from the objective function). All this assumes, of course, that exact quantification is possible which is far easier in theory than in practice. If not, perhaps state an optimistic, most likely, and pessimistic RPN based on best and worst-case scenarios.
the danger in RPNs
Well said - great point about the problems using math on ordinal number; good suggestion in combining (vs. multipling) the digits to avoid the problems.
Ken B
Donald, do you have a
Donald, do you have a preferred risk analysis tool that you like to use better than FMEA? What you say makes a lot of sense, but as you know, leaders tend to want to see a ranking of some sort.
An alternative for DWILEN
The SOD code explained in the article will provide a ranking that can be explained and used in a rational manner.
I used only five levels with the SOD code because that will sort the problems into 125 categories, which is more than enough.
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality
Thanks for the article
Thanks for the article Donald, a mathematical analysis of FMEA. I too have been puzzled at how relatively low severity problems can merit the same RPN as potentially dangerous ones. My solution thus far has been to automatically prioritize high severity failure modes above low severity ones with comparable RPNs. However, now I am planning to try your approach.
High severity ratings
"I too have been puzzled at how relatively low severity problems can merit the same RPN as potentially dangerous ones. " I think books on FMEA say that anything with a high severity rating, especially at the danger to human life or safety level, automatically demands attention even if the occurrence and detection numbers are very low. The RPN is an aid to engineering judgment and not a substitute for it.
Alternatives
Excellent article, as usual. Working in product development, I find that FMEAs are often required by customers, especially the automotive OEMs, and better than nothing for capturing and communicating the ideas and knowledge that team members have about potential failures.
When we use FMEAs, I suggest to my teams that they follow the normal practice with regards to rankings and RPN, but then sort the RPNs into three classes based on the Severity ranking, where a 9 and 10 (causes injury or death) are in one class, 4 - 8 (likely to cause customer dissatisfaction) in another class and 1 - 3 (more or less undetectable to customer) in the third class. They then sort by RPN within the classes and address the most severe class first. I don't recall where I first heard of using classes to mitigate some of the problems of RPNs, but it seems to work reasonably well and still meet with most customer requirements for documentation. I think this would have nearly the same result as your SOD approach, but without the arguments with the customer over whether or not we should have 5 or 10 levels for each ranking.
The biggest problem with FMEAs, and alluded to in a previous comment, is that most failure modes are complex and not easily captured in the simplistic format of the FMEA. Overcoming this limitation while still meeting customer demands to have things in an FMEA format is my biggest challenge.
Sorting then using the RPN values
All RPN numbers are nonsense. The SOD code can be done with 10 categories (I would suggest using 0 through 9 rather than 1 through 10). When you sort by severity you are already doing the same thing as the first step in using the SOD code. The essence of FEMA is the three ordinal rankings, the RPN value is just a bit of nonsense tacked on for those who couldn't handle 1000 problem descriptions.
AIAG Manuals
Between the faulty percentages derived from the GRR studies in the MSA manual and the error of mutiplication used to calculate RPNs in the FMEA manual, it seems the AIAG books are doing more harm than good.
Does anyone know if the commitees that develop these procedures are aware of their short commings?
Rich
Agree and Disagree
Yes, the FMEA method is subjective, but so was the metric and
standard measurement system when first developed. Who decided an inch was an
inch? A consensus was made and then agreed upon. For any FMEA to work the
definitions and labels must be defined and used consistently. It can also be
used for more than the design phase too.
I use it to assign resources and monitor changes. Agreed that an
identical RPN Value may not be the same so you cannot ignore the variables
that were used to calculate it. The intent of the tool is not to look at
risk alone. The intent is to go get facts before you give the rating in each
category.
What I often see are groups who say that working at 1,000 feet has a
greater chance of injury than working at 10 feet. So by risk alone, they assign
the resources constantly for 1,000 feet tasks. Guess what? If the process for
controlling that risk is in control then you will have a low frequency rating.
While the inherent risk of working at 10 feet is lower but you have
60 incidents a month, where would you assign your process improvement resources?
By using a living FMEA you can set a baseline specific for the
process you are measuring, improve it and then reexamine the categories. I also
only use 1,3 and 9 as ratings.
So is it subjective..yes? Can it be used if used consistently with
defined parameters... yes!
There is not measurement (categorical or mathematical) that is 100 percent. That is why there are Type 1 and Type II errors and risks. Heck people can't even agree on .005, .001 or .01... does that mean you through it away.. no! Use it for its intended purpose, understand what was used to develop the measurements and do not abuse it.
Good luck
Christopher Vallee
chris@taproot.com
ps. great meeting you Don at the ASQ conference this year.
Good thinking if based on firm foundation
Only people in our line of work would likel say that was enjoyable read, but it certainly was. Such thresholds probably do contribute to a high rate of ineffective FMEA's, but they don't tell the whole story. People are loathe to score too low or too high, so while the sample set of possibilities favors the extremely low, the real world favors the 5th to 35th percentile of the range in my experience. That seems to be governed by the team's tendency to central scoring. A very human influence.
I admire the attempt to steer the ship toward something meaningful, but also challenge the (in my mind) excessive debate, over the scoring of FMEA's. if we're going to exert mental energy, then the lion's share of it needs to be in the direction of maximum effectiveness. Is the proper method of scoring improperly specified risks truly effective? In our constant debate of scoring FMEA, we too often miss the point of the excercise to begin with: to diligently think about our risks and what we're going to do about them.
It's odd that a tool whose origins reach back nearly 70 years now is still so misunderstood and so inconsistently employed, but here we stand with it as such. And it alarms me that we so often get hung up on the nuances of scoring while missing the meat on the bone of definition of failure effect, failure mode, root cause and accurately categorized controls. Without diligence to those items first, any scoring and any way of slicing that scoring will yield nonsense.
Most CQE's are now trained to consider RPN's in a method similar to what you've laid out, but the attention to defining the failure mode, what prevention controls are, what detection controls are, and how the categories do or do not interact seems to be lacking.
Look at some FMEA's you've seen recently with a probing eye. Is the failure mode well defined and connected to the requirements in a way that makes it actionable? Are root causes relevant to the process at hand? Are controls appropriately categorized? It can be alarming to see how often we fall short there, but encouraging to know that we're only a simple turn in thinking away from truly honoring the intent of the tool.
Maybe I'm a hopeless optimist, but I'd like to think that if we really do our best to state what our risks are and what causes them, that nonsensical scores expose themselves, and the real work that needs done will get done.
SOD
Agree that SOD can signify the ordering of three aspects (severity, occurrence, and detection). However, it also has a disadvantage when using a large FMEA. If we need to prioritize the risk base on SOD, it could be misleading. Example: 799 or 811, which one is the higher risk? In my view, RPN should be used with a risk matrix (priority matrix) to define the higher risk (www.fmea-analysis.com/news/risk-priority-number or www.iqasystem.com/news/risk-priority-number)
Add new comment