Failure mode and effects analysis (FMEA) is an engineering tool that has been heavily adapted for use in Six Sigma programs where it is commonly used to decide which problem to work on. In this usage a risk priority number (RPN) is computed for each of several problems, and the problem with the largest RPN value is selected. The purpose of this column is to explain the inherent problems of RPN values.
Typically a list of several candidate problems will be rated on three scales: severity of failure (S), likelihood of occurrence (O), and difficulty of detection in advance (D). The problem will be assigned a rating of 1 to 10 on each scale, with 10 being severe, very likely to occur, and impossible to detect. These three ratings are then multiplied together to obtain a value known as a risk priority number, and these RPN values are then used to rank the problems. The idea is that the problem with the highest RPN value is the one that needs to be worked on first. This approach has been the subject of textbooks and has been used as the basis for several different types of voting and ranking schemes. Unfortunately, there are two major problems with the use of risk priority numbers.
The first problem is the fact that while the RPN values range from 1 to 1,000, there are only 120 possible values for the RPN values. Moreover, these 120 possible values are not uniformly spread out between 1 and 1,000. This nonuniform spacing may be seen on the horizontal axis of Figure 1.
This restriction on the number of possible values for the RPN scores leads to the second problem. With 10 levels of severity, 10 levels of occurrence, and 10 levels of difficulty of detection, we will have 1,000 different possible problem descriptions. The RPN scores will sort these 1,000 problem descriptions into 120 distinct classes. Some RPN values will group up to 24 problem descriptions together, while other RPN values will correspond to only one problem description, as shown by the frequencies in figure 1. So, the second problem is that the RPN values sort the 1,000 problem descriptions into 120 artificial groupings of different sizes.
To see the artificiality of these groupings, consider the group having an RPN score of 360. On a scale of 1 to 1,000, 360 does not sound like a very high score. However, consideration of Figure 1 will show that 862 of the 1,000 problem descriptions will have a smaller RPN score. Using the criteria given in the auto industry FMEA Manual, Figure 2 lists the 15 problems that outrank 86 percent of the other possible problems.
According to the auto industry FMEA Manual, the problem in Row 1 of figure 2 involves a hazardous failure mode that would affect the safe operation of the vehicle and that would occur without warning. This problem would have a very high incidence of occurrence, affecting approximately one vehicle in three. And this problem would have a moderately high chance of being detected during the design phase and eliminated from the vehicle before production begins.
In the same way the problem in row 15 of figure 2 corresponds to a failure mode that would affect fit and finish. This failure mode would affect approximately one vehicle in three and cannot be detected in the design phase.
Does it seem reasonable to you that the two problems above should be equivalent? Is a hazardous problem affecting one vehicle in three that might be caught before production of equal importance with an appearance problem affecting one vehicle in three that cannot be caught at the design phase? The risk priority numbers rank these two problems the same!
The problem with FMEA is not the subjective ordering of the three different aspects of a problem. It is not even a problem to have more levels than adjectives. The problem is with the risk priority numbers and their use to create a ranking between the problems.
One of my clients decided that a 10-point scale was too detailed. They shifted to using a four-point scale (1 = very low, 3 = low, 6 = moderate, and 9 = high) for severity, occurrence, and detectability. It is left for the reader to confirm that using the RPN scores here will map 64 problem descriptions onto 16 possible values ranging from 1 to 729. Thus, it is not a problem with the number of levels, but with the nonsensical notion that you can multiply rankings together.
When we place a series of categories in order in some continuum such as severity, occurrence, or detectability, we may represent this ordering with numbers. Such numbers are rankings. If we assign the value of 1 to the lowest ranked category in the continuum, then 1 is below 2, 2 is below 3, 3 is below 4, and so on. Values with this property of order are called “ordinal-scale data.” The rankings on severity, occurrence, and detectability are intended to be ordinal-scale data.
However, before the operations of addition and subtraction are meaningful, you absolutely and positively must have interval-scale data. Interval-scale data are data that possess both ordering and distance—not only is 1 less than 2, and 2 is less than 3, but also the distance from 1 to 2 is exactly the same as the distance from 2 to 3. It is this notion of distance that gives meaning to addition and subtraction. Without the metric imposed by distance, you are operating in Wonderland, where 1 + 2 is equal to whatever the Red Queen wants it to be today.
Before the operations of multiplication and division can be meaningful, you must have ratio-scale data. Ratio-scale data are data that posses ordering, distance, and an absolute zero point. A classic example of data that are interval-scale but not ratio-scale are temperatures in degrees Fahrenheit or Celsius. Since both of these scales use an arbitrary zero point, multiplication and division do not make sense. However, addition and subtraction do result in meaningful numbers. For example, in either system, the following is a true statement: 60° + 10° = 70° But in either system the following equation is nonsense: 60°/80° = 0.75
Clearly, using the operation of division with interval-scale values will result in nonsense. Because division and multiplication are two facets of the same operation, neither of these operations make sense with interval-scale data.
Thus, with ratio-scale data we can add, subtract, multiply, and divide numbers to get meaningful results.
With interval-scale data we can add and subtract numbers to get meaningful results, but multiplication and division will result in nonsense.
With ordinal-scale data addition, subtraction, multiplication, and division are all nonsense operations. And as the product of three ordinal-scale values, any RPN value is nonsense squared. The lack of a distance function and the lack of an absolute zero will combine to result in inconsistencies where both serious and trivial problems have the same RPN value, and where some trivial problems end up with larger RPN values than other, more serious, problems.
This is why any attempt to use RPN values is an exercise in absurdity. Their use in the same room with a mathematician will tend to produce a spontaneous explosion. They are utter and complete nonsense.
If you feel that you have successfully used RPN values to identify problems to work on, you have been deluding yourself with the elaborate mumbo-jumbo that surrounds the calculation of the RPN values. While you may have been successful, that success did not come from the use of the RPN values.
When working at the design phase, there is a rationale to doing an FMEA. In this case you may use the three scales, and use the rankings, but you should not use the RPN values. If you feel that you absolutely must have a systematic overall ranking of all the failure modes, then use 1 through 5 instead of 1 to 10 for the rankings of each aspect, and then create a three-digit code for each failure mode, where the first digit is severity, the second digit is occurrence, and the third digit is detectability. To signify the ordering of these three aspects, designate this three digit code as the SOD code. An SOD code based on rankings of 1 through 5 will result in 125 values for 125 situations. When these SOD codes are placed in descending numerical order, they will prioritize the situations first by severity, second by occurrence within each level of severity, and lastly by detectability within each combination of severity and occurrence. Notice that this approach uses the original rankings without distorting them. This will allow you to rationally choose the problems that need to be addressed.
When working with an existing process, there is no need for the elaborate exercise of an FMEA since the process behavior charts can spotlight the problems that need to be addressed in spite of all the fog and confusion associated with production.