Measurement error is ubiquitous. As a result, over the past 250 years, different areas of science and engineering have come up with many different ways to deal with the problem. One approach to the problem of measurement error was developed during the 1960s within General Motors. Throughout the years it was modified and revised, until in 1989, it was turned over to the Automotive Industry Action Group (AIAG). Since that time, the AIAG Gauge Repeatability and Reproducibility (R&R) Study has been promoted throughout many different industries. Unfortunately, the original procedure contained some fundamental problems that have not been corrected over the years. This column will address these historic problems and suggest solutions.
The gauge R&R Study starts out with a sound strategy for collecting data. A simple fully-crossed experiment is performed where two or more operators measure each of three to 10 parts two or three times apiece. To have an example, we will use the data shown in figure 1 where three operators measure each of five parts two times apiece.
The average and range are shown for each of the 15 pairs of measurements in figure 1. Since each of these pairs represents the same operator measuring the same part, the ranges characterize the basic component of measurement error. This component has many names. Among these are test-retest error, repeatability, and equipment variation. This component may be estimated by dividing the average range by the appropriate bias correction factor d2:
Next, under the assumption that the operators are different, an estimate of the reproducibility is obtained by computing an average value for each operator and finding the range of these averages. Here the operator averages are 181.0, 172.5, and 173.9 and the range of these three averages is Ro = 8.5 mils. This range value is divided by a bias correction factor for the range of three values, d2* = 1.906, and then, with o = number of operators, p = number of parts, and n = number of repeated measurements, the reproducibility is estimated to be:
Once we have an estimate of both the repeatability and the reproducibility, we can combine them to estimate the combined R&R component as:
Next the product variation is estimated by computing averages for each of the p parts and using the range, Rp. Here the five-part averages are 158.0, 206.167, 182.0, 184.833, and 148.0. The range for these part averages is Rp = 58.167. The bias correction factor used is d2* = 2.477, which is for one range of five values. The product variation is estimated to be:
Finally, the total variation for the set of product measurements is estimated by combining the repeatability, the reproducibility, and the product variation to get:
Up to this point everything is okay. While the estimators defined here are not the only formulas that could have been used, and while they are not always unbiased estimators, they do provide reasonable estimates for the quantities described. (While the AIAG gauge R&R computational worksheet simplifies the formulas by defining several “K-factors,” the AIAG formulas are algebraically equivalent to formulas 1–5.)
The train wreck begins when the AIAG gauge R&R study tries to use the estimates from formulas 1–5 to characterize relative utility. In the current version the first four quantities are all expressed as a percentage of the total variation from formula 5.
The repeatability (formula 1) is divided by the total variation (formula 5) and multiplied by 100 to be expressed as a percentage. This ratio is said to represent the percentage of the total variation that is consumed by repeatability. Here we find:
Next the reproducibility (formula 2) is divided by the total variation (formula 5) and multiplied by 100. This ratio is said to represent the percentage of the total variation that is consumed by reproducibility. Here we find:
The combined repeatability and reproducibility (formula 3) is divided by the total variation (formula 5) and multiplied by 100. This ratio is said to represent the percentage of the total variation that is consumed by the combined repeatability and reproducibility. Here we find:
Finally, the product variation (formula 4) is divided by the total variation (formula 5) and multiplied by 100. This ratio is said to represent the percentage of the total variation that is consumed by the product variation. Here we find:
Following these ratios, the current version of the AIAG manual has a simple statement to the effect, “The sum of the percent consumed by each factor will not equal 100%.” This statement has no explanation attached. There is no guidance offered on how to proceed now that common sense and every rule in arithmetic have been violated. Just a simple statement that these numbers do not mean what they were just interpreted to mean, and the user is left to his or her own devices. Unfortunately, unlike the Red Queen in Wonderland, when it comes to arithmetic we do not get to say that things mean whatever we want them to mean.
The repeatability percentage of formula 6 and the reproducibility percentage of formula 7 do not add up to the combined repeatability and reproducibility percentage of formula 8 because they are not proportions. Likewise, the combined R&R percentage of formula 8 and the product variation percentage of formula 9 do not add up to 100 percent because they are not proportions. Instead of being proportions, it turns out that the quantities computed above are all trigonometric functions. Figure 2 shows how the five estimates of variation are related.
Thus, based on figure 2, and recalling some of our high school trigonometry, we see that the ratio of the repeatability to the total variation is:
The ratio of the reproducibility to the total variation is:
The ratio of the combined repeatability and reproducibility to the total variation is:
And the ratio of the product variation to the total variation is:
In this form we can begin to see why these quantities do not add up. While they were dressed up to look like proportions, and while they were interpreted as proportions, they are, and always have been, nothing more than trigonometric functions. And trigonometric functions do not satisfy the conditions required for a set of ratios to be interpreted as proportions.
A set of ratios will be proportions only when the denominator is the sum of the numerators. This additivity of the numerators is the essence of proportions. So what is additive in a gauge R&R study? Look at the structure of formula 5. It is the variances that are additive:
Total Variation2 = Repeatability2 + Reproducibility2+ Product Variation2
Because these variances are additive, we know from the Pythagorean Theorem that the standard deviations cannot be additive. However, the ratios computed by the AIAG gauge R&R study implicitly assume that the standard deviations are additive. This implicit assumption of additivity is a violation of the Pythagorean Theorem, and is what makes it impossible to make sense of the ratios in the AIAG gauge R&R study. This is why the percentages of ratios in formulas 6 through 9 do not add up, and this is why engineers have told me they never could figure out exactly what the final numbers in a gauge R&R study represented. The ratios sound like nonsense simply because it is nonsensical to interpret trigonometric functions as proportions.
Formula 5 above suggests the solution to the problem of how to characterize the contribution of the various components to the total variation in the product measurements. For example, the repeatability contribution to the total variation could be estimated by:
Honest ratio 6:
which is 2.45 percent rather than the 15.65 percent erroneously found earlier. The reproducibility contribution to the total variation could be estimated by:
Honest ratio 7:
which is 3.16 percent rather than the 17.77 percent erroneously found earlier. The combined repeatability and reproducibility contribution to the total variation could be estimated by:
Honest ratio 8:
which is 5.61 percent rather than the 23.68 percent erroneously found earlier. We also should note that this 5.61 percent is the sum of the 2.45 percent and the 3.16 percent, which is what we expect when computing proportions. The product variation contribution to the total variation could be estimated by:
Honest ratio 9:
which is 94.38 percent rather than the 97.15 percent erroneously found earlier. When this value is added to the combined R&R percentage we effectively get 100 percent. Now we have correctly accounted for the various components of the variation in the product measurements.
Honest Ratio 9 is the traditional measure of relative utility introduced by Sir Ronald Fisher in 1921. In this context it is commonly known as the intraclass correlation. Here the value of 0.9438 tells us that more than 94 percent of the variation in the product measurements is attributable to the variation in the product stream, and less than 6 percent of the variation in the product measurements is attributable to variation in the measurement system (combined R&R). Thus, this one number summarizes the essential information contained in all four of the honest ratios above and provides a characterization of the relative utility of the measurement system for measuring a particular product.
Since any attempt to use ratios 6 through 9 from the AIAG gauge R&R study is a violation of the Pythagorean Theorem, you will need to convert the ratios into the correct quantities shown as honest ratio 6 through honest ratio 9 before attempting to interpret or use them in any manner. No exceptions. No loopholes. No excuses.
The ratios in formulas 6 through 9 were added to the General Motors gauge R&R study in its 1984 revision. Prior to 1984 there were only three ratios. These compared the repeatability (formula 1), the reproducibility (formula 2), and the combined R&R (formula 3) to the specified tolerance.
Assume that the specifications for the gaskets of figure 1 are 145 mils to 225 mils. How much of this specification range is “lost” due to measurement error? In an attempt to answer this question the GM gauge R&R study multiplied the repeatability by 5.15 (presumably to account for 99% of the measurement error), divided by the specified tolerance, and then multiplied by 100 to express the result as a percentage. In an attempt to account for more than 99 percent of the measurement error, the AIAG gauge R&R study eventually changed the initial multiplier from 5.15 to 6.00. Thus, the following ratio is said to represent that amount of the specified tolerance that is consumed by repeatability:
In a similar manner, reproducibility is said to consume the following amount of the specified tolerance:
However, the combined repeatability and reproducibility is said to consume the following amount of the specified tolerance:
Once again, these three ratios suffer from the same lack of additivity that corrupted the ratios in formulas 6, 7, 8, and 9. Since these ratios cannot all be right, they must all be wrong. Moreover, since reproducibility cannot occur except in conjunction with repeatability, the ratio in formula 11 is not only incorrect, but it also would not make any sense even if it were correct.
As we saw in my columns in June and July, “Is the Part in Spec?,” and “Where Do Manufacturing Specifications Come From?,” the appropriate way to adjust the specifications for measurement error is to use manufacturing specifications. When we use the approach given there we find that, based on the repeatability alone, the 99 percent manufacturing specifications are 152 mils to 218 mils, which represent a loss of only 17 percent of the specified tolerance, rather than the 28 percent erroneously computed above.
Using the combined R&R, the 99 percent manufacturing specifications become 156 mils to 214 mils, which represents a loss of 27 percent rather than the 43 percent erroneously computed above.
Thus, as with the erroneous ratios in formulas 6 through 9 that are based on the total variation, the AIAG ratios in formulas 10, 11, and 12 that are based on the specified tolerance also overstate the damage due to measurement error. This point is especially important in the light of the widespread use of the precision to tolerance (P/T) ratio. The P/T ratio for the data of figure 1 is:
which is the same as the erroneous ratio from formula 12 from the AIAG study divided by 100. And just like ratio 12, the P/T ratio overstates the damage due to measurement error without providing any useful information on how to adjust the specifications for measurement error.
Finally, the AIAG gauge R&R study computes a quantity it calls the “number of distinct categories (ndc)” using the formula:
The AIAG manual suggests rounding this value off to an integer, and that values of 5 or greater are “good.” (Simple algebra will show that this value is inconsistent with the guidelines given below, but inconsistencies of this sort are found throughout the AIAG manual.)
The citation given in the AIAG manual for formula  is in the first edition of Evaluating the Measurement Process by myself and Richard Lyday. In that text we defined a quantity which we called the classification ratio, and the formula above does provide an estimate of this classification ratio. However, nowhere in that text did we ever suggest that this ratio would define the number of distinct categories.
Unfortunately, as I has discovered after much effort, there is no simple interpretation for the classification ratio in practice. While the classification ratio does approximate the relative sizes of the major and minor axes in the intraclass correlation plot, this plot does not lend itself to any practical interpretation. So while the ratio given in formula 13 does describe one aspect of a particular plot of the data, neither that plot, nor formula 13, provide any practical characterization of the number of distinct categories for the product.
In spite of the many revisions made in the Gauge R&R Study over the years the guidelines for interpreting the proportions that are not really proportions has remained unchanged. Whether they were applied to the ratios in formulas 6, 7, or 8, or were applied to the ratios in formulas 10, 11, or 12, the guidelines have always been:
• Ratios that are less than 10 percent are said to be good.
• Ratios that between 10 percent and 30 percent are said to be marginal.
• Ratios that exceed 30 percent are said to be unacceptable.
Using these guidelines we would interpret the ratios computed earlier, as shown in figure 3.
Here we find that the measurement system used to measure the gaskets in figure 1 is simultaneously marginal, unacceptable, and good! Any questions?
While the ratios 6, 7, and 8 have the same numerators as ratios 10, 11, and 12 respectively, they have different denominators. In spite of these different denominators, the same guidelines are used for each set of ratios in the AIAG Gauge R&R Study. This is presumably one of the benefits of using guidelines that have absolutely no contact with reality. Since no justification for these guidelines exists, they can be adapted to fit any set of ratios that might be computed. After all, when attempting to interpret patent nonsense, it is always best to use arbitrary guidelines.
The inconsistency of the results in figure 3 is an inherent feature of the AIAG gauge R&R study. Once you ignore the Pythagorean Theorem, mere guidelines are not going to remedy the situation. After finding reasonable estimates for repeatability, reproducibility, combined R&R, product variation, and the total variation with formulas 1 through 5, the eight ratios (6 through 13) computed in the AIAG gauge R&R study are erroneous, fallacious, naive, incorrect, and, to put it quite simply, wrong.
In my December column, we looked at how the intraclass correlation coefficient could be used to define four classes of process monitors. To see how the gauge R&R guidelines compare with the four classes of process monitors we need to begin by noting that the ratio in formula 8 is 100 times the following:
Thus, if we plotted values for ratio 8 on one scale (in ascending order), and plotted values for the intraclass correlation on another scale (in descending order) we could connect equivalent points on the two scales to get figure 4.
The AIAG gauge R&R guidelines will pronounce a measurement system to be good only when it has an intraclass correlation in excess of 0.99; marginal measurement systems will have an intraclass correlation between 0.91 and 0.99; and everything else is “in need of improvement.” Contrast this to the results from last month’s column which are summarized in figure 5. There we see that even third-class monitors can be used to track process improvements.
When we compare the results in figures 4 and 5, it becomes clear that the main purpose of the AIAG gauge R&R study is to condemn the measurement process. To achieve this end the summary ratios from formulas 6 through 12 are inflated so as to overstate the damage of measurement error while the guidelines used to interpret these inflated ratios are excessively conservative.
So what can you learn from the AIAG gauge R&R study? Virtually nothing that is true, correct, or useful! You have taken the time and gone to the trouble to collect good data, and then you have wasted the information contained in those data by performing a hopelessly flawed analysis.
The Pythagorean Theorem would need to be repealed before the summary ratios 6 through 12 of the AIAG gauge R&R study could even begin to make sense. Since that is unlikely to happen anytime soon, you need to stop using the erroneous ratios of the AIAG gauge R&R study.
The number of distinct categories value from ratio 13 does not represent anything that can be expressed in practical terms. So even though I may be the author of this ratio, it is useless in practice. I personally quit using it back in the 1980s. I suggest that you do the same, starting immediately.
The erroneous ratios 10, 11, and 12 overstate the impact of measurement error upon specifications. Manufacturing specifications provide a way to adjust the specifications to make allowances for measurement error that is both theoretically sound and easy to implement. The use of manufacturing specifications will eliminate the need to compute the erroneous ratios 10, 11, and 12 as well as the related precision to tolerance ratio.
Honest ratio 6 through honest ratio 9 replace the erroneous ratios 6 through 9. Honest ratio 9 is an estimate of the traditional, theoretically sound, intraclass correlation coefficient which was introduced in my July column and explained more fully in my December column.
Because many software packages currently give these nonsense ratios as part of any “measurement system analysis,” you will need to identify these fallacious numbers on the output in order to avoid being misled by them.
This paper and my earlier papers provide you with practical, theoretically sound, and easy-to-use alternatives to the nonsense ratios of the AIAG gauge R&R study. They allow you to learn and use the correct formulas for evaluating the measurement process. Further documentation and explanation can be found in my book EMP III: Evaluating the Measurement Process and Using Imperfect Data.
However, if you still want to obfuscate and confuse the issues, if you want to needlessly condemn measurement systems, and if you want to continue to beat vendors over the head, then by all means continue to use the AIAG gauge R&R study.