Featured Product
This Week in Quality Digest Live
Metrology Features
Harish Jose
Using OC curves to generate reliability/confidence values
Scott Knoche
Choosing the best, most appropriate add-ons makes your work faster and easier
Adam Zewe
Key component for portable mass spectrometers
Peter Büscher
Identify contaminated areas and take steps to optimize them
Silke von Gemmingen
New approach investigates damage due to environmental fluctuation on textile artifacts

More Features

Metrology News
Features improved accuracy, resolution, versatility, and efficiency
Versatile CT solution for a range of 3D metrology, research, and evaluation applications
Adding its new SV series to NASCAR’s all-time leader in wins
Precise, high-speed inspection system makes automotive component production go faster
Upgrade to Mitutoyo’s latest CMM, vision, or form-measuring equipment
International Paper Co. saves money with Radian Plus laser tracker and vProbe
Inspections and measurement range from electronics to automotive and aerospace components
Measurements supporting the global food system

More News

Gary Phillips


The Basics of Gauge R&R

Collecting the right data and getting correct results

Published: Tuesday, November 4, 2014 - 10:11

Every manufacturing company that gets audited, anywhere in the world, is required to do gauge repeatability and reproducibility (R&R) studies. In some cases, this one study is the only chance to find unknown problems with measurement quality. (When problems do occur, it is often downstream from calibration.) It’s important to clearly understand gauge R&R studies, and make the most of them. This article will explain the details and show how to perform gauge R&R studies using a popular software product.

The basics

The method is to measure variables of production measuring processes, and the primary variables are repeatability and reproducibility. The purpose is to confirm that variation is not excessive, or to take action if variation is found to be excessive. This is required for each production measuring process but not for each gauge—you might have 3,000 gauges and only 200 production measuring processes.

Who does it?

Choose three people who do the measurements in production. These might be production people, quality inspectors, or lab technicians, depending on the situation. (During the tryout phase of a new part, you may have to use substitutes for the people who will actually do the measurements in the future.)

It doesn’t matter who collects the data. A calibration technician would often be a good choice. He or she would serve as a resource to answer questions and would have access to gauge R&R software.

What do they have to know?

The people who do the measurement have to know how to measure, of course. They also need to take precautions to make sure they never know which part they are measuring. The parts have to be temporarily numbered in a way in which the technicians can avoid knowing which number is being measured. Operators can often influence a gauge to a considerable extent. Just knowing what to expect will tend to reduce the variation. This can happen even if an operator is consciously trying not to do so.

Setting up the study

Normal sample sizes are 10 parts, three operators, and three trials for a total of 90 measurements. Smaller sample sizes can be used if there is a reason. For example, you have only eight parts and two operators, or the trials are very expensive.

Collecting the data

The person collecting the data should present the parts in random order, but record the measurements according to the temporary part number. In figure 1, data for a typical gauge R&R study has been entered into GAGEtrak software:

Choosing a calculation method

There are three ways to calculate gauge R&R results. The most familiar method is not the best. The familiar method is called “average and range,” or “long AIAG.” This method is intended for spreadsheets or pocket calculators, but it is not recommended for professional software. The average and range method assumes that an error term called “appraiser × part interaction” equals zero. If this assumption is not true (and it sometimes isn’t), then the calculations will not be reliable. A second method is called “range,” or “short AIAG.” It is reserved for special situations. This article will use the workhorse method called “ANOVA,” which stands for analysis of variance. When using computer software, we should typically choose ANOVA.

Evaluating the results

Figure 2 evaluates the results in two different ways.

The “% of Tol” column evaluates the measurement process in terms of capability to determine whether parts meet tolerance. GRR% of Tol = 13.5%, which is “fairly good.” GRR is the combined uncertainty (i.e., variation) including repeatability on production parts, reproducibility, and appraiser × part interaction. GRR is summed by a special method called RSS (root sum square). The individual variables are described following figure 2.

The “% of TV” column evaluates the measurement process in terms of capability to detect changes in total variation (TV, an estimate of process variation). GRR% of TV = 32.2%, which is not acceptable.

Therefore, if we need a gauge to use for experiments to reduce process variation, we should choose a different gauge for that purpose. If we need a gauge only to determine whether parts meet tolerance, this gauge will likely be adequate.

Description of variables

Repeatability: This is variation that is observed when one or more operators repeat the same measurement, on the same part and characteristic, using the same gauge. In figure 1, repeatability will be based on 30 sets of three readings: Using the same gauge, three operators will measure the same dimension on 10 different part samples, three times each. Figure 2 shows the results. This will establish the upper control limits, within which some other variables may default to zero. This particular measure of variation does not distinguish between operators. Repeatability is not always influenced by human (operator) variation. To see whether human variation may be a repeatability issue, view the software’s “repeatability range control chart.”

Reproducibility: This is additional variation that is observed when multiple operators are unable to reproduce the same test-group average within limits predicted by repeatability. In figure 1, reproducibility will be based on one average for each of three operators. Figure 2 shows that the averages vary within limits.

Appraiser × part interaction: This is additional variation that is observed when multiple operators are unable to reproduce the same pattern of part variation within limits predicted by repeatability. In figure 1, interaction will be based on 10 part-averages for each of three operators. Figure 2 shows that operators agree on which parts are larger and which are smaller.

Part-to-part: This is either the actual variation (% TV column), or the allowable variation (% Tol column), in the test parts the gage is trying to measure.

Interpreting gauge capability measures

A few companies prefer to use “number of distinct categories” (ndc) instead of GRR%. We can visualize ndc as “categories” in an imaginary histogram. As GRR% gets smaller, the categories also get smaller, and there is room for more categories. Using ndc will make no difference to acceptance decisions, with one potential exception: Users of ndc may choose to define “not acceptable” as ndc less than five categories. In that case, the corresponding rejection value would be GRR% more than 27 percent.

For those who like the idea of visualizing, GRR% of tolerance can also be visualized. Think of two “zones of doubt.” Within these zones, the gauge can make mistakes (i.e., the measurement and the “true value” can be on opposite sides of the limit). The zones of doubt will tend to be centered on each specification limit. Zone width in the current example is 13.5 percent of tolerance for each zone.

Illustration of reproducibility and “appraiser × part interaction”

In figure 2, appraiser × part interaction is 0.0 percent, meaning that the pattern of part variation, as measured by each operator, is approximately the same.

Figure 4 shows what this looks like. Reproducibility is also 0.0 percent, meaning that the three patterns have approximately the same average. How close the averages have to be is determined by repeatability.

By contrast (not using data from figure 1), figure 5 has both variation from interaction and variation from reproducibility. See figure 6 for the corresponding calculated results.

Reproducibility and interaction are both about how operator variation can increase measurement variation. Of course, the solution might be to get a gauge that is less sensitive to operator variation. The company that contributed the data for figure 5 had used two experienced operators, and one “volunteer” for the study. Also, the gauge did not control location of the measurement.


With today’s software, we only need to collect the data and push a button. With the information in this article, you will be able to get the right data and interpret the results.

Software used for this article is GAGEtrak Calibration Management Software furnished by CyberMetrics.

Gary Phillips is a measurement systems consultant for CyberMetrics, a Quality Digest content sponsor.


About The Author

Gary Phillips’s picture

Gary Phillips

Gary Phillips has consulted for several years on measurement systems analysis, quality engineering, and quality standards and previously he was a quality manager at GM’s Cadillac division. Phillips can be reached through CyberMetrics at 1-800-777-7020 or qmd@cybermetrics.com.


Statistical justification

Is there any formal statistical justification for

quote " Normal sample sizes are 10 parts, three operators, and three trials for a total of 90 measurements."?

Thank you

Gage R & R

We have a product that we make that is very hard to measure.  We use tape measures but, it is very easy to get what you want out of it and 3 people could easily come up with different results.  My question is this, do you need to use the product you are measuring to conduct the R & R.  You are measuring the measurement system so couldn't you just measure blocks of wood?  thanks.

Sampling for a GRR

In my study of a GRR, the range of parts you choose for the 10 samples greatly affects the NDC or %GRR. My opinion previously is that I want my gage to be able to determine the difference between good and no good parts, so I have chosen parts to range from out of tolerance low to out of tolerance high. Recently I have seen posts that my sampling should be random according to the current process, but I don't think this tests the gage's ability to determine the difference between good/no good. 

What is your opinion of manufacturing the study to include out of tolerance parts?

Alternative Measures of Gauge Capability

The previous comment raises non-basic, but interesting questions; you are encouraged to use whichever gauge capability ratio is your personal preference. You don’t have to use GRR%. The software used in this case presents 8 different choices out of a dozen or so possibilities. If I may use Dr. Wheeler for example, and assume that Dr. Wheeler still prefers to use “intraclass correlation coefficient” (ICC); when evaluating Figure 2 in the article, he would look at the 8 choices and choose “PV % Contribution,” (because ICC isn’t there, and PV % Contribution is numerically equivalent to ICC).

PV % Contribution is locked in a mathematical relationship with GRR% and with all of the other choices. GRR% of TV = 30% is always precisely equivalent to PV % Contribution = 91%. In Figure 2, users of GRR% of TV will reject because GRR% exceeds 30%, and users of PV % Contribution will reject because PV % Contribution is less than 91%. The two groups will use different statistics to arrive at identical decisions.

Automotive customers don’t care which way you do it. Either way, they get the level of protection specified in the contract.

    Gary Phillips

% contribution column

The only column that is useful in this study is the % contribution one which is calculated using variances not std deviation. %GRR + %PV HAS TO EQUAL 100%.

If you want to "get corect results" please read Wheeler's article on GRR studies.

There's a lot more to understanding how to perform useful statistical studies than pressing a button using some software package.