Inside Quality Insider

James Wells  |  02/23/2010

James Wells’s picture

Bio

Gauge R&R for Transactional Six Sigma Projects

Measuring the unmeasurable

How many times has this happened to you? You’re leading a Six Sigma project on a transactional process of some kind, something not directly tied to manufacturing or measurement of product quality. You get to the measure phase of your Six Sigma project and struggle to figure out how to satisfy the requirement for a gauge repeatability and reproducibility (R&R) statistic to interpret. If that’s ever happened to you, read on for a solution to this sticky problem.

Where gauge R&R fits into a Six Sigma project

Before we get into the details, I want to spend a few words talking about where a gauge study fits into a Six Sigma project and a little bit on the “spirit” of the gauge R&R requirement.

Gauge R&R is the second step in the measure phase of the define, measure, analyze, improve, control (DMAIC) process,  which is one of the two key methods on which Six Sigma projects are based. Gauge R&R comes after process mapping and building a data collection plan, and before we calculate the baseline capability of our process to be improved. Gauge R&R also comes up again in the control phase of the Six Sigma DMAIC process for the purpose of ensuring that we are able to measure the critical control parameters adequately to maintain the gains that we have achieved.

There are good reasons why gauge R&R is placed where it is in the process. A gauge study follows process mapping because we must understand the process we are trying to improve and where the data about that process can be found before we can measure it. A gauge study precedes calculating baseline capability, because we need to be able to ensure that the data is good before we use it.

The reason we want to do a gauge study boils down to confidence and good decision making. In the measure phase, we do a gauge study of the data used to generate the project Y or “critical to quality” (CTQ) measurement. This is the issue that is most important to the customer of the process. Why do we need to have confidence in this data? So we can be confident that, as we carry that data forward to capability analysis and root cause analysis in the analyze phase, we can trust the conclusions that we will draw and the results we will see. That’s it, confidence and good decision making.

To understand the importance of a gauge study, imagine your automobile speedometer for a moment. Imagine you’re driving down the road and your speedometer indicates that you’re traveling at 55 miles per hour just as you pass a parked police car. Imagine your surprise if that policeman pulled you over and wrote you a ticket for going 65 miles per hour. It would have been great to know that your speedometer was inaccurate by 10 miles per hour. You might have made a different decision while passing the parked police car.

Back to our initial Six Sigma project problem

We are leading a Six Sigma project with attribute data rather than continuous data measured on a device. What do we do to ensure that we can trust the data, decisions, and conclusions that will follow? Attribute agreement is the answer.

Attribute agreement is a method of comparing the responses made by “appraisers” when judging the characteristic of interest. In an attribute agreement study there are four possible levels of analysis of the responses: appraiser against themselves, appraiser against other appraisers, appraiser against a standard (if one exists), and overall appraiser capability.

A Six Sigma project case study

A case study helps explain the tool and how to interpret the results.

A Six Sigma project has been chartered to look into the high occurrence of off-quality product due to expired shelf life. This type of off-quality product typically accumulates about $1 million annually.

Our data: Classification codes of off-quality reasons from enterprise resource planning

Our problem: Determine if we can trust the data that everything classified as shelf life is really a shelf life issue

Possible choices: Shelf life (SL), experimental product (EP), retained sample (RS)

 

Each appraiser judged the samples twice.

If you’re using Minitab, here are the commands to follow:

Figure 1

 


Figure 2

 


Figure 3

 


Figure 4

 

Once the proper selections have been made, go ahead and conduct the analysis and you’ll get results that look like this:

Figure 5

 

The graph on the left side of figure 5  shows how much an appraiser agrees with their own earlier decisions across successive trials. This graph indicates that we may have a training issue with appraiser No. 3 regarding their understanding of the criteria for the decision. The graph on the right indicates a percentage of agreement compared with the standard, if one exists. (If no standard is chosen then this panel will be blank.) This graph indicates that appraiser No. 2 agrees 100 percent with the standard, while appraisers No. 1 and 3 appear to be somewhat confused about the standard.

Fleiss’ Kappa statistic

Next we move on to interpret the session window statistics, but before we go there, here is a brief explanation of the Kappa statistic.

The basis for the Kappa statistic is a comparison to random chance. Imagine flipping a coin to make a quality decision on a process, that’s random chance. Kappa compares the results gathered through the study with the possibility that those results could be randomly generated, as if flipping a coin or rolling a die.

Kappa ranges from -1 to +1 with a value of zero indicating random chance. The closer the Kappa statistic gets to 1, the less likely that the results are the result of random chance. Said a different way, the less random-chance-like the results, the more likely that the appraisers (getting back to the Six Sigma project case study) are actually able to discern differences between the categories.

Kappa values less than zero indicate that the responses are worse than random chance would generate. It’s sort of the statistical equivalent to the old multiple-choice test taking advice of answering C when you don’t know the answer. You’ll be right some of the time. This indicates that the appraiser can not distinguish the categories or is not willing to try.

The hypothesis regarding Kappa goes as follows:

  • H0: The agreement within appraiser is due to chance
  • H1: The agreement within appraiser is not due to chance

 

The way to relate the Kappa statistic to a typical gauge R&R result is to subtract Kappa from 1 to get an approximation of a gauge R&R value. So if Kappa is 0.9, subtract 0.9 from 1 and the remainder is 0.1 or 10-percent gauge R&R. This is just a way to translate the Kappa result into terms that Six Sigma Master Black Belts and Black Belts understand. The same rules of interpretation of a gauge study results apply with attribute studies. Just to refresh, the Automotive Industry Action Group (AIAG) guidelines for acceptability of gauge studies are:

Gauge R&R > 30 percent = unacceptable, measurement process needs improvement

Gauge R&R between 10 percent and 30 percent = marginal, measurement system needs improvement

Gauge R&R < 10 percent = acceptable

 

Interpret attribute studies using the same rules.

Below are the statistical results for the two panel graph shown in figure 5 along with the specific interpretation. (Figures 6 and 7).

Figure 6

 


Figure 7

 


The final session window section shows the comparison between appraisers. Interpretation also included:

Figure 8

 

Six Sigma project case study conclusion

The final conclusion from this Six Sigma project case study was that something needed to be done to improve the ability of engineers making this decision to make a better decision about how to categorize scrap product. This one finding, when corrected, reduced the occurrence of the problem by nearly 50 percent and allowed the team to correctly interpret the magnitude of the problem originally stated. Failure to address the attribute-agreement issues would have resulted in a vastly different set of solutions than resulted after this problem was corrected.

Use attribute agreement analysis for good decision making

Attribute agreement analysis is an effective method for delivering a statistical interpretation of a subjective judgment decision made by people, allowing fact-based improvements to be identified, implemented, and measured. Attribute agreement analysis allows those leading Six Sigma projects without continuous data to measure the quality of that data and boost confidence in the capability of the system, and decisions that are made to improve it.

This article was first published on Six Sigma IQ, a division of IQPC.

Discuss

About The Author

James Wells’s picture

James Wells

James Wells has been a quality professional for over 12 years, implementing quality management systems compliant with ISO 9001, ISO 14001, TS16949, and FDA cGMP requirements, and lean Six Sigma continuous improvement systems that have delivered $36 million in savings and over 90 percent reduction in defects. Wells is certified by the American Society for Quality as a certified manager of quality/organizational excellence, certified as a Six Sigma Master Black Belt and certified as a lean specialist. He can be reached at www.Qualitypractice.blogspot.com.

Comments

Nice coverage of Attribute Agreement Analysis

Nice job, James, on covering AAA...it's an extremely valuable tool, not only for transactional processes, but for manufacturing. I had one manufacturing client that was losing millions of dollars a year because their "standards" for identifying machining defects, staining, blemishes, etc. were all victims of very low agreement. Companies can save themselves a lot of trouble if they learn this skill.
I do want to suggest, however, that Measurement Systems Analysis, whether it's Gage R&R (for continuous metrics) or AAA for discrete categorical metrics, should be used throughout DMAIC. My own approach is admittedly a little different (see "A DMAIC Makeover" in QP Dec 2008)...I insist on a baseline in Define. Even if you wait to do it in Measure, though, you can't just do it for your baseline data and then not use it for all the different process measures you find in Measure and Analyze. While most measurement error will generally have a minimal impact on SPC charts, it is absolutely vital in more distribution-dependent tools such as hypothesis testing, regression and DOE.