Cost for QD employees to rent an apartment in Chico, CA. $1,200/month. Please turn off your ad blocker in Quality Digest

Our landlords thank you.

Six Sigma

Published: Tuesday, July 6, 2010 - 08:40

Evidently Steven Ouellette did not like my June column, “Is the Part in Spec?” The adjectives he used were “complicated,” “unhelpful,” “backward,” “confusing,” “unnecessary,” “crazy,” and “disastrous.” (Ouellette’s response, “Know the Process Before Altering Its Specifications,” can be read here.) Yet, before he published his column he had in his possession the full mathematical explanation for the results I presented in that column. Without going into all the calculus, this column will outline the justification for manufacturing specifications and explain their use.

For the record, my June column had nothing to say about the important questions of process performance and measurement system acceptability, yet these are the only two questions addressed in Ouellette’s reply. So to be clear on this point: The only way to avoid shipping some nonconforming product is to avoid making nonconforming product in the first place. To do this, you must have a capable process and then you will need to operate that process predictably and on target. In my books I call this “operating in the ideal state.” Moreover, to track process changes in a timely manner, you will need a measurement system that is at least a “Third Class Monitor.” I will say more on this topic later.

For those who are not operating in the ideal state, there is still inspection, imperfect though it may be. This is where guardbanding is sometimes used. Historically, the various guardbanding schemes have been overly conservative, resulting in unnecessary costs for the supplier. Therefore, back in 1984, I used the appropriate probability models to determine how to create appropriate guardbands. The following is an outline of that argument and a summary of those results.

Let’s begin by letting Y denote the value of an item in the product stream. When we measure this item, we will get some observed value. Denote this value by X. The problem of measurement error is that X is seldom the same as Y. Because we have two variables here, we need to use a bivariate probability model. Moreover, because the normal distribution is the classic distribution for measurement error, we shall use a bivariate normal model. Thus, we can place the product values Y along the vertical axis and our observed values X along the horizontal axis. Our bivariate model creates the ellipse in figure 1. The better the measurement system, the thinner the ellipse and the stronger the correlation between the product values and the observations.

Now consider the experiment of measuring the same item repeatedly. The item being measured will have a value of Y = y. This value of Y will define a horizontal slice through the ellipse. The width of that slice will define a range of measurement values that will occur in conjunction with Y = y. The distribution shown on the horizontal axis and labeled *f(x| y) *defines the conditional probability model for the measurements X given that Y = y. This conditional distribution of X given Y has a mean of:

MEAN(x | y) = y

Thus, the measurements will cluster around the product value. This distribution also has a standard deviation of:

SD(x | y) = σ_{e} = standard deviation for measurement error

The fact that the standard deviation of the conditional distribution of X given Y is the standard deviation of measurement error is the reason that all measurement system studies are built upon repeated measurements of a collection of product samples. However, the distribution of X given Y will not help in answering the question of whether an item is conforming.

When you are standing at the end of the production line holding an item that you have just measured, the question of interest is, “Given this observed value X, is it likely the product value Y is within the specifications?” To answer this question, we begin with a single observed value X = x. This value for X creates a vertical slice through the ellipse and defines a range of product values Y that could have given rise to the observed value X = x. The conditional distribution of Y given that X = x is labeled as *f(y|x) *and shown on the vertical axis in figure 1. This conditional distribution defines the probability model for this range of values for Y. This distribution is a normal distribution with mean of:

MEAN(y | x) = *ρ* x + ( 1 – *ρ*) MEAN(X)

and a standard deviation of:

SD(y | x) =

where *ρ* denotes the intraclass correlation coefficient. (This intraclass correlation coefficient is the square of the correlation between X and Y, and may be interpreted as the correlation between two measurements of the same thing.) The mean of this conditional distribution immediately establishes the intraclass correlation as the metric for use in evaluating the acceptability of a measurement system simply because it defines how the mean value for Y becomes less and less dependent upon the value for X as the intraclass correlation drops.

The conditional distribution of Y given X is the distribution we must use in answering the question, “Is this item likely to be conforming?” Specifically, the probability that the measured item will be conforming may be found by integrating the conditional distribution of Y given X, with respect to Y, between the lower watershed specification limit and the upper watershed specification limit.

The integral described in the previous paragraph is going to treat the Y axis as a continuum. In practice, our X values are discrete, with each value rounded off to a specific measurement increment. We have to make an adjustment for this discrepancy between our discrete measurements and the underlying continuum from which they came. According to general practice, specifications are stated in terms of A to B, where both A and B are acceptable values. Say A = 0.7 and B = 1.2, and our measurements are recorded to the nearest 0.1 unit. Under these conditions the first nonconforming values would be 0.6 and 1.3. Thus, our watershed specification values are 0.65 to 1.25. This is the portion of the continuum that corresponds to the acceptable values of 0.7 to 1.2, as shown in figure 2.

However we define our manufacturing specifications, it should be clear that the most extreme values within them are the ones most likely to represent an item that might be nonconforming. Therefore, we evaluate the different options that follow by looking at the most extreme values for X that fall within the manufacturing specifications. For these values, we evaluate the probability of conforming product using the integral defined above. Of course, these probabilities will vary depending upon the process capabilities. (If that was the point of Ouellette’s article, then he is correct in stating that I did not cover this aspect of the problem in the earlier article. I omitted it for the sake of simplicity.)

If we use the stated specifications as our manufacturing specifications, and if we get an observed value that is either the minimum acceptable value or the maximum acceptable value, and we calculate the probability of conforming product, we will get the curve shown in figure 3 as a function of the process capability.

Thus, without regard for the capability, when you ship using the stated specifications, you can be sure that there is at least a 64-percent chance that the shipped material will be conforming. With a capability of 1.0, this minimum probability goes up to at least 74 percent. With a capability of 2.0, this minimum probability goes up to at least 83 percent. If your customer is happy with these numbers, then guardbanding is not for you. Figure 3 is the basis for saying that the watershed specifications define 64-percent manufacturing specifications.

For those unwilling to live with the risks of figure 3, there is always the option of tightening the specifications by some amount. Most schemes for doing this do not take advantage of the mathematics above, and as a result they end up tightening the specifications too much. In terms of what increment to use in defining different options, I used probable error because it is a function of the standard deviation of measurement error, and it also defines the amount of round-off that is appropriate for the measurements.

Probable Error = 0.675 σ_{e}

First, I considered what would happen if the watershed specifications were tightened by one probable error. Looking at the most extreme observed values within these tightened limits and calculating the probability of conforming product, we get the curve in figure 4.

When the watershed specifications are tightened by one probable error on each end, you will have at least an 85-percent chance of conforming product. With a capability of 1.0, this will go up to at least 91 percent. With a capability of 2.0, this will go up to at least 95 percent. Thus, when your manufacturing specifications are the watershed specifications tightened by one probable error, you will have at least an 85-percent chance of conforming product.

When the watershed specifications are tightened by two probable errors on each end, you will have at least a 96-percent chance of conforming product. With a capability of 1.0, this will go up to at least 97.8 percent. With a capability of 2.0, this will go up to at least 99 percent. This curve is the bottom of the three curves in figure 5. Thus, when your manufacturing specifications are the watershed specifications tightened by two probable errors, you will have at least a 96-percent chance of conforming product.

When the watershed specifications are tightened by three probable errors on each end, you will have at least a 99-percent chance of conforming product. With a capability of 1.0, this will go up to at least 99.6 percent. With a capability of 2.0, this will go up to at least 99.8 percent. This curve is the middle of the three curves in figure 5. Thus, when your manufacturing specifications are the watershed specifications tightened by three probable errors, you will have at least a 99-percent chance of conforming product. Notice that three probable errors will be approximately 2σ_{e}. rather than the more common, but incorrect, 99-percent guardband value of 3σ_{e}.

When the watershed specifications are tightened by four probable errors on each end, you will have at least a 99.9-percent chance of conforming product regardless of your capability. This curve is the top curve in figure 5.

All of these adjustments are much smaller than the traditional values commonly used in guardbanding, which saves the supplier money while providing the protection needed.

As part of operating in the ideal state as a way to guarantee 100-percent conforming product, you must have a measurement system that can give a timely warning of any process excursion. It turns out that the value of the intraclass correlation defines the relationships and therefore provides the appropriate metric for judging the acceptability of a measurement system for a given application.

The intraclass correlation *ρ* defines that proportion of the variation in the measurements that can be attributed to the variation in the product stream. The complement of the intraclass correlation (1 - *ρ*) defines that amount of variation in the measurements that is attributable to the measurement system. The intraclass correlation statistic is commonly computed according to:

The estimated variance of measurement error would be the square of our estimate of σ_{e} from some measurement error study. The estimated variance of product measurements should be obtained from some within-subgroup measure of dispersion using measurements drawn from the product stream. (Global measures of variation should be avoided here.)

An explanation of what the intraclass correlation is and does is given in my book *EMP III: Evaluating the Measurement Process and Using Imperfect Data *(SPC Press, 2006)*. *The following is a synopsis of the results established there, although the expression of some of these results has been updated here.

Any signal of a change in the production process will be attenuated by measurement error. This attenuation is characterized by:

Attenuation of signals from production process = 1 –

The limits on a process behavior chart will be inflated by measurement error. This inflation can be characterized by:

Inflation of process behavior chart limits = – 1

Of course, between the signal attenuation and the inflation of the limits, measurement error will affect the ability of a process behavior chart to detect process changes in a timely manner. The traditional way of characterizing the sensitivity to a signal is to consider the average run length (ARL). (The ARL is the number of subgroups between the point when a signal occurs and the point when it is detected.) Here we look at a process shift equal to three sigma(Y) and consider using the four detection rules of the Western Electric Zone Tests. Figure 6 gives the ARL as a function of the intraclass correlation for different combinations of detection rules. These ARL curves are shown in figure 7.

These characterizations of how measurement error will affect the ability of a process behavior chart to detect process changes allow us to fully characterize the relative utility of a measurement system for a given application. In doing this we end up with four classes of measurement systems.

When the intraclass correlation is between 1.00 and 0.80, you will have a first-class monitor. Here any signals from the production process will be attenuated by less than 10 percent. Using detection rule one, the ARL for detecting a three-sigma shift will be less than 2.6 subgroups (compared to 2.0 subgroups for a perfect measurement system).

When the intraclass correlation is between 0.80 and 0.50, you will have a Second Class Monitor. Here, any signals from the production process will be attenuated by 10 percent to 30 percent. Using detection rule one, the ARL for detecting a three-sigma shift will be less than 5.5 subgroups. Using detection rules one, two, three, and four, the ARL for detecting a three-sigma shift will be less than 2.7 subgroups.

When the intraclass correlation is between 0.50 and 0.20, you will have a Third Class Monitor. Here, any signals from the production process will be attenuated by 30 percent to 55 percent. Using detection rules one, two, three, and four, the ARL for detecting a three-sigma shift will be less than 5.7 subgroups.

When the intraclass correlation is less than 0.20, you will have a Fourth Class Monitor. Here, the measurement system is on the ropes and should only be used in desperation. Signals from the production process are attenuated by more than 55 percent, and the ability to detect process signals rapidly vanishes as measurement error completely dominates the observations.

Steven Ouellette used the measurement system described in my June column as the basis for three examples. That measurement system recorded viscosities to the nearest 10 centistokes (cs). The probable error for a single reading was found to be 37 cs, and the standard deviation for measurement error was 54.4 cs. In these examples, Ouellette assumed that single determinations would be used to characterize each batch of product.

In Ouellette’s first example, he postulated specifications of 2,500 ± 175 cs and a process with a capability of 1.00. In computing his watershed specifications, Ouellette made two mistakes. First, he used 0.1 times the probable error for his adjustment, rather than half of the measurement increment, and then he tightened the specifications rather than widening them.

In defining the capability to be 1.00 he defined the standard deviation for the product measurements to be 58.33 cs. This leads to an intraclass correlation of:

This means that the measurement system in this example is a Fourth Class Monitor. Only 13 percent of the variation in the product measurements actually come from variation in the product stream. Here, the ARL for detecting a three-sigma process shift using rule one is 42 subgroups. With all four rules it is still 8 subgroups.

So, this measurement system will not detect process changes in a timely manner, but can it be used to decide whether to ship product? The specifications are 2,500 ± 175 = 2,325 to 2675. Using these stated specifications will allow you to ship virtually all your batches, but all that you can say for certain about the marginal batches is that they have at least a 64-percent chance of conforming. But doesn’t figure 3 show 74 percent for a capability of 1.0? Yes, it does. But with a Fourth Class Monitor, you are not likely to know when your process changes; hence, the minimum likelihood of 64 percent for the marginal batches.

However, guardbanding your specifications by two probable errors will result in manufacturing specs of 2,394 to 2,606. Here, you will have to blend the marginal batches (3.5 percent from each end), but you can assure your customer that the shipped batches have at least a 96-percent chance of conforming to the stated specifications of 2,500 ± 175.

Thus, depending upon what risks you and your customer are willing to take, this Fourth Class Monitor might still be useful in deciding what batches to ship (which I believe was Ouellette’s point). However, the fact that a Fourth Class Monitor cannot track process changes in a timely manner means that this process could go on walkabout, and you would not know it for quite some time. Here, the guardbanding protects you from the limitations of the weak measurement system.

In Ouellette’s second example, the measurement system is still a Fourth Class Monitor with an intraclass correlation of 0.13. However, the specifications were changed to 2,500 ± 350, which boosts the capability up to 2.0. Guardbanding the specs by four probable errors will give manufacturing specs of 2,293 to 2,707. Virtually all the product will get shipped, and even if the process changes you can still assure your customer that the batches you ship have at least a 99.9-percent chance of conforming. Thus, guardbanding protects the supplier and the customer here despite the inability of the measurement system to track process changes.

In Ouellette’s third example, he postulates specifications of 2,500 ± 88 cs and a process with a capability of 0.50. The measurement system is still a Fourth Class Monitor. Using the stated specifications, you will have 64-percent manufacturing specs, and about 13.5 percent of the batches will be rejected and will have to be blended. About 20 percent of the stuff you end up shipping to your customer will have about one chance in three of being nonconforming. This is not a pretty picture, but at least we can quantify the risks inherent in using a weak measurement system with tight specs.

However, if we used the average of four determinations, rather than using a single determination, we could turn this Fourth Class Monitor into a high-end Second Class Monitor. Here the probable error would be:

Probable error for average of four readings =

and the intraclass correlation would be:

Now the measurement system can track process changes and also help in improving the production process. Guardbanding the specs by one probable error would result in manufacturing specifications of 2,425 to 2,575. This would increase the number of blended batches from 13.5 percent to about 19.5 percent, but now you could assure your customers that the shipped batches would have at least an 88-percent chance of conforming.

All of which reminds me of a conversation with one of my clients, who declared, “We never make any nonconforming product here.”

“Oh, really?” I asked.

“Yeah, if a batch doesn’t qualify, it will always qualify as ‘base fluid’,” the client replied.

“So what does that do for you?” I asked.

“Well, at one point we had a two-year supply of base fluid on hand,” answered the client.

Guardbanding doesn’t solve the problems of bad measurement processes, nor does it make the production process any better. It simply buys a piece of insurance at the point of shipping the product. It can be used with good measurement systems and poor measurement systems. It can be used with processes having small capabilities and also with those having large capabilities. It is a technique for quality assurance, rather than one for quality improvement. Although it is always better to avoid burning the toast, once it is burned it is time to think about how to scrape it.

## Comments

## Manf. Specs

Donald - you did a masterful job explaining this. I read both previous articles, and found this explanation very thorough and complete. Steve made a good point in his comment that your first article could have been misinterpreted by someone without the knowledge base you posses (I almost made the very error counseled against in the latest paper due to my lack of experience).

This article was both helpful and clear, without being overbearing. Thanks for taking time to respond to Steve's critique - it shows your willingness to listen, and shows in part why you have been as successful as you have. The lack of arrogance is refreshing, and the tone of the article was anything but confrontational or defensive; both would have been easy mistakes.

Props to both of you for expanding my knowledge base, and for taking time to explain the intricate thought processes that go into making a decision.

Doug

## Six Sigma Nonsense

Some of my objectives are:

The "Belt" mentality

The mythical 1.5 sigma process shift

Does not address the homogeneity question (predictable vs. unpredictable process)

Obsession with the Normal Distribution

Inflated $ money saved claims

Rich DeRoeck

## LOL

I propose the formation of ASSS - the Anti-Six Sigma Society.

## @Rich

My $0.02: see the Wikipedia entry for Six Sigma. I initiated the "Criticism" section a few years ago and took dead aim at the Belt Machine (I earned my SSBB certification from ASQ and let it lapse after 3 years due to the rigor of retaking the test and the hypocrisy of recerting by attending dinner meetings), pointing out that Toyota, IMHO the greatest company in the history of manufacturing, simply did what was necessary without smoke and mirrors. The entry has since evolved to a less acerbic tone than I used, which is just as well b/c Wikipedia ought not to be a flamefest, but the points are all valid.

## Thank you Don. Both of your

Thank you Don. Both of your articles are very clear, useful and easy to understand.

By contrast, Ouellette’s responses are tainted by emotion “crazy”, “disastrous”, perhaps driven by tall poppy syndrome; are vague; and show a lack of understanding. At least Ouellette calls himself a six sigma "heretic", so I can assume he is not quite as deluded as the masses that blindly follow the six sigma nonsense.

## SS "Nonsense"

I made a habit of being outspoken and iconoclastic until I got slapped down one too many times. Now I realize that all companies have the right to mediocrity and most of them excel at it. Even mighty Toyota has stumbled! It's good to see someone else still has some fight in them. While I agree that SS earns the "nonsense" badge (which is not to say it's complete nonsense), I'm curious, ADB, exactly what about it you find objectionable. Regardless of the name or slogans applied, a properly implemented CI effort with valid goals must deliver results. That's a tautology.

## Hi ADB

Good to see your comments again - you might recall our discussions about the usefulness of "sigma" many moons ago (not much if you recall...) I put that debate to rest (I wish) in a sequence of articles a while ago - I hope you caught them!

I am sorry that my article seemed vague to you - have you read the MSA articles that are linked within it? That is where I lay out the way I do MSA and provides the context for that last article. I don't think I lack understanding about Wheeler's approach, I just don't think that it is very useful or practical except in very restricted circumstances. And I (and others I have talked with and had e-mail from) think that last article could have been misleading if you lacked knowledge about Wheeler's greater body of knowledge. Consider how you might have read that article without such expertise. Not everyone may read it the way I did, but I know a lot of people did. See the comments under my article.

As far as being tainted by emotion, well, that is the gadfly persona, I am afraid. If it gets a person to read about a concept that they need to know and otherwise might have skipped, I'll take the hit in karma. At the least, we can hope that someone understands something more about the decomposition of the as-measured variance into measurement vs. product.

And I have no delusions about Six Sigma, I assure you. I re-train Master Black Belts and Black Belts every day. :-) That doesn't mean that I can't agitate for some meaningful change from within, though....shhhhh don't tell anyone though.

## Watershed Specifications

In Wheelers column "Is the Part in Spec?", there is a section titled Watershed Specifications in which he describes how the manufacturing specifications are adjusted to account for discontinuity as a result of gage resolution. Im not sure I would go so far as to say Wheeler would consider this is equivalent to "widening" the specifications, but the effect is similar in that numerical value range of the Watershed Specifications is indeed larger than that of the Manufacturing Specifications.

I have read Wheelers manuscripts from his reading room and I must say there are some very thorough arguments on how traditional education of gage R&R makes a number of "leaps of faith" - Wheeler does a good job of filling those gaps with sound mathematics, presenting the intraclass correlation coefficient as the correct workhorse for characterizing the measurement system in context of the product being judged.

Side note: I think the QD community benefits when experts such as Wheeler and Oullette spar a bit... here's to both of you for stickin' your necks out for the benefit of the rest of us :)

## Thanks VPSchroder! I (and

Thanks VPSchroder!

I (and I'll be so bold as to speak for Wheeler to say he is too) are scientists at heart, so I hope that I'll be questioning and learning until my grave!

On ICC, you might investigate the history of the P/T ratio before you toss it. It really does answer a different question than the ICC. Nothing wrong with ICC unless you use it as a metric for gauge acceptability.

## Luxury!

Steve apparently has the luxury of seeing lots of processes with Cpks greater than 0.5.; my experience is that "most" companies are still pretty far from having this universal. I have worked with many companies where they had no SPC going at all yet. Just getting them to the threshhold state (in control, some nonconformance) was a challenge.

## Hi Rip!

Heh, well maybe so. In my experience, most companies are right at the borderline - achieving control usually gets them pretty close to Cpk = 1, and then we start wading into the bushes to reduce variability - always the tough part. I'd say it is about 25-75 in terms of majority of variation coming from the "good ol' measurement system" vs. the majority coming from the process.

## Six Sigma Heretic Responds... :)

Hi all, Steve Ouellette, Six Sigma Heretic here.

.

First off, I am glad to see Dr. Wheeler add to his earlier article. If he had included the second and third paragraphs in his first article, I would have had much less concern that our readers would have made the erroneous conclusion that I feared they would.

.

Dr. Wheeler starts off with a strawman argument: that I am arguing with the math. The math is not the issue, what I was arguing against was changing rejection specifications without an understanding of your process. Following the that first article without knowing more than was in the article would have led to, yes, disaster.

.

I will hazard a guess and say that Dr. Wheeler and I would agree that you should tighten the rejection specifications as shown in his original article if:

.

-you have no clue about your process through time (control)

-you have a measurement system with large measurement error as compared to the specification AND an as-measured Cpk lower than 1

-you have a process where there is large liability associated with making an incorrect classification and a measurement system that varies by some "large" amount as compared to the specification OR the process is out of control

.

My position though, is that if any of those pertain at best you use the tightened rejection limits for a very limited amount of time until you fix your process. If these situations do not pertain, you are in control and your as-measured Cpk (which includes measurement error) is greater than or equal to 1, tightening the rejection limits only costs you money for protection against an eventuality that is very unlikely to happen.

.

Heck, even if you have a Cpk of less than 1, with a huge enough %R&R you probably are still not making anything out of spec. The problem is convincing anyone of that...and detecting a real process shift in time to stop it from shipping.

.

In my experience, most processes today are not like the ones described above. Most industries no longer live in the world of Cpks of 0.5. And if you do, get out! :-)

.

Oh, and the ICC used for gauge measurement *acceptability* is bogus. Plain and simple. It is the right answer to the wrong question: "Can my gauge detect out-of-control events on the control chart?" The right question for gauge acceptability is: "Can I use this gauge to properly make conformance decisions while controlling risk?" Which is properly informed by (not decided by) %R&R or P/T ratios. (My previous articles too talked about taking an average of readings in order to reduce measurement error - just be sure your measurement system is in control before doing so!)

.

I think the whole thing hinges on understanding the difference between measurement system capability (comparing measurement error to the spec width, e.g. P/T or %R&R) and measurement system acceptability (can I use the gauge in this application to make conformance decisions), which uses capability as one input.

.

Ironically, I don't think Dr. Wheeler and I disagree on actions to take most of the time. My concern was his first article as-written told people to do something that probably doesn't make sense for them to do, and didn't clarify when the procedure was useful. Hopefully my article provided a service in doing do.

.

Oh, and on my "math errors" - loosening the spec in the presence of gauge error doesn't make any sense in any context, and isn't what the article described, so not sure what Dr. Wheeler is referring to here. Feel free to educate me. Guardbanding still isn't needed in the vast majority of processes I see anyway...

## Article and rebuttal

These are robust academic arguments, but I'd like to see these arguments laid out in the context of a real-world scenario with all the complications that can arise in a plant. I'll leave it to the imagination what those are.

## MSA Scenarios

Hmm, that was part of what I was trying to do by extending the viscosity measurement device and describing some of those scenarios. You might check out my MSA articles below for real examples, though I spend more time talking about the rationale and basis for MSA than in fully fleshing out the scenarios.

.

Letting You In On a Little Secret

http://www.qualitydigest.com/inside/quality-insider-column/letting-littl...

.

The Mystery Measurement Theatre

http://www.qualitydigest.com/inside/quality-insider-column/mystery-measu...

.

Performing a Short-Term MSA Study

http://www.qualitydigest.com/inside/quality-insider-column/performing-sh...

.

Performing a Long-Term MSA Study

http://www.qualitydigest.com/inside/quality-insider-column/performing-lo...

.

Destructive Gauges and Measurement System Analysis

http://www.qualitydigest.com/inside/quality-insider-column/destructive-g...