Our PROMISE: Our ads will never cover up content.
Our children thank you.
Published: Wednesday, September 16, 2015 - 15:36
When considering how good a production process is, it’s important to ask, “Can we expect the output to be fully conforming?” An assessment of process capability can answer this. Data are needed, but how many? Is “30” the right number? This article examines these last two questions.
There’s an old joke about statisticians not knowing the difference between 30 and infinity, and figure 1 should shed light on its origin. Degrees of freedom, shown on the x-axis and hereafter referred to as “d.f.,” help to determine how precise, or “solid,” an estimate of standard deviation is, given its estimated uncertainty (the y-axis).^{1} Figure 1 shows that by the time an estimate of standard deviation is based on 30 d.f., it’s about as precise an estimate as it’s likely to get. (If 30 d.f. aren’t sufficient, getting up to 120 d.f.—a fourfold increase—is necessary to reduce the uncertainty by half.) This is potentially important because an estimate of standard deviation is essential to make an assessment of process capability possible.
Figure 1
Unfortunately, statistical language about d.f. is often met with blank looks and exclusion from the team intent on assessing process capability. Instead, ask, “How many data?” because this will lead to the more important question, “What’s the relationship between the number of data collected, d.f., and estimate uncertainty?”
The classical, global standard deviation statistic (STDEV in Microsoft Excel) applied to all the data in “one go” has d.f. = total number of values minus one. With 30 data, there are 30 – 1 = 29 d.f., and the uncertainty is 13.1 percent. However, this is not the method to use when working with capability.
For example, let’s consider the case of individual values, which correctly starts with an assessment of capability using a process behavior chart (i.e., an XmR control chart for individual values). In an estimate of standard deviation from 30 data, there are 18 d.f., not 29, corresponding to an uncertainty of 16.7 percent. This estimate of standard deviation—hereafter called “SD_{within}”—comes from the average moving range method (where SD_{within} = “Average moving range” ÷ 1.128). Figure 2 renders this “detail” into a picture.
Figure 2
While 18 d.f. does give a little more uncertainty than 29 d.f., figure 2 shows that 18 d.f. is a reasonable number—in part because the curve is on its way to plateauing. Hence, with about 15 or more d.f., an estimate of standard deviation has started to “solidify,” which lends credibility to the case for 30 data in a capability assessment (even if the original case may have been based on 30 d.f., implying 48–49 data and not ~30, when using an XmR chart).
Yes, a lot more. Consider these two simple examples:
1. If you get data easily, quickly, and at low cost, 30 data might be regarded as a minimum; perhaps you’d get to 30 data within a day or less.
2. If you get one value per production run, and the process is only occasionally in operation, then waiting for 30 data might make no business sense.
So, determining how many data are needed requires more than what mathematical theory puts on the table. The four different cases described below will illustrate this further, but first we have to introduce the data at our disposal.
A stream of 100 individual values obtained as a byproduct of an ongoing manufacturing operation will be used. These 100 process data have an average of 9.99 and an estimate of SD_{within} of 0.117, based on the average of the 99 moving ranges. The raw data are found in figure 3, with the classical, global standard deviation statistic being 0.110.
Figure 3 _{within}
The data in figure 3 have been organized in six ways, leading to six different charts for individual values (see figure 4). A summary is below, including the effective number of d.f. and estimated uncertainties in each case:^{2}
No. of values |
Values |
Effective d.f. |
Uncertainty in SD_{within} |
10 |
1 to 10 |
5.9 |
29.1% |
20 |
1 to 20 |
11.9 |
20.5% |
30 |
1 to 30 |
18.0 |
16.7% |
40 |
1 to 40 |
24.0 |
14.4% |
50 |
1 to 50 |
30.1 |
12.9% |
100 |
1 to 100 |
60.3 |
9.1% |
All six charts in figure 4 allow for a characterization of process behavior as predictable. This statement is valid, even though the uncertainty in SD_{within—}and therefore the 3-sigma limits—is about three times greater with 10, compared to 100 values (29.1 vs. 9.1%, respectively).
What of a chart from just 10 data (the first chart in figure 4)? Ten values are not a lot, but if that is all you have, or all you can get, put them on a chart. A characterization of process behavior as predictable based on 10 data is viable, although the same conclusion with, for example, 20 or 30 values would be somewhat more solid.
Having characterized process behavior as predictable for these data, the computed capability statistics can be interpreted as being well-defined (in the sense that they are reliable indicators of what the process could be expected to do in the future). A minimum capability requirement for Cp and Cpk of 1.3 has been fixed.
Figure 4
The context:
• Specifications are 9.5 to 10.5
• Approximately 5 data values are obtained per production run (based on rational sampling considerations^{3})
• The production process operates five days per week on average, so ~25 values per week are obtainable
• The cost of measurement is inexpensive
To complement the process behavior charts seen in figure 4, histograms are shown in figure 5, and the computed capability statistics are shown in figure 6.
Figure 5
Figure 6
Given the context of these data, along with a thoughtful interpretation of figures 4, 5, and 6, how many data are needed to reach a “sound” conclusion about capability?
A judgment of 30 data, or approximately so, seems reasonable. Making do with just 10 or 20 values seems hard to justify. This is not because Cpk is suddenly > 1.3 with 30 or more data, as seen in figure 6, but because data are relatively easy to obtain, and the element of consistency from one production run to the next is well-assessed using roughly 30 data (30 data would be expected to cover six different production runs).
What about using all 100 data values? Aren’t more data better because they would give the lowest possible uncertainty in SD_{within}? Not necessarily. Using fewer data (30 in this case) to arrive at a positive conclusion about capability means that monitoring the production process for continual process improvement could start earlier. By extrapolating the natural process limits—i.e., the 3-sigma limits on the chart for individual values—and charting new data as they come along, it would be possible to identify and deal with assignable causes of excessive variation. A manufacturer might prefer to use these extra data for improvement purposes, or for developing a capability statistic by reducing the uncertainty in it.
Finally, the confidence limits for Cpk are included in figure 6 to help confirm that, when using statistics, we live in an uncertain world. It’s left to the reader to contemplate how useful they are in reaching a good business decision.
The context:
• Tighter specifications of 9.7 to 10.3 are now in place, the only difference compared to the data in case one.
• Compared to the statistics found in figure 6, the capability values are now smaller, as seen in figure 7. With a minimum capability requirement of 1.3, things look pretty bleak.
Figure 7
Here, wouldn’t 10 data, and certainly no more than 20, be sufficient to reach a sound decision? The histogram for data values 1 to 20 is shown in figure 8; the location and width of the natural process limits (of 9.6 and 10.35) relative to the specifications (of 9.7 to 10.3) help to visualize why the capability statistics are lower than 1. This state of affairs provides pretty good justification for giving the process improvement team a call. How relevant is figure 1 for coming to this conclusion? We’ve also concluded that the process isn’t doing the job we need it to do, even though all measurements are in specification.
Figure 8
The context:
• Same data as in cases one and two
• Specifications are now 8 to 12
• The cost of measurement is expensive.
• One data value per production run is the way to capture the process’s routine variation.
• The production process is in operation once every two to four weeks.
• After seven months 13 data values have been obtained (the data values 1 to 13 seen in figure 3—see column “Observation number”).
• Those “in the know” think that the process has been operating correctly and consistently during the last seven months, and this is expected to continue in the future.
Figure 9 shows the X chart for these 13 individual values. This chart, along with the context above, supports a characterization of process behavior as predictable.
Figure 9
The histogram, with specifications and natural process limits included, is found in figure 10. Unlike case two, the location and width of the natural process limits relative to the specifications help to visualize why the capability statistics are much bigger than 1 (the capability statistics follow figure 10).
Figure 10
For these 13 data, the capability statistics are:
• Cp: 5.04
• Cpk: 4.96
• Cpk lower confidence interval: 2.97
• Cpk upper confidence interval: 6.95
Given these data and their context, would you be prepared to say that the process is capable, or would you keep your audience waiting at least six more months before offering an opinion? If you’re inclined to conclude yes (i.e., a positive conclusion on capability):
• How important was figure 1 in reaching this decision?
• How much influence did the Cpk confidence intervals have in your decision?
• Would you have trouble convincing your colleagues that 13 values might be insufficient in other situations?
The context of the situation (no specific data referred to):
• From one standard production run, the collection of about 20 data (individual values) captures the process’s routine variation.
• Obtaining samples from the line and the cost of measurement aren’t a constraint.
• The time between production runs is short.
Would you prefer to assess capability with data from just one, or at least two, production runs, knowing that the advantage of collecting data from at least two different production runs is to see how the process behaves both within and between production runs? Getting data from at least two production runs would, in most if not all cases, make good business sense. This case suggests that anything from 40 data upwards is in order. (This is the context used to give the comment in the summary table below.)
Conversely, if the time interval between production runs was long (e.g., one or two months between production runs), the business case to observe between production-run effects in the capability assessment would be much weaker. Figure 1 doesn’t know this context.
So, determining whether data from two or more production runs are appropriate would come through context and process knowledge.
Ultimately, the decision to be made is whether action to improve the process is needed or not. An assessment of capability is merely an aid in reaching the right decision.
The four cases discussed here can be summarized as follows:
Case |
Appropriate no. of data |
Decision on capability |
Looking forward |
1 |
In the order of 30 |
Positive |
Chart new data; investigate and act on assignable causes |
2 |
10 |
Negative |
Improve the process |
3 |
13 |
Positive |
Chart new data; investigate and act on assignable causes |
4 |
40 or more |
(No data presented to decide on capability) |
The two starting questions were: “How many data are the “right number?” and “Is it 30?” Answers are offered based on the “context approach” to the problem at hand.
1. Lack of context approach: Mathematical theory as represented in figure 1 can be used to make an argument for 30 data.
2. Context approach: Cases one through four have shown that, if the context of the problem is both known and understood, the “right number of data” will vary: sometimes it will be 30, sometimes considerably less than 30, and sometimes quite a bit more than 30.
So, even though our colleagues might remain frustrated, probably the best answer to the question, “How many data should we use to assess capability?” is, “It depends.”
References:
1. Wheeler, D. J. Advanced Topics in Statistical Process Control, Second Edition (SPC Press, 2004). Pages 80–83 and 180–186 provide all key details on degrees of freedom and estimate uncertainty related to this article. For those without the book, consult Wheeler’s article, “How Much Data Do I Need?” and/or “Process Behavior Charts for Non-Normal Data, Part 1” for some details.
2. The effective number of d.f. values come from table 23 (p. 446) of Wheeler’s Advanced Topics in Statistical Process Control.
3. Some recommended reading for determining the frequency of sampling:
Donald J. Wheeler’s two recent Quality Digest Daily articles: “Rational Subgrouping” and “Rational Sampling.”
Nelson, L. S. “Control Charts: Rational Subgroups and Effective Applications,” Journal of Quality Technology, vol. 20, No. 1, pp. 73–75.
Palm, A. C. “Some Aspects of Sampling for Control Charts,” ASQ Statistics Division Newsletter, summer 1992, pp. 20–23.
4. Confidence intervals for Cpk are obtained from Minitab (formulas can be found in Help/Methods and Formulas/Quality and process improvement/Process capability/Process capability (Normal)/Confidence intervals and bounds Cpk).
Comments
Tolerance Intervals and Process Capability
Hello Scott,
I'm a student of Dr. Wheeler's books like you. So your article made sense to me. It's an approach I use when I have the chance.
However, my employer, a medical device company, uses "Tolerance Intervals" to estimate process capability. I'm not strong in that approach.
I'm wondering if the two approaches are linked and if so, how they are linked. Any guidance?
Best regards,
Shrikant Kalegaonkar
Shrikant's point
Hello Shrikant,
Whether we rely on Dr. Wheeler's books or others (e.g. Montgomery) don't we end up with the same way of doing things? Like in ANOVA, a within-subgroup estimator of dispersion is needed to estimate the interval within which process output should fall (if stable/predictable), i.e. the +/- 3-sigma limits. If the voice of the process is well defined, how does it compare to the voice of the customer? Hopefully, favourably... If so, we should find a "good" capability statistic(s).
Scott.
Prerequisite to Capability is Stability
Hi Scott,
Thank you for your fast response. I agree with your point. Where I stumble with the approach to estimating process capability using tolerance intetvals is in the assumptions it makes. I believe it assumes 1] that the sample used for the estimation comes from a stable process and 2] that that stable process has a normal distribution. Am I correct?
The Shewhart charts don't make such assumptions. If I understand Dr. Wheeler correctly, the Shewhart charts check for stability. And only when no signs of instability are seen is it appropriate to estimate the process's capability. Furthermore, the Shewhart charts don't make an assumption of the data's distribution.
In design and development efforts where you're trying to validate a process and estimate its capability, is it appropriate to make the assumptions the former approach makes?
Lastly, is there a sample size advantage of one method over the other? Which one has a lower required sample size to achieve the same level of confidence in the estimate?
Best regards,
Shrikant Kalegaonkar
Process Capability: How many Data
Hello Scott. First, I like to congratulate to you for excellent article. I still need help to understand Effective D.F. and Uncertainty in SD. I am wondering how did you calculate the " Effective D.F. and Uncertainty in SD" in table below. I will appreciate your help. My email address is pete.thakor@resmed.com.
No. of values
Values (in Figure 3)
Effective d.f.
Uncertainty in SDwithin
10
1 to 10
5.9
29.1%
20
1 to 20
11.9
20.5%
30
1 to 30
18.0
16.7%
40
1 to 40
24.0
14.4%
50
1 to 50
30.1
12.9%
100
1 to 100
60.3
9.1%
Process Capability: How many Data
Scott, thank you for the article again. Can you also provide the details on how to calculate the Degrees of Freedom for the Standard Deviation for Withing Subgroups where you got a result of 18 vs. a Degrees of Freedom of 29 for the Overall Standard Deviation. Thank you.
Robert's question
With the global SD having n-1 d.f. you know that d.f. is number of data minus 1, so with 30 data you have 29 d.f.. With the other estimator used in the article, i.e. SD within based on average moving range, it is done differently.
With e.g. 30 data using av. mR method, you end up with an uncertainty effectively the same as that from approx. 19 data using the global SD statistic, leading to the effective d.f. of 18 (uncertainty 16.7%). As said, this should not be used to propose using the global SD as being better than the average mR method for capability applications.
If you want “get a feel” for this, try the following (it soon gets pretty tedious!):
- In Excel use e.g. =NORMINV(RAND();0;1) and generate a few thousand observations in one column
- Arrange these into sets as per the size of each set you want (keeping constant)
- Suppose you want to use 30 data in each set, observations 1-30 are “set 1”, 31-60 are “set 2”, 61-90 are “set 3” and so on until the end
- For each set of 30 values find the 29 moving ranges, take the average, divide by 1.128 to get the SDwithin estimate
- Suppose you generate 90,000 total observations (you could easily do a lot more), you have 3,000 "sets" so 3,000 estimates of SDwithin for your generated data
- Determine the SD of these 3,000 SDwithin statistics
- Determine the average of these 3,000 SDwithin statistics
- Estimate the uncertainty using [(SD of statistic) / (Average of statistic)] x 100
- You should find you get a value in the region of 16.7% (but don’t expect exactly 16.7%)
The uncertainty is equivalent to the y-axis on Figure 1 in the article.
If you really need/want detail, I propose to consult the book I cited. Finally, the estimated uncertainty applies to the estimate of SDwithin and its multiple of 3 to generate the upper and lower limits on the X chart.
Process Capability: How many Data
Scott, thank you for the article again. Can you also provide the details on how to calculate the Degrees of Freedom for the Standard Deviation for Withing Subgroups where you got a result of 18 vs. a Degrees of Freedom of 29 for the Overall Standard Deviation. Thank you.