



© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 09/16/2015
When considering how good a production process is, it’s important to ask, “Can we expect the output to be fully conforming?” An assessment of process capability can answer this. Data are needed, but how many? Is “30” the right number? This article examines these last two questions.
There’s an old joke about statisticians not knowing the difference between 30 and infinity, and figure 1 should shed light on its origin. Degrees of freedom, shown on the x-axis and hereafter referred to as “d.f.,” help to determine how precise, or “solid,” an estimate of standard deviation is, given its estimated uncertainty (the y-axis).1 Figure 1 shows that by the time an estimate of standard deviation is based on 30 d.f., it’s about as precise an estimate as it’s likely to get. (If 30 d.f. aren’t sufficient, getting up to 120 d.f.—a fourfold increase—is necessary to reduce the uncertainty by half.) This is potentially important because an estimate of standard deviation is essential to make an assessment of process capability possible.
Figure 1: Plot of uncertainty in standard deviation against degrees of freedom
Unfortunately, statistical language about d.f. is often met with blank looks and exclusion from the team intent on assessing process capability. Instead, ask, “How many data?” because this will lead to the more important question, “What’s the relationship between the number of data collected, d.f., and estimate uncertainty?”
The classical, global standard deviation statistic (STDEV in Microsoft Excel) applied to all the data in “one go” has d.f. = total number of values minus one. With 30 data, there are 30 – 1 = 29 d.f., and the uncertainty is 13.1 percent. However, this is not the method to use when working with capability.
For example, let’s consider the case of individual values, which correctly starts with an assessment of capability using a process behavior chart (i.e., an XmR control chart for individual values). In an estimate of standard deviation from 30 data, there are 18 d.f., not 29, corresponding to an uncertainty of 16.7 percent. This estimate of standard deviation—hereafter called “SDwithin”—comes from the average moving range method (where SDwithin = “Average moving range” ÷ 1.128). Figure 2 renders this “detail” into a picture.
Figure 2: Reproduction of figure 1, but with the uncertainties and number of data associated with 18 and 29 degrees of freedom based on the average moving range method
While 18 d.f. does give a little more uncertainty than 29 d.f., figure 2 shows that 18 d.f. is a reasonable number—in part because the curve is on its way to plateauing. Hence, with about 15 or more d.f., an estimate of standard deviation has started to “solidify,” which lends credibility to the case for 30 data in a capability assessment (even if the original case may have been based on 30 d.f., implying 48–49 data and not ~30, when using an XmR chart).
Yes, a lot more. Consider these two simple examples:
1. If you get data easily, quickly, and at low cost, 30 data might be regarded as a minimum; perhaps you’d get to 30 data within a day or less.
2. If you get one value per production run, and the process is only occasionally in operation, then waiting for 30 data might make no business sense.
So, determining how many data are needed requires more than what mathematical theory puts on the table. The four different cases described below will illustrate this further, but first we have to introduce the data at our disposal.
A stream of 100 individual values obtained as a byproduct of an ongoing manufacturing operation will be used. These 100 process data have an average of 9.99 and an estimate of SDwithin of 0.117, based on the average of the 99 moving ranges. The raw data are found in figure 3, with the classical, global standard deviation statistic being 0.110.
Figure 3: 100 values from a predictable process of average 9.99 and SDwithin of 0.117
The data in figure 3 have been organized in six ways, leading to six different charts for individual values (see figure 4). A summary is below, including the effective number of d.f. and estimated uncertainties in each case:2
No. of values |
Values |
Effective d.f. |
Uncertainty in SDwithin |
10 |
1 to 10 |
5.9 |
29.1% |
20 |
1 to 20 |
11.9 |
20.5% |
30 |
1 to 30 |
18.0 |
16.7% |
40 |
1 to 40 |
24.0 |
14.4% |
50 |
1 to 50 |
30.1 |
12.9% |
100 |
1 to 100 |
60.3 |
9.1% |
All six charts in figure 4 allow for a characterization of process behavior as predictable. This statement is valid, even though the uncertainty in SDwithin—and therefore the 3-sigma limits—is about three times greater with 10, compared to 100 values (29.1 vs. 9.1%, respectively).
What of a chart from just 10 data (the first chart in figure 4)? Ten values are not a lot, but if that is all you have, or all you can get, put them on a chart. A characterization of process behavior as predictable based on 10 data is viable, although the same conclusion with, for example, 20 or 30 values would be somewhat more solid.
Having characterized process behavior as predictable for these data, the computed capability statistics can be interpreted as being well-defined (in the sense that they are reliable indicators of what the process could be expected to do in the future). A minimum capability requirement for Cp and Cpk of 1.3 has been fixed.
Figure 4: Six different X charts for individual values with the data organized in six ways: 1) values 1 to 10; 2) values 1 to 20; 3) values 1 to 30; 4) values 1 to 40; 5) values 1 to 50; and 6) all values 1 to 100
The context:
• Specifications are 9.5 to 10.5
• Approximately 5 data values are obtained per production run (based on rational sampling considerations3)
• The production process operates five days per week on average, so ~25 values per week are obtainable
• The cost of measurement is inexpensive
To complement the process behavior charts seen in figure 4, histograms are shown in figure 5, and the computed capability statistics are shown in figure 6.
Figure 5: Six different histograms, with the specifications of 9.5 (LSL) and 10.5 (USL) included, for the data in figure 4
Figure 6: Process capability statistics Cp and Cpk for the six different organizations of the 100 data. Columns four and five include the lower and upper confidence limits for Cpk.4
Given the context of these data, along with a thoughtful interpretation of figures 4, 5, and 6, how many data are needed to reach a “sound” conclusion about capability?
A judgment of 30 data, or approximately so, seems reasonable. Making do with just 10 or 20 values seems hard to justify. This is not because Cpk is suddenly > 1.3 with 30 or more data, as seen in figure 6, but because data are relatively easy to obtain, and the element of consistency from one production run to the next is well-assessed using roughly 30 data (30 data would be expected to cover six different production runs).
What about using all 100 data values? Aren’t more data better because they would give the lowest possible uncertainty in SDwithin? Not necessarily. Using fewer data (30 in this case) to arrive at a positive conclusion about capability means that monitoring the production process for continual process improvement could start earlier. By extrapolating the natural process limits—i.e., the 3-sigma limits on the chart for individual values—and charting new data as they come along, it would be possible to identify and deal with assignable causes of excessive variation. A manufacturer might prefer to use these extra data for improvement purposes, or for developing a capability statistic by reducing the uncertainty in it.
Finally, the confidence limits for Cpk are included in figure 6 to help confirm that, when using statistics, we live in an uncertain world. It’s left to the reader to contemplate how useful they are in reaching a good business decision.
The context:
• Tighter specifications of 9.7 to 10.3 are now in place, the only difference compared to the data in case one.
• Compared to the statistics found in figure 6, the capability values are now smaller, as seen in figure 7. With a minimum capability requirement of 1.3, things look pretty bleak.
Figure 7: Process capability statistics based on the tighter product specifications of 9.7 to 10.3
Here, wouldn’t 10 data, and certainly no more than 20, be sufficient to reach a sound decision? The histogram for data values 1 to 20 is shown in figure 8; the location and width of the natural process limits (of 9.6 and 10.35) relative to the specifications (of 9.7 to 10.3) help to visualize why the capability statistics are lower than 1. This state of affairs provides pretty good justification for giving the process improvement team a call. How relevant is figure 1 for coming to this conclusion? We’ve also concluded that the process isn’t doing the job we need it to do, even though all measurements are in specification.
Figure 8: Histogram of data values 1 to 20, with the product specifications and the lower and upper natural process limits included
The context:
• Same data as in cases one and two
• Specifications are now 8 to 12
• The cost of measurement is expensive.
• One data value per production run is the way to capture the process’s routine variation.
• The production process is in operation once every two to four weeks.
• After seven months 13 data values have been obtained (the data values 1 to 13 seen in figure 3—see column “Observation number”).
• Those “in the know” think that the process has been operating correctly and consistently during the last seven months, and this is expected to continue in the future.
Figure 9 shows the X chart for these 13 individual values. This chart, along with the context above, supports a characterization of process behavior as predictable.
Figure 9: Chart for individual values for data values 1 to 13 found in figure 3
The histogram, with specifications and natural process limits included, is found in figure 10. Unlike case two, the location and width of the natural process limits relative to the specifications help to visualize why the capability statistics are much bigger than 1 (the capability statistics follow figure 10).
Figure 10: Histogram of the 13 individual values found in figure 9 with the specification limits of 8 to 12 also included
For these 13 data, the capability statistics are:
• Cp: 5.04
• Cpk: 4.96
• Cpk lower confidence interval: 2.97
• Cpk upper confidence interval: 6.95
Given these data and their context, would you be prepared to say that the process is capable, or would you keep your audience waiting at least six more months before offering an opinion? If you’re inclined to conclude yes (i.e., a positive conclusion on capability):
• How important was figure 1 in reaching this decision?
• How much influence did the Cpk confidence intervals have in your decision?
• Would you have trouble convincing your colleagues that 13 values might be insufficient in other situations?
The context of the situation (no specific data referred to):
• From one standard production run, the collection of about 20 data (individual values) captures the process’s routine variation.
• Obtaining samples from the line and the cost of measurement aren’t a constraint.
• The time between production runs is short.
Would you prefer to assess capability with data from just one, or at least two, production runs, knowing that the advantage of collecting data from at least two different production runs is to see how the process behaves both within and between production runs? Getting data from at least two production runs would, in most if not all cases, make good business sense. This case suggests that anything from 40 data upwards is in order. (This is the context used to give the comment in the summary table below.)
Conversely, if the time interval between production runs was long (e.g., one or two months between production runs), the business case to observe between production-run effects in the capability assessment would be much weaker. Figure 1 doesn’t know this context.
So, determining whether data from two or more production runs are appropriate would come through context and process knowledge.
Ultimately, the decision to be made is whether action to improve the process is needed or not. An assessment of capability is merely an aid in reaching the right decision.
The four cases discussed here can be summarized as follows:
Case |
Appropriate no. of data |
Decision on capability |
Looking forward |
1 |
In the order of 30 |
Positive |
Chart new data; investigate and act on assignable causes |
2 |
10 |
Negative |
Improve the process |
3 |
13 |
Positive |
Chart new data; investigate and act on assignable causes |
4 |
40 or more |
(No data presented to decide on capability) |
The two starting questions were: “How many data are the “right number?” and “Is it 30?” Answers are offered based on the “context approach” to the problem at hand.
1. Lack of context approach: Mathematical theory as represented in figure 1 can be used to make an argument for 30 data.
2. Context approach: Cases one through four have shown that, if the context of the problem is both known and understood, the “right number of data” will vary: sometimes it will be 30, sometimes considerably less than 30, and sometimes quite a bit more than 30.
So, even though our colleagues might remain frustrated, probably the best answer to the question, “How many data should we use to assess capability?” is, “It depends.”
References:
1. Wheeler, D. J. Advanced Topics in Statistical Process Control, Second Edition (SPC Press, 2004). Pages 80–83 and 180–186 provide all key details on degrees of freedom and estimate uncertainty related to this article. For those without the book, consult Wheeler’s article, “How Much Data Do I Need?” and/or “Process Behavior Charts for Non-Normal Data, Part 1” for some details.
2. The effective number of d.f. values come from table 23 (p. 446) of Wheeler’s Advanced Topics in Statistical Process Control.
3. Some recommended reading for determining the frequency of sampling:
Donald J. Wheeler’s two recent Quality Digest Daily articles: “Rational Subgrouping” and “Rational Sampling.”
Nelson, L. S. “Control Charts: Rational Subgroups and Effective Applications,” Journal of Quality Technology, vol. 20, No. 1, pp. 73–75.
Palm, A. C. “Some Aspects of Sampling for Control Charts,” ASQ Statistics Division Newsletter, summer 1992, pp. 20–23.
4. Confidence intervals for Cpk are obtained from Minitab (formulas can be found in Help/Methods and Formulas/Quality and process improvement/Process capability/Process capability (Normal)/Confidence intervals and bounds Cpk).
Links:
[1] http://www.amazon.com/Advanced-Topics-Statistical-Process-Control/dp/0945320639
[2] http://www.qualitydigest.com/june97/html/spctool.html
[3] http://www.qualitydigest.com/inside/quality-insider-column/process-behavior-charts-non-normal-data-part-1.html
[5] http://www.qualitydigest.com/inside/quality-insider-column/060115-rational-subgrouping.html
[6] http://www.qualitydigest.com/inside/quality-insider-column/070115-rational-sampling.html
[7] http://asq.org/qic/display-item/?item=5605