Process Capability: How Many Data?

Is 30 the ‘right number?’

When considering how good a production process is, it’s important to ask, “Can we expect the output to be fully conforming?” An assessment of process capability can answer this. Data are needed, but how many? Is “30” the right number? This article examines these last two questions.

First, why 30?

There’s an old joke about statisticians not knowing the difference between 30 and infinity, and figure 1 should shed light on its origin. Degrees of freedom, shown on the x-axis and hereafter referred to as “d.f.,” help to determine how precise, or “solid,” an estimate of standard deviation is, given its estimated uncertainty (the y-axis).¹ Figure 1 shows that by the time an estimate of standard deviation is based on 30 d.f., it’s about as precise an estimate as it’s likely to get. (If 30 d.f. aren’t sufficient, getting up to 120 d.f.—a fourfold increase—is necessary to reduce the uncertainty by half.) This is potentially important because an estimate of standard deviation is essential to make an assessment of process capability possible.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Process Capability: How many Data

Hello Scott. First, I like to congratulate to you for excellent article. I still need help to understand Effective D.F. and Uncertainty in SD. I am wondering how did you calculate the " Effective D.F. and Uncertainty in SD" in table below. I will appreciate your help. My email address is pete.thakor@resmed.com.

No. of values

Values (in Figure 3)

Effective d.f.

Uncertainty in SDwithin

1 to 10

5.9

29.1%

1 to 20

11.9

20.5%

1 to 30

18.0

16.7%

1 to 40

24.0

14.4%

1 to 50

30.1

12.9%

100

1 to 100

60.3

9.1%

Process Capability: How many Data

Scott, thank you for the article again. Can you also provide the details on how to calculate the Degrees of Freedom for the Standard Deviation for Withing Subgroups where you got a result of 18 vs. a Degrees of Freedom of 29 for the Overall Standard Deviation. Thank you.

Process Capability: How many Data

Robert's question

With the global SD having n-1 d.f. you know that d.f. is number of data minus 1, so with 30 data you have 29 d.f.. With the other estimator used in the article, i.e. SD within based on average moving range, it is done differently.

With e.g. 30 data using av. mR method, you end up with an uncertainty effectively the same as that from approx. 19 data using the global SD statistic, leading to the effective d.f. of 18 (uncertainty 16.7%). As said, this should not be used to propose using the global SD as being better than the average mR method for capability applications.

If you want “get a feel” for this, try the following (it soon gets pretty tedious!):

- In Excel use e.g. =NORMINV(RAND();0;1) and generate a few thousand observations in one column

- Arrange these into sets as per the size of each set you want (keeping constant)

- Suppose you want to use 30 data in each set, observations 1-30 are “set 1”, 31-60 are “set 2”, 61-90 are “set 3” and so on until the end

- For each set of 30 values find the 29 moving ranges, take the average, divide by 1.128 to get the SDwithin estimate

- Suppose you generate 90,000 total observations (you could easily do a lot more), you have 3,000 "sets" so 3,000 estimates of SDwithin for your generated data

- Determine the SD of these 3,000 SDwithin statistics

- Determine the average of these 3,000 SDwithin statistics

- Estimate the uncertainty using [(SD of statistic) / (Average of statistic)] x 100

- You should find you get a value in the region of 16.7% (but don’t expect exactly 16.7%)

The uncertainty is equivalent to the y-axis on Figure 1 in the article.

If you really need/want detail, I propose to consult the book I cited. Finally, the estimated uncertainty applies to the estimate of SDwithin and its multiple of 3 to generate the upper and lower limits on the X chart.

Tolerance Intervals and Process Capability

Hello Scott,

I'm a student of Dr. Wheeler's books like you. So your article made sense to me. It's an approach I use when I have the chance.

However, my employer, a medical device company, uses "Tolerance Intervals" to estimate process capability. I'm not strong in that approach.

I'm wondering if the two approaches are linked and if so, how they are linked. Any guidance?

Best regards,

Shrikant Kalegaonkar

Shrikant's point

Hello Shrikant,

Whether we rely on Dr. Wheeler's books or others (e.g. Montgomery) don't we end up with the same way of doing things? Like in ANOVA, a within-subgroup estimator of dispersion is needed to estimate the interval within which process output should fall (if stable/predictable), i.e. the +/- 3-sigma limits. If the voice of the process is well defined, how does it compare to the voice of the customer? Hopefully, favourably... If so, we should find a "good" capability statistic(s).

Scott.

Prerequisite to Capability is Stability

Hi Scott,

Thank you for your fast response. I agree with your point. Where I stumble with the approach to estimating process capability using tolerance intetvals is in the assumptions it makes. I believe it assumes 1] that the sample used for the estimation comes from a stable process and 2] that that stable process has a normal distribution. Am I correct?

The Shewhart charts don't make such assumptions. If I understand Dr. Wheeler correctly, the Shewhart charts check for stability. And only when no signs of instability are seen is it appropriate to estimate the process's capability. Furthermore, the Shewhart charts don't make an assumption of the data's distribution.

In design and development efforts where you're trying to validate a process and estimate its capability, is it appropriate to make the assumptions the former approach makes?

Lastly, is there a sample size advantage of one method over the other? Which one has a lower required sample size to achieve the same level of confidence in the estimate?

Best regards,

Shrikant Kalegaonkar

Process Capability: How Many Data?

Is 30 the ‘right number?’

Social Sharing block

First, why 30?

Comments

Process Capability: How many Data

Process Capability: How many Data

Process Capability: How many Data

Robert's question

Tolerance Intervals and Process Capability

Shrikant's point

Prerequisite to Capability is Stability

Add new comment