{domain:"www.qualitydigest.com",server:"169.47.211.87"} Skip to main content

        
User account menu
Main navigation
  • Topics
    • Customer Care
    • Regulated Industries
    • Research & Tech
    • Quality Improvement Tools
    • People Management
    • Metrology
    • Manufacturing
    • Roadshow
    • QMS & Standards
    • Statistical Methods
    • Resource Management
  • Videos/Webinars
    • All videos
    • Product Demos
    • Webinars
  • Advertise
    • Advertise
    • Submit B2B Press Release
    • Write for us
  • Metrology Hub
  • Training
  • Subscribe
  • Log in
Mobile Menu
  • Home
  • Topics
    • Customer Care
    • Regulated Industries
    • Research & Tech
    • Quality Improvement Tools
    • People Management
    • Metrology
    • Manufacturing
    • Roadshow
    • QMS & Standards
    • Statistical Methods
    • Supply Chain
    • Resource Management
  • Login / Subscribe
  • More...
    • All Features
    • All News
    • All Videos
    • Training

Capability and the Pooled Variance Statistic

Pooled variance doesn’t always give optimum results in every technique

abigail low / Unsplash

Donald J. Wheeler
Bio

SPC Press

Thu, 01/22/2026 - 12:02
  • Comment
  • RSS

Social Sharing block

  • Print
Body

Performance indexes use the global standard deviation statistic to describe the past. Capability indexes use a within-subgroup measure of dispersion to characterize the process potential. However, some within-subgroup measures are better than others. This article will explain why you should not use the square root of the pooled variance statistic to compute either capability indexes or limits on process behavior charts.

ADVERTISEMENT

The pooled variance statistic

The purpose of statistical analysis is to detect signals by first filtering out the noise. One of the major statistical advances of the 20th century was the use of the within-subgroup variation as the filter. In the analysis of variance (ANOVA), this is done by computing the pooled variance: A variance statistic for each treatment (subgroup) is found, and these values are averaged. In ANOVA, this pooled variance, also known as the mean square within (MSW), is the denominator of the F-ratio.

The numerator of the F-ratio is the average of the sum of squared deviations of the treatment averages from the grand average. These deviations are the potential signals. Thus, the F-ratio is a signal-to-noise ratio that compares the squared signals with the pooled variance. This F-ratio has proven to be a robust way to detect differences between subgroup averages. It works over a broad range of conditions.

Thus, the pooled variance statistic has a central role in ANOVA. It is an unbiased estimator of the variance due to noise, it is robust, and it has the maximum number of degrees of freedom (i.e., it uses all of the within-subgroup data). In various ANOVA-related techniques, such a Tukey’s post-hoc test, when we need an estimate of the standard deviation we use the square root of the pooled variance (more commonly known as the root mean square within, or RMSW).

This use of the RMSW is natural and convenient with ANOVA-related techniques. However, as an estimator of the standard deviation parameter, the RMSW is biased and nonrobust. This limits its usefulness with techniques that are not related to ANOVA.

Other within-subgroup estimators of the standard deviation exist. These estimators have the advantage of being both unbiased and more robust than the RMSW. Consequently, outside of the structured environment of an experimental study, these other estimators of the standard deviation are preferred over the RMSW.

The following examples illustrate this point.

Example 1

This example uses actual process data that have been coded to preserve confidentiality. Forty observations on one product characteristic were arranged into 10 subgroups of size four. The coded specifications for this characteristic are 0 to 120 units. Figure 1 shows the data, the subgroup averages, the subgroup ranges, and the subgroup standard deviations.


Figure 1: 10 subgroups of size four

We’ll use these data to compare three within-subgroup estimators by computing estimates of SD(X), three sigma limits for the average chart, and the capability indexes. These three estimators will be based on the average range, the average standard deviation, and the root mean square within.

Ranges

Our first unbiased estimator for the process standard deviation is the average range divided by its bias correction factor.

Three-sigma limits for averages are thus:

And the capability indexes are:

All of these estimates are said to have 27.6 degrees of freedom.

Standard deviations

Our second unbiased estimator for the process standard deviation is the average subgroup standard deviation divided by its bias correction factor.

Three-sigma limits for averages are thus:

And the capability indexes are:

All of these estimates are said to have 28 degrees of freedom.

Pooled variances

Our third estimator for the process standard deviation is the root mean square within statistic. Here it may be found by squaring the 10 subgroup standard deviations, averaging them, and finding the square root.

Three-sigma limits for averages are thus:

And the capability indexes are:

All of these estimates are said to have 30 degrees of freedom.

To compare these three estimators, we begin by placing all three sets of limits on the average chart in Figure 2. There we see that there is no practical difference between the limits. They all tell the same story about this process.


Figure 2: Average chart for Example 1

To compare the three sets of capability indexes, we plot the 90% interval estimates (the 90% confidence intervals) in Figure 3. The inherent uncertainties of these point estimates are shown by these 90% interval estimates. These uncertainties overwhelm the differences between the point estimates. So while the different within-subgroup statistics yield different capability index numbers, these differences are artifacts of the computations rather than real differences in the capability of the process.

 

Figure 3: 90% interval estimates for capability for Example 1

Here we see that that the three capability ratios are equivalent and the three centered capability ratios are equivalent. So where is the advantage in using one of these estimators over another? The equivalence above occurs only when the subgroups all display a similar amount of variation. Figure 4 shows the range and standard deviation charts for Example 1. The formulas for the upper limits are:

Figure 4: Range and standard deviation charts for Example 1

These charts and limits tell the same story. With no points outside the limits in Figure 4, we judge the within-subgroup variation to be consistent throughout this part of the production run. Here, the three estimators of the standard deviation parameter are equivalent.

All within-subgroup estimators are robust to differences between the subgroup averages. The question of robustness that concerns us here is what happens when different subgroups have different amounts of variation. Example 2 will answer this question.

Example 2

One additional subgroup from the production process above is added in Figure 5. Using these 11 subgroups of size four, we repeat the computations done above.


Figure 5: 11 subgroups of size four

The extreme value in the last subgroup inflates the average range to give the following results:

Three-sigma limits for averages are thus:

And the capability indexes become:

All of these estimates are said to have 30.3 degrees of freedom.

Likewise, the extreme value in the last subgroup inflates the average standard deviation to give the following results:

Three-sigma limits for averages are thus:

And the capability indexes become:

All of these estimates have 30.8 degrees of freedom.

However, the extreme value in the last subgroup hyperinflates the root mean square within to give the following results:

Three-sigma limits for averages are thus:

And the capabilities indexes become:

All of these estimates are said to have 33 degrees of freedom.

To compare these three estimators, we place the three sets of limits on the average chart in Figure 6. The limits shown in gray are the original limits. The limits based on the average range and the average standard deviation are inflated by about 34%, yet they still show the last average as being outside the limits. Figure 6: Average chart for Example 2

But the limits based on the root mean square within are inflated by 76%. This is enough to hide the signal contained in the last subgroup. This lack of robustness is why the pooled variance should not be used with a process behavior chart. A single unusual value out of 44 values inflates the limits and hides the signal that we want to detect.

Figure 7 shows how the capability indexes changed with the addition of Subgroup 11. While the capability ratios based on the average range and the average standard deviation dropped by about 26%, the ratio based on the RMSW dropped 43%.


Figure 7: The effects of Subgroup 11 on capabilities

The centered capability ratios based on the average range and the average standard deviation dropped about 20%, while those based on the RMSW dropped 38%.

This greater sensitivity to extreme values makes the root mean square within less robust than the other within-subgroup measures of dispersion. This lack of robustness is inherent in the computation itself. By squaring the standard deviation statistics prior to averaging them, we give extra weight to extreme values, which can bias the result. Although this bias is of slight consequence in the F-ratio where we are making comparisons with the squared signals, it becomes problematic when estimating the standard deviation. This is why we should not use the root mean square within with SPC techniques.

A point of confusion

When we want to know the standard deviation parameter for the sum of two independent random variables, SD(X+Y), we have to add the variance of X to the variance of Y and find the square root:

This is required because variance is essentially rotational inertia, while the standard deviation is the radius of gyration, and only rotational inertia is additive. This means that we can’t add the SD(X) parameter and the SD(Y) parameter together without violating both the Pythagorean theorem and the laws of physics.

However, when we have multiple statistics that are all independent estimates of the same parameter, such as several subgroup ranges or subgroup standard deviations, we may average these statistics together without violating any principles of statistics, mathematics, or physics. This is why the pooled variance statistic is no more “correct” than the average range or the average standard deviation statistic.

Summary

Statisticians are well acquainted with the pooled variance statistic because of its role in ANOVA. In an experimental study, the pooled variance statistic provides an unbiased estimator of the background variance. Moreover, it has the maximum number of degrees of freedom. This is why, in the context of ANOVA, we can do no better than to use the pooled variance.

However, the pooled variance doesn’t always give optimum results in every technique. While the F-test for means is robust, the F-test for equality of variances is not robust. Likewise, when estimating the standard deviation parameter, the square root of the pooled variance is biased and nonrobust. Other within-subgroup estimators are unbiased and more robust. This makes the root mean square within a suboptimal choice for estimating the process standard deviation.

Nevertheless, because of its role in ANOVA, some software packages erroneously default to using the RMSW with SPC computations. Others mistakenly include the RMSW as an optional computation. Do not be misled by this. While computing capabilities and limits using the root mean square within is not completely wrong, it is definitely suboptimal, nonrobust, and biased.

Postscript

Figure 8 lists 15 within-subgroup estimators for the standard deviation parameter, SD(X). This article looked at the three most commonly used of these.

Figure 8: Some within-subgroup estimators of SD(X)

These 15 estimators display three different levels of robustness to extreme values. As demonstrated in this article, the estimator based on the pooled variance statistic (RMSW) is, by far, the least robust. You can use the datasets given here to verify that the six estimators based on medians are, as a group, the most robust estimators. The remaining eight estimators based on averages have an intermediate level of robustness. And since the job of filtering out the noise places a premium on getting a good estimate even when the data are not well-behaved, we need to use estimators that are reasonably robust. Thus, with 14 estimators available that are more robust, there is simply no justification for ever using the square root of the pooled variance statistic to compute limits on a process behavior chart or to compute a capability index.

Donald J. Wheeler’s complete “Understanding SPC” seminar may be streamed for free; for details, see spcpress.com.

Add new comment

The content of this field is kept private and will not be shown publicly.
About text formats
Image CAPTCHA
Enter the characters shown in the image.

© 2026 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute Inc.

footer
  • Home
  • Print QD: 1995-2008
  • Print QD: 2008-2009
  • Videos
  • Privacy Policy
  • Write for us
footer second menu
  • Subscribe to Quality Digest
  • About Us