SPCTool

What Is a Rational Subgroup?

by Donald J. Wheeler

In the April column, I outlined three ways to compute the limits for an average chart: the right way, a wrong way and a very wrong way. Several readers wrote that they were using the very wrong way and that they were happy with this method.

I have seen dozens of examples given in attempts to justify the incorrect ways of computing limits. In every case, the problem was a failure to subgroup the data in a rational manner.

We compute limits for an average chart based upon the average range. The average range is the average amount of variation within the subgroups. Thus, the limits on an average chart depend upon the amount of variation inside the subgroups. You must organize the data into subgroups in such a way that this computation makes sense. We want to collect into each subgroup a set of values that were collected under essentially the same conditions.

For example, some asthma patients measure their peak exhalation flow rates four times each day: morning before and after medication, and evening before and after medication. The data for one patient is shown in Figure 1.

Now think about what happens when we make each column in the table into a subgroup of size 4. Within each subgroup, we would have the four scores from a single day, and from one subgroup to the next, we would have the day-to-day variation. But the four scores for a single day are collected under different conditions!

The variation within a subgroup is more than just background variation -- it includes both the medication effects and the morning-to-evening swings of the patient. These effects will make the ranges larger than they need to be to characterize the day-to-day variation. As a result of this subgrouping, the limits will be far too wide, and the averages and ranges will hug the central lines. This mistake is called stratification.

What if we made each row of the table into a subgroup of size 5? Now the different conditions would no longer be contained within the subgroups. But what about the variation inside these subgroups? With this arrangement of the data, the day-to-day variation would be within each subgroup. Because the variation within the subgroups is used to construct the limits, this subgrouping will result in limits that make allowance for the day-to-day variation, but do not make any allowance for the variation morning to evening, or before and after medication. This average chart will be "out of control." But did we really need to prove that there is a difference morning to evening and pre-medication to post-medication? Unless we are trying to document these differences, this is an inappropriate subgrouping.

So we must avoid the two errors of stratification and inappropriate subgrouping. Two conditions are required for any subgrouping to be rational: Each subgroup must be logically homogeneous, and the variation within the subgroups must be the proper yardstick for setting limits on the routine variation between subgroups.

When the values within the subgroups are not collected under essentially the same conditions, you have failed to satisfy the homogeneity condition.

When the variation from subgroup to subgroup represents sources of variation that are not present within the subgroups, and when these sources of variation from subgroup to subgroup are known to be larger than the sources of variation within the subgroups, then you have failed the yardstick criterion.

In either case, the computations will break down because you will have failed to create rational subgroups. The remedy is not to change the computations, but to change the subgrouping into one that is appropriate for your data.

While the data in the table do constitute a time series, they are not easily arranged into rational subgroups because each value is collected under different conditions. In other words, our logical subgroup size is n = 1. You will learn more about the data in the table by plotting them as a time series of 20 values than you ever will by subgrouping them and using an average and range chart.

At the same time, you should resist the temptation to turn this time series of 20 values into an XmR chart. The fact that this time series is a mixture of values collected under different conditions will contaminate the moving ranges and make the limits meaningless.

There is more to rational subgrouping than can be presented in this column. However, the two principles above should get you started down the right road.

About the author

Donald J. Wheeler is an internationally known consulting statistician and the author of Understanding Variation: The Key to Managing Chaos and Understanding Statistical Process Control, Second Edition.