## The Importance of a Proper SPC Subgroup Sampling Technique

### Methodically finding useful data in out-of-control conditions

Published: Friday, November 13, 2009 - 10:16

A common error of many Six Sigma and operations research professionals is not properly selecting the correct subgroup sampling technique when constructing a statistical process control (SPC) chart. Incorrect subgroup sampling technique selection has become worse in the modern computing age, perhaps because most practitioners try to “fit” their data into the graphical user interface template of the major statistical software packages. Consequently, many practitioners produce aesthetically appealing charts that are simply not effective at identifying out-of-control (OOC) conditions. This article will discuss proper SPC subgroup sampling techniques and illustrate the principles of proper subgroup sampling selection from a practitioner’s perspective.

The proper selection of an SPC sampling technique should be based on an analysis of historical data that is representative of the current process and the question that needs to be addressed. The importance of selecting the proper SPC subgroup sampling technique cannot be stressed enough—however, it is overlooked by most practitioners.

Today, many public utilities such as electric and telephone companies maintain all of the data for regulatory purposes. In an instance like this, a practitioner has a population of data, so the practitioner can simply chose to make measurements on all of the elements in the population and comment on the observations with complete certainty. Alternatively, if the practitioner is in a high-volume, low-mix environment where the measurement of process outcomes is not practical or impossible, then the practitioner has a sample and not a population. In this case, the practitioner must rely on inferential statistics to make an educated guess concerning the value of population parameters. SPC charts were developed to support real-time analysis of product quality in an environment where a sample of data is available.

Suppose the practitioner decides to partition the process data based on increments of production and to monitor the process using an X-bar/R chart. Many practitioners are often driven by the statistical software package interface and partition the all of the data into small subgroups of three to five units and let the software generate an X-bar/R chart. There are several problems with this approach: First, there is a likelihood that the subsequent subgroup averages will be correlated and not independent; and second, distinguishing between within-group and between-group variations becomes problematic and therefore the practitioner will most likely have developed a process monitoring system that is not sensitive to detecting alarm conditions.

There are two common approaches to sampling the data partitions to construct the rational subgroups: First, perform a random sample of the partition with at least the minimum number of elements in the subgroup; and second, systematically sample the data partitions and collect the minimum number of elements in the subgroup. One example of the second approach might be to collect the serial observations from the tail of the partition. The last observations are often preferable to the initial observations since the process is being observed in its “steady-state” behavior. The next question is, “How many observations should be in a rational subgroup?”

Consider a normally distributed, capable production process that has produced 20 lots of 100 items. The length of each item was measured and recorded to assess the proper sampling technique in this high-volume, low-mix environment. In this scenario, the desirable length of the item was 5 cm and acceptable quality is between 4.5 cm and 5.5 cm. Figure 1 contains a process capability analysis of the 2,000 measurements that were taken.

Figure 1: Capability Study of Capable Process Under Examination |

The size of a rational subgroup is a function of the detectable change, the probability of minimizing a Type I and Type II error, and the variation in the data. Practitioners can calculate the minimum size of a rational subgroup using a statistical software package. Statgraphics Centurion is an example of one statistical software package that has a heightened sensitivity to the importance of this critical step in the proper construction of an SPC chart.

To design an alarm system that will alert us to an OOC condition, we must be able to distinguish between variation within a group and the variation between the other groups. Because this production consisted of 20 lots we must identify the partition (or manufacturing lot in this case) with the greatest variation so that we can determine the minimum size of the rational subgroup. In this example, the greatest variation within each partition was 0.50. Figure 2 contains the calculation for the minimum rational subgroup sample size for this problem.

Figure 2: Determination of Minimum Rational Subgroup Size Using Statgraphics Centurion |

The goal of this investigation is to determine the sampling technique that will be the most effective at detecting an OOC condition. The historical production data that was provided above is representative of the present state of the process. Therefore, the investigation will be made by using the 20 partitions of production data.

Because of the need to have detailed insight into the nature of our production process, we have measured every product of the process in each of the 20 partitions. Consequently, we can state with certainty the mean and variance of each partition. Our goal will be to determine, based on our data, if we can detect OOC conditions by randomly sampling five elements from each partition and forming a rational subgroup of these five elements, or by sampling the last five elements of each rational subgroup and forming the rational subgroup based on those five elements.

The 20 partitions of data have 18 partitions where the variance of the partition is essentially constant while the process mean “drifted” between subsequent partitions; and two partitions where both the process mean and the variance of the partitions were significantly different than the subsequent partitions. Therefore, we can use the data to determine the best sampling approach.

Because the true mean of the partition is known, we can identify each production lot where an OOC condition occurred. The question is which sampling technique will be the most effective at identifying those OOC conditions? Figure 3 contains an X-bar/R chart that was generated using a rational subgroup created from a randomly selected set of five elements from each partition, and the last five elements for each partition.

Figure 3: X-bar/R chart for Each Sampling Technique |

Notice that the R chart for the sampling technique that selected the last five units produced in each production lot captures the change in the standard deviation better than the R chart of the elements that were captured using a random sample. Therefore, while both sampling techniques were able to identify the OOC conditions on the X-bar/R chart, sampling the last five units in each production lot was the best approach for this data set.

Because it is important that we interpret X-bar/R charts based on both the X-bar and the R charts, the second approach of selecting the last five units in each production lot was a better sampling approach because it was able to capture the variation within the subgroups better.

Every SPC chart should be accompanied with an out-of-control action plan (OOCAP). The astute practitioner that selected the proper sampling technique could quickly identify a “process drift,” change in variation, or a combination of both. Consequently, the OOCAP could lead to immediate corrective action dealing with a process drift or abnormal variation because we selected and constructed the rational subgroups properly.

Many Six Sigma and operations research practitioners either have failed to understand the importance of the rational subgroup concept or its importance in constructing proper SPC charts. This discussion has used a pragmatic approach to demonstrate and discuss the importance of proper SPC rational subgroup sampling techniques. Even though it maybe expedient to ignore this important concept, the astute practitioner will benefit immensely if care is taken when selecting the proper sampling technique.

References:

*Introduction to Statistical Quality Control, Fifth Edition*, by Douglas C. Montgomery (John Wiley & Sons, 2005), chapter 4

*Understanding Statistical Process Control, Second Edition*, by Donald J. Wheeler (SPC Press, 1992), chapter 3

## Comments

## A couple of points...

First, I don't think you missed anything, Dan. The little piles of data above and below the center pile are from the data in the high and low "out-of-control" points on the Xbar-R charts.

Secondly, I'm glad we have an article that talks about rational subgrouping; this is an important subject that deserves a lot of discussion, especially these days when there are too many Black Belts and Master Black Belts who have too little understanding of SPC. However, these examples may not be the best to illustrate the concept.

For one thing, sampling theory as usually practiced for enumerative studies may be completely irrelevant in analytic studies; monitoring process data over time for SPC usually falls under the latter type. There is no underlying population. There may be a pile of historical data, but our task is not extrapolating from a smaller samples to characterize the population, it is to extrapolate from the past to the future. The population of interest does not exist, and never will.

The control chart indicates that we do not have "normally distributed data." You never actually have normally distibuted data, but when you have an out-of-control condition, you can't say anything at all about the distribution; without homogeneity, there is no distribution. In fact, what we appear to have in this case is maybe three different distributions. Further study within the lots to find out whether they are interenally homogeneous might be helpful, as one of the primary goals of rational subgrouping is ensuring internal homogeneity.

You could, I think, make an argument that either scheme discussed here might work. While it's true that the chart for "last five" reveals one extra assignable cause signal, the chart for the "random sample" has a smaller R-bar, thus tighter limits; it may, therefore, be more sensitive and stand a better chance of revealing other signals over the long run.

Anyone interested in really exploring rational subgrouping would do well to study closely chapter 5 (especially sections 5.6-5.7) in the Wheeler text John cites.

## 100% data

I still have some doubt as to how to chart 100% data. It seems like the only way to chart 100% data is with an I-MR chart. Any kind of X-bar - MR chart will violate rational sampling. Or is a better solution to periodically draw X-bar subgroups form the 100% data, in order to get a better sense of mean shifts? It just seems counter intuitive to only use part of the data if 100% data exists.

By the way, I agree with referring to Wheeler.

JT

## Imaginary Processes

It is meaningless to discuss rational subgrouping in an imaginary theoretical normal process.

You would be well advised to read Wheeler's book "Advanced Topics in SPC".

## Sampling from a Trimodal Distribution

I guess I missed something. I see a "trimodal" distribution. And the X-Bar/R Charts appear very similar in shape. At least it appears that no matter what sampling plan was selected, both ended up with similar results. I think only one point different(?)

It appears to me that the sidemodes are from the early groups and last sample groups. Given these data, I didn't see much of a difference in the charts or analysis presented -- enough to draw a conclusion that subsampling change had any impact at all.

Dan