## spctoolkit

by Donald J. Wheeler

### Why Three-Sigma Limits?

Three-sigma limits filter out nearly all probable noise and isolate the potential signals.

The key to Walter Shewhart's choice of three-sigma limits lies in the title of his first book, Economic Control of Quality of Manufactured Product, where he emphasizes the economics of decisions. For example, Shewhart writes: "As indicated the method of attack is to establish limits of variability such that, when [a value] is found outside these limits, looking for an assignable cause is worthwhile."

Here Shewhart makes a fundamental distinction-some processes are predictable while others are not. He shows that by examining the data produced by a process, we can determine the predictability of a process. If the data show that a process has been predictable in the past, it's reasonable to expect that it will remain predictable in the future. When a process is predictable, it's said to display common-cause, or chance-cause variation. When a process is unpredictable, it's said to display assignable-cause variation. Therefore, the ability to distinguish between a predictable process and an unpredictable one depends upon your ability to distinguish between common-cause and assignable-cause variation.

What's the difference? Shewhart writes that a predictable process can be thought of as the outcome of "a large number of chance causes in which no cause produces a predominating effect." When a cause does produce a predominating effect, it becomes an "assignable" cause. Thus, if we denote the predominating effect of any assignable cause as a signal, then the collective effects of the many common causes can be likened to background noise, and the job of separating the two types of variations is similar to separating signals from noise.

In separating signals from noise, you can make two mistakes. The first mistake occurs when you interpret noise as a signal (i.e., attribute common-cause variation to an assignable cause). The second mistake occurs when you miss a signal (i.e., when we attribute assignable-cause variation to common causes).

Both mistakes are costly. The trick is to avoid the losses caused by these mistakes. You can avoid making the first mistake if you consider variation to be noise. But, in doing this, your losses from the second mistake will increase. In a similar manner, you can avoid making the second mistake if you consider each value a signal indicator. But, in doing this, your losses from the first mistake will increase.

In our world, when using historical data, it's impossible to avoid both mistakes completely. So, given that both mistakes will be made occasionally, what can we do? Shewhart realized it's possible to regulate the frequencies of both mistakes to minimize economic loss. Subsequently, he developed a control chart with three-sigma limits. Three-sigma limits filter out nearly all probable noise (the common-cause variation) and isolate the potential signals (the assignable-cause variation).

How is it possible that three-sigma limits filter out virtually all probable noise? While there are certain mathematical inequalities that guarantee most data sets will require at least 95 percent of the values within three standard deviations of the average, a better rule of practice is the Empirical Rule, which states that about 99 percent to 100 percent of the data will be located within three standard deviations, either above or below the average.

Figure 1 displays six theoretical distributions to illustrate the Empirical Rule's appropriateness. It shows the area within three standard deviations of the mean. No matter how skewed or "heavy tailed" the distribution may be, virtually all of the area under the distribution curve will fall within three standard deviation units of the mean. When applied to homogeneous data sets, the Empirical Rule suggests that no matter how the data "behave," virtually all of the data will fall within three standard deviation units of the average. Because data that display statistical control are, by definition, reasonably homogeneous, the Empirical Rule explains why the control chart will yield very few instances of noise interpreted as a signal.

Figure 1 also shows that three-sigma limits will indeed filter out nearly all common-cause variation displayed by predictable processes.

Three-sigma limits allow you to detect the process changes that are large enough to be economically important, while filtering out almost all common-cause variation. These limits allow you to strike a balance between the losses associated with interpreting noise as a signal and attributing assignable-cause variation to common causes.