## spctoolkit

by Donald J. Wheeler

Control charts
are often misused
because of a failure to

The control charts described in many current technical articles bear little, if any, resemblance to the control chart technique described in Walter Shewhart's writings. Part of this problem can be attributed to novices teaching neophytes, while part is due to the failure to read Shewhart's writings carefully. Therefore, to help the reader differentiate control chart myths from foundations, this column will focus on both. This month, I will discuss four myths about Shewhart's charts. Next month, I will discuss four foundations of Shewhart's charts.

Myth One: Data must be normally distributed before they can be placed on a control chart.

While the control chart constants were created under the assumption of normally distributed data, the control chart technique is essentially insensitive to this assumption. This insensitivity is what makes the control chart robust enough to work in the real world as a procedure for inductive inference. In August, this column showed the robustness of three-sigma limits with a graphic showing some very nonnormal curves.

The data don't have to be normally distributed before you can place them on a control chart. The computations are essentially unaffected by the degree of normality of the data. Just because the data display a reasonable degree of statistical control, doesn't mean that they will follow a normal distribution. The normality of the data is neither a prerequisite nor a consequence of statistical control.

Myth Two: Control charts work because of the central limit theorem.

The central limit theorem applies to subgroup averages (e.g., as the subgroup size increases, the histogram of the subgroup averages will, in the limit, become more "normal," regardless of how the individual measurements are distributed). Because many statistical techniques utilize the central limit theorem, it's only natural to assume that it's the basis of the control chart. However, this isn't the case. The central limit theorem describes the behavior of subgroup averages, but it doesn't describe the behavior of the measures of dispersion. Moreover, there isn't a need for the finesse of the central limit theorem when working with Shewhart's charts, where three-sigma limits filter out 99 percent to 100 percent of the probable noise, leaving only the potential signals outside the limits. Because of the conservative nature of the three-sigma limits, the central limit theorem is irrelevant to Shewhart's charts.

Undoubtedly, this myth has been one of the greatest barriers to the effective use of control charts with management and process-industry data. When data are obtained one-value-per-time-period, it's logical to use subgroups with a size of one. However, if you believe this myth to be true, you'll feel compelled to average something to make use of the central limit theorem. But the rationality of the data analysis will be sacrificed to superstition.

Myth Three: Observations must be independent-data with autocorrelation are inappropriate for control charts.

Again, we have an artificial barrier based on theoretical assumptions, which ignores the nature of real data and the robustness of the control chart. All data derived from production processes will display some level of autocorrelation. Shewhart uses autocorrelated data in the control chart as early as page 20 of his first book. He writes that assignable causes of variation are found and removed, then new data is collected. The new data shows they improved the process.

Remember, the purpose of analysis is insight rather than numbers. The control chart isn't concerned with probability models. Rather, it's concerned with using data for making decisions in the real world. Control charts have worked with autocorrelated data for more than 60 years.

Myth Four: Data must be in control before you can plot them on a control chart.

This myth could have only come from computing limits incorrectly. Among the blunders that have been made in the name of this myth are: censoring data prior to charting them and using limits that aren't three-sigma limits. Needless to say, these and other manipulations are unnecessary. The purpose of Shewhart's charts is to detect lack of control. If a control chart can't detect lack of control, why use it?