Our PROMISE: Our ads will never cover up content.

Our children thank you.

Statistics

Published: Monday, September 9, 2019 - 12:03

The oldest myth about process behavior charts is the myth that they require “normally distributed data.” If you have ever heard this idea, or if you have ever taught this to others, then you need to read this article.

While this myth dates back to 1935, and while Walter Shewhart exposed this idea as a myth in 1938, it continually reappears in various forms even today. For example, a white paper put out by a software company recently called the process behavior chart a “normality control chart.” And a blurb for a workshop advertised the Western Electric zone tests as “depending upon a normal distribution.”

As I consider how these myths are perpetuated I do not detect any malicious intent, just unconscious confusion. Nevertheless, those that continue to spread these myths fall into three groups. One group spreads these myths because of their special interests, another group spreads these myths because of their unexamined assumptions, and the third group spreads these myths because they were taught that the myths are true.

Software makes the difficult easy and the impossible possible. As part of their outreach software companies produce white papers and conduct seminars and tutorials. Inherent in these marketing efforts is an emphasis upon making the software seem indispensable to the user.

For example, if you are told that you have to qualify your data by checking to see if they are “normally distributed” then you will be interested in learning how the software might help you check for normality. As users come to depend upon the software to guide them through their analyses, all the bells and whistles begin to seem essential. Gradually all of these extra features, such as the checks for normality, the probability plots, and the histograms, turn into prerequisites for using a process behavior chart. Thus, as everyone tends to their own special interests, the myth that the data must be normally distributed prior to using a process behavior chart continues to get a new lease on life.

So, while specific training may be needed to fully benefit from a particular software package, and while it is reasonable and proper for software companies to provide this training, they are a poor resource for broader education in theory, technique, or practice. As a consequence of this software-based training, many users think of statistics as a grab-bag of techniques. Just enter your data, pick a procedure from the smorgasbord provided by the software, and read the output.

However, the first axiom of data analysis is that no data have any meaning apart from their context. When understanding does not guide the analysis, the extraneous can obscure the essence, and the analysis can go astray.

A corollary to this axiom is that the software will never know the context for your data. Every meaningful analysis requires an understanding of the source of the data, an understanding of the analysis technique, and appropriate subject-matter knowledge.

When someone does not recognize that SPC is *fundamentally different* from the traditional techniques of statistical inference they inevitably end up making unexamined assumptions that are incorrect. To understand why these assumptions are incorrect we will have to illustrate the difference between SPC and statistics by returning to first principles.

The purpose of every statistical analysis is to separate the *potential signals* from the* probable noise*. To this end we have to find a way to filter out the probable noise. Since we can never filter out all of the noise, we have to settle for filtering out most of the noise. So let us denote that proportion of the probable noise that we filter out by P.

The traditional statistical approach begins by picking some value for P that is reasonably close to 1.00. (Commonly used values for P are 95 percent, 98 percent, and 99 percent.) Next, we transform our data in some way to get a test statistic *Y*. (*Y* might be a t-statistic, a chi-square statistic, an F-statistic, a proportion, or some other statistic of interest.) Next we identify the appropriate probability model, *f(y)*, to use for our test statistic, *Y*. Finally, we use the equation for the area under a curve to find the critical values, A and B, which correspond to the chosen value for P.

Once we know that the critical values A and B will filter out the proportion P of the probable noise, then we can compare our computed statistic with these critical values. If the statistic *Y* falls between A and B we conclude that our data contain no detectable signals. When our statistic *Y* falls outside the interval from A to B, we have evidence of a potential signal within our data. Once we have detected a signal, we can then estimate it and assess its practical importance.

This logical sequence for filtering out the noise is repeated over and over with different statistics and different analysis techniques. It describes the fundamental approach used by most statistical techniques. It has a proven track record when used with experimental studies, and it is so fundamental that it becomes automatic for those with statistical training to think in these terms: Fix the value for P, determine the probability model, and then find A and B. *But this is not the only approach to filtering out the noise.*

The statistical approach uses a fixed value for P which defines the coverage of the interval A to B. This coverage defines how much of the probable noise gets filtered out. Thus, the statistical approach can be said to use a *fixed-coverage filter*. Inherent in this approach is the necessity of defining a probability model, *f(y)*, to use in finding A and B. Hence, statisticians are by training and inclination prone to think in terms of “What is the probability model?”

While we can usually define a reasonable probability model for various well-known statistics, *Y*, we will never have enough data in practice to fully specify a probability model for the original data. To get around this problem Walter Shewhart chose to use a *fixed-width filter* rather than a fixed-coverage filter. Instead of beginning with a fixed value for P, he decided to fix the values for A and B instead. With appropriate values for A and B, regardless of what probability model might apply when the process is operated predictably, the value for P will always turn out to be reasonably close to 1.00.

In his search for how to define fixed values for A and B, Shewhart found symmetric, three-sigma limits to be sufficiently general to work as desired. As he observed, it is the potential signals that are of interest, not the noise, so in practice all that we need to know is that P is reasonably close to 1.00. *As long as P is close to 1.00 we will know that we are filtering out almost all of the noise, and the precise value of P will be moot.*

Thus, Shewhart’s approach using a fixed-width filter is exactly the opposite of the approach used by the fixed-coverage filter. The difference in these two approaches is absolutely fundamental. With Shewhart’s fixed-width approach there never was a fixed value for P, and there never will be. This is why those who attempt to assign a fixed-coverage value to a process behavior chart are simply exposing their unexamined assumptions.

Shewhart’s generic, fixed-width limits do not depend upon any specific probability model to work. There is no fixed P value for a point falling inside the limits. In fact, as I argued in last month’s column, the notion of computing a P value only makes sense *when the process is being operated predictably*.

However, in order to illustrate the complete generality of Shewhart’s approach we will consider the theoretical P values for each of the Western Electric zone tests with each of six different probability models.

Because of the inevitable gaps between theory and practice, theoretical probabilities are rarely meaningful beyond parts per thousand. (For more on this topic see “Invisible Probability Models,” *Quality Digest*, June 4, 2018.) However, in order to show the differences between the models, some of the following theoretical values are given to four decimal places.

A single point falling outside Shewhart’s three-sigma limits is taken as evidence of a potential signal of a process change. If we assume that we have a predictable process that is characterized, in turn, by each of the following probability models, then the theoretical probabilities of a false alarm and the P values shown would apply. Figure 4 shows the probabilities of a point falling beyond the three-sigma limits for each tail of each probability model, and the complement of these tail probabilities will be the P value for Rule One for that model.

Statistical procedures with a fixed coverage P in excess of 0.975 are said to be conservative. Figure 4 shows that Shewhart’s generic three-sigma, fixed-width limits will result in a conservative analysis regardless of what probability model we may use. Regardless of the shape of your histogram, Shewhart’s three-sigma limits will filter out anywhere from 98 percent to 100 percent of the probable noise. Thus, P remains reasonably close to 1.00 and Rule One false alarms remain rare.

When at least two out of three successive values fall more than two sigma units above the central line, or when at least two out of three successive values fall more than two sigma units below the central line, this *run beyond two-sigma* may be taken as evidence of a process change. In figure 5 the probabilities of getting at least two out of three values beyond two-sigma are shown for each tail of each probability model. The complement of these tail probabilities will be equal to the P value for Rule Two for that model.

Regardless of the shape of your histogram, when your process is operated predictably, the chance that a run beyond two-sigma will be a false alarm is less than one-half percent. Thus, Detection Rule Two is not dependent upon having a normal distribution. While the P values vary, they all remain very close to 1.00.

When at least four out of five successive values fall more than one sigma unit above the central line, or when at least four out of five successive values fall more than one sigma unit below the central line, this *run beyond one-sigma* may be taken as evidence of a process change. In figure 6 the probabilities of getting at least four out of five values beyond one-sigma are shown for each tail of each probability model. The complement of these tail probabilities will be equal to the P value for Rule Three for that model.

Regardless of the shape of your histogram, when your process is operated predictably, the chance that a run beyond one-sigma will be a false alarm is less than one-half percent. Thus, Detection Rule Three is not dependent upon having a normal distribution. While the P values vary, they all remain reasonably close to 1.00.

When eight successive values all fall on the same side of the central line this *run about the central line *may be taken as evidence of a process change.

Figure 7 shows the probabilities of getting eight successive values on either side of the mean for the first model and above the mean for each of the last five models. The complement of these run-of-eight probabilities will be equal to the P value for Rule Four for that model.

When your predictable process has a histogram with one tail that is less than two sigma in extent you will almost certainly be operating near a boundary condition which limits your process on that side. In such a case detection rule four only makes sense when it is applied to the unbounded, long-tail side of the histogram. For this reason the bottom five probability models only give the false alarm probabilities for the upper tail. Once again, regardless of the shape of your histogram, when your process is operated predictably, the chance that a run about the central line on the unbounded side will be a false alarm is less than one percent. Thus, Detection Rule Four is not dependent upon having a normal distribution. While the P values vary, they all remain very close to 1.00.

It is natural for those who think in terms of having a fixed value for P to be concerned about which probability model to use. But Shewhart’s fixed-width filter is fundamentally different from the fixed-coverage filter used with traditional statistical techniques. Regardless of the shape of the histogram, Shewhart’s generic, symmetric, three-sigma limits and the Western Electric zone tests will work reliably to filter out virtually all of the probable noise so you can detect any potential signals.

The purpose of a process behavior chart is simple: To characterize a process as being operated predictably or unpredictably. The technique of creating and using a process behavior chart is equally simple: Collect data, plot chart, compute limits, plot additional data on chart, and look for assignable causes associated with any signals shown on the chart. For years all of this was successfully done with pencil and paper. The complexity began when software came along and we started adding all the bells and whistles.

You do not have to have a “normal distribution” to use these techniques. Never have, never will. You do not have to achieve some magic value for P in order for the process behavior chart to work. Never have, never will. And you do not have to qualify your data before using a process behavior chart. Never have, never will. Anyone who says anything different is either promoting some special interest, or else has not taken the time, or has not had the opportunity, to learn how Shewhart’s approach differs from the traditional statistical approach.

To paraphrase Shewhart, classical statistical techniques start with the assumption that a probability model exists, whereas a process behavior chart starts with the assumption that a probability model does *not* exist. Until you learn the difference, every time you open your mouth, you will be exposing your unexamined assumptions.

## Comments

## Normality necessity for Cpk calculation?

Hello all,

The Normality Myth article was great.

We learned that for SPC charts, normality is not necessary.

My question is that:

How much is it acceptable to use usual formula (normal distribution formula,3 sigma) for Cpk calculation when the distribution is not normal or is unknown?

thanks in advance.

## Another great angle on this problem

Don always seems to find another angle; this one should, I hope, help convince some of those astute enough to understand. Personally, I was lucky enough to learn stats for analytic studies first via classes and practice in SPC (including some Wheeler seminars), and later learned the enumerative world as I got into DoE. The question of normality for SPC was never much of a question for me, and was settled for good once I read

Normality and the Process Behavior Chart.There is another aspect to the problem, though, and that is the fact that when you have time-ordered data, you ca

nnotignore the context of time. One of the things Shewhart did was to find a way to look for signals in time-ordered process data, recognizing that without stability, you cannot assume homogeneity. Without homogeneity, any distributional assumption is meaningless. It can very easily be shown (Don and Davis Balestracci have demonstrated this very well; I summarized Davis's argument in https://www.qualitydigest.com/inside/quality-insider-column/render-unto-enumerative-studies-073113.html back in 2013. In Davis's argument, he points out three different distributions that all test well for normality, but when you look at the time series, only one is stable.So again, if you don't have homogeneity, you don't have any reason to assume any distribution. You can test for normality--it's just a calculation--but it would be meaningless if the time series is shifting or out of control. There is no distribution for data from an out-of-control time series. This aspect of the problem also makes testing for normality prior to examining the data in a chart a "cart-before-the-horse" exercise.

## 1935 and still going

Another great article. 84 years since Pearson thought he "fixed" what Shewhart overlooked. In 1931, on page 138 of Economic control of quality of manufactured product, Shewhart notes: "Pearson and his followers claim that these laws have been found to cover practically all cases coming to their attention." Was this a preemptive strike at Pearson's misunderstandings?

Any comments appreciated.

Thank you, Allen

## Normality Myth

Great article as per usual.

The distribution doesnt create the data!

The point is to try to make the best informed decisions on imperfect data, fitting models doesnt change this especially if the data does not display

homogeniety, yet another key and foundational use of the process behaviour chart which is so often ignored.

Thank you once again Don!

## Normality Myth

I agree that it is important to avoid the false positives, but I've found that it is equally, if not more important to capture the true positives. That is, capturing off-spec product when it IS off spec. So, when the distribution isn't normal, readings of -2 or even -1.5 sigma are indeed significant and indicative of a change while the positive sigma values indicate acceptable material. Missing a process change can be critical, depending, of course, on the process. As a result, I've found it useful to identify the distribution and set appropriate limits based on the probability. Then to convert those probabilities to the A and B values for operators. I'm sure Dr. Shewhart used the same A and B logic because he recognized the limited ability of QC professionals to make the conversions that are now so easily obtained with our PCs. Thank you for opening up this window on a poorly understood element of SPC.

## Practical Limits to minimize false alarms in skewed data sets

If process data are truly independent and if one agrees to tolerate up to 2.5% of false positive signals, then one can ignore the underlying data distribution (symmetric or skewed) and be guided by 3 sigma limits as Dr. Wheeler suggests and advocates. However, there are situations where data distributions are inherently skewed and will inevitably entail a large and unacceptable number of false positive signals. Experience shows that these apparent outlying data can be considered as part of the common variation. An example would be microbial counts recorded in controlled rooms for production of sterile and non-sterile pharmaceuticals. In a regulated industry such the pharmaceutical one, every apparent out-of-control point must be investigated and documented under a strict quality system. When you generate hundreds of data, the "2.5%" can amount to considerable, annoying and costly futile investigations. Therefore, Richard Heller's approach to "identify the distribution and set appropriate limits based on the probability" is in my opinion understandable. To be practical, if the skewed data base is large enough and shows a recurrent and consistent pattern, one can try to either identify the exactly or approximately a data distribution or do empirical curve fitting and this will result in a much smaller amount of false positive signals. Shewhart used the terms "approximately" and “past experience” to state that a controlled phenomenon may be expected to behave in the future. So, if the data modeling reflects adequately or approximately the past behavior of a process, then I judge it reasonable to set what I call “practical limits” to such a skewed data set, limits that will minimize false positive signals. Furthermore, “past experience” can also reveal that certain skewed data sets, control limits based on 4 or 5 of 6 sigma limits can minimize false positive signals and in this case, these wider limits could also be viewed as another example of “practical limits”.

## Maybe I'm misunderstanding you here.

Control limits have nothing to do with whether the product is "off spec." Specifications are set by the customer. Just because your process has shifted in a positive direction, further within the specification, does not mean that this is acceptable. Quality means "on target with minimum variability," and the costs of your process being out of control are going to be passed on to your customers, who have to deal with incoming lots of product being different from previously-used product, regardless of whether the product satisfies specifications.

The point here is that you DON'T have to know the probability distribution for control charts to work. Unless you have a tonne of data about an extremely predictable process, you CAN'T know the distribution. Thus, you can't know the probabilities. OF COURSE we would like to avoid false negatives, just as we want to avoid false positives. But I'm sceptical that you have a statistical method for determining that a signal which looks exactly like noise is actually a signal, and that this method does not lead to more false positives than Shewhart's.

## Normality Myth

Your point is well taken.

First, I apologize for my mistake in talking about specifications instead of control limits. I meant to say that there were occasions under Dr. Wheeler's examples where we would be making the error of assuming the process was in control when it actually wasn't. And this would be an issue only if the control limits were based on assuming a normal distribution when the underlying one wasn't. Some operations can be assumed to be normal . . . length, temperature, time, etc. Others can't, such as chemical purity or impurities in a batch.

One useful tool for processes with less than a tonne of data that I've found is the Weibull distribution which can help predict the expected distribution. Again, I should qualify myself and note that it isn't a universal solution. However, in the long run, I believe it is important to recognize that we often do make predictions that can be off by a country mile. Dr. Wheeler's calculations taken out to the nth decimal place need to be taken with a grain of salt, not because his math is wrong, but because of the difference between the real world and the mathematical world.

With this being said, I enjoyed the article and the coments from you and all the others. I learn from each of these articles . . . something that I couldn't do when I was younger and

(thought)I was so much smarter! Thank you## Thanks!

Thank you for the thoughtful response. It is greatly appreciated.

## Great comment Richard, I do

Great comment Richard, I do agree with what you are saying. Wheeler's points are tried and true and go all the way back to his seminal book with Chambers on "Understanding Statistical Process Control" written in the late 80's or early 90's. At the same time understanding the underlying distribution can be not only useful for identifying true positives on a process behavior chart, but also for understanding the physical nature of a process so you can further improve the process down the road.

## Bravo!

I'm often surprised by QIMacros users who think they have to check for normality and do a Box-Cox transformation to use a process behavior chart (i.e., control chart). I keep directing them to your wisdom on this topic. Thanks for another great article explaining it to the doubters.

Jay Arthur

## QI Macros

I tried several years ago to get them (at QI Macros) to at least make it an option...they would not. As a result, I cannot recommend that package to clients.

## The Normality Myth

Very well written. I appreciate the class I took with you back in 1997.

Thank you for taking the time to write this.

## Dr. Wheeler is a gem

I too took Dr. Wheeler's class in the 1990's. It forever changed my approach to numbers and statistics. The SPC Press training is pragmatic and easy to learn.

I just sent a new-hire to the class this week. Highly recommend over any other SPC training.