The first part of this article illustrated the kinds of problems that can happen when data from non-normal processes are plotted on traditional control charts, and when traditional process capability assessments are applied to these data. This second part will show what to do about these problems.
The data in the example were simulated from a gamma distribution with shape and scale parameters alpha = 2 and gamma = 2. The resulting random data fit a gamma distribution with alpha = 1.88 and gamma = 1.85. In practice, the correct distribution would not be known up front, but it can often be inferred from the nature of the process. Undesirable random arrivals like impurities, particles, and defects follow the Poisson distribution, of which the gamma distribution is the continuous scale analogue. If the distribution is not normal and the critical-to-quality characteristic involves something undesirable, the gamma distribution is often a good choice. The extreme value distribution, on the other hand, is appropriate for materials that fail at their weakest points. See pages 77–78 in Kailash C. Kapur’s and Leonard R. Lamberson’s Reliability in Engineering Design (John Wiley & Sons, 1977) for corroboration. It is of course vital to perform tests for goodness of fit no matter which distribution is finally selected.
Figure 4 (editor's note: the first three figures appeared in the first part of this article) shows the histogram and the corresponding fitted gamma distribution. Note that, unlike the normal distribution, the gamma distribution does not allow values of less than zero for quality characteristics such as impurity levels. The quantile-quantile plot is equally encouraging.

Figure 4. Histogram, gamma distribution
StatGraphics finds PPU = 0.88 as shown in figure 5, and this indicates a nonconformance rate of 4,163 nonconformances per million opportunities: almost 100 times as many as promised under the assumption of normality.

Figure 5. Process performance index, correct distributional assumption
This article has so far illustrated the pitfalls involved in the normality assumption; the next section will show what to do about processes that don’t comply with it.
The first step is to fit the appropriate distribution to the process data, and maximum likelihood estimation (MLE) is the standard approach for the gamma, Weibull, and many other distributions. The results consist of the shape, scale, and where appropriate, threshold parameter of the distribution. The latter is the minimum value the quality characteristic can have, and it is similar to the guarantee time in reliability statistics.
The second step—and this must never be omitted—involves tests for goodness of fit such as the histogram, chi square test, and quantile-quantile plot. Figure 3 in the first part of this article was an example of a very bad quantile-quantile plot that shows an inappropriate distributional fit. Figure 6 is an example of a good one, for the fit of the gamma distribution to the impurity data.

Figure 6. Quantile-quantile plot, gamma distribution
It is then straightforward to deploy control charts with realistic false-alarm risks that are unlikely (0.135% is the exact chance) to be exceeded if the process is in control. Figure 7 presents an X‑bar chart in which none of the subgroups are beyond the control limit, and also in which it is not possible to record an impurity level of less than zero. The range chart also shows no points outside the control limit.


Figure 7. Control charts, gamma distribution limits
The next step is to deploy these charts to the shop floor in a form that production operators can use easily. In his novel Starship Troopers (Ace, 1987, originally published in 1959), Robert Heinlein says the following of military equipment, and the same principle applies to statistical tools for the shop floor: “If you load a mud foot down with a lot of gadgets that he has to watch, somebody a lot more simply equipped—say with a stone ax—will sneak up and bash his head in while he is trying to read a vernier.”
StatGraphics and Minitab are ideal for offline data analysis, but they are not particularly convenient for routine control charting. Once one of these programs has delivered the fitted parameters, though, a simple visual control can be deployed on a spreadsheet like Excel. Excel’s built-in functions for the gamma and Weibull distribution (WEIBULL, GAMMADIST) will return the cumulative distribution for an individual measurement. If this exceeds 0.99865 (or any other desired false-alarm risk), the cell can be formatted to turn red.
It is also possible to set up an X‑bar chart for the gamma distribution. The average of a subgroup of n with parameters aand g is another gamma distribution with parameters na and ng. If n is large enough, the gamma distribution begins to look like a normal distribution per the Central Limit Theorem. It is also possible, albeit with somewhat more difficulty (a Visual Basic for Applications program is required) to find quantiles of the sample range and set the 0.99865 quantile as the upper control limit. In 1948, mathematician Samuel Stanley Wilks noted in his article, “Order Statistics” (Bulletin of the American Mathematical Society, page 21), the following probability density function for the range of any distribution. My experience is that Romberg integration is superior to Simpson’s Rule for the necessary numerical integrations.
![]()
Since it is relatively easy to set up control charts that will work for highly non-normal distributions, why is this practice so infrequent? The key issues appear to be lack of awareness and failure to exploit modern computers.
The first problem is lack of awareness. Few, if any, statistical process control (SPC) textbooks and guides address the non-normality issue in any depth, although the Automotive Industry Action Group’s 2005 SPC manual, Statistical Process Control, Second Edition, devotes a couple of pages to non-normal process performance indexes.
The second problem, or rather opportunity, is that modern computational technology makes it practical to do jobs that would have been unthinkable in the days of Walter A. Shewhart and other statistical quality pioneers. SPC was developed during an era in which the most sophisticated computational tools consisted of slide rules and possibly electromechanical computers. The production of parts as opposed to control charts delivers profits and pays wages, and Heinlein’s admonition against reading a vernier when a stone ax will do the job was perfectly applicable.
As an example, the sample standard-deviation (s) chart is slightly superior to the range chart in terms of average run length when the process variation goes out of control. Computation of the range, however, requires the production worker to merely subtract the smallest measurement from the largest. R is the stone ax while s is the vernier, and R does the job almost as well. If the ease of computation means the operator can plot far more sample ranges than sample standard deviations in the same amount of time, R is actually better. The median chart, meanwhile, eliminates the need to calculate the sample average. Today, of course, spreadsheet functions like AVERAGE and STDEV will do everything automatically.
The traditional attribute charts (i.e., p, np, c, and u) rely on normal approximations to the binomial and Poisson distributions that work properly only when the expected count exceeds four and possibly five or six. The count is generally of something undesirable like scrap or defects, so we don’t want situations in which the normal approximation works well for these applications. Spreadsheet functions like BINOMDIST and POISSON have, at least in my opinion, made these traditional attribute charts obsolete. Even the traditional R and s charts rely on normal approximations to the distributions for range and standard deviations, and the built-in CHIDIST function can calculate exact control limits for the latter.
In summary, then, awareness of processes that follow non-normal distribution in combination with off-the-shelf computational technology makes it relatively easy for quality practitioners to deploy control charts and calculate process performance indexes in which production personnel and customers can have the utmost confidence.
Image credits: All the figures were generated from StatGraphics Centurion.
Sign In to get started!
Comments
The question is Tchebychev vs Fitting a Distribution
The question is Tchebychev vs Fitting a Distribution, not normal versus non-normal. I am glad Mr. Levinson mentioned in this article the original assumptions made by Shewhart. My request is we characterize this "debate" as
1. Is it good enough to use Tchebychev and Shewhart's empirical methods for control limts
OR
2. Is distribution fitting necessary.
I should explain that during my Masters degree training in Operations Research, distribution fitting was drilled into our heads. But, after exposure to SPC and reading Dr. Shewhart and Deming's works, I believe that response 1. is sufficient, easier to implement, and easier to explain. My request though is to not make the assumption that Shewhart's version used the assumption of Normality or the Central Limit Theorm.
duplicate, please delete
duplicate, please delete
Distribution followup..
I've read a few different authors opinions to be curious about this subject. What is the statistical test that will tell us when one has successfully 'fit a distribution'? Dont we only have statistical tests that tell us when we have a LACK of fit? And if so, wouldnt a data set that failed to detect a lack of fit for one probability distribution, also potentially fail to detect a lack of fit test for many other probability distributions? If so, how do you know which one is the 'right' one?
Distribution fit tests
As with any null hypothesis, you can never PROVE that the selected distribution fits the data. You can only prove beyond a reasonable doubt (significance level) that the distribution is NOT a good fit. This is why I always like a scientific reason, e.g. "undesirable random arrivals" for the gamma distribution, to support selection of the distribution in the first place. If you have this scientific reason and cannot reject the null hypothesis that the distribution fits the data, it's a pretty good bet that it is the right distribution for the job.
Why not do it right?
This is an excellent article, making a valuable point. Why not do it right? The position of certain other authors and experts seems to be "Why not do it wrong?"
There is a conundrum that arises when you consider nonnormal control charts. How do you know if the process is stable without assuming a distribution, or how do you know what the distribution is without knowing if it is stable? The best resolution to this problem, in my opinion, is to understand the physics of the process well enough to select an appropriate distribution family without looking at data. Or, with a capability study, potentially outlying values could be individually investigated to see if they represent assignable causes or common causes. Once we know the process is stable, a distribution model can be fit to the data.
At the risk of annoying some readers of this blog, I would like to point out that my 2007 book "Six Sigma Distribution Modeling" describes capability metrics and control chart methods for many different distribution families.
Thanks to Mr. Levinson for making an important argument in a convincing way. Now sit back and watch the normalites argue the case for inappropriate methodology.
Andy Sleeper, Successful Statistics LLC
Assess stability before attempting to determine distribution
"How do you know if the process is stable without assuming a
distribution, or how do you know what the distribution is without
knowing if it is stable?"
This always a good question to ask. If
the process is not stable, assessing the distribution of the measures will be
highly misleading. You might, for example, think you have a skewed
distribution, when you in fact simply had points where your process had
shifted during your data collection.
Let assume you wish to plot an individuals or "X' chart.
Use the median as the centerline. You can still assess for runs and
trends without concern of the distribution. If skewness is present, and
your process is stable, you will see a density closer to centerline on
one side of the chart versus the other. Start with standard limits,
only as a ballpark. If you have a stable skewed distribution, you will
only see points outside the limits on one side of your chart. If you
have a pure leptokurtic distribution, you may see points outside both
limits, and if you have platykurtic distribution, you will see values
hugging the centerline. If you suspect instability (lack-of-control),
use the median moving range to generate the limits.
Then look at your histogram. If the stability assumption looks
reasonable, run your standard tests for normality (Anderson-Darling,
Lin-Mudholkar, Shapiro-Wilk, etc). If you reject the assumption of
normality, then try some different distribution fits. Fitting using the
Johnson or Pearson families are often also good choices. Remember that
fitting a distribution, does not mean you have a given distribution.
Sampling error exists. And that error can be quite large with small
sample sizes and when estimating the higher moments. If you select a
distribution, with a good fit, and the histogram looks reasonable, then
use the methods described in the article to generate the upper and lower
control limits, but use the median as the centerline. Your limits may
never be exact, but may be reasonably sufficient.
On another topic, I appreciated the mention in the article on the use of
exact binomial and Poisson control limits. I am amazed that this is
not more widely used. The normal approximation limits are a hold-over
from when we used to generate control charts with calculators and plot
on paper.