When the Bell Curve Doesn’t Fit, Part 2

Few SPC textbooks address the non-normality issue in depth

Bio

Levinson Productivity Systems

The first part of this article illustrated the kinds of problems that can happen when data from non-normal processes are plotted on traditional control charts, and when traditional process capability assessments are applied to these data. This second part will show what to do about these problems.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Why not do it right?

This is an excellent article, making a valuable point. Why not do it right? The position of certain other authors and experts seems to be "Why not do it wrong?"

There is a conundrum that arises when you consider nonnormal control charts. How do you know if the process is stable without assuming a distribution, or how do you know what the distribution is without knowing if it is stable? The best resolution to this problem, in my opinion, is to understand the physics of the process well enough to select an appropriate distribution family without looking at data. Or, with a capability study, potentially outlying values could be individually investigated to see if they represent assignable causes or common causes. Once we know the process is stable, a distribution model can be fit to the data.

At the risk of annoying some readers of this blog, I would like to point out that my 2007 book "Six Sigma Distribution Modeling" describes capability metrics and control chart methods for many different distribution families.

Thanks to Mr. Levinson for making an important argument in a convincing way. Now sit back and watch the normalites argue the case for inappropriate methodology.

Andy Sleeper, Successful Statistics LLC

Assess stability before attempting to determine distribution

"How do you know if the process is stable without assuming a
distribution, or how do you know what the distribution is without
knowing if it is stable?"

This always a good question to ask. If
the process is not stable, assessing the distribution of the measures will be
highly misleading. You might, for example, think you have a skewed
distribution, when you in fact simply had points where your process had
shifted during your data collection.

Let assume you wish to plot an individuals or "X' chart.
Use the median as the centerline. You can still assess for runs and
trends without concern of the distribution. If skewness is present, and
your process is stable, you will see a density closer to centerline on
one side of the chart versus the other. Start with standard limits,
only as a ballpark. If you have a stable skewed distribution, you will
only see points outside the limits on one side of your chart. If you
have a pure leptokurtic distribution, you may see points outside both
limits, and if you have platykurtic distribution, you will see values
hugging the centerline. If you suspect instability (lack-of-control),
use the median moving range to generate the limits.

Then look at your histogram. If the stability assumption looks
reasonable, run your standard tests for normality (Anderson-Darling,
Lin-Mudholkar, Shapiro-Wilk, etc). If you reject the assumption of
normality, then try some different distribution fits. Fitting using the
Johnson or Pearson families are often also good choices. Remember that
fitting a distribution, does not mean you have a given distribution.
Sampling error exists. And that error can be quite large with small
sample sizes and when estimating the higher moments. If you select a
distribution, with a good fit, and the histogram looks reasonable, then
use the methods described in the article to generate the upper and lower
control limits, but use the median as the centerline. Your limits may
never be exact, but may be reasonably sufficient.

On another topic, I appreciated the mention in the article on the use of
exact binomial and Poisson control limits. I am amazed that this is
not more widely used. The normal approximation limits are a hold-over
from when we used to generate control charts with calculators and plot
on paper.

Distribution followup..

I've read a few different authors opinions to be curious about this subject. What is the statistical test that will tell us when one has successfully 'fit a distribution'? Dont we only have statistical tests that tell us when we have a LACK of fit? And if so, wouldnt a data set that failed to detect a lack of fit for one probability distribution, also potentially fail to detect a lack of fit test for many other probability distributions? If so, how do you know which one is the 'right' one?

Distribution fit tests

As with any null hypothesis, you can never PROVE that the selected distribution fits the data. You can only prove beyond a reasonable doubt (significance level) that the distribution is NOT a good fit. This is why I always like a scientific reason, e.g. "undesirable random arrivals" for the gamma distribution, to support selection of the distribution in the first place. If you have this scientific reason and cannot reject the null hypothesis that the distribution fits the data, it's a pretty good bet that it is the right distribution for the job.

duplicate, please delete

The question is Tchebychev vs Fitting a Distribution

The question is Tchebychev vs Fitting a Distribution, not normal versus non-normal. I am glad Mr. Levinson mentioned in this article the original assumptions made by Shewhart. My request is we characterize this "debate" as

1. Is it good enough to use Tchebychev and Shewhart's empirical methods for control limts

2. Is distribution fitting necessary.

I should explain that during my Masters degree training in Operations Research, distribution fitting was drilled into our heads. But, after exposure to SPC and reading Dr. Shewhart and Deming's works, I believe that response 1. is sufficient, easier to implement, and easier to explain. My request though is to not make the assumption that Shewhart's version used the assumption of Normality or the Central Limit Theorm.