## A Bell-Shaped Distribution Does Not Imply Only Common Cause Variation

### Random does not imply normal

Published: Monday, September 18, 2017 - 12:03

*Story update 9/26/2017: The words "distribution of" were inadvertently left out of the last sentence of the second paragraph.*

Some practitioners think that if data from a process have a “bell-shaped” histogram, then the system is experiencing only common cause variation (i.e., random variation). This is incorrect and reflects a fundamental misunderstanding about the relationship between distribution shape and the variation in a system. However, even knowledgeable people sometime make this mistake.

For example, paraphrasing from a popular Six Sigma textbook, when most values fall in the middle and tail off in either direction, we have statistical evidence of common cause variation.^{1,2} This is an invalid statement, and the misunderstanding probably stems from the fact that if we were sampling means from a stable process, the central limit theorem would assure us that the distribution of sample means would be approximately normally distributed. However, even though the histogram of the subgroup means is bell-shaped, the process itself may still be non-normal or be experiencing special or systematic causes of variation (i.e., it may be out-of-control). To determine the correct status of the process, we must look at the control chart of the individual observations, not the distribution of subgroup means.

The fact that a “normal” distribution shape does not imply process stability is known as the Quetelet Fallacy and is documented in *The History of Statistics.*^{3} You may be surprised to learn that many educated people, including statisticians and engineers, have no knowledge of the fallacy or believe it to be true, and that the belief in the fallacy has a long history. The first documented example that it is false was given in Sir Frances Galton’s famous sweet pea experiment of 1875 that exposed the Quetelet conjecture as false.^{4}

A proof is given below for the argument that a normal or bell-shaped histogram does not imply that the system is experiencing only common cause variation, and conversely a system experiencing only common cause variation will not necessarily have a normal distribution of observations.

**Theorem:** Normal does not imply Random, and Random does not imply Normal

**Proof:**

Part 1. The proof that “Random does not imply Normal” is obvious because you can generate random (i.e., common cause) distributions that are uniform, triangular, Weibull, Poisson, Cauchy, etc., and yes, even Normal (see JMP or Minitab for examples). Also, Walter A. Shewhart’s figure 9 in his 1931 book, *Economic Control of Quality of Manufactured Product*, contains an example. It is the histogram of the modulus of rupture for sitka spruce trees. The histogram is skewed, but Shewhart observes that it is at least approximately in a state of statistical control.^{5}

Part 2. The proof that “Normal does not imply Random” is false is illustrated by a counter example given below. In this example the histogram is bell-shaped, but the system is experiencing both special cause (in this case systematic) variation and common cause (i.e., random) variation. In the graph the slope of the polynomial trend line characterizes special cause (systematic) variation, and common cause (random) variation is characterized by the spread of the points about the trend line.

**Example: **

**Clothing sales data **for spring, summer, and fall (× 1,000 units)

{1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9}

**Period 1: **May, June (six weeks, new marketing dialog)

**Period 2: **July, August (seven weeks, old marketing dialog)

**Period 3: **September, October (six weeks, new marketing dialog)

The graph of the sales over time shows the effect of the marketing programs in the spring and fall. This change in performance was caused by systematic changes in the process (i.e., the marketing initiatives) and not just random variation.

Irrespective of the shape of the distribution, a good way to arrive at the correct conclusion regarding process stability is by looking at a control chart of the behavior of the individual observations from the process, or for highly skewed distributions, by using the F* test^{6} [Cruthis, 1993] or the Dixon and Massey z-test^{7} where z ~ N(0, 1) and is given by:

**References**

1. Eckes, G. *The Six Sigma Revolution.* New York: John Wiley & Sons, 2001, pg. 97.

2. Eckes, G. *Six Sigma for Everyone.* New York: John Wiley & Sons, 2003, pp. 72, 73.

3. Stigler, S. M. *The History of Statistics*. Cambridge, MA: The Belknap Press of Harvard University Press, 1986.

4. Wheeler, D. J. personal communications, 2016.

5. Shewhart, W. A. *Economic Control of Quality of Manufactured Product.* New York: D. Van Nostrand, 1931. (Republished in 1980 by the American Society for Quality Control, Milwaukee, WI.)

6. Cruthis, E. N. and S. E. Rigdon. “Comparing Two Estimates of Variance to Determine the Stability of a Process,” *Quality Engineering,* vol. 5, no. 1., 1993.

7. Dixon, W. J. and F. J. Massey. *Introduction to Statistical Analysis*. New York: McGraw-Hill, 1969.

## Comments

## Charts indicating stability

Interesting question; I believe that if both charts are free of signals, that does by definition imply a state of statistical control. It's highly likely in that case that the system (at least for that time period) was acted upon only by common cause factors. I have always used those criteria, tests and decisions as my operational definition of "a state of statistical control" or "stability."

## Still not sure...

I'm still not sure what you're saying here, John. Does not an in-control R or S chart imply the presence of only common cause variation?

I agree that to look at capability you have to examine the distribution of individuals, not averages, but I believe that is a different question.

## s-chart and stability

Rip,

A stable s-chart implies that the factors that control the variance are only experiencing common cause varuation. However the system itself could still be unstable if the factors that control location were experiencing special cause variation i.e., the mean was unstable.

An interesting case is the x-chart where all the information about the process is in the chart (i.e., it is a sufficient statistic). Then if the x-chart is stable, does that imply the both the mean and variance are experienceing only common cause variation?

Regards,

John

## Correction

Sorry...I replied in the wrong place, and now I see I answered the wrong question. If the x-chart is stable, but the dispersion chart contains signals, that does not imply stabiity of the mean or the variance. The average dispersion is the basis for the limits in both charts; an out-of-coontrol dispersion chart implies that the limits for the averages chart are inflated, and could thus be masking signals in that chart.

## Rip, Sorry, I had a typo in

Rip, Sorry, I had a typo in the sentence. "To determine the correct status of the process, we must look at the control chart of the individual observations, not the DISTRIBUTION of subgroup means." That is distribution shape does not imply common cause. Also, your statement about in-control R or S charts is correct.

## Typo fixed in text

## Steve and Rip, The Central

Steve and Rip,

The Central Limit Theorem tells us that the subgroup means will be approximately normally distributed, but just because this distribution is normal does not imply common cause variation of the system. The same is true for individual observations.

## Individual Charts

Thanks for the article. I do not agree with the statement: "To determine the correct status of the process, we must look at the control chart of the individual observations, not the subgroup means"

I don't see why it follows that just because the means from a STABLE process follow a normal distribution, why that would lead you to disqualify the use of an averages chart for assessing the process stability. Certainly means from an unstable process do not necessarily follow a bell. And if we are sampling correctly (to capture common cause variation only, the xbar chart should certainly detect the instability.

A fundamental issue with individual charts is that they do NOT detect small process changes quickly. Conversely. charts of averages can detect small process changes by determining the appropriate sample size. Many of my clients need to detect much smaller process changes than can be detected quickly/reliability with an "I chart", so Xbar charts are much more useful. Where sampling is costly, a CUSUM chart on the individuals may be used.

## Great point

I used to teach ASQ Black Belt exam prep classes using the Black Belt Primers from QCI. They contained an example that stated that one could determine stability by looking at a histogram. I don't know whether they have corrected that or not.

I think I agree with Steve, but I'd have to know what you meant by "status" of the process. I don't know of any practical utility in running an XmR chart on data for an XBarR chart, unless you have a rule seven violation. If you've subgrouped well, and the XbarR chart shows a stable process, then you have a stable process. If you want to know the shape of the underlying data, then a discrete plot or a histogram will tell you that (but it still won't say anything about stability).