## Do You Have Leptokurtophobia?

### The abnormal need for normal distributions

Published: Wednesday, August 5, 2009 - 05:00

The symptoms of leptokurtophobia are (1) routinely asking if your data are normally distributed and (2) transforming your data to make them appear to be less leptokurtic and more “mound shaped.” If you have exhibited either of these symptoms then you need to read this article.

The origins of leptokurtophobia go back to the surge in statistical process control (SPC) training in the 1980s. Before this surge only two universities in the United States were teaching SPC, and only a handful of instructors had any experience with SPC. As a result many of the SPC instructors of the 1980s were, of necessity, neophytes, and many things that were taught at that time can only be classified as superstitious nonsense. One of these erroneous ideas was that you must have normally distributed data before you can put your data on a process behavior chart (also known as a control chart).

## Leptokurtophobia |

When he created the process behavior chart, Shewhart was looking for a way to separate the routine variation from the exceptional variation. Since the exceptional variation, by definition, dominates the routine variation, Shewhart figured that the easiest way to tell the difference would be to filter out the bulk of the routine variation. After looking at several different ways of doing this he found that three-sigma limits will cover all, or almost all, of the routine variation for virtually all types of data.

To show how three-sigma limits do this, figure 1 contains six different probability models for routine variation. These models range from the uniform distribution to the exponential distribution. (The last three models are leptokurtic.) Each of these models is standardized so that they all have a mean of zero and a standard deviation parameter of 1.00. Figure 1 shows the three-sigma limits and that proportion of the area under each curve that falls within those three-sigma limits.

There are four lessons that can be learned from figure 1.

**• The first lesson of figure 1 is that three-sigma limits will filter out virtually all of the routine variation regardless of the shape of the histogram.**

These six models are radically different, yet in spite of these differences, three-sigma limits cover 98 percent to 100 percent of the area under each curve.

**• The second lesson is that any data point that falls outside the three-sigma limits is a potential signal of a process change.**

Since it will be a rare event for routine variation to take you outside the three-sigma limits, it is more likely that any point that falls outside these limits is a signal of a process change.

**• The third lesson is that symmetric, three-sigma limits work with skewed data.**

Four of the six models shown are skewed. As we scan down the figure we see that no matter how skewed the model, no matter how heavy the tail becomes, the three-sigma limits are stretched at essentially the same rate as the tail. This means that the length of the elongated tail will effectively determine the three-sigma distance in each case, and that three-sigma limits will cover the bulk of the elongated tail no matter how skewed the data become.

“But that certainly makes the other limit look silly.” Yes, it does. Here we need to pause and think about those situations where we have skewed data. In most cases skewed data occur when the data pile up against a barrier or boundary condition. Whenever a boundary value falls within the computed limits, the boundary takes precedence over the computed limit, and we end up with a one-sided chart. When this happens the remaining limit covers the long tail and allows us to separate the routine variation from potential signals of deviation away from the boundary. Which is how symmetric, three-sigma limits can work with skewed data.

**• The fourth lesson is that any uncertainty in where we draw the three-sigma lines will not greatly affect the coverage of the limits.**

All of the curves are so flat by the time they reach the neighborhood of the three-sigma limits that any errors we may make when we estimate the limits will have, at most, a minimal effect upon how the chart works.

The six probability models in figure 1 effectively summarize what was found when this author looked at more than 1,100 different probability models from seven commonly used families of models. These 1,143 models effectively covered all of the shape characterization plane, with 916 mound-shaped models, 182 J-shaped models, and 45 U-shaped models. Eleven hundred and twelve of these models (or 97.3%) had better than 97.5 percent of their area covered by symmetric three-sigma limits.

Thus, three-sigma limits work by brute force. They are sufficiently general to work with all types and shapes of histograms. They work with skewed data, and they work even when the limits are based on few data.

To illustrate this point, I used the exponential probability model from figure 1 to generate the 100 values shown in rows in the table in figure 2. The histogram for these values is shown in figure 3. Since such values should, by definition, display only routine variation, we would hope to find almost all of the observations within the limits in figure 4. We do. Hence, the process behavior chart will work as advertised even with skewed data.

Therefore, we do not have to pre-qualify our data before we place them on a process behavior chart. We do not need to check the data for normality, nor do we need to define a reference distribution prior to computing limits. Anyone who tells you anything to the contrary is simply trying to complicate your life unnecessarily.

### Transformations of the data

“But the software suggests transforming the data!” Such advice is simply another piece of confusion. The fallacy of transforming the data is as follows.

The first principle for understanding data is that no data have meaning apart from their context. Analysis begins with context, is driven by context, and ends with the results being interpreted in the context of the original data. This principle requires that there must always be a link between what you do with the data and the original context for the data. Any transformation of the data risks breaking this linkage.

If a transformation makes sense both in terms of the original data and the objectives of the analysis, then it will be okay to use that transformation. Transformations of this type might be things like the use of daily or weekly averages in place of hourly values, or the use of proportions or rates in place of counts to take into account the differing areas of opportunity in different time periods.

Only you as the user can determine when a transformation will make sense in the context of the data. (The software cannot do this because it will never know the context.) Moreover, since these sensible transformations will tend to be fairly simple in nature, they do not tend to distort the data.

A second class of transformations would be those that rescale the data in order to achieve certain statistical properties. (These are the only type of transformations that any software can suggest.) Here the objective is usually to make the data appear to be more “normally distributed” in order to have an “estimate of dispersion that is independent of the estimate of location.” Unfortunately, these transformations will tend to be very complex and nonlinear in nature, involving exponential, inverse exponential, or logarithmic functions. (And just what does the logarithm of the percentage of on-time shipments represent?) These nonlinear transformations will distort the data in two ways: at one end of the histogram, values that were originally far apart will now be close together; at the other end of the histogram, values that were originally close together will now be far apart.

To illustrate the effect of transformations to achieve statistical properties we will use the hot metal transit times shown in rows in the table in figure 5. These values are the times (to the nearest 5 minutes) between the phone call alerting the steel furnace that a load of hot metal was on the way and the actual arrival of that load at the steel furnace ladle house.

Given the skewed nature of the data in figure 6 some programs would suggest using a logarithmic transformation. Taking the natural logarithm of each of these transit times' results in the histogram in figure 7. (The horizontal scales show both the original and transformed values.) Notice how the values on the left of figure 7 are spaced out while those on the right are crowded together. After the transformation the distance from 20 to 25 minutes is about the same size as the distance from 140 to 180 minutes. How could you begin to explain this to your boss?

By itself, this distortion of the data is sufficient to call into question the practice of transforming the data to achieve statistical properties. However, the impact of these non-linear transformations is not confined to the histograms

Figure 8 shows the X Chart for the original, untransformed data of the table in figure 5. Eleven of the 141 transit times are above the upper limit, confirming the impression given by the histogram that these data come from a mixture of at least two different processes. Even after the steel furnace gets the phone call, they still have no idea when the hot metal will arrive at the ladle house.

However, if we transform the data before we put them on a process behavior chart we end up with figure 9. There we find no points outside the limits!

Clearly the logarithmic transformation has obliterated the signals. What good is a transformation that changes the message contained within the data? The transformation of the data to achieve statistical properties is simply a complex way of distorting both the data and the truth.

The results shown here are typical of what happens with nonlinear transformations of the original data. These transformations hide the signals contained within the data simply because they are based upon computations that *presume there are no signals within the data*.

To see how the computations do this, we need to pause to consider the nature of the formulas for common descriptive statistics. For a descriptive measure of location we usually use the average, which is simply based upon the sum of the data. However, once we leave the average behind, the formulas become much more complex. For a descriptive measure of dispersion we commonly use the global standard deviation statistic, which is a function of the *squared deviations from the average*. For descriptive measures of shape we commonly use the skewness and kurtosis statistics which, respectively, depend upon the *third *and *fourth *powers of the deviations of the data from the average. When we aggregate the data together in this manner and use the second, third, and fourth powers of the distance between each observation and the average value, we are implicitly assuming that these seven computations make sense. Whether they be measures of dispersion, or measures of skewness, or even measures of kurtosis, *any high-order descriptive statistic that is computed globally is implicitly based upon a very strong assumption that the data are homogeneous*.

When the data are not homogeneous it is not the *shape *of the histogram that is wrong, but the computation and use of the descriptive statistics that is erroneous. We do not need to distort the histogram to make the transformed values more homogeneous, but we need to stop and question what the lack of homogeneity means in the context of the original observations.

So how can we determine when a data set is homogeneous? *That is the purpose of the process behavior chart! *Transforming the data to achieve statistical properties prior to placing them on a process behavior chart is an example of getting everything backwards. It assumes that we need to make the data more homogeneous prior to checking them for homogeneity. Any recommendation regarding the transformation of the data prior to placing them on a process behavior chart reveals a fundamental lack of understanding about the purpose of process behavior charts.

Shewhart’s approach, with its generic three-sigma limits computed empirically from the data, does not even require the specification of a probability model. In fact, on page 54 of *Statistical Method from the Viewpoint of Quality Control*, Shewhart wrote *“… we are not concerned with the functional form of the universe *[i.e., the probability model], *but merely with the assumption that a universe exists.” *[Italics in the original.]

When you transform the data to achieve statistical properties you deceive both yourself and everyone else who is not sophisticated enough to catch you in your deception. When you check your data for normality prior to placing them on a process behavior chart you are practicing statistical voodoo. Transforming the data prior to using them on a process behavior chart is not only bad advice, it is also an outright mistake.

Whenever the teachers lack understanding, superstitious nonsense is inevitable. Until you learn to separate myth from fact you will be fair game for those who were taught the nonsense. And you may end up with leptokurtophobia without even knowing it.

## Comments

## False Alarm rates are not comparable for skewed distribution

The idea that the false alarm rates are comparable for the various distributions shown in figure 1 is not correct. Anyone who has ever made an Individuals control chart with highly skewed data knows that the false alarm rate can be quite high. The reason is that the data in the figure have been standardized based on the overall standard deviation, while an Individuals control chart typically uses the moving range as an estimate of variation to determine control limits.

With highly skewed data such as that shown in the bottom two distributions, the data is more bunched up and the average moving range is smaller than that of the normal distribution - therefore, although both may have the same overall standard deviation, the control limits for the skewed distribution will be tighter and therefore we expect a larger false alarm rate. A quick simulation using 10,000 data points that are normal and 10,000 which are skewed - both standardized to a mean of 0 and overall standard deviation of 1 - shows a false alarm rate of .0021 (0.21%) for the normal data and .026 (2.6%) for the skewed data. I don't think most practitioners want a 12-fold increase in their false alarm rate, especially given the resources that frequently go into finding out the "special cause" of an out-of-control point.

While I agree transformations may be overused or frequently used in cases in which they are unnecessary, I have to also side with those that have a good transformation and choose to use it when establishing control using an Individuals control chart and skewed data.

## Absolutely superb article

I have been following Dr. Wheeler for years, since the early days of SPC Ink. I am forever in his debt for 'saving' me from learning the superstitions that are so prevalent in our academia regarding process behavior charts...I must have 20 SPC Ink articles in a ragged old blue binder (sitting in front of me now), in addition to at least three of his books, on this amazing subject.

His recent publications on six-sigma, in which he systematically reveals the fallacy of the gigantic leap of faith upon which the whole concept originated saved me once again from spending hard to come by money on a 'belt' - and instead focus on the higher ROI PPI program that he and Ed Zunich so carefully designed.

Many of you reading this latest article probably have no idea how markedly important and useful it really is.

Dr. Wheeler, thanks for this valuable gift - for those who understand it as such, anyway!

- forever, A Student

## Great Article - is there an error?

I see an error when I ran the numbers. I ran the 141 observations for the hot metal tranist times and get 59.9 for the average and putting these into the individuals chart in Minitab yields 3 points out of control, not 11 points.

But, Love the article! Great lesson here!

## Lepto...by any other name

Here's a related article from several years ago, with a different spelling for the same concept (and a more technical response): http://www.jmp.com/about/newsletters/jmpercable/pdf/15_summer_2004.pdf

## Leptokurtophobia

I, too, find a huge obsession with others (including colleagues) to check for normality first and foremost or look for transformations. If n is sufficiently large, CLT kicks in and it doesn't matter. If the application is SPC, then as Wheeler suggests, it generally doesn't matter unless n is 1 or 2 and the skew is quite significant. To me, what is most important is looking at the data to validate homogeneity (and as Wheeler suggests) - that will burn you more often than anything else. My approach is generally first to look at probability plots - not to validate normality but to look for tell-tale signs of outliers or several modes. After that, I create SPC and/or time series plots to validate homogeneity. It's been almost two decades since I really cared about normality. I will remember the name of the disease!