Inside Quality Insider

Donald J. Wheeler  |  07/30/2012

Donald J. Wheeler’s picture

Bio

What Is Leptokurtophobia?

And why does it matter?

Three years ago this month Quality Digest Daily published my column, “Do You Have Leptokurtophobia?” Based on the reaction to that column, it contained a message that was needed. In this column I would like to explain the symptoms of leptokurtophobia and the cure for this pandemic affliction.

Leptokurtosis is a Greek word that literally means “thin mound.” It was used to describe those probability models that have a central mound that is narrower than that of a normal distribution. In reality, due to the mathematics involved, a leptokurtic probability model is one that has heavier tails than the normal distribution. By a wide margin, most leptokurtic distributions are also skewed, and most skewed distributions will be leptokurtic.

The fear of leptokurtosis can be traced back to the surge in training in statistical process control (SPC) in the 1980s. Before this surge only two universities in the United States were teaching SPC, and only a handful of instructors had any experience with SPC. As a result of the surge, of necessity, many of the SPC instructors of the 1980s were neophytes, and many things that were taught at that time can only be classified as superstitious nonsense. One of these erroneous ideas was that you have to have “normally distributed data” before you can put your data on a process behavior chart (also known as a control chart). Over the years this simple but incorrect idea has grown and mutated into a prohibition on doing any statistical analysis without first testing the data for normality or defining a reference probability model for the data.

Therefore, you may have leptokurtophobia if you have an irrational fear of using non-normal data in your analysis. Symptoms include asking if your data are normally distributed, transforming your data to make them more “mound-shaped,” or fitting a probability model to your data as the first step in your analysis. This phobia was originally held in check by the complexity of the remedies, such as performing a nonlinear transformation or computing a lack-of-fit statistic. However, due to the availability of software that will perform these complex operations, today we find leptokurtophobia to be truly pandemic, with outbreaks occurring around the world. People are fitting probability models and transforming data with a few keystrokes, and as a result they are unknowingly suffering undesirable side-effects. Insidiously, although these side-effects have few symptoms, they tend to completely undermine your analysis and your predictions.

Let’s begin with the problem of fitting a probability model to your data. Figure 1 shows a histogram of the number of major hurricanes per year in the North Atlantic for 1940 through 2007. These 68 counts have an average of 2.59. Using this value as the mean value for a Poisson distribution, a lack-of-fit test will fail to find any detectable lack of fit. Therefore, we might well conclude that a Poisson probability model with a mean of 2.59 is a reasonable model to use. From this we might then characterize the likelihood of various numbers of major hurricanes in a given year. Specifically, the probability of getting seven or more major hurricanes in a single year is found to be 0.017. Thus, in 68 years we should expect to find about one year with seven or more major hurricanes.

https://lh3.googleusercontent.com/-1fQz5EOgtaRyzH_JQzrJnsUg4lSwEVVctLAGgEckioyP52EMMLDgOLU7zeHC8DLAboIus5N-TUFrWcAW936pN1kF1caIK8foN5YeOZO4MUPgqvlqic
Figure 1: North Atlantic major hurricanes

However, NOAA researchers think that these data represent two different weather patterns. They call the change between these patterns the “multi-decadal tropical oscillation.” They break this time period of 1940 to 2007 into four segments. In the time period used here, the era of lower activity includes 1940 to 1947 and 1970 to 1994. The era of higher activity includes 1948 to 1969 and 1995 to 2007. The histograms for these two eras are seen in figure 2.

https://lh5.googleusercontent.com/Z38mtoycwxF-plX_tAJqdgWum9eR6b51rCEM0TLvY1onl5HSyrIaXx3OnUnhJ1SGZALaKtmu500QxXmJkorm8LRFsJNAwUnRot4bjqqhhL2SLhvAYjw
Figure 2: North Atlantic major hurricanes

During the era of low activity the average number of major hurricanes per year was 1.58. During the era of high activity this average doubled to 3.54 per year. So, which years would you say are characterized by the average of 2.59 major hurricanes per year? Clearly, this average does not apply to the era of low activity, and neither does it characterize the era of high activity. While your model based on figure 1 predicts one year with seven or more major hurricanes, the data show three years with seven or eight major hurricanes.

Whenever you fit a model to your data you are assuming that those data are homogeneous. If they are not homogeneous, all of your statistics, all of your models, and all of your predictions are going to be wrong.

Well, if fitting a probability model is not the answer, what about transforming the data?

When you transform the data you are reshaping it to fit your preconceived notions. This is always a dangerous thing to do. Figure 3 shows the histogram of 141 hot metal transit times. These values are the times (to the nearest 5 minutes) between the call alerting the steel furnace that a load of hot metal was on the way and the actual arrival time of that load at the steel furnace ladle house. The average delivery time is 60 minutes. The standard deviation is 30 minutes. The skewness is 1.70, and the kurtosis is 6.0. (Anything above 3.0 is leptokurtic.) As they stand they form a very skewed and heavy tailed histogram.

https://lh3.googleusercontent.com/Of2shLrXDFlEVEdQ83LvKZylJUjXzekKaIenNh4cbMfKfRDUotAqxYpb77eBcnV3_wnA_AVMVKZo9UinwyG2tCT4XoY61q-2wD3S1hsfuVvi7JBzinc
Figure 3: Hot metal transit times

Some software packages would suggest a logarithmic transformation for these data. Taking the natural logarithm of each of these transit times results in the histogram in figure 4. There the horizontal scales show both the original and the transformed values. The logarithmic transformation has spaced out the values on the left and has crowded the values on the right together so that the overall shape of the histogram is much more “mound shaped” than before. But is this an improvement? Now the “distance” from 20 minutes to 25 minutes is about the same size as the “distance” from 140 minutes to 180 minutes. How are you going to explain this to your boss? While the original histogram clearly showed a histogram with two and possibly three humps, the transformed histogram blurs this important feature of the data.

https://lh5.googleusercontent.com/EniYbVcVhWKgTua9UbJnYwNqmuYIviGYAG535W62gUhJoogZfncy6-IOvxR9o4Jd5kgSg83QISgZOxcYtUqCbhKk9Y2FoEeLqReZPlktkBKvsYPuQNw
Figure 4: Logarithms of the hot metal transit times

By itself, this distortion of the data should be sufficient to make you want to avoid the practice of transforming the data to achieve statistical properties. However, the impact of nonlinear transformations is not confined to the histograms.

One of the major reasons for analyzing data is to detect signals buried within those data. And when we go looking for signals, the premier technique will be the process behavior chart. Figure 5 shows the X chart for the original hot metal transit times. Eleven of the 141 transit times are above the upper limit, confirming the impression given by the histogram that these data come from a mixture of at least two different processes. Even after the steel furnace gets the phone call, they still do not have any idea about when the hot metal will arrive in the ladle house.

https://lh4.googleusercontent.com/VK3oKcvd2prFd-QJd5p2hf8booqRj_v-2FfW4I2MABhW3BA7pxHAdAIVS1tFjU_iGHhU_lsltTA3xAn0iUiWctowfoWXQGLoll3RTbLzndZhSgJT3jY
Figure 5: X chart for the hot metal transit times

However, if we use a nonlinear transform on the data prior to placing them on a process behavior chart, we end up with the X chart shown in figure 6. There we find no points outside the limits!

https://lh6.googleusercontent.com/lfrL9fBzFtG3Jo0aaLqvJxMt0IyqepKGN0h6RuGNM6DUHFJZXulfNKUIDS3pSapu0QV474QxZadvLL0v5XaxDPjB8wPz3tXZfUo4RK79A-XEJ1Lhh1o
Figure 6: X chart for the logarithms of the hot metal transit times

Clearly the logarithmic transformation has obliterated the signals. What good is a transformation that changes the message contained within the data? The transformation of the data to achieve statistical properties is simply a complex way of distorting both the data and the truth.

The results shown here are typical of what happens with nonlinear transformations of the original data. These transformations hide the signals contained within the data simply because they are based on computations that presume there are no signals within in the data.

(For more on the hurricane data, see my columns for February 2009, “Probability Models Don’t Generate Your Data,” and March 2009, “No Data Have Meaning Without Context.” For more on the problems of transforming the data, see my column for August 2009 mentioned above, “Do You Have Leptokurtophobia?” For an explanation of how three-sigma limits work with nonnormal data, see my column for November 2010, “Are You Sure We Don’t Need Normally Distributed Data?”)

So, what should be the first question of data analysis? Should you try to accommodate to the shape of the histogram by fitting a probability model? Should you seek to reshape the histogram by using some nonlinear transformation? Or should you check the data for evidence of a lack of homogeneity? Since a lack of homogeneity will undermine the fitting of a probability model, and since it will invalidate the rationale for the transformation of the data, it is imperative that we begin by checking for possible nonhomogeneity.

So how can we determine when a data set is homogeneous? That is what the process behavior chart was created to do! This is why it is essential to begin any analysis by organizing your data in a logical manner and placing them on a process behavior chart. If you do not have the requisite homogeneity, anything else you might do will be flawed.

When you fit a probability model to your data you are making a strong assumption that the data are homogeneous. If they are not homogeneous, then your model, your analysis, and your predictions will all be wrong. When you transform the data to achieve statistical properties you deceive both yourself and everyone else who is not sophisticated enough to catch you in your deception. When you check your data for normality prior to placing them on a process behavior chart you are practicing statistical voodoo.

Whenever the teachers lack understanding, superstitious nonsense is inevitable. Until you learn to separate myth from fact you will be fair game for those who were taught the nonsense. And you may end up with leptokurtophobia without even knowing it.

Discuss

About The Author

Donald J. Wheeler’s picture

Donald J. Wheeler

Dr. Donald J. Wheeler is a Fellow of both the American Statistical Association and the American Society for Quality, and is the recipient of the 2010 Deming Medal. As the author of 25 books and hundreds of articles, he is one of the leading authorities on statistical process control and applied data analysis. Contact him at www.spcpress.com.  

Comments

Great Article Don

In the realm of using process behavior charts and assessing process capability, I stay clear of transforming data. Also, trying to find a best fit distribution model to a histogram that truly reflects more than one population present would be ‘silly’. That would be like sticking my head in a freezer and my feet in an oven at the same time and saying on average I feel great.

Over the years I’ve conducted many types of hypothesis tests ranging from simple to complex for designed experiments. In these cases, I’ve had good reason to ‘test for normality’ and at times to transform data. Every hypothesis test has a list of ‘underlying conditions’ which should be met in order to validate the analysis outcome. One of the common ‘requirements’ for using Parametric tests is that the populations being assessed can be modeled by the normal distribution; hence, the need to ‘test for normality’. If normality is ‘rejected’, then it may be appropriate to use an ‘equivalent’ Non-Parametric test depending upon several things; magnitude in departure from normality, sample size, and … . Non-Parametric tests tend to be ‘weaker’ than Parametric tests in detecting ‘significant differences’. In that case, it may be appropriate to use data transformation for all levels involved to validate the use of Parametric tests.

In summary, using the ‘right’ tool for the ‘right’ application at the ‘right’ time is key. Thanks again for the great article.

Regards, Bruce

My favorite piece on Kurtosis

For those of  you unfamiliar with Don's books, you should try to get a copy of Understanding Statistical Process Control or USPC, pp 326-327. He shows two distributions, each of which has a skewness of 0.0000 and kurtosis of 2.0000. One looks like a house, the other like an elephant...His conclusion? "...we may properly conclude that the 'shape parameters' of skewness and kurtosis cannot even discriminate between an elephant and a house!"

And people think statisticians aren't funny! 

Leptokurtosis

Thanks for your comment. The word kurtosis was chosen about 100 years ago by an Englishman. He was trying to find a descriptive term for shape to go with skewness. He did not choose well. Kurtosis does not exactly describe either the mound or the tails of a distribution. One article that addressed this issue found that it best described the "absence of shoulders" for a distribution. So if you feel the word does not quite fit the job it has been given, you are in good company.

The cause

Don's articles are always a great read.  However I suggest that the fear of leptokurtosis can be traced back to the surge of the Six Sigma scam in the 80's.  As with all scams, the driving force was money.  Six Sigma consultants and institutes made billions, while users suffered as a result of being conned by the nonsense. 

Despite Shewhart almost 100 years ago, having demonstrated that there was nothing to fear from non normal distributions, the masses swallowed the words of their heroes Smith and Harry.  Smith claimed the way to improve quality was to broaden the specification limits and Harry unbelievably "proved", based on stacks of discs, that every process drifts uncontrollably by +/- 1.5 sigma in the "long term" of 24 hours.  The vultures providing "quality" calculation software circled and fed on the ignorant masses' frenzy to click buttons to draw normal distributions over everything.  Millions of dollars were easy pickings, simply by conning gullible CEOs and Quality Managers that everything needed to be transformed to make it normal.

Greed has set quality backwards by 100 years.

re: the cause

I would agree that many of the six sigma trainers have perpetuated the use of transformations as they typically only teach what Dr. Deming called enumerative statistics.  All mathematics.  But I can't agree that they are the cause.  I have known too many statisticians whose passion is the math and if the distribution isn't Normal they can't deal with it unless they transfom the data.  This is typically becuase most of the commonly taught/used tests of significance are based on an assumption of Normality.  None of these statisticians have had more than a cursory 'knowledge' of six sigma and amost all were trained prior to Dr. Harry...it started a long time ago...

The Cause

Yes, in 1982, page 132, "Out of the Crisis", Deming draws attention to the "deceptive and misleading" teaching of statistics as applied to production.  While at that time, Leptokurtophobia may have been a disease, Six Sigma was to turn it into a global pandemic.

Transformations are often appropriate

If I am dealing with a process that is in control, but whose data is nonnormal, my inclination is always to fit the underlying distribution and then set appropriate control limits. Then accurate process performance indices can be computed; StatGraphics and Minitab can in fact do this for nonnormal distributions. AIAG's SPC manual sanctions this approach for nonnormal systems.

Figures 1 and 2 represent two "processes" and therefore a bimodal distribution, for which no SPC scheme is appropriate. Figure 3 looks like it has numerous outliers in its upper tail, and therefore represents a system with assignable causes. If so, even if the underlying distribution were normal, it would not be possible to do SPC based on this process history. This underscores the fact that, as you said, the first step is to check for nonhomogeneity.

It might be interesting to model the situation in Figure 3 with a beta distribution (model for activity completion times in project management) as well as a lognormal distribution, but without the numerous outliers--if they can all be identified.

Levinson on transformations

I think the point is not that transformation is wrong per se, but that transformations are not necessary or even useful for industrial improvement studies.  Can data transformations help us understand something about the data before us?  Yes.  But what do they help us understand?  That is the real point of this discussion.  In Statistician lingo it’s about setting the correct frame.  A transformation of Non-Normal data can help us to calculate a Cpk/Ppk index.  But the real question is how useful is the index itself?  what can it tell us about the process that a simple time series plot can’t?  (answer:  nothing really, it just gives a precise number that quantifies one aspect of the process)  Process Capability indices yield no informative understanding to drive improvements.)

 

Transformations help us get mathematically precise answers to math questions when the data before us are Non Normal and/or non-homogenous; they do not help us get actionable answers to physics questions.  The industrial world is trying to solve physics problems not math problems.  It comes down to two things: 

­       - Real processes can be stable and capable even though they are non-homogenous and not Normally distributed. 

­       - Solving real industrial problems requires analytic studies not enumerative studies. 

 

Real world processes are often not well behaved like the processes used to teach mathematical statistics. A homogenous Normal distribution may be the “ideal” but it is not required to make quality product in a reliable and profitable manner.  Unfortunately it is required for many (distributional) statistical tests of significance.  Therefore we are taught that any distribution that is not Normal ‘must’ be transformed to use these common statistical tests.  This approach has transformed over the years into an incorrect belief that any process that is not homogenous and Normally distributed must be ‘bad’.  

 

Fortunately, analytic studies rely on replication and probability instead of distributional statistics. 

 

So the question isn’t whether or not one can transform the data or if there are situations where transforming the data might be helpful.  They are helpful in ENUMERATIVE studies and in limited ways in some complex modeling to assist in calculations.  The point of the article is to say that transformations are not helpful and often misleading in ANALYTIC studies.  Not transforming the data helps us see what is really going on.  I've solved hundreds of complex real world problems and never performed a transformation - they are not necessarry. 

 

A time series or other multi-variate plot is almost always the most valuable analysis tool the analytic study has…if you plot the hurricane data in time series you can clearly see the patterns – there are no ‘outliers’. There is the non homogenous effect of factors that cause hurricanes changing…transforming this data hides the causal system and suppresses knowledge.  This data can be found at:  http://www.aoml.noaa.gov/hrd/tcfaq/E11.html

Levinson on Transformations

William, I do not know who taught you SPC, but you should ask for your money back since they did a particularly bad job. Virtually everything you say contains some sort of error. A process behavior chart is the universal technique for examining a data set for homogeneity. It does not impose any requirements upon the data. We do not have to fit a model to our data before we "do SPC." We do not need to verify the normality of the data prior to using a process behavior chart. We do not need to remove the outliers prior to placing the data on a chart. And as for fitting a lognormal, the uncertainty in the statistics means that you can fit a very wide spectrum of lognormals to these data, making a unique distribution impossible to justify.

Question on Wording

Don:

The article states "Leptokurtosis ... means 'thin mound' ... used to describe those probability models that have a central mound that is narrower than that of a normal distribution."If the "thin mound" refers to the probablility graph, would it not be more precise to say it refers to "models that have a central mound that is wider or significantly less steep than that of a normal distribution"?  If I'm wrong, please explain.  Thx,

- HF

Leptokurtophobia and Assumptions

Re: Assumptions & data shaping - your article reminded me of a favorite quote from American geologist, T. C. Chamberlin:

“The fascinating impressiveness of rigorous mathematical analysis, with its atmosphere of precision and elegance, should not blind us to the defects of the premises that condition the whole process. There is, perhaps, no beguilement more insidious and dangerous than an elaborate and elegant mathematical process built upon unfortified premises”. (1899)

 

Such an elegant way to pop a balloon.

-lee c