What They Forgot to Tell You About the Normal Distribution

How the normal distribution has maximum uncertainty

There are two key aspects of the normal distribution that make it the central probability model in statistics. However, students seldom hear about these important aspects, and as a result they end up making many unnecessary mistakes. Read on to learn what it means when we say the normal distribution has maximum uncertainty.

The normal distribution has long been known to be the distribution with maximum entropy, but like many things in statistics, this mathematical fact does not translate into understandable properties. The concept of entropy is a measure of uncertainty for a probability model that comes from information theory (those who are interested can find the definition of continuous entropy on Wikipedia). Therefore, maximum entropy is equivalent to maximum uncertainty. But just what does this mean?

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Postscript

This article used a split of 91 percent and 9 percent between the central portion and the outer tails of a distribution. Subsequent research shows that equally strong arguments can be made for other splits ranging from 88/12 to 92/8. Thus, if we define the cut-off for the outer tails anywhere between 1.55 sigma and 1.75 sigma we can still say that the outer tails of the normal distribution are as heavy as, or heavier than, the outer tails of any unimodal probability model.

Re the postscript

Your statement in the postscript is too strong; exceptions are easy to find. You have a kind of caveat into the original article. While I think the original caveat is too weak (there's an infinite number of exceptions, so to make some claim about relative preponderance of distributions that meet or fail the claim we'd need some probability-distribution over the space of distributions considered); that aside, it's certainly good to have noted that it's not always true, but you can't drop it in the postscript. [It might be instructive to show some of the exceptions. Among continuous symmetric unimodal distributions, the largest proportion outside k standard deviations from the mean is 4/9(k^2). For k=1.70 that's about 15.4%; it's interesting that the normal does get up as high as it does for k in that region.]

Great article!

At the risk of sounding like a teen-ager--O M G!! Fabulous article. It has really gotten me thinking about all of the stuff I learned in statistics and raised a lot of questions about the (standard) uses of other distributions. For example, should we ever use a Student's t test? Or a chi-square test? I think I know what you would say about some of them and it is pretty much a repeat of what you have said here regarding the use of process performance charts, but I would really love to see more discussions of the implications of this concept. In fact, I am now wondering if, looking at the entire field of statistics, including analysis of designed experiments-which has become such a large part of the Six Sigma methodology-we aren't making the wrong assumptions more often than not. Perhaps this is too esoteric a discussion for the Quality Digest audience, but definitely of interest to statistical practitioners everywhere. Am I completely ovethinking this, or could the implications of this totally revamp the application of statical methods?

T-tests, etc.

We need to make a distinction between fitting a distribution to the ORIGINAL DATA and using the known and established distributions that work with STATISTICS obtained from those data. Student's t-test is a very robust test that works without having to first check that the data are normally distributed. The F-ratios of ANOVA are robust when used as a test for means. The chi-square works with sums of squares of almost anything. So, the traditional techniques are built on sound theory, and they work well in practice. It is the sophistry of trying to identify a probability model for the original data that this article addresses. Hope this will help.

T distribution

Now, I'm somewhat notorious for missing horribly obvious things, but I thought that the t distribution basically started at 1 df with egregiously heavy tails and the more degrees of freedom you add, the closer it approximates the normal distribution. How is it that your 6 df t distribution has smaller tails than your normal distribution? When I run the calculations for a 6 df t distribution, I get a tail area of 14.9% at plus/minus 1.656 standard deviations. Am I missing something again?

Heavy tailed t-dists.

You might want to reread the paragraph about the t-distribution again. The units of a t-distribution are SD(X), not SD(T). For 6 d.f. the SD(T) = SQRT(1.5). Thus, 1.656 SD(T) = 2.028. This is the source of your confusion. (Also, in figure 5 I started the curve with 3 d.f. because that is what you need to have a well-defined standard deviation.) Hope this will help.

Overwhelming evidence

I always find Don's papers a fantastic read. His papers and excellent books should provide overwhelming evidence that Shewhart's approach was right. Yet teaching to the contrary continues in Six Sigma courses, in a fashion that Deming described as "seeing every day the devastating effects of incompetent teaching and faulty application" (p131 Out of the Crisis). Despite the good statistics from Don, Deming and Shewhart, the Asch Effect prevails, where almost the entire industry follows the ridiculous Six Sigma path, often even with an awareness of its fallacies. (Solomon Asch and Conformity Studies: http://psychology.about.com/od/classicpsychologystudies/p/conformity.htm )

Tail heaviness

Hi Don, I don't agree, and I don't believe it is generally agreeable, that you can define "tail heaviness" as "probability outside a central range." Tail heaviness is commonly thought of a the potential to generate extreme observations. A counteraxample where the probability concentration outside the central range goes to zero, yet the distribution is heavier- and heavier-tailed, in the sense of having the potential to produce extreme outliers, is given here: https://math.stackexchange.com/questions/167656/fat-tail-large-kurtosis…