PROMISE: Our kitties will never sit on top of content. Please turn off your ad blocker for our site.
puuuuuuurrrrrrrrrrrr
Tom Pyzdek
Published: Tuesday, March 29, 2011 - 05:30 Story update 3/29/2011: We corrected an error in the next to last sentence. "p < 0.05" was changed to "p > 0.05." One of the exercises I assign to students in my training involves creating two histograms from normally distributed random numbers. The results often look similar to those shown in figure 1. When I ask students to comment on their histograms, I usually get comments about the averages, spread, and other statistical properties. However, that misses the point I’m trying to teach. When we do Six Sigma, we usually spend a lot of time mining historical data from databases. Sometimes the sample sizes are large, and sometimes they can be quite small. In fact, even large sample sizes can become small when we slice-and-dice them, drilling down with various categories and subcategories in search of critical-to-quality data. Statistical software will often automatically fit a normal curve to histograms created from these data. It’s often tempting to use the fitted curves to make an eyeball judgment about the normality of the data. Sometimes this is a good idea, and sometimes it isn’t. If the sample sizes are small, then the curve may not appear to fit the data very well simply because of small sample variation. Witness the top histogram in figure 1 for an example of a curve fitted to a histogram from a sample size of n = 20. The histogram looks like a poor fit, but the p-value of a normality test tells us that the fit is pretty good anyway. So we’re probably safe assuming normality and acting accordingly. Fig. 1: Large and small samples of normally distributed data The lower curve is fitted to a sample of n = 500 data values. It appears to be a much better fit, and the p-value will back this conclusion. But what if the eyeballed curve fit and the p-value disagree? Sometimes the fit of the curve is “close enough,” but the p-value will tell you that the fit is awful. Take a look at figure 2. The histogram suggests that the normal curve fits the data pretty well. There are many practical situations where you could use the normal distribution to make estimates, and your estimates would be just fine. These are data on the time it takes to complete technical support calls. If you assume normality and you estimate costs or make a decision about process acceptability, your decisions will be essentially correct. Fig. 2: Decent fit but lousy p-value However, the probability plot and Anderson-Darling goodness-of-fit statistic clearly show that the data are not normal and that the lack of fit is particularly poor in the tails (p < 0.005). A closer examination shows that even in the tail areas the discrepancies are fractions of a percent. For example, the normal distribution estimates that 99.9 percent of all calls will take less than 35 minutes to complete, while the data show about 99.5 percent. Chances are these differences are of little or no practical importance. The point is that in the business world, we often need to make decisions and then get on to other, more urgent matters. The normal distribution is a handy device for getting quick estimates that are useful for such decisions. If your sample size is relatively large (say 200 or more), then you can go with the normality assumption if the fitted curve looks reasonably good. On the other hand, if you only have a small amount of data, you can still use the normality assumption if the histogram fit looks lousy, providing the p-value of the goodness-of-fit statistic says the normal curve is OK, i.e., if p > 0.05. The normality assumption is so useful that it's worth using as a default, even if you bend the rules a bit. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Thomas Pyzdek’s career in business process improvement spans more than 50 years. He is the author more than 50 copyrighted works including The Six Sigma Handbook (McGraw-Hill, 2003). Through the Pyzdek Institute, he provides online certification and training in Six Sigma and Lean.Histograms: When to Use Your Eyeballs, When Not
Looks can be deceiving
Click for larger image
Click for larger image
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Tom Pyzdek
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Comments
Use minitab normal prob graph with ci lines
Hi Tom I'm a MBB trying to build Transactional GB training material but need to use minitab. And for a very long time, i've used the normal prob graph to better show normal, non-normal and outliers. To this point, I can ask real simple questions Like are all the dots inside the blue lines? CI Does the line in the middle look like it explains all the dots? You must look at all dots out side the blue lines,..... Next, see this picture what it looks like and see over here the p & AD numbers. baby step I have found this is much easier for the students get Histograms have less formal visual rules Also, make the look and break up sub groups and then if they are differnt see if there is any value? maybe maybe not. I have a few screen shots of an example if you have an interest in a ppt show.
I've had issues with people using histograms either out of novice or others with an agenda. So, I think the normal graph is much more simple but more inforamation rich.
Thanks Frank