{domain:"www.qualitydigest.com",server:"169.47.211.87"} Skip to main content

User account menu
Main navigation
  • Topics
    • Customer Care
    • FDA Compliance
    • Healthcare
    • Innovation
    • Lean
    • Management
    • Metrology
    • Operations
    • Risk Management
    • Six Sigma
    • Standards
    • Statistics
    • Supply Chain
    • Sustainability
    • Training
  • Videos/Webinars
    • All videos
    • Product Demos
    • Webinars
  • Advertise
    • Advertise
    • Submit B2B Press Release
    • Write for us
  • Metrology Hub
  • Training
  • Subscribe
  • Log in
Mobile Menu
  • Home
  • Topics
    • 3D Metrology-CMSC
    • Customer Care
    • FDA Compliance
    • Healthcare
    • Innovation
    • Lean
    • Management
    • Metrology
    • Operations
    • Risk Management
    • Six Sigma
    • Standards
    • Statistics
    • Supply Chain
    • Sustainability
    • Training
  • Login / Subscribe
  • More...
    • All Features
    • All News
    • All Videos
    • Contact
    • Training

Histograms: When to Use Your Eyeballs, When Not

Looks can be deceiving

Tom Pyzdek
Tue, 03/29/2011 - 05:30
  • Comment
  • RSS

Social Sharing block

  • Print
Body

Story update 3/29/2011: We corrected an error in the next to last sentence. "p < 0.05" was changed to "p > 0.05."

One of the exercises I assign to students in my training involves creating two histograms from normally distributed random numbers. The results often look similar to those shown in figure 1. When I ask students to comment on their histograms, I usually get comments about the averages, spread, and other statistical properties. However, that misses the point I’m trying to teach.

ADVERTISEMENT

When we do Six Sigma, we usually spend a lot of time mining historical data from databases. Sometimes the sample sizes are large, and sometimes they can be quite small. In fact, even large sample sizes can become small when we slice-and-dice them, drilling down with various categories and subcategories in search of critical-to-quality data. Statistical software will often automatically fit a normal curve to histograms created from these data. It’s often tempting to use the fitted curves to make an eyeball judgment about the normality of the data. Sometimes this is a good idea, and sometimes it isn’t.

 …

Want to continue?
Log in or create a FREE account.
Enter your username or email address
Enter the password that accompanies your username.
By logging in you agree to receive communication from Quality Digest. Privacy Policy.
Create a FREE account
Forgot My Password

Comments

Submitted by jclark6s on Tue, 03/29/2011 - 12:40

Histograms

I hope you mean P>=0.05 in your conclusion, not P<0.05.

Excellant article  I have been doing the same for years.

  • Reply

Submitted by Dr Burns on Mon, 04/11/2011 - 21:33

In reply to Histograms by jclark6s

Histograms

More six sigma based nonsense.  Who gives a damn if the data is normally distributed or not ?  Control charts don't need normal data.  The purpose of the histogram is to gain insight into the process.

  • Reply

Submitted by William K. Gordon on Tue, 03/29/2011 - 12:48

P Value for Goodness of Fit

I might have missed the meaning of the following sentence in the last paragraph of your article: "On the other hand, if you only have a small amount of data, you can still use the normality assumption if the histogram fit looks lousy, providing the p-value of the goodness-of-fit statistic says the normal curve is OK, i.e., if p < 0.05." Did you intend to say that one could use the normality assumption if the p > 0.05?

"Anderson-Darling Normality Test: If the p-value is equal to or less than a specified alpha risk, there is evidence that the data does not folow a normal distribution" (Picar, p. 123).  When the p-value is greater than the alpha value (in this case 0.05) the analysis suggests normally distributed data.

Reference:

Picar, D. (ED.). (2002). Graphical analysis. The black belt memory jogger: A pocket guide for six sigma success. Salem, NH: Goal/QPC.

  • Reply

Submitted by Tom Pyzdek on Wed, 12/05/2018 - 10:43

In reply to P Value for Goodness of Fit by William K. Gordon

You are correct, William.

You are correct, William. The next to last sentence should read: 

 "On the other hand, if you only have a small amount of data, you can still use the normality assumption if the histogram fit looks lousy, providing the p-value of the goodness-of-fit statistic says the normal curve is okay, i.e., if p > 0.05."

   In other words, don't trust your eyeball judgment regarding the fitted curve if the sample size is small. I'll see if I can get QD to correct this typo.  -

Thomas Pyzdek

www.pyzdekinstitute.com

  • Reply

Submitted by Rip Stauffer on Wed, 03/30/2011 - 11:37

Another Consideration

Great point about ignoring the curve fit by the software...we live in the real world. Our data come in histograms, not PDF/CDF curves. We don't get an infinite amount of noise-free data. My own belief is that we spend entirely too much time (in the Six Sigma world) worrying about normality; testing for it, torturing perfectly good and representative data sets through transformation, and doing other things that are often (as Don Wheeler says) "victories of computation over common sense." 

It's also worthwhile mentioning that testing the data from any histogram for normality is futile until you have some reason to believe that the data are homogeneous, i.e., they come from one universe. When we are using data in Six Sigma, they usually comes from a process, with the intent to work on the system; that means we are usually conducting an analytic study. The best test for homogeneity, then, will be a control chart. Davis Balestracci illustrated this very clearly in "Data Sanity" several years ago. Don Wheeler spent several chapters in "The Six Sigma Practitioner's Guide to Data Analysis" on this issue. If you want to see a quick summary of Balestracci's work, I have one at my blog, http://woodsidequality.blogspot.com.

  • Reply

Submitted by Bob Doering on Thu, 03/31/2011 - 06:06

Use of p value in distribution decisions

When is comes to determining if data is normally distributed, I prefer to do a distrubtion analysis to find the best fit distribution, rather than just use the p value on a normality test.  One good explanation of the p value limitation for distribtuion decisions is avialable from Charles Annis' web page http://www.statisticalengineering.com/goodness.htm  His note 1 is very educational:

"The Anderson-Darling test, does not tell you that you have a Normal density.  It only tells you when the data make it unlikely that you do not.  Engineers (and I'm one) hate this kind of statistical double-talk.   But the fact remains:  Any frequentist test is constructed to disprove something.  Just as a dry sidewalk is evidence that it didn't rain, a wet sidewalk might be caused by rain or by the sprinkler system.  So a wet sidewalk can't prove that it rained, while a not-wet one is evidence that it did not rain."

When I think of eyeballing is "good enough", it makes me think of Quality Level TCE....."That's Close Enough".  It is in common usage, but it is often used when it should not be.  The real issue is that the normal curve in the real world is one of three most common occuring curves - normal, uniform and skewed (Weibull or beta).  Assume one over the other without pondering which really makes sense is just as negligent as "over-thinking" the distribution.  It is also key to uderstand that nearly all measured outputs are multi-modal, as described in the total variance equation.

  • Reply

Add new comment

Image CAPTCHA
Enter the characters shown in the image.
Please login to comment.
      

© 2025 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute Inc.

footer
  • Home
  • Print QD: 1995-2008
  • Print QD: 2008-2009
  • Videos
  • Privacy Policy
  • Write for us
footer second menu
  • Subscribe to Quality Digest
  • About Us
  • Contact Us