{domain:"www.qualitydigest.com",server:"169.47.211.87"} Skip to main content

User account menu
Main navigation
  • Topics
    • Customer Care
    • FDA Compliance
    • Healthcare
    • Innovation
    • Lean
    • Management
    • Metrology
    • Operations
    • Risk Management
    • Six Sigma
    • Standards
    • Statistics
    • Supply Chain
    • Sustainability
    • Training
  • Videos/Webinars
    • All videos
    • Product Demos
    • Webinars
  • Advertise
    • Advertise
    • Submit B2B Press Release
    • Write for us
  • Metrology Hub
  • Training
  • Subscribe
  • Log in
Mobile Menu
  • Home
  • Topics
    • 3D Metrology-CMSC
    • Customer Care
    • FDA Compliance
    • Healthcare
    • Innovation
    • Lean
    • Management
    • Metrology
    • Operations
    • Risk Management
    • Six Sigma
    • Standards
    • Statistics
    • Supply Chain
    • Sustainability
    • Training
  • Login / Subscribe
  • More...
    • All Features
    • All News
    • All Videos
    • Contact
    • Training

Lessons From a Statistical Analysis Gone Wrong, Part 3

Keep asking questions

Eston Martz
Wed, 07/29/2015 - 13:15
  • Comment
  • RSS

Social Sharing block

  • Print
  • Add new comment
Body

If you’ve read the first two parts of this tale, you know it started when I published a post that involved transforming data for capability analysis. When an astute reader asked why Minitab didn‘t seem to transform the data outside of the capability analysis, it revealed an oversight that invalidated the original analysis.

ADVERTISEMENT

I removed the errant post. But to my surprise, John Borneman, the reader who helped me discover my error, continued looking at the original data. "I do have a day job, but I‘m a data geek,” he explained to me. “Plus, doing this type of analysis ultimately helps me analyze data found in my real work!”

I want to share what Borneman did, because it’s a great example of how you can take an analysis that doesn’t work, ask a few more questions, and end up with an analysis that does work.

 …

Want to continue?
Log in or create a FREE account.
Enter your username or email address
Enter the password that accompanies your username.
By logging in you agree to receive communication from Quality Digest. Privacy Policy.
Create a FREE account
Forgot My Password

Comments

Submitted by NT3327 on Wed, 07/29/2015 - 10:53

Or you can just watch it run

It's probably worth emphasizing that you're dealing with winning times. At around 20 horses per Kentucky Derby, every five years we have 100 or so horses that are slower than Secretariat.

How might non-winning times inpact the analysis? Sham's second-place finish in 1973 is one of the fastest times in the Kentucky Derby, it just happened to go up against Secretariat that day (and yes, non-winning times are not generally available).

The Kentucky Derby is billed as the most exciting two minutes in sports. Secretariat became the first horse to finish in less than two minutes, and still holds the record time for each of the Triple Crown races. How many track and field or swimming records from 1973 hold today?

The slowest winning time (1970) in the Belmont Stakes was turned in by High Echelon in the mud. Still, it's interesting to see that the slowest and fastest winning times (over the current length) are separated by just three years. Special causes, anyone? ;) The range of winning times is 10 seconds, with Secretariat 2 seconds faster than the next-best winning time. It's worth noting that the Belmont Stakes was held on a different track (Aqueduct, I think) from 1963-1968.

Take a few minutes to find some clips of Secretariat winning the Triple Crown events. At times it looks as though his portion of the recording has been set to fast-forward.

NT3327

  • Reply

Submitted by pswaroop on Wed, 07/29/2015 - 11:13

Great learning

Eston, thanks for sharing your oversight n learnings with all of us. I was going through all 3 posts and awaited for the next day to know more, this was a thriller!.

You point has been well noted and even i faced this issue sometime due to system inability to track decimal point for effort data on software dev.

Im your big fan and love your posts.
Thank you again.

Prashant
India

  • Reply

Submitted by Bill McNeese on Wed, 07/29/2015 - 15:50

Data

Hello, Interesting post. Definitely need to look closely at everything. The issue can be seen in the first part of this series. Definitely a rounding issue that is evident with the individuals control chart. You can see where the rounding to the second makes more a step-type chart in that chart. That was clue 1. Plus, you are not dealing with homogeneous data - the first 11 points are above the average - so a special cause. Probably should not include those in the data set. Would be nice if you shared the original data somehow. I copied the Belmont winning times from Wiki since 1929, which rounded to the nearest .1 seconds. The first eight points are above the average on the individuals chart. Including that data gives a p value of 0.16 for normality. Removing those, gives a p value of almost 0.5. Did not check that other two races. Thanks for the post. Bill

  • Reply

Submitted by kkbari on Tue, 08/04/2015 - 09:22

Lack of Discrimination

Here's another error.  The times are not ordinal, they are still ratio.  Ordinal implies that the distances between the values are not equidistant, and that is patently not true.  A lack of discrimination does not change the form of the data.  120 seconds is twice as fast as 60 seconds, no matter how many decimal places you have.

 

It is also important to point out that when you have "rounded" data, it does not change the standard deviation of the data.  You can run a quick simulation in Excel to prove that out.  Yes, graphs are more "digital" but it doesn't diminish the power of using the data to perform almost any analysis you choose.

 

Took me a while to figure out why there was a LSL; could have been clearer on the reason you were running a capability analysis was not to calculate a capability indice but to integrate underneath the curve and hence your rationale for transforming the data (which is probably the only legitimate reason to transform when the data is significantly skewed).

  • Reply

Add new comment

Image CAPTCHA
Enter the characters shown in the image.
Please login to comment.
      

© 2025 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute Inc.

footer
  • Home
  • Print QD: 1995-2008
  • Print QD: 2008-2009
  • Videos
  • Privacy Policy
  • Write for us
footer second menu
  • Subscribe to Quality Digest
  • About Us
  • Contact Us