{domain:"www.qualitydigest.com",server:"169.47.211.87"} Skip to main content

User account menu
Main navigation
  • Topics
    • Customer Care
    • FDA Compliance
    • Healthcare
    • Innovation
    • Lean
    • Management
    • Metrology
    • Operations
    • Risk Management
    • Six Sigma
    • Standards
    • Statistics
    • Supply Chain
    • Sustainability
    • Training
  • Videos/Webinars
    • All videos
    • Product Demos
    • Webinars
  • Advertise
    • Advertise
    • Submit B2B Press Release
    • Write for us
  • Metrology Hub
  • Training
  • Subscribe
  • Log in
Mobile Menu
  • Home
  • Topics
    • 3D Metrology-CMSC
    • Customer Care
    • FDA Compliance
    • Healthcare
    • Innovation
    • Lean
    • Management
    • Metrology
    • Operations
    • Risk Management
    • Six Sigma
    • Standards
    • Statistics
    • Supply Chain
    • Sustainability
    • Training
  • Login / Subscribe
  • More...
    • All Features
    • All News
    • All Videos
    • Contact
    • Training

Data Snooping, Part 1

What pitfalls lurk within your database?

Donald J. Wheeler
Mon, 08/06/2018 - 12:03
  • Comment
  • RSS

Social Sharing block

  • Print
Body

Data mining is the foundation for the current fad of “big data.” Today’s software makes it possible to look for all kinds of relationships among the variables contained in a database. But owning a pick and shovel will not do you much good if you do not know the difference between gold and iron pyrite.

ADVERTISEMENT

When you start rummaging around in a collection of existing data (a database) to discover if you can use some variables to “predict” other variables you are data snooping (known today as data mining). With today’s software we can go snooping in very large databases in an effort to extract useful relationships. However, in the interest of clarity we will use a small data set and do our snooping using nothing more than bivariate linear regression. The issues illustrated here are the same regardless of the size of the data set and regardless of the techniques used to “model the data.”

Data snooping

The data set consists of five weekly production variables from a chemical plant. Figure 1 shows the data for a baseline of eight weeks of production. We will treat Y as our response variable, and see how well the other four variables do in predicting the value for Y.

 …

Want to continue?
Log in or create a FREE account.
Enter your username or email address
Enter the password that accompanies your username.
By logging in you agree to receive communication from Quality Digest. Privacy Policy.
Create a FREE account
Forgot My Password

Add new comment

Image CAPTCHA
Enter the characters shown in the image.
Please login to comment.
      

© 2025 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute Inc.

footer
  • Home
  • Print QD: 1995-2008
  • Print QD: 2008-2009
  • Videos
  • Privacy Policy
  • Write for us
footer second menu
  • Subscribe to Quality Digest
  • About Us
  • Contact Us