That’s fake news. Real news COSTS. Please turn off your ad blocker for our web site.
Our PROMISE: Our ads will never cover up content.
Davis Balestracci
Published: Monday, December 14, 2015 - 09:13 “It is impossible to tell how widespread data torturing is. Like other forms of torture, it leaves no incriminating marks when done skillfully. And like other forms of torture, it may be difficult to prove even when there is incriminating evidence.” When will academics, Six Sigma belts, and consultants wake up and realize that, despite their best efforts, most people in their audiences will not correctly use the statistics they’ve been taught—including many of the teachers themselves? Sometimes I wonder if they are exacting revenge on their captive audiences for being beaten up on the playground 25 years ago. The clinical publications world is especially a hotbed for inappropriate uses of statistics. Many people are guilty of looking for the most dramatic, positive findings in their data, and who can blame them? If study data are manipulated enough, they can be made to appear to prove whatever the investigator wants to prove. When this process goes beyond reasonable interpretation of the facts, it becomes data torturing. 1. Opportunistic. This involves a) poring over data not collected specifically for the current purpose until an alleged statistically significant association is found between variables, and then b) devising a plausible hypothesis to fit the association. One can easily find significant results where none exist simply by making multiple comparisons. Using the widely accepted p-value of 0.05 (i.e., willingness to take a 5-percent risk of declaring something is significant when it isn’t), more comparisons mean more opportunities for random events to be declared significant simply due to chance. For two tests, the probability that at least one “significant” difference could possibly be declared randomly is 10 percent (1 – (0.95 x 0.95)). For 20 tests, it is 64 percent (1 – 0.95**20)). If one is on a fishing expedition with such a data set—once again, I emphasize that this term applies only to data (usually a tabulation) that weren’t collected specifically for the current purpose—one should at least adjust decision criteria to make the overall risk 5 percent. This significance value is dependent on the number of possible comparisons. There are several ways to do this, but one example would say that the threshold to declare significance for two potential individual comparisons should each be p < 0.025. Similarly, for 20 comparisons, this would be p < 0.0036. Further, if the fishing expedition catches a boot, the fishermen should throw it back and not claim that they were fishing for boots. The honest investigator will limit the study to focused questions, all of which make sense in the given context—which can then be subsequently tested with an appropriately designed study. The data torturer will act as if every positive result confirmed a major hypothesis. Unfortunately, when this type of data torturing is done well, it may be impossible for readers to tell that the positive association did not spring from an a priori hypothesis. 2. Procrustean. You decide on the hypothesis to be proved, then make the data fit the hypothesis. This requires selective reporting, one of the most common being the intentional suppression of contradictory data. It is more difficult to carry out than opportunistic data torturing, but its results are often more believable if one starts with a popular hypothesis that appears to have been “proven.” One should suspect data torturing whenever subjects are dropped without a clear reason, or when a large proportion of subjects are excluded for any reason. You should ask, “Is the rationale for the subgroup analyses convincing?” In the case of medicine, remember that two sexes, multiple age groups, and different clinical features such as stages of disease make it possible for the investigators to examine the data in many different ways. If a drug is reported as working only in women older than 60 years, the savvy reader should at least suspect a chance finding. The delightful applied-science statistician J. Stuart Hunter invented the term PARC to characterize a lot of what is being taught and practiced: “practical accumulated records compilation” on which one does a “passive analysis by regressions and correlations.” Then, to get it published, one must do the “planning after the research is already completed.” With the current plethora of friendly computer packages that are designed to “delight” their customers, I have also coined the characterization “profound analysis relying on computers.” Here is an enlightening quote by Walter A. Shewhart from his classic book, The Economic Control of Quality Manufactured Product (Martino Fine Books, 2015 reprint): “You go to your tailor for a suit of clothes, and the first thing that he does is to make some measurements; you go to your physician because you are ill, and the first thing he does is to make some measurements. The objects of making measurements in these two cases are different. They typify the two general objects of making measurements. They are: These are two entirely different purposes. For example, when I’m being fitted for a suit, I don’t expect my tailor to take my waist measurement, then ask, “I need to know whether your mother has or had Type II diabetes.” The tailor doesn’t care about the genetic process that produced my body; he or she just measures it (once), then makes my suit. I vividly remember a newspaper article that appeared when I lived in Minnesota more than 20 years ago, titled, “Whites May Sway TV Ratings.” It read: “...[An] associate professor and Chicago-based economist reviewed TV ratings of 259 basketball games.... They attempted to factor out all other variables such as the win-loss records of teams and the times games were aired [my emphasis].... The economists concluded that every additional 10 minutes of playing time by a white player increases a team’s local ratings by, on average, 5,800 homes.” Hence, Minnesotans are bigots! What do you think? Isn’t the objective of TV ratings solely to find out how many people watched a particular show (i.e., “making a suit”), period? Is the data-collecting agency trying to determine racial viewing patterns during basketball games (i.e., causal explanation)? Hardly. When “data for a suit” (i.e., most tabulated statistics) are used to make a causal inference, that’s asking for trouble. This is why a lot of published research is, in essence, PARC spelled backwards—which was Hunter’s ultimate point. People are doing PARC analyses on data that are the “continuous recording of administrative procedures.” Speaking of data torturing, when are teachers of statistics going to stop torturing their students as well? Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.Can You Prove Anything With Statistics?
Maybe... using PARC
—J. L. MillsTwo types of torture
Do a PARC analysis, and you get...
• To obtain quantitative information [only]
• To obtain a causal explanation of observed phenomena.”
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Davis Balestracci
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.