Featured Product
This Week in Quality Digest Live
Statistics Features
Douglas C. Fair
Part 3 of our series on SPC in a digital era
Scott A. Hindle
Part 2 of our series on SPC in a digital era
Donald J. Wheeler
Part 2: By trying to do better, we can make things worse
Douglas C. Fair
Introducing our series on SPC in a digital era
Donald J. Wheeler
Part 1: Process-hyphen-control illustrated

More Features

Statistics News
How to use Minitab statistical functions to improve business processes
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Elsmar Cove is a leading forum for quality and standards compliance
InfinityQS’ quality solutions have helped cold food and beverage manufacturers around the world optimize quality and safety
User friendly graphical user interface makes the R-based statistical engine easily accessible to anyone
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment

More News

Davis Balestracci


What Do ‘Above’ and ‘Below’ Average Really Mean?

Not knowing can have very serious consequences

Published: Wednesday, June 14, 2017 - 11:03

My last column mentioned how doctors and hospitals are currently being victimized with draconian reactions to rankings, either interpreted literally or filtered through the results of some type of statistical analysis. Besides the potential serious financial consequences of using rankings in the current craze of “pay for performance,” many hard-working people are stigmatized inappropriately through what has been called a “public blaming and shaming” strategy. Is it any wonder why many physicians are so angry these days?

A real example

Rankings with alleged helpful feedback to those allegedly needing it are also used as a cost-cutting measure to identify and motivate alleged poor performers. Many are analyzed and interpreted using an analysis that, based on courses people have taken, intuitively feels appropriate, but should actually be avoided at all costs.

In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a proposal to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class. Data were obtained for each of a peer group of 51 physicians; specifically, the total number of prescriptions written and, of that number, how many were for the targeted drug.

Someone was kind enough to send me this proposal, which included the data—while begging me not to be identified as the source. I quote it verbatim (adding my emphases).

Given the 51 physician results:

“1. Data will be tested for the normal distribution.

“2. If distribution is normal—physicians whose prescribing deviates greater than one or two standard deviations from the mean are identified as outliers.

“3. If distribution is not normal—examine distribution of data and establish an arbitrary cutoff point above which physicians should receive feedback (this cutoff point is subjective and variable based on the distribution of ratio data).”

For my own amusement, I tested the data for normality and it “passed” (p-value of 0.277, which is > 0.05). Yes, I said “for my own amusement” because this test is moot and inappropriate for percentage data like this (the number of prescriptions in the denominators ranged from 30 to 217). The computer will do anything you want.

The scary issue here is the proposed ensuing analysis that will result from whether the data are deemed normally distributed or not. If data are normally distributed, doesn’t that mean there are no outliers? But suppose outliers are present—doesn’t this mean they’re atypical? In fact, wouldn’t their presence tend to inflate the traditional calculation of standard deviation? But wait, the data passed the normality test. It’s all so confusing!

Yet that doesn’t seem to stop our quality police from lowering the “Gotcha!” threshold to two or even one standard deviation to find outliers. In my experience, I am shocked at the extent to which this has become common practice.

Returning to the protocol, even scarier is what’s proposed if the distribution is deemed not normal: establish an arbitrary cutoff point for either what the administrator feels performance should be, or the point that will expose a pre-determined arbitrary percentage (ending in “0” or “5,” of course) of alleged bad performers and/or reward a similar arbitrary percentage of good performers.

I’ll play his game. Because the data pass the normality test, the graph below shows the suggested analysis with one, two, and three standard deviation lines drawn in around the mean.

The standard deviation of the 51 numbers was 10.7.

And the consequences? Pick one…

Depending on the analyst’s mood and the standard deviation criterion subjectively selected, he or she could claim to statistically find one—or 10—“high utilizers” who would receive helpful feedback. Just curious: How does the analyst intend to deal with the 10 performances below the one standard deviation limit of 5.15 percent—and the three zeroes?

He or she could have just as easily decided that “less than 15 percent” should be a standard, resulting in 27 physician high utilizers who would receive feedback.

There is also the common alternative arbitrary strategy: Let’s go after... oops, I mean give feedback to... the—pick one—top quartile, top 10 percent, top 15 percent, top 20 percent.

Another option would be to set a tough stretch goal of “less than 10 percent,” with the following choices:
• Financially reward the 16 physicians below 10 percent? 
• Perhaps offer a bonus for those below the one standard deviation threshold of 5.15 percent?
• You could reward the bottom quartile (or 10 or 15 percent), which, along with the previous scheme, would no doubt cause displeasure among the doctors below 10 percent who didn’t get rewarded.
• Should everyone above 10 percent receive feedback? Should there be a financial penalty for a certain level above 10 percent? 

The high-utilizer feedback was a thick packet of professional journal articles considered the gold standard of evidence-based practices and rationale.

When I present this example and its proposed actions to a roomful of doctors, they erupt in laughter. When I ask what they do with such feedback, without fail, I see a beautifully synchronized collective pantomime of throwing things into the garbage.

For those of you in education, government, manufacturing, or administration, is this scenario similar to many conversations you routinely experience in any meetings you attend? How much waste in time, money, and morale do analyses and resulting meetings like this cost you? “Unknown or unknowable?” (Does it matter?)

Much of this results from teaching people what Donald Wheeler calls “superstitious nonsense” in the guise of statistics (especially that relating to the normal distribution). Most such material is pretty much useless when it comes to application in a real-world, everyday environment and causes far more confusion and problems than it solves. 

Is it possible to change those conversations to make them more productive? More about that next time when I revisit these data.


About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.


Whether or not you understand statistics.....

A statement in a previous Balestracci article sums up the situation"

"Whether or not you understand statistics, you are already using statistics!"

This a real problem and with sometimes dire and even deadly consequences.  

As early as 1916, Walter A. Shewhart began to think about and do something about "data sanity."

Great to see Davis Balestracci expand on Shewhart's work.  Is anybody listening?  We darned well better!

"In the summer of 1916, Walter worked with the Western Electric Co. in New York (an integral part of the Bell Telephone system). He was examined by the medical officers of the firm who all thought he was suffering from tuberculosis. All tests were negative, but he had small fluctuations of temperature which seven medical men accepted as convincing proof of tuberculosis. Walter then started taking temperatures of other people, and found that his own father had similar fluctuations although he had never been ill except only once in his life. Walter discovered that most people have temperature fluctuations; so that his own case could not possibly be considered as abnormal, and he decided not to worry any further about T.B. This clearly was an authentic early piece of work in Quality Control." 

Walter A. Shewhart and Statistical Quality Control in India Author(s): P. C. Mahalanobis Source: Sankhyā: The Indian Journal of Statistics (1933-1960), Vol. 9, No. 1 (Oct., 1948), pp. 51-60 Published by: Indian Statistical Institute 


Based on what you cite as the proposal, the pharmacy administrator demonstrates a profound lack of understanding of variation. (I suspect there wasn't much curiosity to understand the system that generated the data, either.) In the span of about 15 seconds I went through disbelief, anger, and disgust as I recognize the very real impact decisions based on such superstitious nonsense have on people's lives.

You keep doing what you do, Davis. I am following in Dr. Wheeler's and your footsteps to debunk these methods, and help those open to listening with better analyses.

All the best, Shrikant Kalegaonkar (https://twitter.com/shrikale or https://shrikale.wordpress.com)

Distributed folk

Reminds me of the fact that in the general population, 50% of people are below average intelligence.  Those who argue are usually in this group.  (Median is approximately equal to mean for IQ.  Yes, and it's pretty normally distributed. )  I had an aquaintaince once, who was very excited that she scored 99% in an IQ test.  Then we those who have IQ between 0 and 25 who are idiots; IQs between 26 and 50 are considered imbeciles; and those who have an IQ between 51 and 70 are considered morons.  Does this mean it is better to be considered a moron than the village idiot?

Still, I would prefer my physician to be a genius rather than an idiot.