Featured Video
This Week in Quality Digest Live
Statistics Features
Harish Jose
Tips for sample sizes
Ville Satopaa
The partial information framework can be used to analyze real-world forecasting data
Barbara A. Cleary
Accurate predictions demand more than a chart
Davis Balestracci
Dirk Dusharme @ Quality Digest
Self-climbing stairs, after-sales service, and can you be too concerned about quality?

More Features

Statistics News
Strategic investment positions EtQ to accelerate innovation efforts and growth strategy
Satisfaction with federal government reaches a four-year high after three years of decline
TVs and video players lead the pack, with internet services at the bottom
Using big data to identify where improvements will have the greatest impact
Includes all the tools to comply with quality standards and reduce variability
A free, systematic comparison of upcoming changes to the ISO 9001:2008 standard

More News

Davis Balestracci


What Do ‘Above’ and ‘Below’ Average Really Mean?

Not knowing can have very serious consequences

Published: Wednesday, June 14, 2017 - 12:03

My last column mentioned how doctors and hospitals are currently being victimized with draconian reactions to rankings, either interpreted literally or filtered through the results of some type of statistical analysis. Besides the potential serious financial consequences of using rankings in the current craze of “pay for performance,” many hard-working people are stigmatized inappropriately through what has been called a “public blaming and shaming” strategy. Is it any wonder why many physicians are so angry these days?

A real example

Rankings with alleged helpful feedback to those allegedly needing it are also used as a cost-cutting measure to identify and motivate alleged poor performers. Many are analyzed and interpreted using an analysis that, based on courses people have taken, intuitively feels appropriate, but should actually be avoided at all costs.

In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a proposal to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class. Data were obtained for each of a peer group of 51 physicians; specifically, the total number of prescriptions written and, of that number, how many were for the targeted drug.

Someone was kind enough to send me this proposal, which included the data—while begging me not to be identified as the source. I quote it verbatim (adding my emphases).

Given the 51 physician results:

“1. Data will be tested for the normal distribution.

“2. If distribution is normal—physicians whose prescribing deviates greater than one or two standard deviations from the mean are identified as outliers.

“3. If distribution is not normal—examine distribution of data and establish an arbitrary cutoff point above which physicians should receive feedback (this cutoff point is subjective and variable based on the distribution of ratio data).”

For my own amusement, I tested the data for normality and it “passed” (p-value of 0.277, which is > 0.05). Yes, I said “for my own amusement” because this test is moot and inappropriate for percentage data like this (the number of prescriptions in the denominators ranged from 30 to 217). The computer will do anything you want.

The scary issue here is the proposed ensuing analysis that will result from whether the data are deemed normally distributed or not. If data are normally distributed, doesn’t that mean there are no outliers? But suppose outliers are present—doesn’t this mean they’re atypical? In fact, wouldn’t their presence tend to inflate the traditional calculation of standard deviation? But wait, the data passed the normality test. It’s all so confusing!

Yet that doesn’t seem to stop our quality police from lowering the “Gotcha!” threshold to two or even one standard deviation to find outliers. In my experience, I am shocked at the extent to which this has become common practice.

Returning to the protocol, even scarier is what’s proposed if the distribution is deemed not normal: establish an arbitrary cutoff point for either what the administrator feels performance should be, or the point that will expose a pre-determined arbitrary percentage (ending in “0” or “5,” of course) of alleged bad performers and/or reward a similar arbitrary percentage of good performers.

I’ll play his game. Because the data pass the normality test, the graph below shows the suggested analysis with one, two, and three standard deviation lines drawn in around the mean.

The standard deviation of the 51 numbers was 10.7.

And the consequences? Pick one…

Depending on the analyst’s mood and the standard deviation criterion subjectively selected, he or she could claim to statistically find one—or 10—“high utilizers” who would receive helpful feedback. Just curious: How does the analyst intend to deal with the 10 performances below the one standard deviation limit of 5.15 percent—and the three zeroes?

He or she could have just as easily decided that “less than 15 percent” should be a standard, resulting in 27 physician high utilizers who would receive feedback.

There is also the common alternative arbitrary strategy: Let’s go after... oops, I mean give feedback to... the—pick one—top quartile, top 10 percent, top 15 percent, top 20 percent.

Another option would be to set a tough stretch goal of “less than 10 percent,” with the following choices:
• Financially reward the 16 physicians below 10 percent? 
• Perhaps offer a bonus for those below the one standard deviation threshold of 5.15 percent?
• You could reward the bottom quartile (or 10 or 15 percent), which, along with the previous scheme, would no doubt cause displeasure among the doctors below 10 percent who didn’t get rewarded.
• Should everyone above 10 percent receive feedback? Should there be a financial penalty for a certain level above 10 percent? 

The high-utilizer feedback was a thick packet of professional journal articles considered the gold standard of evidence-based practices and rationale.

When I present this example and its proposed actions to a roomful of doctors, they erupt in laughter. When I ask what they do with such feedback, without fail, I see a beautifully synchronized collective pantomime of throwing things into the garbage.

For those of you in education, government, manufacturing, or administration, is this scenario similar to many conversations you routinely experience in any meetings you attend? How much waste in time, money, and morale do analyses and resulting meetings like this cost you? “Unknown or unknowable?” (Does it matter?)

Much of this results from teaching people what Donald Wheeler calls “superstitious nonsense” in the guise of statistics (especially that relating to the normal distribution). Most such material is pretty much useless when it comes to application in a real-world, everyday environment and causes far more confusion and problems than it solves. 

Is it possible to change those conversations to make them more productive? More about that next time when I revisit these data.


About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.



Based on what you cite as the proposal, the pharmacy administrator demonstrates a profound lack of understanding of variation. (I suspect there wasn't much curiosity to understand the system that generated the data, either.) In the span of about 15 seconds I went through disbelief, anger, and disgust as I recognize the very real impact decisions based on such superstitious nonsense have on people's lives.

You keep doing what you do, Davis. I am following in Dr. Wheeler's and your footsteps to debunk these methods, and help those open to listening with better analyses.

All the best, Shrikant Kalegaonkar (https://twitter.com/shrikale or https://shrikale.wordpress.com)

Distributed folk

Reminds me of the fact that in the general population, 50% of people are below average intelligence.  Those who argue are usually in this group.  (Median is approximately equal to mean for IQ.  Yes, and it's pretty normally distributed. )  I had an aquaintaince once, who was very excited that she scored 99% in an IQ test.  Then we those who have IQ between 0 and 25 who are idiots; IQs between 26 and 50 are considered imbeciles; and those who have an IQ between 51 and 70 are considered morons.  Does this mean it is better to be considered a moron than the village idiot?

Still, I would prefer my physician to be a genius rather than an idiot.