Featured Video
This Week in Quality Digest Live
Statistics Features
Davis Balestracci
A data-sane alternative for percentage performance comparisons
Donald J. Wheeler
The ultimate in homogeneous subgroups
Steve Daum
Avoid these and amplify your QA efforts
Davis Balestracci
Not knowing can have very serious consequences
Derek Benson
How a new parent calculates SleepPK

More Features

Statistics News
Satisfaction with federal government reaches a four-year high after three years of decline
TVs and video players lead the pack, with internet services at the bottom
Using big data to identify where improvements will have the greatest impact
Includes all the tools to comply with quality standards and reduce variability
A free, systematic comparison of upcoming changes to the ISO 9001:2008 standard

More News

Davis Balestracci

Statistics

New Results Equal New Conversations

A data-sane alternative for percentage performance comparisons

Published: Monday, July 17, 2017 - 12:03

Recently I demonstrated a common incorrect technique for comparing percentage rate performances—based of course in the usual normal distribution nonsense. Let’s revisit those data with a superior alternative.

To quickly review the scenario: In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a protocol to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class.

Data were obtained for each of a peer group of 51 physicians—the total number of prescriptions written and how many of them were for the target drug. During this time period, these 51 physicians had written 4,032 prescriptions, of which 596 were for the target drug—an overall rate of 14.8 percent.

The correct alternative: a p-chart analysis of means

The goal of analysis of means (ANOM) is to compare a group of physicians who have what should be similar practice types—a relatively homogeneous “system,” if you will. Each is compared to this system’s overall average. Variation is exposed, and a group conversation ensues to discuss the variation, then reduce the inappropriate and unintended variation.

For each individual physician’s performance, one calculates the common-cause limits of what would be expected due to statistical variation from the system’s 14.8 target prescription rate. Given the appropriate statistical theory for percentage data based on counts (i.e., binomial), a standard deviation must be calculated separately for each physician because each wrote a different number of total prescriptions, in this case, ranging from 30 to 217. 

The calculation for this situation’s p-chart ANOM is as follows (note its dependence on the system average):

The result of the square root is multiplied by three (for “three standard deviations”), then added and subtracted to the overall system average to see whether the actual value for an individual physician is in the range of this expected variation, given an assumed rate of 14.8 percent (“innocent until proven guilty,” the best strategy for dealing with physicians from my experience).

Prior data analysis is shown directly below, and the p-chart ANOM is below that.

Note that what many of you would consider conservative, three standard deviation limits are, in the case of the ANOM, comparable to approximately 1.5 standard deviations of the incorrect analysis. Why? Because the standard deviation is calculated correctly

Another difference: The overall system value obtained from the aggregated summed numerators and denominators of the 51 physicians was 14.8 percent (596 / 4,032), which differs from taking the average of the 51 individual percentages (15.8).

In ANOM, anyone outside the (correctly calculated) unique common-cause band is a probable special cause; these physicians are truly “above average” or “below average.” Note that: 1) physicians 48 and 49 could still be indicative of a prescribing process at 14.8 percent because of the number of prescriptions written; and 2) there are five below-average performances found by the three standard deviation criteria (there is not even a lower two standard deviation line in the incorrect analysis).

The incorrect analysis and its inappropriate declaration of normality, coupled with the standard deviation criterion subjectively selected, could claim to statistically find one or 11 upper outliers, using two or one standard deviations, respectively. The ANOM shows eight probable above-average outliers with a lot more certainty.

So, what should we conclude from our correctly plotted graph? Only that these outlier physicians have a different process for prescribing this particular drug than their colleagues, 36 of whom exhibit average behavior. These physicians between the red lines are indistinguishable from each other and the system average of 14.8 percent.

Some physicians’ outlier variation might be appropriate because of the type of patient they treat (“people” input to their process), while for others it may be inappropriate or unintended due to their “methods” of prescribing—but they don’t know it. Maybe collegial discussion (also considering the outliers who are below average?) using this graph as a starting point would be more productive than what has become known as “public blaming and shaming.”

I get very positive responses when presenting this approach to frontline physician groups in grand rounds: This makes sense to their scientific intuition. In fact, many have told me, “If results were presented to us like this, we’d take care of it ourselves.

This gives them back the sense of control they lose when presented with arbitrary, incorrect, judgmental analyses that inappropriately threaten their sense of competence.

Similar analysis with the term ‘funnel plot’

For some reason, academic journals have taken a fancy to ordering results with the X-axis sorted from lowest denominator to highest, then labeling the result a “funnel plot” due to its resulting appearance.

Here is the ANOM above, presented as a funnel plot. The same doctor labels from above are shown on the X-axis. My personal preference remains sorting the proportions from lowest to highest, as above.

For people obsessed with standards

Integrating ANOM with standards thinking
• If there were a standard of 15 percent, the first order of business would be to study the eight above-average outlier physicians and see whether it is appropriate to bring them into the average red zone with their 36 colleagues.
• If the standard were 10 percent, there needs to be fundamental change in all physician behavior, but the discussion could begin by talking with the five physicians who are truly below average... if their outcomes are better or at least no different.

Consider: In the case of an arbitrary 10 percent standard, is such a level even appropriate? With the current focus on costs, shouldn’t linking these discussions with observed patient outcomes also be part of the equation? I would call this a much better analysis for determining where the process should be—and for which patients would be thankful.

Discuss

About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.