## New Results Equal New Conversations

### A data-sane alternative for percentage performance comparisons

Published: Monday, July 17, 2017 - 11:03

Recently I demonstrated a common *incorrect* technique for comparing percentage rate performances—based of course in the usual normal distribution nonsense. Let’s revisit those data with a superior alternative.

To quickly review the scenario: In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a protocol to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class.

Data were obtained for each of a peer group of 51 physicians—the total number of prescriptions written and how many of them were for the target drug. During this time period, these 51 physicians had written 4,032 prescriptions, of which 596 were for the target drug—an overall rate of 14.8 percent.

### The correct alternative: a p-chart analysis of means

The goal of analysis of means (ANOM) is to compare a group of physicians who have what should be similar practice types—a relatively homogeneous “system,” if you will. Each is compared to this *system*’s overall average. Variation is exposed, and a group conversation ensues to discuss the variation, then reduce the *inappropriate* and *unintended* variation.

For each individual physician’s performance, one calculates the common-cause limits of what would be expected due to statistical variation from the system’s 14.8 target prescription rate. Given the appropriate statistical theory for percentage data based on counts (i.e., binomial), a standard deviation must be calculated separately for each physician because each wrote a different number of total prescriptions, in this case, ranging from 30 to 217.

The calculation for this situation’s p-chart ANOM is as follows (note its dependence on the system average):

The result of the square root is multiplied by three (for “three standard deviations”), then added and subtracted to the overall system average to see whether the actual value for an individual physician is in the range of this expected variation, given an assumed rate of 14.8 percent (“innocent until proven guilty,” the best strategy for dealing with physicians from my experience).

Prior data analysis is shown directly below, and the p-chart ANOM is below that.

Note that what many of you would consider conservative, three standard deviation limits are, in the case of the ANOM, comparable to approximately 1.5 standard deviations of the incorrect analysis. Why? *Because the standard deviation is calculated correctly*.

Another difference: The overall system value obtained from the aggregated summed numerators and denominators of the 51 physicians was 14.8 percent (596 / 4,032), which differs from taking the average of the 51 individual percentages (15.8).

In ANOM, anyone outside the (correctly calculated) unique common-cause band is a probable special cause; these physicians are truly “above average” or “below average.” Note that: 1) physicians 48 and 49 could still be indicative of a prescribing process at 14.8 percent because of the number of prescriptions written; and 2) there are five below-average performances found by the three standard deviation criteria (there is not even a lower two standard deviation line in the incorrect analysis).

The incorrect analysis and its inappropriate declaration of normality, coupled with the standard deviation criterion subjectively selected, could claim to statistically find one or 11 upper outliers, using two or one standard deviations, respectively. The ANOM shows eight probable above-average outliers with a lot more certainty.

So, what should we conclude from our correctly plotted graph? Only that these outlier physicians have a different process for prescribing this particular drug than their colleagues, 36 of whom exhibit average behavior. *These physicians between the red lines are indistinguishable from each other and the system average of 14.8 percent.*

Some physicians’ outlier variation might be appropriate because of the type of patient they treat (“people” input to their process), while for others it may be *inappropriate* or *unintended* due to their “methods” of prescribing—but they don’t know it. Maybe collegial discussion (also considering the outliers who are below average?) using this graph as a starting point would be more productive than what has become known as “public blaming and shaming.”

I get very positive responses when presenting this approach to frontline physician groups in grand rounds: This makes sense to their scientific intuition. In fact, many have told me, “If results were presented to us like this, *we’d take care of it ourselves.*”

This gives them back the sense of control they lose when presented with arbitrary, incorrect, judgmental analyses that *inappropriately* threaten their sense of competence.

**Similar analysis with the term ‘funnel plot’**

For some reason, academic journals have taken a fancy to ordering results with the X-axis sorted from lowest denominator to highest, then labeling the result a “funnel plot” due to its resulting appearance.

Here is the ANOM above, presented as a funnel plot. The same doctor labels from above are shown on the X-axis. My personal preference remains sorting the proportions from lowest to highest, as above.

### For people obsessed with standards

**Integrating ANOM with standards thinking**

• If there were a standard of 15 percent, the first order of business would be to study the eight above-average outlier physicians and see whether it is appropriate to bring them into the average red zone with their 36 colleagues.

• If the standard were 10 percent, there needs to be fundamental change in *all* physician behavior, but the discussion could begin by talking with the five physicians who are truly *below *average... if their outcomes are better or at least no different.

**Consider: **In the case of an *arbitrary* 10 percent standard, is such a level even appropriate? With the current focus on costs, shouldn’t linking these discussions with observed patient outcomes also be part of the equation? I would call this a much better analysis for determining where the process should be—and for which patients would be thankful.