



© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 09/30/2014
My last article demonstrated a common incorrect technique—based in “traditional” statistics—for comparing performances based on percentage rates. This article will use the same data to show what should be done instead.
To quickly review the scenario: In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a proposal to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class. Data were obtained for a peer group of 51 physicians including the total number of prescriptions written, and, of that number, how many were for the target drug.
During this time period, these 51 physicians had written 4,032 prescriptions, of which 596 were for the target drugs—an overall rate of 14.8 percent. The goal of this analysis of means (ANOM) is to compare a group of physicians who have what should be similar practice types, a relatively homogenous system, if you will. Each is compared to their system’s overall average. Variation is exposed, there is conversation to discuss the variation, and then reduce the inappropriate and unintended variation.
For each individual physician’s performance, one calculates the common-cause limits of what would be expected due to statistical variation from the system’s 14.8 target prescription rate. Based on the appropriate statistical theory for percentage data based on counts (i.e., binomial), a standard deviation must be calculated separately for each physician because each wrote a different number of total prescriptions.
The calculation for the p-chart ANOM is as follows:
Once again, note its similarity to the u-chart calculation, as well as the philosophy of its use. They differ only in this calculation of the common-cause band.
As in the u-chart, this result of the square root is then multiplied by three (for three standard deviations), then added and subtracted to the overall mean to see whether the actual value for an individual physician is in the range of this expected variation, given an assumed rate of 14.8 percent.
The statistical wild-ass guess (SWAG) graph is shown below and the p-chart ANOM is below that.
Note that what many of you would consider conservative three standard deviation limits—calculated correctly—are in this case comparable to approximately 1.5 standard deviation limit of the incorrect analysis. Another difference: The overall system value obtained from the aggregated summed numerators and denominators of the 51 physicians was 14.8 percent, which differs from merely taking the average of the 51 percentages (15.8).
In ANOM, anyone outside his or her correctly calculated, unique, common-cause band is a probable special cause; these physicians are truly “above average” or “below average.” Notice that physicians 48 and 49 could still be indicative of a prescribing process at 14.8 percent because of the number of prescriptions written.
Using the previous SWAG analysis, depending on the analyst’s mood and the standard deviation criterion subjectively selected, he could claim to statistically find one or 11 upper outliers, using two or one standard deviations, respectively. The ANOM shows eight probable above-average outliers, with a lot more certainty than a SWAG.
So, what should we conclude from our correctly plotted graph? Only that these outlier physicians have a different process for prescribing this particular drug than their colleagues, 36 of whom exhibit average behavior—these physicians between the red lines are indistinguishable from each other and the system average of 14.8 percent. For some physicians, this outlier variation might be appropriate because of the type of patient they treat, and for others it may be inappropriate or unintended due to their methods of prescription, but they don’t know it. Maybe collegial discussion (including the outliers that are below average) using this graph as a starting point would be more productive than what has become known as “public blaming and shaming.”
And then there are people obsessed with “standards.”
I cringe as I think of these people creating and enforcing standards via the wild-ass guess (WAG) approach: setting a standard above which they “feel” no one should be, and give feedback to those physicians. In this case, if the standard is 15 percent, 27 physicians would get such feedback, 19 of them inappropriately. If the enforcers decide to get tough and set a 10 percent goal, then 35 or 36 physicians would get feedback. There is no realization that the current system seems “perfectly designed” to perform at 14.8 percent.
If the standard was 15 percent, the first order of business would be to study the eight above-average outlier physicians and see whether it is appropriate to bring them into the average red zone with their 36 colleagues. If the standard was 10 percent, there needs to be fundamental change in all physician behavior, but the discussion could begin by dialoguing with the six physicians who are truly below average.
And look what is lost in all these discussions: linking all this with observed patient outcomes for the individual physicians. Might this put the focus where it should be rather than on cost? I would call this a much better analysis for determining where the process should be—which I think would beat a WAG in the patients’ eyes.
Links:
[1] http://www.qualitydigest.com/inside/quality-insider-column/statistical-stratification-sorts.html