I chatted about u-charts for rates last time, and this column was going to be about p-charts for percentage data. These are the two major charts for dealing with count data and are helpful for stratifying a stable section of process performance.

ADVERTISEMENT |

But something recently happened that saddens me and has become all too common in many organizations for which I have consulted. It reminded me of the need to warn you about a very common approach to (allegedly) stratify data—to find the “bad” performers. I have a wonderful data set using percentages on which, next time, I will demonstrate the proper analysis and interpretation via p-charts; but I am going to use it today to make a major point about *something to be avoided at all costs*.

I have been mentoring a very good data analyst for the past three years. Despite the support of the medical director, it has been pretty much an all-out war with the C-suite executives to implement “data sanity”—resistance, to put it mildly, has been fierce from the start. I received the following note from this analyst last week:

“I’m sorry to report that it appears control charts [of key indicators] are nearly dead.... As of last week, they have been pulled off all but one report....

“In other news [the organization for which he works] has moved towards lean Six Sigma. The first Black Belt ‘course’ is being offered right now—and I have been ‘drafted’ to teach the statistics portion.... I have been working on my slides over the past couple of weeks, and I must say that I don't understand where any of this is going to come in handy for quality directors.... I spent half an hour trying to find information on calculating the confidence interval for the correlation coefficient by hand. It involves the inverse hyperbolic tangent function.... I'm sure everyone will get that one, right?!?

“It all seems a little ridiculous to me.”

It is courses like these that lead to consequences and analyses such as the ones I am about to describe—techniques inappropriately used to convert a wild-ass guess (WAG) into a statistical wild-ass guess (SWAG).

**I can't make this stuff up**

Published rankings with feedback are very often used as a cost-cutting measure to identify and motivate “those bad workers.” Some are even derived, er... uh... “statistically?”

In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a proposal to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class. Data were obtained for each of a peer group of 51 physicians—the total number of prescriptions written and, of that number, how many were for the target drug.

Someone was kind enough to send me this proposal, while begging me not to be identified as the source. I quote it verbatim as it applies to results from the 51 physicians:

1. “Data will be tested for the normal distribution”

2. “If distribution is normal—physicians whose prescribing deviates greater than one or two standard deviations from the mean are identified as outliers”

3. “If distribution is not normal—examine distribution of data and establish an arbitrary cutoff point above which physicians should receive feedback (this cutoff point is subjective and variable based on the distribution of ratio data)”

For my own amusement, I tested the data for normality and it “passed” (p-value of 0.277). Yes, I said “for my own amusement” because this test is moot and inappropriate for percentage data (the number of prescriptions in the denominator ranged from 30 to 217), but *the computer will do anything you want*.

The scary issue here is the proposed ensuing “analysis” resulting from whether the data are normal. If data are normally distributed, doesn’t that mean that there are no outliers? But suppose outliers are present—doesn’t this mean that they are atypical? In fact, wouldn’t their presence tend to inflate the traditional calculation of standard deviation? But wait, the data passed the normality test... it's all so confusing!

Yet that doesn't seem to stop our quality police from lowering the “gotcha” threshold to two or even one standard deviation to find outliers (in my experience, a* very* common practice).

Returning to the protocol, even scarier is what's proposed if the distribution isn't normal: Establish an *arbitrary* cutoff point—a WAG for what the administrator feels it *should* be.

I'll play his game: Because the data pass the normality test, the graph below shows the suggested analysis with one, two, and three standard deviation lines drawn in around the mean. (The standard deviation of the 51 numbers was 10.7.)

**Get out the Ouija boards!**

Depending on the analyst’s mood and the standard deviation criterion subjectively selected, he or she could claim to statistically find one—or 10—upper outliers. (What about lower outliers?) Even worse, he or she could have just as easily used the WAG approach, decided that 15 percent was what the standard “should” be, and given feedback to the 27 physicians above 15 percent. Or maybe a “tougher” standard of 10 percent could be set, in which case 35 physicians would receive feedback, consisting of a wealth of educational material. Then there is the tried-and-true, “Let’s go after the top quartile (or 10%... or 15%... or 20%).” When I present this to a roomful of doctors, there is raucous laughter and a collective pantomime of people throwing things into the garbage when I ask what they do with such “helpful” feedback.

What's not so funny is that this and similar SWAGs are fast becoming “simple... obvious... and wrong” techniques in the current pay-for-performance craze in healthcare. Who knows? Maybe some of these schemes will even involve the inverse hyperbolic tangent function, so my friend’s training will not have gone to waste.

As my poor friend said, “It all seems a little ridiculous to me.”

## Comments

## Feedback

QD wouldn't print what I'm thinking about those "executives". Aside from all the statistical machinations, is there anywhere in this scenario where someone says "Let's ask the Doctors why they make the decisions they do?" or "Is there a relationship between the cost of the medicine and its efficacy?"

It reminds me of my friend's anecdote of the President who asked why the Process Average wasn't 100%. Please tell me your correspondent can find neurons firing somewhere in his place of work.

## Add new comment