Featured Product
This Week in Quality Digest Live
Quality Insider Features
Akhilesh Gulati
To solve thorny problems, you can’t have either a purely internal or external view
Daniel Croft
Noncontact scanning for safer, faster, more accurate, and cost-effective inspections
National Physical Laboratory
Using Raman spectroscopy for graphene and related 2D materials
Ashley Hixson
Partnership with Hexagon’s Manufacturing Intelligence division provides employable metrology skills
Lily Jampol
Here’s why that’s a problem

More Features

Quality Insider News
Alliance will help processors in the US, Canada, and Mexico
Makes it easy to perform all process steps, from sample observation to data analysis
General, state-specific, and courses with special requirements available
New features revolutionize metrology and inspection processes with nondimensional AI inspection
Annual meeting in Phoenix, April 26–28
Engineering and computer science students receive new lab and learning opportunity
Strategic partnership expands industrial machining and repair capabilities

More News

Davis Balestracci

Quality Insider

Statistical Stratification... of Sorts

SWAGs remain alive and well

Published: Thursday, September 18, 2014 - 15:19

I chatted about u-charts for rates last time, and this column was going to be about p-charts for percentage data. These are the two major charts for dealing with count data and are helpful for stratifying a stable section of process performance.

But something recently happened that saddens me and has become all too common in many organizations for which I have consulted. It reminded me of the need to warn you about a very common approach to (allegedly) stratify data—to find the “bad” performers. I have a wonderful data set using percentages on which, next time, I will demonstrate the proper analysis and interpretation via p-charts; but I am going to use it today to make a major point about something to be avoided at all costs.

I have been mentoring a very good data analyst for the past three years. Despite the support of the medical director, it has been pretty much an all-out war with the C-suite executives to implement “data sanity”—resistance, to put it mildly, has been fierce from the start. I received the following note from this analyst last week:

“I’m sorry to report that it appears control charts [of key indicators] are nearly dead.... As of last week, they have been pulled off all but one report....

“In other news [the organization for which he works] has moved towards lean Six Sigma. The first Black Belt ‘course’ is being offered right now—and I have been ‘drafted’ to teach the statistics portion.... I have been working on my slides over the past couple of weeks, and I must say that I don't understand where any of this is going to come in handy for quality directors.... I spent half an hour trying to find information on calculating the confidence interval for the correlation coefficient by hand. It involves the inverse hyperbolic tangent function.... I'm sure everyone will get that one, right?!?

“It all seems a little ridiculous to me.”

It is courses like these that lead to consequences and analyses such as the ones I am about to describe—techniques inappropriately used to convert a wild-ass guess (WAG) into a statistical wild-ass guess (SWAG).

I can't make this stuff up

Published rankings with feedback are very often used as a cost-cutting measure to identify and motivate “those bad workers.” Some are even derived, er... uh... “statistically?”

In an effort to reduce unnecessary expensive prescriptions, a pharmacy administrator developed a proposal to monitor and compare individual physicians’ tendencies to prescribe the most expensive drug within a class. Data were obtained for each of a peer group of 51 physicians—the total number of prescriptions written and, of that number, how many were for the target drug.

Someone was kind enough to send me this proposal, while begging me not to be identified as the source. I quote it verbatim as it applies to results from the 51 physicians:

1. “Data will be tested for the normal distribution”

2. “If distribution is normal—physicians whose prescribing deviates greater than one or two standard deviations from the mean are identified as outliers”

3. “If distribution is not normal—examine distribution of data and establish an arbitrary cutoff point above which physicians should receive feedback (this cutoff point is subjective and variable based on the distribution of ratio data)”

For my own amusement, I tested the data for normality and it “passed” (p-value of 0.277). Yes, I said “for my own amusement” because this test is moot and inappropriate for percentage data (the number of prescriptions in the denominator ranged from 30 to 217), but the computer will do anything you want.

The scary issue here is the proposed ensuing “analysis” resulting from whether the data are normal. If data are normally distributed, doesn’t that mean that there are no outliers? But suppose outliers are present—doesn’t this mean that they are atypical? In fact, wouldn’t their presence tend to inflate the traditional calculation of standard deviation? But wait, the data passed the normality test... it's all so confusing!

Yet that doesn't seem to stop our quality police from lowering the “gotcha” threshold to two or even one standard deviation to find outliers (in my experience, a very common practice).

Returning to the protocol, even scarier is what's proposed if the distribution isn't normal: Establish an arbitrary cutoff point—a WAG for what the administrator feels it should be.

I'll play his game: Because the data pass the normality test, the graph below shows the suggested analysis with one, two, and three standard deviation lines drawn in around the mean. (The standard deviation of the 51 numbers was 10.7.)

Get out the Ouija boards!

Depending on the analyst’s mood and the standard deviation criterion subjectively selected, he or she could claim to statistically find one—or 10—upper outliers. (What about lower outliers?) Even worse, he or she could have just as easily used the WAG approach, decided that 15 percent was what the standard “should” be, and given feedback to the 27 physicians above 15 percent. Or maybe a “tougher” standard of 10 percent could be set, in which case 35 physicians would receive feedback, consisting of a wealth of educational material. Then there is the tried-and-true, “Let’s go after the top quartile (or 10%... or 15%... or 20%).” When I present this to a roomful of doctors, there is raucous laughter and a collective pantomime of people throwing things into the garbage when I ask what they do with such “helpful” feedback.

What's not so funny is that this and similar SWAGs are fast becoming “simple... obvious... and wrong” techniques in the current pay-for-performance craze in healthcare. Who knows? Maybe some of these schemes will even involve the inverse hyperbolic tangent function, so my friend’s training will not have gone to waste.

As my poor friend said, “It all seems a little ridiculous to me.”

Discuss

About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.

Comments

Feedback

QD wouldn't print what I'm thinking about those "executives". Aside from all the statistical machinations, is there anywhere in this scenario where someone says "Let's ask the Doctors why they make the decisions they do?" or "Is there a relationship between the cost of the medicine and its efficacy?"

It reminds me of my friend's anecdote of the President who asked why the Process Average wasn't 100%. Please tell me your correspondent can find neurons firing somewhere in his place of work.