Featured Product
This Week in Quality Digest Live
Quality Insider Features
Chris Caldwell
Significant breakthroughs are required, but fully automated facilities are in the future
Dawn Bailey
Helping communities nurture the skilled workforce of the next generation
Leah Chan Grinvald
Independent repair shops are fighting for access to vehicles’ increasingly sophisticated data
Brent Simpson
Even if it works in your favor
Adam Zewe
How do these systems differ from other AI?

More Features

Quality Insider News
Funding will scale Aigen’s robotic fleet, launching on farms in spring 2024
3D printing technology enables mass production of complex aluminum parts
High-end microscope camera for life science and industrial applications
Three new models for nondestructive inspection
Machine learning identifies flaws in real time
Developing tools to measure and improve trustworthiness
On-demand modules provide introduction and fundamentals for pumps and systems
Partnership with local manufacturers to upskill workers for emerging technology

More News

Davis Balestracci

Quality Insider

An Elegantly Simple but Counterintuitive Approach to Analysis

Without common theory, there will be variation in how people perceive and act on variation

Published: Monday, March 5, 2012 - 12:39

For those of us practicing improvement in a medical culture, presenting this “funny new statistical way” of doing things to a physician audience triggers a predictable stated reason: “This isn’t in line with rigorous, double-blind clinical trial research.” And your response should be, “True! Nor could it be, nor should it be.”

ADVERTISEMENT

Clinical trial research statistical methods make assumptions and control variation in ways that can’t be replicated in the unstable environment of the real world, making them less suitable for improvement. This is true for any work environment as well.

Most basic academic statistics requirements are based in a context of “estimation” and teach methods appropriate for research. These, unfortunately, have limited applicability in everyday work, which is based on process-oriented thinking—a concept foreign to most academics—and whose need is “prediction.” This affects data collection, use of statistical tools, and validity of analyses.

Ultimately, a disarmingly and elegantly simple analysis will yield far more profound and productive questions than a typical, overly complicated (alleged) statistical analysis.

A medical scenario (not real data)

Suppose you have been getting an increasing number of vague anecdotes that it “feels” like the cardiac surgery mortality rate has been increasing of late. Not only that, but the organization is not making progress toward attaining a published national benchmark of 3.5 percent mortality rate. There are three hospitals in your system doing this type of surgery, and the tabular summary performance data for the last 30 months is shown below:

coronary surgery data

You ask your eager local statistical “guru” (LSG) to analyze the data, and you receive a report that states:

1. “Pictures are very important. A comparative histogram was done to compare the distributions of the mortality rates. At a first glance, there seem to be no differences.” (figure 1)

https://lh4.googleusercontent.com/NEoLU8Iq48waw329G5eqYS4p_tkCBrXdShLk9VLvoNm0nuDzoh0nXy-09sHAAafIUgKbjLgHWsqjyMKTRQNPQN6dm0vGkH1zgxybKl4doZADizyHKIY

Figure 1: Histogram comparison of cardiac mortality performance

2. “The three data sets were then statistically tested for the assumption of normality. The resulting analysis showed that we can assume each to be normally distributed (p-values of 0.502, 0.372, and 0.234, respectively, all of which are > 0.05); however, we have to be cautious. Just because the data pass the test for normality does not necessarily mean that the data are normally distributed; only that, under the null hypothesis, the data cannot be proven to be non-normal.”

3. “Since the data can be assumed to be normally distributed, I proceeded with the analysis of variance (ANOVA) and generated the 95-percent confidence intervals” (figure 2):

analysis of variance

Figure 2: One-way analysis of variance (ANOVA)

4. “The p-value of 0.850 is greater than 0.05. Therefore, we can reasonably conclude that there are no statistically significant differences among these hospitals’ cardiac mortality rates as further confirmed by the overlapping 95-percent confidence intervals.”

5. “Regarding comparison to the national benchmark of 3.5 percent, none of the hospitals are close to meeting it. There will need to be a systemwide intervention at all three hospitals. I recommend that we benchmark an established hospital and copy their best practices systemwide.”

Has all the potential jargon been utilized? This includes: mean, median, standard deviation, normal distribution, histogram, p-value, analysis of variance (ANOVA), 95-percent confidence interval, null hypothesis, statistical significance, F-test, degrees of freedom, and benchmark.

Do you realize that this LSG’s analysis is totally worthless?

Three routine questions

Here are three questions that should become a part of every improvement professional’s vocabulary whenever faced with a set of data for the first time:
1. How were these data defined and collected, and were they collected specifically for the current purpose?
2. Were the processes that produced these data stable?
3. After considering No. 1 and No. 2, were any analyses appropriate?

In the context of these mortality data

• How were these data collected?
The table was a descriptive statistical summary of the 30 previous months of cardiac mortality rates for three hospitals. These hospitals all subscribed and fed into the same computerized data collection process, so at least the definitions are consistent.

• Were the systems that produced these data stable?
This might be a new question for you. There are two key concepts to any robust improvement process:
1. Everything is a process.
2. All processes occur over time.

Hence, all data have an implicit “time order” element that allows a necessary assessment of the stability of the process or system producing the data.

It is always a good idea as an initial analysis to plot any data in its naturally occurring time order to assess, formally, the process stability. This was not done for this set of data. Otherwise, as you will see, many common statistical techniques could be rendered invalid. This puts one at risk for taking inappropriate actions.

• Were the analyses appropriate, given the way the data were collected and the stability state of the systems?
“But the data passed the Normal distribution test. Isn’t that all you need to know to proceed with the standard statistical analysis?” you ask. Early in my career, I believed this.

And your LSG also concluded that there were no statistically significant differences amongst the hospitals’ mortality rates.

No difference?

Here are the three simple time plots for the individual hospitals. The individual median of each hospital’s 30 data points has been added as a reference line, making them run charts (figure 3):

https://lh5.googleusercontent.com/vjnlynI31iSVvWK6V0oI39cYsOYpRMYnYnPIEhwn3PqA7foIbofQTsah2lN4LHrW5ZWSflMC_Ha2tuCVdhCKgwKt8jS5YyipDEJcfbpw2DObJMNoTLg
https://lh4.googleusercontent.com/osY_IlJv_rxAR58FjbiMRKuEUTYvN9qpkmiCNy6gdhGQXvG3oU72Fj7jnpz8Dal9MbtNSQkAL250V498CamDNNFe3_Ui3Ym0L25S0f1PP-65kPcEPxw
https://lh4.googleusercontent.com/TUWG7R9NBBPrZWSJCBY98yo0S2AZVALAn_2XgLsGtDHP3eYyTgS25cG1--nIGLYZKdObZcJo9YirqM4LaZuLp7UbOMEdQlc2gasGXw57ThTd-uhGQMQ

Figure 3: Run charts of hospital cardiac mortality rates

No difference!

Note that just by “plotting the dots,” you have far more insight.

Won’t this result in the ability to ask more incisive questions whose answers will lead to more productive system improvements?

Compare this to outputs typically encountered, such as bar graphs, pages of summary tables, and the “sophisticated” statistical analyses full of jargon. From your experience, what questions do people ask from those? Are they generally even helpful? Does anything change as a result?

Health care workers are very smart people. Unfortunately, they will, with the best of intentions, come up with theories and actions that could unwittingly harm a system. Or worse yet, they might do nothing because “there are no statistical differences” among the systems. Or they might decide, “We need more data.” Without common theory, there will be variation in how a roomful of people perceive and want to act on variation.

A potential new conversation will be shared in my next column that will once again shed light on my favorite answer to: “What should we do?”—“It depends.”

Discuss

About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.