PROMISE: Our kitties will never sit on top of content. Please turn off your ad blocker for our site.
puuuuuuurrrrrrrrrrrr
Davis Balestracci
Published: Monday, March 5, 2012 - 12:39 For those of us practicing improvement in a medical culture, presenting this “funny new statistical way” of doing things to a physician audience triggers a predictable stated reason: “This isn’t in line with rigorous, double-blind clinical trial research.” And your response should be, “True! Nor could it be, nor should it be.” Clinical trial research statistical methods make assumptions and control variation in ways that can’t be replicated in the unstable environment of the real world, making them less suitable for improvement. This is true for any work environment as well. Most basic academic statistics requirements are based in a context of “estimation” and teach methods appropriate for research. These, unfortunately, have limited applicability in everyday work, which is based on process-oriented thinking—a concept foreign to most academics—and whose need is “prediction.” This affects data collection, use of statistical tools, and validity of analyses. Ultimately, a disarmingly and elegantly simple analysis will yield far more profound and productive questions than a typical, overly complicated (alleged) statistical analysis. Suppose you have been getting an increasing number of vague anecdotes that it “feels” like the cardiac surgery mortality rate has been increasing of late. Not only that, but the organization is not making progress toward attaining a published national benchmark of 3.5 percent mortality rate. There are three hospitals in your system doing this type of surgery, and the tabular summary performance data for the last 30 months is shown below: You ask your eager local statistical “guru” (LSG) to analyze the data, and you receive a report that states: 1. “Pictures are very important. A comparative histogram was done to compare the distributions of the mortality rates. At a first glance, there seem to be no differences.” (figure 1) Figure 1: Histogram comparison of cardiac mortality performance 2. “The three data sets were then statistically tested for the assumption of normality. The resulting analysis showed that we can assume each to be normally distributed (p-values of 0.502, 0.372, and 0.234, respectively, all of which are > 0.05); however, we have to be cautious. Just because the data pass the test for normality does not necessarily mean that the data are normally distributed; only that, under the null hypothesis, the data cannot be proven to be non-normal.” 3. “Since the data can be assumed to be normally distributed, I proceeded with the analysis of variance (ANOVA) and generated the 95-percent confidence intervals” (figure 2): Figure 2: One-way analysis of variance (ANOVA) 4. “The p-value of 0.850 is greater than 0.05. Therefore, we can reasonably conclude that there are no statistically significant differences among these hospitals’ cardiac mortality rates as further confirmed by the overlapping 95-percent confidence intervals.” 5. “Regarding comparison to the national benchmark of 3.5 percent, none of the hospitals are close to meeting it. There will need to be a systemwide intervention at all three hospitals. I recommend that we benchmark an established hospital and copy their best practices systemwide.” Has all the potential jargon been utilized? This includes: mean, median, standard deviation, normal distribution, histogram, p-value, analysis of variance (ANOVA), 95-percent confidence interval, null hypothesis, statistical significance, F-test, degrees of freedom, and benchmark. Do you realize that this LSG’s analysis is totally worthless? Here are three questions that should become a part of every improvement professional’s vocabulary whenever faced with a set of data for the first time: • How were these data collected? • Were the systems that produced these data stable? Hence, all data have an implicit “time order” element that allows a necessary assessment of the stability of the process or system producing the data. It is always a good idea as an initial analysis to plot any data in its naturally occurring time order to assess, formally, the process stability. This was not done for this set of data. Otherwise, as you will see, many common statistical techniques could be rendered invalid. This puts one at risk for taking inappropriate actions. • Were the analyses appropriate, given the way the data were collected and the stability state of the systems? And your LSG also concluded that there were no statistically significant differences amongst the hospitals’ mortality rates. Here are the three simple time plots for the individual hospitals. The individual median of each hospital’s 30 data points has been added as a reference line, making them run charts (figure 3): Figure 3: Run charts of hospital cardiac mortality rates Note that just by “plotting the dots,” you have far more insight. Won’t this result in the ability to ask more incisive questions whose answers will lead to more productive system improvements? Compare this to outputs typically encountered, such as bar graphs, pages of summary tables, and the “sophisticated” statistical analyses full of jargon. From your experience, what questions do people ask from those? Are they generally even helpful? Does anything change as a result? Health care workers are very smart people. Unfortunately, they will, with the best of intentions, come up with theories and actions that could unwittingly harm a system. Or worse yet, they might do nothing because “there are no statistical differences” among the systems. Or they might decide, “We need more data.” Without common theory, there will be variation in how a roomful of people perceive and want to act on variation. A potential new conversation will be shared in my next column that will once again shed light on my favorite answer to: “What should we do?”—“It depends.” Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.An Elegantly Simple but Counterintuitive Approach to Analysis
Without common theory, there will be variation in how people perceive and act on variation
A medical scenario (not real data)
Three routine questions
1. How were these data defined and collected, and were they collected specifically for the current purpose?
2. Were the processes that produced these data stable?
3. After considering No. 1 and No. 2, were any analyses appropriate?In the context of these mortality data
The table was a descriptive statistical summary of the 30 previous months of cardiac mortality rates for three hospitals. These hospitals all subscribed and fed into the same computerized data collection process, so at least the definitions are consistent.
This might be a new question for you. There are two key concepts to any robust improvement process:
1. Everything is a process.
2. All processes occur over time.
“But the data passed the Normal distribution test. Isn’t that all you need to know to proceed with the standard statistical analysis?” you ask. Early in my career, I believed this.No difference?
No difference!
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Davis Balestracci
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.