



© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 09/03/2019
As statistical methods become more embedded in everyday organizational quality improvement efforts, I find that a key concept is often woefully misunderstood, if it is even taught at all. W. Edwards Deming distinguished between two types of statistical study, which he called “enumerative” and “analytic.”
The key need in quality improvement is that statistics should relate to reality, which then lays the foundation for a theory of using statistics (analytic). Whether you realize it or not, the perspective from which virtually all college courses and many belt courses are taught is population-based (enumerative), its purpose is estimation.
In a real-world environment, this becomes questionable at best because everyday processes are usually not static populations. Deming was emphatic that the purpose of statistics in improvement is prediction; the question becomes, “What other knowledge beyond probability theory is needed to form a basis for action in the real world?”
Think of population-based statistics as studying a static pond, and a designed study going even further to create a custom-made pond like a swimming pool—a sanitized version of a pond, much easier to study and sample because of reduction of “nuisance” (i.e., everyday) variation.
Beyond design of the actual study circumstances, the statistical data processes now come into play: 1) measurement definition, 2) appropriate data collection (which includes the process of choosing the sample), so that 3) any statistical analysis is appropriate, and 4) correct interpretation of the analysis results.
In a research study, the variation of each of these processes should be (and usually are) tightly controlled. This makes the application of enumerative methods valid... for the specific sample of experimental units chosen for this specific study.
Inevitably, as results from a study are applied, this...
...has now become this:
What was easy in a “swimming pool” environment now becomes much more complicated; the real world is more like a whitewater rapids. Not only that, but uncontrolled variation manifests in the four statistical data processes as well:
“Random sample” has an entirely different meaning in a minimally controlled, semi-chaotic environment—it’s not possible. A good example is an everyday medical environment with patients flowing in and out. You cannot take repeated samples from the exact same population, except in rare cases.
Analytic statistics are very concerned with where and how one should sample.
For example, we may take a group of patients who attend a particular clinic and suffer from arthritis. But the resulting sample is not necessarily a random sample of the patients who will be treated in the future at that same clinic. Still less is it a random sample of the patients who will be treated in any other clinic.
In fact, the patients who will be treated in the future will depend on choices that others have not yet made. And those choices will depend on the results of any study we are doing, and on studies by other people that may be carried out in the future. R. A. Fisher called it “a hypothetical infinite population” that neither yet exists nor ever will exist, i.e., imaginary.
And there is an additional issue of how the impact of variation in a particular environment on a theoretical result compares to what could happen in yet another environment (below), and the same is true for any benchmarking:
The late David Kerridge, one of the world’s leading Deming thinkers, wrote:
“Suppose that we compare two antibiotics in the treatment of some infection. We conclude that one did better in our tests. How does that help us?
“Suppose that all our testing was done in one hospital in New York in [2013]. But we may want to use the antibiotic in Africa in [2016]. It is quite possible that the best antibiotic in New York is not the same as the best in a refugee camp in Zaire. In New York the strains of bacteria may be different, and the problems of transport and storage really are different. If the antibiotic is freshly made and stored in efficient refrigerators, it may be excellent. It may not work at all if transported to a camp with poor storage facilities.
“And even if the same antibiotic works in both places, how long will it go on working? This will depend on how carefully it is used, and how quickly resistant strains of bacteria build up.
“This may seem an extreme case, and it is. But in every application of statistics, we have to decide how far we can trust results obtained at one time and under one set of circumstances as a guide to what will happen at some other time under new circumstances.”
Statistical theory, as it is stated in most textbooks (enumerative), simply analyzes what would happen if we took repeated, strictly random samples, from the same population under circumstances in which nothing changes with time. Enumerative analyses or studies either naively assume no possible influence of outside variation or have the luxury of tightly controlling it as part of a study’s design.
Unfortunately, the potential influence of outside variation usually continues to be ignored even after the study. When it’s time to actually apply the result, analytic statistics’ purpose is to anticipate and formally study the manifestations of such outside, uncontrolled variation. This approach to get more information inherently improves the situation because when you understand—rather than ignore—the sources of uncertainty, you understand how to reduce it.
In the case of medicine, analytic statistics are totally different from the clinical trial mindset in which most physicians have been taught, and in which “tight control” is an understatement. A good example of this stark contrast can be demonstrated in the case of hospital-acquired infections. Let’s say that a statistically significant result from a well-designed enumerative study has been found to eliminate them, i.e., apply the result, and you shouldn’t have them. With enumerative thinking, the post-application tendency would then be to treat any occurrence of an infection (undesirable variation) as a special cause. This is why people are drowning in root cause analyses. This would be helpful in the case of an outbreak.
But in terms of everyday work, one usually has to take the view that the environment might be “perfectly designed” to have such infections, in which case a common-cause strategy would be warranted. It is only by “plotting the dots”—the basis of analytic statistics—that you will be able to distinguish between the two.
And even if one is careful enough to take this view and use the currently in vogue technique of “rapid-cycle PDSA,” (plan-do-study-act ) everyday variation rears its ugly head again.
Besides the subsequent study of the variation of the actual application process, do people even consider studying the variation in each of the four data processes? If not, there is a very real danger: People could act on the basis of interpreting variation due solely to any or all of the four data processes. When these processes are ad hoc and not formally designed, there is real danger for variation from these data processes to overshadow and cloud the study of any variation—beneficial or otherwise—caused by the actual process being tested.
In many applications I’ve observed, I barely see even cursory consideration of the environmental (cultural) effects of variation on the four data processes. The result? Vague plans leading to vague data and vague results. Wouldn’t it be easier to test the study result if the data processes had their variation minimized as part of the plan so as not to cloud the interpretation of the tested process application? This is a most nontrivial process.
“If I had to reduce my message to management to just a few words, I’d say it all has to do with reducing variation.”
—W. Edwards Deming
Links:
[1] https://www.linkedin.com/pulse/maybe-its-time-do-root-cause-analysis-obsession-davis-balestracci?trk=prof-post