Predictable
Why has it taken so long to understand that processes need analytic methods, not enumerative ones?
Published: Wednesday, June 13, 2018  11:03
Quality is related to processes. A process is “a series of actions or steps taken in order to achieve a particular end.” It doesn’t matter whether the process is the handling of invoices, customers in a bank, the manufacture or assembly of parts, insurance claims, the sick passing through a hospital, or any one of thousands of other examples. A process involves movement and action in a sequential fashion.
Every quality professional is concerned about the improvement of processes. By making processes better, we get less waste, lower costs, and happier customers.

The image above depicts two opposed states: a dynamic, changing state and a static state. The lake is static, unchanging. We might take temperature measurements in the lake at different depths and come back tomorrow to find no difference. We might take samples of the lake water to compare with other lakes at a later date when we travelled to them.
By contrast, the stream is dynamic. It changes second to second. It is a combination of myriad chaotic processes that would take myriad NavierStokes equations to solve—that is, once the Millennium Prize had been won showing how to solve a NavierStokes equation. Measure the flow rate in different parts of the stream, and you would not be surprised to find constant changes.
The stream represents the changing and dynamic businesses with which we are all familiar and are all concerned with improving. W. Edwards Deming referred to the methods to study dynamic systems as “analytic.” He referred to the lake as being studied using “enumerative” methods.
Researchers in lake compositions, psychologists, and demographers all use enumerative methods. The first use of such methods was carried out by John Arbuthnot, who published the first statistical test in 1710. PierreSimon, Marquis de Laplace, pioneer of the Laplace transform, in 1812 issued his Théorie Analytique des Probabilities, which laid down many of the fundamentals of statistics. Such statistics are based on the normal distribution, derived by Carl Friedrich Gauss in 1809. During the ensuing centuries, statistical tests, or hypothesis tests, were devised by men such as K. B. Wilson, George Box, David Cox, George Mood, David Mann, D. Ransom Whitney, William Kruskal, W. Allen Wallis, and Milton Friedman. There’s lots of wonderful and interesting statistics. None include the element of time. They are not designed for process improvement.
Whilst enumerative methods proved powerful, they were unsuitable for dynamic, analytic situations. In 1944 Walter Shewhart made the brilliant observation that: “Classical statistics start with the assumption that a statistical universe exists, whereas [SPC] starts with the assumption that a statistical universe does not exist.” That is, while enumerative statistics usually rely on assumptions about the distribution of data, commonly the normal distribution, the prediction of the behavior of processes does not. We can never know the distribution for a changing process.
Shewhart’s discovery was that classical, enumerative statistics were inappropriate for process improvement. On this basis, he created the control chart for analytic analysis.
This led Deming to state in 1986 that “The student should avoid passages in books that treat confidence intervals and tests of significance, as such calculations have no application in analytic problems in science and industry.... Analysis of variance, ttest, confidence intervals, and other statistical techniques taught in the books, however interesting, are inappropriate because they bury the information contained in the order of production.... a confidence interval has no operational meaning for prediction, hence provides no degree of belief in planning.”
During the 32 years since Deming’s illuminating statements, the message still hasn’t sunk in for most folk. Today, most process improvement courses focus on enumerative methods, that is, hypothesis testing. This is exactly what Deming warned against. Quality has regressed to the days before 1944. It is hardly surprising that a survey showed 80 percent of Six Sigma improvement projects fail (of those brave enough to admit failure).
Deming’s key word for process improvement is prediction. Businesses want to be able to be sure not only that their processes will be improved, but also that they will stay that way into the future. Shewhart derived his control charts with statistical knowledge, but based on economics. His charts indicate when a process is predictable and when special causes that disrupt stability are likely to exist and should be investigated. Most important, a control chart is not a probability chart. It does not give probabilities of a process being predictable or otherwise. It does not depend on any particular data distribution, in the way that enumerative methods do.
A preoccupation with enumerative methods has led many people to falsely believe data need to be normally distributed for control charting. Have a look at the data distributions below. It is difficult to imagine just what processes might produce them, but which of the processes below do you think could be charted with a control chart, without any data manipulation of any kind? That is, which can be control charted without pressing a button on ridiculously expensive statistical software to torture the data to make it confess?
The answer is all of them. It doesn’t matter what the histogram or the data distribution look like. Data from any distribution can be charted on a control chart. Furthermore, simple XbarR charts, manually drawn, as Shewhart did, produce just as good results as those from folk selling software to draw XbarS charts.
Importantly, all control charts have an averages and a ranges chart. Leptokurtophobes (those fearing nonnormality) will be leaping for their Librium if they care to look at the distribution for ranges in the example below (subgroup of three, calculated for normally distributed data). Despite the normal distribution for the averages chart in this example, the ranges chart is far from normal. That is, the prediction of process behavior is not affected by either averages, or ranges data being nonnormal.
We have created a system that can demonstrate the power of analytic methods and control charts to students. QSkills3D is an interactive 3D training product in quality. It has been built using what is known as a 3Dgaming engine. This engine simulates realworld physics and behaviors. Using it, we can investigate and wonder about dynamic systems from our laptops.
QSkills3D includes interactive games, simulations, and exercises across the gamut of process improvement. One is a ship game based on a realworld, historical story of process improvement. The game can be used for histogram and control chart training, as well as explaining the meaning of “world class quality.” It does not use “defects (i.e., misses) per million opportunities (shots)” or “zero defects (misses).” These were tossed overboard into the briny 48 years ago by Genichi Taguchi, with Deming and Donald Wheeler in support. Sadly, many still cling desperately to this waterlogged jetsam of quality. Instead, it calculates an “on target with minimum variance” score for each student.
The ship game simulates a situation that occurred more than a century ago, on a heaving sea, with a yawing, pitching, rolling vessel, shooting at another ship. As you can imagine, the hit rate was terrible. For the game simulation, the data for a very experienced gunner (guess who) are plotted on the left below. As you might expect, the results are excellent and in control. We can depend on the gunner to produce consistent shooting in the future.
Now suppose we take the data and swap three pairs of values. Draw another control chart. It is using exactly the same data but with a different sequence. We get the result on the right, below. You might imagine this result as coming from a second gunner. He might appear to be a better gunner, but his control chart shows an out of control point. We cannot depend on his shooting. He is unpredictable.
Click for larger view 
Remember that the data for these two gunners are identical—except for the order. Any enumerative test would not have been able to distinguish between these two performances. As Deming emphasized, hypothesis tests are inappropriate because they bury sequence of the data.
It is also important to look at the histogram for the data. The histograms are also identical, and the data are clearly nonsymmetrical. It is not normally distributed data. Enumerative methods are commonly based on the assumption of normal data, but skewed data, such as these, are fine for control charting.
It is important that you should never draw a normal distribution over a histogram for an analytic process. Not only does it provide no benefit and is meaningless, but it also can hide the voice of the process speaking in the histogram. In this case, we can see that the gunner seems more likely to overshoot than undershoot. We might decide to collect more data and to investigate the causes.
Although we can eliminate special causes, such as wild shooting and inexperience, a process improvement requires a system change. This is what a clever admiral did in 1898 to achieve dramatically better quality. The ship game simulates the process improvement that he devised. Students play the game again with the process improvement, and control limits are adjusted.
It seems incredible that it took decades for such a simple yet brilliant idea to be adopted in the Navy. Yet how incredible is it that there is still such poor understanding of Shewhart’s and Deming’s simple yet brilliant control charts? Why has it taken so long for people to understand that processes need analytic methods, not enumerative ones?
Enumerative studies are suited to studying existing, fixed populations, where the population can be fully characterized. The analytic methods of Shewhart are used to study changing processes and to predict their future behavior.
Acknowledgement
I would like to thank Scott Hindle for his thoughtful contributions, comments, and discussion on this article.
Comments
enumerative and analytic methods
Dear Dr. Burns,
This is one of the powerful articles that enlighten anyone about the importance of using tools. But as just begginner in stats, could you please shed light about what are enumerative tools and analytics tools? Is the SPC only considered as analtics tools? Where to start to my study about the topics?
Thanks in advance for your help
The difference between analytic and enumerative studies
In the earliest reference for which I am aware, Deming proposed the differences between the two types of studies in his seminal work on curvefitting, Some Theory of Sampling. You can still find Dover editions of that book.
Deming stated that the difference between the two types of studies is the intent. Essentially, enumerative studies deal with a population. You are sampling (collecting statistics) from the population to estimate its parameters, with the intention of taking action on some aspect of the population. The data are static: At least in principle you know every member of the population and can attribute any characteristics of those members to the moment in time when you drew the sample. It is like examining a snapshot with a lot of holes in it, using what you can glean from the information you have to fill in those holes and estimate what the whole picture would look like.
Analytic studies are intended for action on a cause system. They are not static, but dynamic. In an analytic study there is no population of interest. We are trying to sample from the past or present, and studying those data to try to extrapolate into the future. It's a little like watching a movie and looking at what's causing the current actions on the screen, and using that information to try to figure out what will happen next.
Here's where it gets tricky (example courtesy of David Kerridge): In Out of the Crisis, Deming used an example to illustrate operational definitions. It was a destructive testing example to test whether a blanket was 50% wool. In the operational definition, he outlines a test procedure where the analyst punches 10 holes one inch in diameter, centered by random numbers, then tests the content of that sample. If the sample proves to be greater than 50% wool (plus or minus 2% if I remember right), then the blanket can be considered to be "50% wool."
So, that test would be an enumerative study...the "population" is all the fabric in the blanket, the sample comprises the circles punched from the wool, and we are extrapolating from those circles to the remainder of the blanket. It would be appropriate, in this case, to compute at least a confidence interval around the average wool content from the sample.
However suppose this test were conducted at some regular time interval...say, once or twice per day. If we use the numbers to run an averages and ranges chart (or an individuals and ranges chart) to get the average and control limits over time (to look for special causes or trends), we have used our enumerative study results to get data for an analytic study. If we were doing that, then we would just get the average wool content from each sample, and not run any sort of confidence interval or ttest.
Hopefully this helps, Mohamed. If you've read the rest of this discussion, you can see that there is (lamentably) little in the literature any more about this topic.
The strong argument is demonstrated very simply.
Good afternoon, Tony
Excellent article. The strong argument is demonstrated very simply.
I fully support your attitude toward analytical research with the help of the Shewhart control charts
Yours faithfully,
Sergey Grigoryev, DEMING.PRO
Thank you Sergey.
Thank you Sergey.
ChiSquare
With Dr Wheeler's permission, I would like to post his wonderful email response to me, to this article on Chi Square Analysis, on the ASQ forum. The article is from a company attempting to sell enumerative tools for process improvement goo.gl/ToUJu7 How many people buy such statistical software and start pressing buttons without any clue as to what they are doing, or any understanding of the difference between enumerative and analytic methods?
"
If you have proportions for three or more conditions, and if you are willing to assume that each of the areas of opportunity represents a single universe, then you may compare the conditions by using an approximate procedure known as the chisquare contingency test.
The assumption about each of the areas of opportunity representing a single universe is simply the generalized version of the binomial assumption that all of the items (in any one condition) have the same chance of possessing the attribute being counted.
For example, the counts of units failing the online test for each of three shifts are:
Day shift, 85 out of 955;
Swing shift, 46 out of 940;
Night shift, 39 out of 947.
This results in a 3 by 2 contingency table, having 2 degrees of freedom. The Chisquare statistic is 22.3, and the 95th percentile is 5.99, so we can say the three shifts HAD different proportions of units that failed the online test.
But what does this mean? Should the three shifts be the same? Or is there some reason the day shift should have more failures?
This analysis does not consider whether or not the failure rates were constant throughout each shift.
If the failure rate is changing throughout a shift, what do the counts above represent?
If the difference between the shifts persists over time, then there might be a systematic reason for the difference. The analysis above merely assumes that the failure rates were constant and proceeds to draw a conclusion that the shifts WERE different. This is no guarantee that they will be different tomorrow.
Both the nature of the inference and the quality of the information provided by this analysis is fundamentally different from that provided by process behavior charts.
On those rare occasions when you can safely assume the data are homogeneous, the traditional approaches make sense. In all other cases let the user beware.
While the chisquare test can be extended to categorical data for three or more conditions, the same issues apply. If we have temporal order, put the data on XmR charts by category and condition. If we do not have temporal order, we have nothing more than an enumerative study which may or may not predict anything.
"
"
It is all in making the distinction between
What is?
and
Why it is?
and
Whether it will stay that way?
"
Always Thoughtful, Useful Wisdom from Dr. Burns!
This one is another keeper!
Thank you for your kind words
Thank you for your kind words Kay.
Hypothesis Testing
Are you saying Quality professionals have no use for Hypothesis Testing? I understand Hypothesis Tests assume static states. What if I am trying to reduce variation in a molding process (process improvement) and want to compare products coming from two cavities. Of course to compare means from the 2 cavities the mean would have to be a useful statistic (implying a stable process characterized by a single distribution). I do not see why you are bashing the fact that quality professionals learn statistical methods for making valid comparisons while taking into account the inherent variability in the data.
No need for hypothesis tests?
I would say that you don't need hypothesis tests for comparing two processes ro process streams. In my 35 yers of solving hundreads if not thousands of complex (physics based) problems, certainly executing thousands of process characterizations (emperical models that characterize teh design space of a set of inputs to critical outputs), capabilty studys, Measurement Systems analyses, V&V test plans I have never calculated a p value or performed any tests of statistical significance. If the study is designed appropriately the visual evidence available in a properly design graph will display all of the evidence you need. On teh other hand, I've seen hundreds of statistical tests that resulted in a p value less .05 and when I looked at the study design and graphed the resutl it was clear that the 'statistical significance had no practical importance or was simply incorrect becuase the process itself was non homogenous. Many statisticians  not just Deming or Wheeler  have demonstrated this.
A great paper to introduce yourself to the enumerative vs analytic study question is Deming's seminal paper: “On Probability as a Basis for Action”, American Statistician, November 1975, Vol. 29, No. 4, pp. 146152, available for free at: https://www.deming.org/media/pdf/145.pdf
A great paper to introduce yourself to critical questioning of the usefulness of the 'hypothesis test' is "The Null Ritual  What you always wanted to know about significance testing but were afraid to ask" by Gerd Gigerenzer, Stefan Krauss, and Oliver Vitouch. Published in: D. Kaplan (Ed.). (2004). The Sage handbook of quantitative methodology for the social sciences (pp. 391–408). Th ousand Oaks, CA: Sage. © 2004 Sage Publications. Available for free at http://library.mpibberlin.mpg.de/ft/gg/gg_null_2004.pdf
Some other papers you might find interesting:
“The Insignificance of Statistical Significance Testing”, Johnson, Douglas H., Journal of Wildlife Management, Vol. 63, Issue 3, pp. 763772, 1999 http://www.ecologia.ufrgs.br/~adrimelo/lm/apostilas/critic_to_pvalue.pdf
“The Case Against Statistical Significance Testing”, Carver, Ronald P., Harvard Educational Review, Vol 48, Issue 3, pp 378399, 1978 http://healthyinfluence.com/wordpress/wpcontent/uploads/2015/04/CarverSSD1978.pdf
“What Statistical Significance Testing Is and What It Is Not”, Shaver, James P., Journal of Experimental Education, No.61, pp. 293316, 1993
Hypothesis testing
Hypothesis testing has its place, certainly in designed experiments, but in those you are dealing with experimental data (although recent literature reflected in some of the papers listed below suggests that hypothesis testing is not as universally useful as some would like us to believe). However, in recent years it has become commonplace for trainers to recommend using a ttest on before/after data to see whether an improvement action made a statistically significant difference in performance. This would be a waste of time at best, and could yield misleading results (as Dr. Burns and Dr. Wheeler point out above). If you had a stable baseline before your improvement, the process behavior chart will show you whether your change made a difference. If you made a difference, you induced an assignable cause and will see signals indicating it. You don't need a ttest.
Your best bet is to understand the difference between enumerative studies and analytic studies (if you are involved in process improvement, you are mostly involved in the latter), and use theory and methods appropriate to your study.
Other papers:
Moonesinghe, R., Khoury, M. J., & Janssens, A. C. J. (2007). Most published research findings are false—but a little replication goes a long way. PLoS medicine, 4(2), e28. doi:10.1371/journal.pmed.0040028
Tramifow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 38, (1), 12. doi:10.1080/01973533.2015.1012991
Trafimow, D., & Earp, B. D. (2017). Null hypothesis significance testing and Type I error: The domain problem. New Ideas in Psychology, 45, 1927. doi:10.1016/j.newideapsych.2017.01.002
Gelman, A. (2015). Working through some issues. Significance, 12(3), 3335. doi:10.1111/j.17409713.2015.00828.x
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124. doi:10.1371/journal.pmed.0020124
Nuzzo, R. (2014). Statistical errors. Nature, 506, 150152.
Molding Cavity Variation
I would refer the reader to Wheeler and Chambers' Understanding Statistical Process Control for an excellent example of the use of control charts to solve the problem you have posed. Through the use of rational subgrouping, the manager of that process was able to look at hourtohour, cycletocycle and cavitytocavity variation. Cavitytocavity variation turned out to be the greatest driver of variation. The manager used the charts (and the process knowledge of his molding machine operators) to reduce betweencavity variation, and then monitor it using a dynamic and very sensitive method.
Casting
Thanks Rip. Dr Wheeler also discusses such casting here: https://www.qualitydigest.com/inside/qualityinsidercolumn/060115ratio... and in "Advanced Topics in SPC" "Rational Subgrouping" pages 143 to 157. These books are essential for anyone serious about quality.
As you can see from Dr Wheeler's examples and the example in my article, hypothesis testing throws away key data that can be used to help analyse your process. There is absolutely nothing wrong with hypothesis testing when used correctly. Hypothesis testing is fine for static, non process related situations.
Enumerative studies were promoted in Six Sigma courses. However the enumerative tools of Six Sigma are inappropriate for analyzing processes. Six Sigma was developed by a psychologist. It is hardly surprising that it focuses on the enumerative tools that are far more appropriate for lab rats in a psychology laboratory, than for process improvement.
Absolutely
The problem, I think, is in the education system. It is very difficult to find a stats class, or a "business statistics" class that teaches even the tools of analytic studies (much less that there are different types of studies). Those that do often do it badly (e.g., they might teach you to construct a control chart using the mean and 3 standard deviations above and below the mean for control limits).
More on "static" processes
Further to above points, what proportion of many man made processes display stable behaviour over time (i.e. static)?
I think a valid concern is the use of hypothesis tests without a prior check on whether the data being used in the test are “in control” or not.
When the data are not stable the use of the hypothesis tests is questionable.
When the data are stable the control chart can often be used to answer the question to be answered through a hypothesis test.
Great point, Scott
I don't know the proportion of manmade processes that are stable; you can't know until you look. You make a great point about stability and hypothesis tests, though. This was the reason Shewhart developed control charts  to test whether the data exhibited a reasonable degree of statistical control. If they do, then distributional assumptions could be made. Without the evidence of homogeneity we get from the charts, we have no evidence for any particular distributional model. If the process is stable, then you are correct; processes can be compared handily using those charts...no need for ttests, ztests, ANOVA.
Thanks, Rip
I had an example this afternoon.
Three subgroups of n=3: No signal on the range chart (or S chart). No signal on the average chart.
We ran the ANOVA and looked at the table of values: No signal. I asked my colleague which was easier to understand. What do you think the answer was?
My guess
Just a guess, but if your colleague was someone who grew up in the enumerative world, I'm guessing they thought they understood the high pvalue in the ANOVA better. My other guess is that they don't really understand that pvalue (and what it DOESN'T say). I've never understood why a simple visualization would not be preferable, but it might be as Wheeler points out, "Some people are afraid of simplicity, because complexity looks profound even if it's absurd."
Dr. Deming did a series on the difference between analytic and enumerative studies for Ford engineers in the early '80s. At one point he lamented that he could not get engineers to use two simple tools: a piece of paper and a pencil. ("They will be damned before they do"). He then drew a plot on the newsprint: Two high points followed by a low point, two high points followed by a low point  repeated a few times. He said, three shifts...two are high, the third is low. There is clearly a difference. No need for any advanced mathematics...just get to work finding out where the difference comes from!
Of course, one engineer asked, "But wouldn't you do a hypothesis test, just to verify the difference?"
Deming thundered, "Why? Why ruin it...waste time? You can see there's a difference, get to work!"
Thanks, Rip (again!)
My mistake: I did a poor job of saying the colleague was happier with the average and range chart, i.e. the easy to interpret picture over the table of statistics (the ANOVA table). Nonetheless, and I had an example today, the simplicity of the average and range chart somehow seems to suggest something should replace it.
Thanks for the response. Great. I guess Dr. Deming had a way of pulling such things off in a way that few others, or nobody, could do.
Really glad to hear it, Scott!
I'm glad to hear that your colleague "got it" right away. I'm just too jaded, sometimes...
On the other hand, you might be happy to hear that I have managed to get a university to let me start teaching using one of Wheeler's texts in a course about using data in decisionmaking, and in a basic business stats class. So there is some (if not glaring) light in the fight for analytic studies.
Love it
Love it Rip. What a wonderful reply!
Six Sigma  Prediction Failure
Six Sigma focusses on enumerative tools. M Harry, the creator of Six Sigma, was asked a similar question to that of this article "Could you explain the best way to predict the outcome of a process in the future?" Harry's answer showed complete ignorance of analytic methods and the meaning of a predictable process: "Reference the quality literature on statistical process control, also known as “SPC.” There are many excellent books on the subject. Process improvement and optimization is often accomplished by way of statistically designed experiments, or “DOE” as it is known."
A Six Sigma process with its +/1.5 sigma shift is wildly out of control and is hence unpredictable. It cannot produce good quality product/service in the future, no matter where specification limits are set.
Further...
I would agree that most of the Six Sigma world has left its quality roots behind, chasing cost cutting and forgetting Deming, SPC, the Taguchi Loss Function and its implications (which concepts are what the whole idea is, after all, based on). I believe that one reason for the emphasis on enumerative study methods is because they are, unfortunately, what are taught in most statistics courses at the high school and university level. Hopefully, some of the research in the list of papers I provided above will shake the enumerative statistics world enough to effect some change.
There are very few textbooks outside the sort of "deep quality" realm that even discuss analytic or enumerative studies. To advance the notion of analytic studiesand to see it survive uswhat is probably needed is some sort of analytic statistics society with enough academic credibility to publish its own journal.
As far as the 1.5sigma shift goes, it doesn't exist...despite it's presence in international standards as a "convention." In the real world, assuming some reasonable attempt at statistical process control, a process whose data display a sustained but undetected 1.5 sigma shift CANNOT exist. Platitudes such as "Shift Happens" are cute but do not reflect the real world.
Analytics Society
Rip  I suspect that you on to something that might increase the credibliity and visibility of analytic studies. There are many of us out there trying valiantly to change things. In my own organization I am continually fighting against the resurgence of enumerative studies and what I call "statistical alchemy' (FMEA/RPN, Cpk/Ppk, AIAG Gage R&R, etc.). Even though the organization ahs demonstrated great success with analytic studies the seduction of not having to think (just take a pile of numbers and throw them into some software) is so powerful.
I will be presenting at the ASQ Lean and Six Sigma conference in March and the Worl Quality conference in May...fighting the good fight. perhaps I'll see you at one of them?
Analytics Society
I'm just afraid that the window is closing. It probably would have taken people of the stature of Myron Tribus or David Kerridge to pull it off...