Featured Product
This Week in Quality Digest Live
Six Sigma Features
Donald J. Wheeler
How you sample your process matters
Paul Laughlin
How to think differently about data usage
Donald J. Wheeler
The origin of the error function
Donald J. Wheeler
Using process behavior charts in a clinical setting
Alan Metzel
Introducing the Enhanced Perkin Tracker

More Features

Six Sigma News
How to use Minitab statistical functions to improve business processes
Sept. 28–29, 2022, at the MassMutual Center in Springfield, MA
Elsmar Cove is a leading forum for quality and standards compliance
Is the future of quality management actually business management?
Too often process enhancements occur in silos where there is little positive impact on the big picture
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Floor symbols and decals create a SMART floor environment, adding visual organization to any environment
A guide for practitioners and managers

More News

Donald J. Wheeler

Six Sigma

Avoiding Statistical Jabberwocky

The last column in the debate between Donald Wheeler and Forrest Breyfogle

Published: Wednesday, October 7, 2009 - 05:00

This is the final column in the debate between Donald Wheeler and Forrest Breyfogle on whether or not to transform data prior to analysis. Because the debate started with Wheeler's article "Do You Have Leptokurtophobia?" we are letting him have the last word on the topic.

The articles following Wheeler's first story were:
Breyfogle: “Non-normal data: To Transform or Not to Transform
Wheeler: “Transforming the Data Can Be Fatal to Your Analysis
Breyfogle: “NOT Transforming the Data Can Be Fatal to Your Analysis


In my August column, “Do You Have Leptokurtophobia?” I carefully explained how the process behavior chart does not require that you have “normally distributed data.” I explained how three-sigma limits effectively strike a balance between the economic consequences of getting false alarms and those of missing signals of economic importance. I also demonstrated this balance in two ways: first by showing how the chart for individual values avoided false alarms, even when used with synthetic data generated by an exponential distribution; and then by showing how the individuals chart also detected signals within a real data set, even when those data possessed a very lopsided histogram.

On Aug. 24, Forrest Breyfogle responded with an article that claimed that the control charts require normally distributed data and that we should transform our data to make them “more normal” in certain situations. He presented no support for this claim that the charts require normal data. He simply stated it as a article of faith. He then used a very extreme probability model to generate some synthetic data and observed that the individuals chart had 3.3-percent false alarms. Of course, Breyfogle interpreted this as evidence that the chart did not work because he did not get the normal theory false alarm rate of three per thousand. To understand Breyfogle’s example it is important to know that virtually all practical models for original data will have a skewness of three or less and a kurtosis of 12 or less. Breyfogle’s model had a skewness of six and a kurtosis of 114. Yet, in spite of this unusually extreme probability model, his model had a theoretical value of only 1.8 percent beyond the upper three-sigma limit. Both this theoretical result and Breyfogle’s observed result are consistent with the definition of what will happen with a robust procedure. (Robust procedures are those where conservative theoretical critical values will continue to deliver less than 5 percent false alarms when you change the probability model.) Therefore, in light of the extreme nature of Breyfogle’s probability model, the 3.3-percent false alarm rate is actually evidence of the robustness of the individuals chart, rather than being evidence of a failure of the technique.

In my September column, “Transforming the Data Can Be Fatal to Your Analysis,” I explained the shortcomings of Breyfogle’s paper and presented Shewhart’s conceptual foundation for the use of process behavior charts. There I showed that Shewhart was not concerned with achieving a particular risk of a false alarm, but rather was looking for a general, distribution-free approach that would have a reasonably small risk of a false alarm for all types of data. I also showed why a process behavior chart for the original data should always be the first step in any analysis of process related data.

On Sept. 16, Breyfogle responded to my September column, arguing that data have to be transformed prior to using them on a process behavior chart. In this paper he painted my response as being too narrow in scope, overlooking the fact that I have often made exactly the same points he was making about the kinds of action possible in practice. (For example, see “The Four Possibilities for Any Process,” Quality Digest, December 1997.) Before we consider the rest of Breyfogle’s paper we will need some background material. Most of what we will need may be found in my articles in this column during the past nine months, but first we shall begin with the “Four Foundations of Shewhart’s Charts,” Quality Digest, October 1996.

The first foundation of Shewhart’s charts is the generic use of three-sigma limits. This frees us from having to specify a probability model and determine specific critical values to achieve a particular alpha-level. As Shewhart noted, in practice, we really do not care what the alpha-level is as long as it is reasonably small, and three-sigma limits have been proven to provide reasonably small alpha-levels with all sorts of real world data.

The second foundation of Shewhart’s charts is the use of within-subgroup variation to compute the three-sigma limits. Global measures of dispersion are descriptive, but the foundation of all modern statistical analyses, from ANOVA and the Analysis of Means to process behavior charts, is the use of the within-subgroup variation as the filter for separating probable noise from potential signals.

The third foundation of Shewhart’s charts is the use of rational sampling and rational subgrouping. This is simply the flip side of using the within-subgroup variation. To filter out the noise we need to estimate the routine variation. This means that we will have to select and organize the data in such a way that the subgroups will be logically homogeneous. As Shewhart said, “Specify how the original data are to be taken [collected] and how they are to be broken up into subsamples [subgroups] upon the basis of human judgments about whether the conditions under which the data were taken were essentially the same or not.” Two numbers belong in the same subgroup only when they can be said to have been obtained under essentially the same conditions. Numbers that might have come from different conditions belong in different subgroups. Rational sampling and rational subgrouping are rational simply because they are the result of careful thought about the context for the data. When we ignore the principles of rational sampling and rational subgrouping we undermine the computations and are likely to end up with nonsense charts.

And the fourth foundation of Shewhart’s charts is the ability of the organization to use the knowledge gained from the charts. This topic is so broad that W. Edwards Deming developed his 14 Points for Management to address the many issues that occur here. However, one thing has been repeatedly demonstrated: Organizations that refuse to listen to the data will fail.

In my February column, “First, Look at the Data,” I demonstrated the importance of plotting the original data in time order on an XmR chart. There we considered the number of major hurricanes in the North Atlantic from 1940 to the present. When we plotted these original data on an individuals chart we found a distinct oscillation between quiet periods and active periods. These data consisted of small counts, were bounded on one side by zero, and had a histogram that looked like a ski slope. By simply listening to the story told by original data themselves we learned something. Had we transformed these data we would have missed this important aspect of these data.

In my March column, “Probability Models Don’t Generate Your Data,” I used the same hurricane data to demonstrate that even though you might find a probability model that provides a reasonable fit to your histogram, this does not mean that your data actually came from a single system. The erroneous idea that you can infer things from how well your data fit a particular probability model is known as the Quetelet Fallacy, after the Belgian astronomer who had this mistaken idea in 1840. Quetelet’s Fallacy was exposed by Sir Francis Galton in an 1875 paper that proved to be the foundation of modern statistical analysis. In this paper Galton demonstrated that a collection of completely different processes, having different outcomes, could still yield a histogram that looked like a histogram produced by a single process having consistent outcomes. For the past 134 years statisticians have known better than to read too much into the shape of the histogram. Unfortunately, each generation of students of statistics has some individuals who follow in the footsteps of Quetelet. Some of them even write articles about their profoundly erroneous insights.

In my April column, “No Data Have Meaning Without Context,” I demonstrated that you cannot create meaning with arithmetic. The meaning of any data set is derived from the context for those data. I showed that if you do not respect that context when you analyze your data then you are likely to end up with complete nonsense. Shuffling the data, or transforming the data, in ways that distort or destroy the context for those data may give you a pretty histogram, or a pretty running record, but it will not provide any insight into the process represented by the data.

In my May column, “All Outliers Are Evidence,” I demonstrated the fallacy of “removing the outliers” from your data. Once again, simply plotting the original data on an XmR chart provided the insight needed to understand the data. Moreover, this approach worked even though the original data had a kurtosis statistic of 12.3. No transformation was needed. No subjective decision had to be made about which nonlinear transformation to use. We simply plotted the data and got on with the interpretation of what was happening.

In my June and July columns, “When Can We Trust the Limits On a Process Behavior Chart?” and “Good Limits from Bad Data,” I demonstrated that the correct way of computing the limits is robust to outliers. These correct computations will always use the within-subgroup variation and are a direct consequence of Sir Francis Galton’s 1875 paper. But this approach imposes a requirement of rational subgrouping upon the use of process behavior charts. Where, you might ask, are the subgroups on an XmR chart? They are the pairings of successive values used to compute the moving ranges. The requirement of rational subgrouping means that these successive values must be logically comparable. In other words, we cannot mix apples and oranges together on an XmR chart and expect the chart to work the same as it would if we charted all apples. While the definition of what is logically comparable may depend upon the context of what we are trying to accomplish with the chart, we must still work to avoid irrational subgroupings.

In my August column, “Do You Have Leptokurtophobia?” I once again showed how the use of a nonlinear transformation will inevitably distort the original data and obscure the signals contained within those data. These are facts of life, not matters of opinion. However, to be as clear as possible about this, consider figure 1 which reproduces part of a graph from an article in a scientific journal. There the distance from the zero point to the first arrow represents one million years, but the distance between the first arrow and the second arrow represents 13,699 units of one million years each.

Figure 1

When I saw this graph I was sure that there must be some mistake on the horizontal scale. One million years after the Big Bang should be a lot closer to the beginning than it looks on this scale. However, calculation of the times involved show the scale of figure 1 to be correct, even though it defies all logic.

Figure 2

To understand the distortion created by the logarithmic scale of figure 1 consider the same points plotted on a linear scale in figure 2. Exponential and logarithmic scales are simply not intuitive. We do not think in these terms. As a result, when we use nonlinear transformations we cannot even begin to understand the profound distortions that we are creating. Yet, as we will see, Breyfogle wants to use nonlinear transformations of the original data simply for cosmetic purposes.

So what about Breyfogle’s paper of Sept. 16? Among his many different points he finally gets down to an example at about one-third of the way through the article. The data used are said to be the “time to change from one product to another on a process line.” For this example he combines “six months of data from 14 process lines that involved three different types of changeouts, all from a single factory.” Clearly, such data are what I have called report-card data, and while we may place report-card data on a process behavior chart, the limitations of such data will impose certain limitations upon any report-card chart.

The first salient feature of a report-card chart is that the aggregation which is inherent in all report-card data will inevitably create a lot of noise. This noise will inflate the routine variation detected by the chart, which will inflate the limits and make report-card charts insensitive. This phenomenon is one reason that some authors, such as Lloyd Provost, do not like report-card charts. They claim that they can never detect any signals with such charts. While such report-card data are inherently a violation of the rational subgrouping requirement, they will sometimes work when the aggregated values are, indeed, logically comparable. Unfortunately, the noise that is part of the aggregation will also tend to make these charts appear to be reasonably predictable, even when the individual components of the aggregated time series are not predictable at all.

However, this is not Breyfogle’s problem. His chart shows nine out of 639 points, or 1.4 percent, above the upper limit. So what does this mean? Given that these data represent three different types of changeouts on 14 different lines, it would be a miracle if there were no signals. The grouping together of the different types of changeouts, which in all likelihood have different averages, is almost certainly an example of an irrational subgrouping, and the most plausible interpretation of the nine points above the upper limit is that they are trying to tell us about this problem with the organization of the data.

The second salient feature of report card charts is that the aggregation will tend to obscure the context for each point. This aggregation, along with the time delay inherent in report-card data, will tend to make it very difficult to use report-card charts to identify specific assignable causes of exceptional variation when a point falls outside the limits. While improvements can be tracked on a report-card chart, it is a rare thing when the report-card chart can be used as the catalyst for process improvement. It is always much easier to identify assignable causes as the data are disaggregated. As the data become more narrowly focused and are plotted in real time they become more useful as a tool for process improvement. (I suspect that Breyfogle will take issue with that statement, but then, by his own admission, his clients have not been able to use SPC successfully. I have many clients that have been using SPC with great success for more than 20 years, and it is their experience that I am reporting here, not my opinion.)

So, the inherent noise of a report-card chart, along with the difficulty of identifying specific assignable causes that correspond to the few signals that do occur, will tend to limit their usefulness to merely tracking the business at a fairly high level, which is what Breyfogle seems to be suggesting. However, there is a twist to what Breyfogle is doing.

In the two sentences following the description of the conglomeration of data that he is using, Breyfogle states that the factory is “consistently managed” and that the “corporate leadership considers this process to be predictable enough.” These two statements are puzzling. According to the data given on the chart, these 639 changeouts averaged 11.54 hours each. If this plant is operating 24/7, then these changeouts consumed 12 percent of the operating time for the 14 lines in this plant. If this plant is operating 16/5, then these changeouts consumed more than 25 percent of the operating time for the plant. It seems to me that anything that reduces productivity by 12 to 25 percent deserves more than a glance from the “30,000-foot level.”

The real problem with his approach is not that he has an irrational subgrouping (a mistake), and it is not that he has used a nonlinear transformation in an effort to correct his irrational subgrouping (another mistake), or that he has missed the message of this report-card chart about an opportunity for improving productivity (a third mistake). The real problem is seen in the last sentence in the following quote.

“The corporate leadership considers this process to be predictable enough, as it is run today, to manage a relatively small finished goods inventory. With this process knowledge, what is the optimal method to report the process behavior with a chart?”

Following this sentence, Breyfogle begins to compare and contrast the individuals chart of the original data with a chart for the transformed data. But before we get bogged down in the argument about transforming the data, notice the sequence of events in the quote above. Having just said that the executives do not want to find any signals, Breyfogle sets to work to get rid of all of the signals. He is shaping the data to fit a preconceived message. This is not data analysis, but rather data manipulation. When accountants do this we put them in jail.

No matter how you might try to dress it up, any attempt to transform the data as the first step in an analysis is an act that does not treat the data with integrity. That which is lost with a nonlinear transformation of the data can never be recovered by any subsequent analysis, no matter how elegant, profound, or complex that subsequent analysis may be. Even when the objectives and motivation of the subsequent analysis may be correct, appropriate, and well motivated, the distortion of the initial nonlinear transformation makes everything else moot. And when the motivation is to change the message, the use of a nonlinear transformation becomes just one more way to lie with statistics.

I have not tried to address many of the points raised by Breyfogle simply because, no matter what he says or does once he has used a nonlinear transformation to massage his data into saying what he wants it to say, nothing else that follows can ever make sense.



About The Author

Donald J. Wheeler’s picture

Donald J. Wheeler

Dr. Wheeler is a fellow of both the American Statistical Association and the American Society for Quality who has taught more than 1,000 seminars in 17 countries on six continents. He welcomes your questions; you can contact him at djwheeler@spcpress.com.



Sorting out false alarms

By further sorting out false signals, operations should not necessarily be be bothered with this 2-3% false alarm rate.
Every time a new data point is showing up on the control chart, signals should be assessed by a cross-functional team (technical department, operations, engineering..). This way false alarms are filterded and strongly reduced < 1%.

It is better to live with some 1% false alarms than having a > 5% chance for rejecting real signals as at the end, like a boomerang they will come back to you...

There are two types of error

There are two types of error - Alpha and Beta. Alpha error is treating common cause variation and reacting as if it were special cause. This leads to "tampering" with the system. Beta error is not responding to special cause variation as if it were common cause. YOU CANNOT MINIMIZE BOTH TYPES OF ERROR. As you minimize one error type, the other increases. A good analogy is our system of criminal justice. We want to minimize the chance of convicting innocent people, so the laws are written to protect them. As a result, the chance of not convicting guilty people is increased. You cannot minimize both.

The bottom line: A 2-3% false signal (Alpha error) risk is very acceptable.

S. Moore

Understandig of variation

Key point is that the conclusions drawn from the raw data x usin SPC must be consistant with those from f(x) (BPC).

Question is whether it is possible to achieve the same information from the Variation of x and the Variation of f(x)?

From the foregoing articles it can be concluded that by transforming the data to f(x) quite some information about raw data x-Variation can be lost so finally coming to a consistant Variation assessment between x and f(x) will be difficult if not impossible...

Also, as correct sampling is for statistics, a correct rational subgouping is of crucial (!!) importance for a correct SPC raw data x-analysis. I fully agree with a comment below that in literature more focus should be given to rules for correct rational subgrouping. SPC is impossible without a correct estimation of the background noise;with incorrect noise estimation theSPC charts will yield wrong signals as is the case with bad sampling in statistics..
Rational subgrouping is not an easy excersise requiring quite some experience, excellent process knowledge, cross-functional input and repeated in-depth discussions with the process improvement team.

It's an issue of X or Y. Are they both important?

Y=f(X); i.e., the output of a process is the function of its inputs and the process itself. Wheeler has been focusing on the SPC tracking of the X’s, while in my articles, I have been focusing on the Y’s. In these articles, I also provided a viable way to track so called "report-card" data sets from high level performance areas that aggregate many processes. To ignore this type of situation seems to be short sighted relative to true business needs. Wheeler appears not to see the value in evaluating this type of data. I wonder if his errant conclusion that you will never see a signal in that type of data comes from opinion and hope without a lot of actual experience. Also, what about process capability statements for a process’ "Y", where common-cause variability is unsatisfactory relative to customer needs? I do not understand why Wheeler did not address this important point that I made in his articles. Wheeler is an SPC icon, but I am not talking about SPC, but about BPC or business performance charting. In my articles, I used both randomly generated and real data sets, which were picked apart but still fundamentally describe how individuals chart on aggregated data can be a part of business’ decision making process. I have seen these tools identify clear business process signals when used on the aggregation of data – it works. These tools can be used to examine the business as a whole to identify where data drill down can provide additional understanding for improvement efforts so that the business as a whole benefits; i.e., goes beyond Lean Six Sigma and the Balanced Scorecard.

Understanding Variation

In Dr. Wheeler's book "Understanding Variation-The Key to Managing Chaos" he takes a "typical" management report that looks for large percent differences (as indications of special causes) and re-analyzes these data in context using XmR charts. He demostrates that by using this method, the real message behind the data can be discovered and how mis-guilded traditional management reports are.
I don't thing rolling-up operational data onto a control chart can solve the inherent problem of mixing apples with oranges. Let's not present things so easy for management that it loses credibility.



There is no doubt that the 'eye' is a powerful analysis tool for graphical pictures, in that very subtle changes can be discerned that would be lost in a table of numbers. The crux of the issue as I see it comes from trying to figure out how to convey to execs adequately discrimination between routine variations and alarm situations irrespective of the transform question.

Breyfogle's data in his 'Figure 1' in the prior article does indeed contain a mix of change types, lines, as well as shifts, and begs the question of aggregation and transform. But this is a very real situation. Somewhere along the line, these data have to be rolled-up to the executives (in the proverbial report-card approach). What strategy can I use to to state GOOD/BAD in as few slides as possible to execs.

--- Breyfogle proposed a single, aggregated chart to tell the Exec that all is "Good/Bad" without invoking handing 50 charts as 'background' OR reducing to a 'Red or Green' light. Further, I think he was trying to deal with identifying normal variations to keep the Red/Green lights switching too often from false alarms. Breyfogle's main point (as I saw it) is that suppose AFTER all the detailed analysis of each of the lines/changes/shifts -- all the charts indicate that indeed ALL the processes are stable and in control. Breyfogle suggests developing an aggregate chart to represent a single, rolled up view of control. This simple behavior chart portrays the 1000 word snapshot of the process and conveys a more complete picture. Hence, the aggregate is intended to show a better view of normal variations that many can relate to -- rather than Red/Yellow/Green colors which provide no depth.

Unfortunately, the subtle differences due to different mfg lines/change types, etc., did cause an inadvertant 'lognormal' relationship and distort the chart. (Lognormal can be expected to cause this if the underlying processes with wide means and variances are merged). So, Breyfolge chose to transform to convey stability, and hence help convey that indeed the variations are from normal random fluctuations.

What you lose are the known problems: Transforms will distort the data; does the transform chart of the merged data (now with the "appearance" of control) really mean that the processes are in control; how much of a change will be required to see that an action has to be taken (how sensitive to alarm conditions); are we missing problems, and that subtle information is lost that is usually seen in the raw data. BUT he proposes that the DIR/VP exec has a short 'summary' that says that all is GOOD or BAD.

--- Wheeler proposes to look at the raw data, and use a simple nonparametric 3 sigma rule for control. However, this has a known issue rate of 2-3%. Should we tell operations that they will be responding to 2-3% non-issue alarms due to the '3 sigma' rule. Touting that <5% is robust may not answer the question. He just needs to build that into his cost model. Secondly, the question is strategy: should this 'rule' be applied to any chart of data -- including an aggregated chart (? - how does Wheeler propose we summarize to execs?) Again, would this rule also result in a 2-3% (potential) false alarm rate that becomes visible at the exec level? I wonder what they would say.

Wheeler does point out adequately all the shortcomings of transform and chart, etc. No doubt there. But I wonder if he would share (or link us to a prior article we missed) a strategy in how he took this mess of detail charts and aggregated to a good, visual summary which told all "GOOD/BAD" -- without an inherent 2-3% risk. Perhaps Wheeler can clarify this aspect in a 'Comment' to this paper.

Excellent Discussion

I thoroughly enjoyed this back and forth discussion on what I consider essential statistics for manufacturing. I laughed, I learned, and I looked forward to the next article. Dr. Wheeler has done a fantastic job of raising the level of understanding of applied statistics to the work place. There are so many low-hanging fruits to be harvested in the manufacturing orchard by utilizing the simple and unconfounded methods he has taught over the years. The business world is complicated enough with its uncertainties. To be able to assess and address variation in a simple and cost-effective manner is key to surviving. Thanks for the great summary of past articles to help highlight the path through the trees.

IMHO this debate has brought

IMHO this debate has brought out the best in both experts...and has helped us all out. In particular, Wheeler's systematic summary of his various articles over the years is VERY helpful, not only as a handy reference, but also as a storyline of how they all meld together to help folks like us actually understand process behavior charts, and implement them.

A suggestion for another article.. when to use a run chart vs. when to use a process behavior chart? e.g., can we still use a run chart to analyze data that violate the rational subgrouping requirement of the process behavior chart? when can we not use either? ..and how about a table of common metrics that are analyzed, and which of those violate the rational subgrouping requirement?

i think a well defined article on that subject would enlighten many..

To transform or not

Thanks for the good articles. It only took me around 20 years into my career to realize the futility of transforming one's data. I'm glad a lot of my clients are patient.

I find the exchange useful

I find the exchange useful and enlightening. Calling two well-known professionals unprofessional is.. well... unprofessional. Dr. Wheeler's track record on the practical use of SPC is well-proven. I have used his teachings for many years with great results. "The proof is in the pudding"!!!

Steven J. Moore
Dir. Quality Improvement Systems
Wausau Paper Corp.

Oh, stop already!

This whole back and forth has turned into a cat fight, and nothing more. While it was amusing for awhile, it has grown tedious and I'd like to read about something other than two supposed professionals acting very unprofessionally, and refusing to hear the other person's view.