Our PROMISE: Our ads will never cover up content.
Our children thank you.
Published: Thursday, September 3, 2009  04:00
Following my article on Leptokurtophobics (Do You Have Leptokurtophobia?) it was almost inevitable that we should hear from one. We were fortunate to have someone as articulate as Forrest Breyfogle III to write the response. However, rather than offering a critique of the points raised in my original article, he chose to ignore the arguments against transforming the data and to simply repeat his mantra of “transform, transform, transform.” Thirtyfive years ago I also thought that way, but now I know better, and out of respect for those who are interested in learning how to better analyze data, I feel the need to further explain why the transformation of data can be fatal to your analysis.
Starting on page 275 of Economic Control of Quality of Manufactured Product (American Society for Quality, 1980), author Walter A. Shewhart described two completely different approaches to working with data. The first of these approaches we will call the statistical approach, since it describes how we create a test statistic. This statistical approach consists of four steps:
1. Choose an appropriate probability model to use.
2. Choose some risk of a false alarm to use.
3. Find the exact critical values for the selected model that correspond to this risk of a false alarm, or else transform the selected model to match some known critical values.
4. Then use these critical values in your analysis.
While this can make sense when working with functions of the data (i.e., statistics) it does not work when applied to the original data themselves. As Shewhart points out, we will never have enough data to uniquely identify a specific probability model. Probability models are limiting functions for infinite sequences, and therefore, they can never be said to apply to any finite portion of that sequence. This is why any assumption of a probability model is just that—an unverifiable assumption. Shewhart goes on to say that even if we did assume a probability model, we still would not know the mean or the variance of that model.
So what are we to do when we try to analyze data? Shewhart suggests a different approach for the analysis of original data. Shewhart’s approach also consists of four steps: (1) Choose some generic critical values for which (2) the risk of a false alarm will be reasonably small (3) regardless of what probability model we might choose, and (4) use these generic critical values in our analysis. This approach changes what is fixed and what is allowed to vary. With the statistical approach the alphalevel is fixed, and the critical values vary to match the specific probability model. With Shewhart’s approach it is the critical values that are fixed and the alphalevel that is allowed to vary. This reversal of the statistical approach is what makes Shewhart’s approach so hard for those with statistical training to understand.
Statistical Approach 
Shewhart’s Approach 


1. Choose an appropriate probability model 
1. Choose some generic critical values for which… 
2. Choose some risk of a false alarm 
2. …the risk of a false alarm will be small… 
3. Find exact critical values (or else transform data to match some known critical values) 
3. …regardless of what probability model we might choose 
4. Use these critical values in the analysis. 
4. Use these generic critical values in the analysis. 
Shewhart’s generic solution to the problem of how to analyze a stream of data was to use threesigma limits. These limits have been used around the world for more than 70 years and they have been thoroughly proven. They strike a reasonable balance between the economic consequences of the twin errors of failing to detect signals and having false alarms. Threesigma limits have been proven to work, and this is a fact of life, not a matter of opinion. To quote a note penciled into one of my books by W. Edwards Deming, “This means that even wide departures from normality will have virtually no effect upon the way the control chart functions.”
However, when someone does not appreciate the difference between the statistical approach and Shewhart’s approach it is almost inevitable that they will get lost trying to apply the statistical approach to original data.
In my earlier article, “Do You Have Leptokurtophobia?” I pointed out how threesigma limits will filter out virtually all of the routine variation regardless of the shape of the histogram. I illustrated this with six specific examples ranging from the uniform to the exponential, and described a broader study encompassing more than 1,100 probability models where 97.3 percent of these models had better than 97.5percent coverage at threesigma limits. This kind of behavior is the very definition of robustness.
Note that there has never been any claim that the area outside the threesigma limits will remain constant regardless of the probability model, just that the alphalevel will remain reasonably small. With any statistical procedure we will change the risk of a false alarm whenever we change the original probability model. A robust procedure is one where a conservative alphalevel will remain conservative (generally taken as anything under 5%) and a traditional alphalevel will remain traditional (under 10%).
But Breyfogle wants to prove that the X Chart is “not robust” to nonnormal data. To this end he uses a lognormal probability model that has a skewness of 6.2 and a kurtosis of 113.9. (To understand just how extreme this model is, it might be helpful to know that virtually all reasonable models for original data will have a skewness of less than 2.5 and a kurtosis of less than 10. If you have a histogram of real data that has skewness and kurtosis statistics that fall outside of the range above, it is a virtual certainty that the underlying process is outofcontrol.)
So, in his quest to show that the X Chart is not robust to nonnormal data, Breyfogle selected a very, very extreme probability model. This lognormal probability model has 98.19 percent of its area contained within the interval defined by the mean plus or minus three standard deviations.
However, rather than looking at this theoretical value, he generated 1,000 observations from this model and placed them on an X Chart. He found 3.3 percent of the values in his sample outside the limits on this chart. He then complains that this is not the three out of a 1,000 that we expect when using a normal distribution. But where is the surprise in this? The alphalevel is supposed to vary. When we go from a kurtosis of three to a kurtosis of 114, we should expect an increase in the area outside the threesigma limits. The fact that it changes so little is the real surprise here. Thus, the very behavior that Breyfogle cites as evidence of a lack of robustness, is, in fact, a stunning demonstration of robustness—the XmR Chart still yields a conservative false alarm rate in spite of the extreme kurtosis.
Next Breyfogle notes that his observations on the X Chart are not symmetrically spread out around the central line in a “random scatter pattern.” Once again, where is the surprise? These pseudodata are lognormal. As I noted in my earlier article, whenever we have skewed data there will be a boundary value on one side that will fall inside the computed threesigma limits. When this happens, the boundary value takes precedence over the computed limit and we end up with a onesided chart. His insistence on finding a set of values that are symmetrically spread out around the central line in a “random scatter pattern” is an interpretative guideline for use with plots of residuals. This is a completely different type of analysis than the analysis of the original data.
When we fit a regression model to a set of data we are trying to explain most of the variation in the response variable. When we do this successfully, the amount of variation that remains between the data and the fitted model are known as the residuals. When we look at these residuals, we would like to see a symmetric plot since any lack of symmetry would suggest that we have used the wrong model in our regression. Moreover, it is also appropriate to check these residuals for a detectable lack of normality since the residuals should be nothing but noise, and the classic model for noise is the bell shaped curve. Unfortunately, we cannot, like Breyfogle, simply take analyses and guidelines that are appropriate for the residuals from a regression model and apply them to the original data. To do so is to demonstrate a misunderstanding of the concepts behind the techniques of statistical analysis.
Finally Breyfogle complains that, from an operational perspective, the original data give too many false alarms, and that we need to transform the data to eliminate these false alarms. However, the situation he describes is one where these pseudodata represent years of production and are being looked at in a retrospective manner. If you are interested in looking for assignable causes you need to use the process behavior chart (control chart) in real time. In a retrospective use of the chart you are unlikely to ever look for any assignable causes, so where is the problem?
In any sequence of 1,000 real data, I would expect to find signals, and the points outside the limits are the place to start looking for the assignable causes behind those signals. While some of the points outside the limits may be false alarms, with 1,000 real data it is essentially inevitable that many of the points outside the limits will be signals. Occasional false alarms are a reasonable price to pay to avoid missing the signals that you need to know about. While a process behavior chart may be used for the onetime analysis of retrospective data, it is important to understand that the chart was created for use as a sequential analysis procedure, and that its retrospective use changes the way we interpret the chart. Here the emphasis is no longer upon using the individual points to identify assignable causes, but rather on the overall behavior displayed on the chart. (While we know that Breyfogle’s observations are synthetic, and contain no signals, his original X Chart, if it were based upon real data, would justify the judgment that the underlying process is subject to assignable causes. This justification would come from the extent to which several of the points exceed the upper limit, rather than being based upon how many points fall outside the limits. Once again, this is because reasonable models for original data do not typically have kurtosis values greater than 10.)
When he transforms his extremely skewed and leptokurtic data he gets a wonderful bellshaped histogram, which should not be a surprise. He also changes the false alarm rate from the 3.3 percent back to a normal theory value of one in 1,000. Again, no surprise, this is exactly what should happen. However, in practice, it is not the falsealarm rate that we are concerned with, but rather the ability to detect signals of process changes. And that is why I used real data in my article. There we saw that a nonlinear transformation may make the histogram look more bellshaped, but in addition to distorting the original data, it also tends to hide all of the signals contained within those data.
The important fact about nonlinear transformations is not that they reduce the falsealarm rate, but rather that they obscure the signals of process change. Since presumably, the purpose of analysis is discovery, this tendency of nonlinear transformations to obliterate the signals contained within the data makes their use on the original data completely inappropriate. Nowhere in Breyfogle’s article does he address this point.
The first step in data analysis has nothing to do with what probability model is appropriate. Data are never generated by a probability model. Rather they are generated by a process or system that can change without warning. This is why the primary question of data analysis is concerned with whether or not the data are homogeneous. If the data are homogeneous, then it might make sense to select some probability model to represent the data. But if the data are not homogeneous, then no single probability model will ever be appropriate since the process is changing. The process behavior chart was deliberately created and intended for use in this initial step of analysis. It examines the data for evidence of a lack of homogeneity that might indicate changes in the process where no changes ought to have occurred. If you transform the data to make them appear to be “more normal,” you are likely to end up with a beautiful, but completely incorrect, analysis.
Remember Daniel Boorstin’s caution, “The greatest obstacle to discovery is not ignorance, but rather the illusion of knowledge.” Once you have been taught erroneous ideas, it is hard to change.
Comments
Check for excessive variation on transformed Xbar chart?
A lot of processes by nature are skewed so they will yield some 23 % false alarm on the individual Xbar charts resulting in unnecessary search for assignable causes and/or expensive firefighting actions.
So my question is if we could not use both approaches for process stability assessment i.e. :
If the real distribution type is known to be nonnormal by physical understanding and enough data collection, would it not be wise to transform it to a normal distribution (Breyfogle) and then inspect the transformed Xbar chart for eventual excessive variation using Wheeler's 4 lack of control detection rules?
I appreciate any reply.
Frank
Practical use of this discussion
Although I sometimes tried simple transformations for example in short run SPC even these simple transformations are very complicated for operators and the shop floor is the place were we should apply control charts. So in practise I use a much easier approach to this discussion
If we have too many false alarms and don't have the resources to work on it isn't it much easier to simply put the control limits on 4 sigma for selected charts instead of transforming the data. Yes with 4 sigma I loose some signals of assignable causes but if I don't have the resources it is probably not economically feasible to set the limits on 3 sigma for all charts.
If I fix the limits for a selection of charts on 4 sigma I loose the possibility to see if the process is stable but if I also show both Cpk and Ppk in the chart the difference between the 2 simply shows me if the process is stable or not giving the people at the production support level a quick method to assess stability
Any comments are always welcome
Marc Schaeffers
www.datalyzer.com
mschaeff@iae.nl
Dear mr. Wheeler, this is a
Dear mr. Wheeler,
this is a most interesting subject,
I had a look at the 2 individuals plots (Raw data and Log transformed) in Forrest Breyfogle's response article.
From point of view of stability the nontransformed individual plot looks horrible and no doubt according to this assessment stability actions are a definite must!! This statement is rejected by Forrest as he claims that data should be transformed first.
On the other hand the log transromed plot has a nice symmetric distribution around the average with only 1 point out of 3sigma limits and indeed this looks attactive to assess the process as being stable. However (!!) I think we forget that in order to really assess stability all individual points need to obey the 4 famous Lack of Control detection rules (*) , right? So looking at Forrest's log normal individual plot one could easily see that quite some data clusters fall within the specs. of lack of control detection rules 2, 3 and 4 so also here it can be stated that this process is not stable!
So by stricktly applying the Lack of Control Detection rules my final conclusion would be that both data plots yield equivalent information and Forrest's example points to an unstable process, right?
I think it would be fundamentally wrong if any mathematical transformation would be able to hide process instabilty..
(*) Reference: "Understanding Statistical Process Control" p. 96
Interesting Discussion
From my perspective, I am looking for practical application here. Dr. Wheeler's approach achieves that for me. I may occasional ask the wrong question, but more often than not the 3sigma limits have served me well over the past 15 years. I hope that we don't lost sight of this pragmatic side.
Seeking to Understand
This debate is getting very interesting. The problem is to a large extent that they are not practicing Steven Covey's 5th habit: "Seek first to understand and then to be understood." I can't fault either of them too much for that because even the great Deming and Juran owe a lot of their disagreement to this.
But to get to the point, I think both authors have valid points. Dr. Wheeler's is the more purist approach, I think, in that he has strictly limited his articles to the concept of process and process behavior.
Mr. Breyfogle, on the other hand, has brought in the concept of a specification limit. His intent being to predict the future, as he has often said in his books and talks. In addition, Mr. Breyfogle is intent on the elimination of the disdainable thing called "fire fighting". In reading his article, it seems one of the principles used is that a large number of "out of control" points would cause a lot of wasted resource trying to discover their causes and solutions. I think he correctly points out that transformation would eliminate that tendency. However, in immediately doing such a transformation, he would, as Dr. Wheeler correctly states, be throwing out a lot of the "signal" hidden in the data.
So what happens in the real world we live in? I don't believe any serious practitioner of SPC would call for an analysis of every out of control point should they, after plotting the chart, discover that an unexpectedly large number of them exist. Instead, I think he/she would investigate fundamental reasons why the data looks that way and take action dependent on what is discovered. This could even involve data transformation if that is warranted. But the transformation might be a final decision based on analysis of process behavior, and I think Dr. Wheeler would agree with that.
In bringing in the concept of spec limit (which Dr. Wheeler does not address), I think Mr. Breyfogle brings up a valid point if he is looking at specific process output data (in other words, data that might be important to the customer of a product). His intent, apparently, is to demonstrate to the management team the proportion of outofspec products that are being produced. This, he says, will drive the appropriate process improvement actions. These can be prioritized and turned into specific projects that will ultimately improve the fundamental process that is producing the defects. This is certainly a valid point, but is not even mentioned in Dr. Wheeler's articles.
Therefore, my conclusion would be that both authors are not really listening to each other. As a result, the debate rages on. I'm just waiting for the next article.
 Mike Harkins
Response to Mike Harkins
Mike, Thank you for your comment. I could have written a whole book is response to Forrest's article, but I had to limit it in order to make the job feasible. I have read Forrest's books, and I think I am aware of his position.
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality
Mike, Dr Wheeler's approach
Mike,
Dr Wheeler's approach is not just "purist". It is both practical and realistic. Mr. Breyfogle states "Let’s consider a hypothetical application" ... perhaps he couldn't find any real data to support his six sigma approach ? Why attempt to base process management on rare special cases, rather than the majority of situations that people encounter ?
It is hypothetically possible that we could be devastated by a meteorite impact tomorrow, but would you attempt to manage your life on the basis of this possibility ? Would you spend thousands of dollars learning how to deal with a meteorite impact ? You'd probably still get it wrong. That is exactly what Mr. Breyfogle is advocating ... spend thousands of dollars on his six sigma nonsense in order to be well armed, to deal poorly with special cases.
It is indeed sad that most of the world of quality has forgotten the real meaning behind Shewhart's control charts. Shewhart's approach is a revolution. It is fantastic that people like Dr Wheeler are attempting to bring back some sanity, as well as to show just how powerful Shewhart charts really are.
"Just becasue you can, doesn't mean you should"
While I appreciate Breyfogle's efforts to help organizations implement Six Sigma, I have always depended on, and will continue to depend on, Dr. Wheeler's guidance for successful application of SPC. Dr. Wheeler taught me that with data anlaysis, "just becasue you can, doesn't mean you should". Transformation of data falls in that group.