Transforming the Data Can Be Fatal to Your Analysis

If you think you really know your data, look again.

Following my article on Leptokurtophobics (Do You Have Leptokurtophobia?) it was almost inevitable that we should hear from one. We were fortunate to have someone as articulate as Forrest Breyfogle III to write the response. However, rather than offering a critique of the points raised in my original article, he chose to ignore the arguments against transforming the data and to simply repeat his mantra of “transform, transform, transform.” Thirty-five years ago I also thought that way, but now I know better, and out of respect for those who are interested in learning how to better analyze data, I feel the need to further explain why the transformation of data can be fatal to your analysis.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

"Just becasue you can, doesn't mean you should"

While I appreciate Breyfogle's efforts to help organizations implement Six Sigma, I have always depended on, and will continue to depend on, Dr. Wheeler's guidance for successful application of SPC. Dr. Wheeler taught me that with data anlaysis, "just becasue you can, doesn't mean you should". Transformation of data falls in that group.

Seeking to Understand

This debate is getting very interesting. The problem is to a large extent that they are not practicing Steven Covey's 5th habit: "Seek first to understand and then to be understood." I can't fault either of them too much for that because even the great Deming and Juran owe a lot of their disagreement to this.

But to get to the point, I think both authors have valid points. Dr. Wheeler's is the more purist approach, I think, in that he has strictly limited his articles to the concept of process and process behavior.

Mr. Breyfogle, on the other hand, has brought in the concept of a specification limit. His intent being to predict the future, as he has often said in his books and talks. In addition, Mr. Breyfogle is intent on the elimination of the disdainable thing called "fire fighting". In reading his article, it seems one of the principles used is that a large number of "out of control" points would cause a lot of wasted resource trying to discover their causes and solutions. I think he correctly points out that transformation would eliminate that tendency. However, in immediately doing such a transformation, he would, as Dr. Wheeler correctly states, be throwing out a lot of the "signal" hidden in the data.

So what happens in the real world we live in? I don't believe any serious practitioner of SPC would call for an analysis of every out of control point should they, after plotting the chart, discover that an unexpectedly large number of them exist. Instead, I think he/she would investigate fundamental reasons why the data looks that way and take action dependent on what is discovered. This could even involve data transformation if that is warranted. But the transformation might be a final decision based on analysis of process behavior, and I think Dr. Wheeler would agree with that.

In bringing in the concept of spec limit (which Dr. Wheeler does not address), I think Mr. Breyfogle brings up a valid point if he is looking at specific process output data (in other words, data that might be important to the customer of a product). His intent, apparently, is to demonstrate to the management team the proportion of out-of-spec products that are being produced. This, he says, will drive the appropriate process improvement actions. These can be prioritized and turned into specific projects that will ultimately improve the fundamental process that is producing the defects. This is certainly a valid point, but is not even mentioned in Dr. Wheeler's articles.

Therefore, my conclusion would be that both authors are not really listening to each other. As a result, the debate rages on. I'm just waiting for the next article.

- Mike Harkins

Mike, Dr Wheeler's approach

Mike,
Dr Wheeler's approach is not just "purist". It is both practical and realistic. Mr. Breyfogle states "Let’s consider a hypothetical application" ... perhaps he couldn't find any real data to support his six sigma approach ? Why attempt to base process management on rare special cases, rather than the majority of situations that people encounter ?

It is hypothetically possible that we could be devastated by a meteorite impact tomorrow, but would you attempt to manage your life on the basis of this possibility ? Would you spend thousands of dollars learning how to deal with a meteorite impact ? You'd probably still get it wrong. That is exactly what Mr. Breyfogle is advocating ... spend thousands of dollars on his six sigma nonsense in order to be well armed, to deal poorly with special cases.

It is indeed sad that most of the world of quality has forgotten the real meaning behind Shewhart's control charts. Shewhart's approach is a revolution. It is fantastic that people like Dr Wheeler are attempting to bring back some sanity, as well as to show just how powerful Shewhart charts really are.

Response to Mike Harkins

Mike, Thank you for your comment. I could have written a whole book is response to Forrest's article, but I had to limit it in order to make the job feasible. I have read Forrest's books, and I think I am aware of his position.
Donald J. Wheeler, Ph.D.
Fellow American Statistical Association
Fellow American Society for Quality

Interesting Discussion

From my perspective, I am looking for practical application here. Dr. Wheeler's approach achieves that for me. I may occasional ask the wrong question, but more often than not the 3-sigma limits have served me well over the past 15 years. I hope that we don't lost sight of this pragmatic side.

Dear mr. Wheeler, this is a

Dear mr. Wheeler,

this is a most interesting subject,

I had a look at the 2 individuals plots (Raw data and Log transformed) in Forrest Breyfogle's response article.

From point of view of stability the non-transformed individual plot looks horrible and no doubt according to this assessment stability actions are a definite must!! This statement is rejected by Forrest as he claims that data should be transformed first.
On the other hand the log transromed plot has a nice symmetric distribution around the average with only 1 point out of 3sigma limits and indeed this looks attactive to assess the process as being stable. However (!!) I think we forget that in order to really assess stability all individual points need to obey the 4 famous Lack of Control detection rules (*) , right? So looking at Forrest's log normal individual plot one could easily see that quite some data clusters fall within the specs. of lack of control detection rules 2, 3 and 4 so also here it can be stated that this process is not stable!

So by stricktly applying the Lack of Control Detection rules my final conclusion would be that both data plots yield equivalent information and Forrest's example points to an unstable process, right?

I think it would be fundamentally wrong if any mathematical transformation would be able to hide process instabilty..

(*) Reference: "Understanding Statistical Process Control" p. 96

Practical use of this discussion

Although I sometimes tried simple transformations for example in short run SPC even these simple transformations are very complicated for operators and the shop floor is the place were we should apply control charts. So in practise I use a much easier approach to this discussion
If we have too many false alarms and don't have the resources to work on it isn't it much easier to simply put the control limits on 4 sigma for selected charts instead of transforming the data. Yes with 4 sigma I loose some signals of assignable causes but if I don't have the resources it is probably not economically feasible to set the limits on 3 sigma for all charts.
If I fix the limits for a selection of charts on 4 sigma I loose the possibility to see if the process is stable but if I also show both Cpk and Ppk in the chart the difference between the 2 simply shows me if the process is stable or not giving the people at the production support level a quick method to assess stability

Any comments are always welcome
Marc Schaeffers
www.datalyzer.com
mschaeff@iae.nl

Check for excessive variation on transformed X-bar chart?

A lot of processes by nature are skewed so they will yield some 2-3 % false alarm on the individual X-bar charts resulting in unnecessary search for assignable causes and/or expensive firefighting actions.
So my question is if we could not use both approaches for process stability assessment i.e. :
If the real distribution type is known to be non-normal by physical understanding and enough data collection, would it not be wise to transform it to a normal distribution (Breyfogle) and then inspect the transformed X-bar chart for eventual excessive variation using Wheeler's 4 lack of control detection rules?

I appreciate any reply.
Frank