Non-normal data: To Transform or Not to Transform

Sometimes you need to transform non-normal data.

Bio

Smarter Solutions

Story update 8/24/2009: The original graphics for this story were missing key data due to errors in converting them. We have fixed the problem.

n “Do You Have Leptokurtophobia?” Don Wheeler stated, “‘But the software suggests transforming the data!’ Such advice is simply another piece of confusion. The fallacy of transforming the data is as follows:

“The first principle for understanding data is that no data have meaning apart from their context. Analysis begins with context, is driven by context, and ends with the results being interpreted in the context of the original data. This principle requires that there must always be a link between what you do with the data and the original context for the data. Any transformation of the data risks breaking this linkage. If a transformation makes sense both in terms of the original data and the objectives of the analysis, then it will be okay to use that transformation. Only you as the user can determine when a transformation will make sense in the context of the data. (The software cannot do this because it will never know the context.) Moreover, since these sensible transformations will tend to be fairly simple in nature, they do not tend to distort the data.”

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

After Transform Control Limits

UCL is 5.14 after Box Cox Transformation.
If this control limit will use for future process monitoring, it is appropriate to transform back to original context of the data, which is e^5.14=2.71828^5.14=170in?
Regards,
Chee Han

After Transform Control Limits

Chee, glad you brought this up. Since the primary purpose of the control chart is to assess process stability, there is no compelling reason to examine stable-process, control-limit values. For a continuous response, process capability (and its variability impact) would be quantified in regions of stability through examination of the probability plot, which would have data axes that are in the context of the original (untransformed) data.

Real world control charts

The crux of this discussion is Breyfogle's statement: "Let’s consider a hypothetical application." Why not a real situation with some real data ? Is real data so hard to find ?

Process management is about practical solutions to real world problems, not sitting in ivory towers creating rare and imaginary special cases. ( At this point I can hear some pathetic voice claiming that charting "tool wear", is what process management is all about. "Tool wear" is Mikel Harry's favorite for his ridiculous "drifts and shifts" )

Of course, data transforms do have a real purpose ... in convincing the masses that they need to buy special software and special training to use it. Creating unnecessary complexity and mystique in quality improvement, not only sells software and six sigma training courses but it puts more power in the hands of those jealously guarding their six sigma belts (and higher salaries). Deming fought against such elitism and taught how quality improvement is simple enough for everyone to use. Shewhart taught how simple, "standard" control charts are very effective in virtually all real world situations.

It is time for the ivory towers to be cut down and to get back to basics.