© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.

“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.

Published on *Quality Digest* (https://www.qualitydigest.com)

**Published: **09/10/2015

It’s a cold winter’s night in northern New Hampshire. You go out to the woodshed to grab a couple more logs, but as you approach, your hear a rustling inside the shed. You’ve gotten close enough to know you have a critter in the woodpile. You run back inside, bolt the door, hunker down with your .30–06, and prepare for a cold, fireless night.

Analyzing data using common tools like f-tests, t-tests, transformations, and ANOVA methods are a lot like that scenario. They can tell you that you’ve got a critter in the woodshed, but they can’t tell you whether it’s a possum or a black bear. You need to take a look inside to figure this out. Limiting data analysis to the results that you get from the tools cited above is almost always going to lead to missed information and, often, to wrong decisions. Charting is the way to take a look inside your data.

In this article, I will explore data sets that illustrate this point. I’ve chosen two specific sets, but the truths are basic and apply to all data sets; there are many I could have chosen from to use as examples. Both data sets used here are real, nonsimulated data. First I’ll look at the groups using classical methods, and then I’ll look again using control chart methods. I’ll leave the final decision about charting *your* data to you.

A major change was made on a process. Data were collected from parts made before the change and compared to a group made after the change. The question (as initially proposed) was, “Are the two groups the same?” Rather than going into the complexities of asking all of the questions needed to validate this question, or getting into confidence intervals and errors of the estimates, I’ll reword the question as, “With this data set, are there detectable differences between the two groups?” because this is probably what the asker really wanted to know.

To answer this question, I perform a test for the differences in standard deviations (or variances) and a test of the differences in averages (means). Excel quickly provided the statistics below. It can also provide both tests of the variances and the means, but for these analyses, I’ll use Minitab and will conduct the test of the variances first because I need to know if there are significant differences in the variance for performing the correct version of the test of the averages.

Both tests above indicate that I could expect to see differences in standard deviations this large, or larger, just due to sampling. This means I can assume equal variances for testing the difference in means.

The t-test indicates that given the number of samples, and the variation of the processes, the chance of seeing this big a difference in averages due to chance is nil.

The answer to the question, “With this data set, are there detectable differences between the two groups?” is “Yes, there are detectable differences.” The variation is about the same, but there’s a significant difference in the averages.

The data describe a process that makes several parts at a time. Grouping by the time each set came off the machine would reduce the data to a few averages and standard deviations. Sometimes this is enough to give an indication if anything else is going on, but it would be sketchy: A lot could still be hiding in that woodshed. I’m going to use a moving range chart. I can still separate it into larger groups if I want to, but this method will allow me to see all of the data.

The chart quickly reveals several things I didn’t get from the initial analysis:

• The range chart indicates that even though the total spread of the data might not be as great, the “before” process has more moment-to-moment variation.

• The measure of dispersion used in the classic tests of mean and variance assumes that the data are homogenous. It can’t differentiate between variation caused by capability and variation caused by stability.

• Both processes show a lack of stability. Any attempts at estimating defect rates will be fallacious no matter how you transform the data. You may find a model that fits the data today, but it will be meaningless tomorrow. Again, these transformations count on a homogenous data set.

• Both processes exhibit a lot of “freaks” that are outside of the normal model used for the control chart; however, nothing in the conventional methods even hints at this, especially if we just transform the data to make them disappear.

• The “after” process is drifting, inflating its total spread. This is visible on the average chart. Shifting of the average in the “after” group is confounding standard analysis and requires further investigation.

• Based on the range over which the “after” process drifted, it’s not unreasonable to assume that, if stabilized, the process could be adjusted to run at about the same average as the “before” data, meaning that the difference in averages may not really be a problem.

“These [samples] were taken and measured in sequential order,” reads the original note about these data from 1989. “No adjustments were made during sample collection.” To arrive at the data, 125 parts were taken and measured in the shortest time-frame possible, a classic capability study if ever there was one. The data, as well as both types of capability analysis that are available in Minitab, follow:

Although the data don’t form a perfect bell curve and appear to be a little heavy on the low end, the P value on the Anderson-Darling probability chart indicates that I could expect to see a value this high about 12 times out of a hundred with normal data. Kurtosis and skew values are also relatively small. There’s no indication that these data are anything but normal, so I won’t transform the data. The Cp is 0.88, the Cpk is 0.55 (according to two out of three of the methods)—an unacceptable process, but a sound capability analysis. So the question is, “What is wrong with this?”

The short answer is, “Just about everything.” The basic assumption behind all of the above conclusions is flawed. The results above, computed from summary statistics, assume that the data are from a stable, consistent, homogenous source. Both the control chart and the point plot tell us at a glance that this is not a valid assumption. This process is unstable. Any conclusions drawn from the summary statistics, as well as any transformations that I might have done, are meaningless if the data don’t all come from the same process. These parameters and tests assume that if a given value has a certain probability of occurring in one sampling, it will have the same probability of occurring in any other sampling. This is not true in an unstable process.

Although we are told to “always chart our data,” usually that occurs as an afterthought. Often, when charts are readily available, such as in the Minitab analysis above, they are only glanced at or even ignored. Charting the data (and looking at and understanding the chart) must be the first step in an analysis, not the last. Only by doing so can a meaningful decision be made about any additional analysis.

There are a lot more unstable processes out there than stable ones. When properly used, control charts can tell you so much more about your data than conventional statistical methods, including whether your (stable) process is actually skewed or not. During the last several years, I’ve heard less and less about the use of control charts, especially as an analytical tool. Designed experiments are sexy, but unless you’re using ANOM techniques, who gives a moment’s thought to within-treatment stability?

“The problem is not with the choice of the model or with the mathematics, but rather with the assumption that the data were homogeneous,” says Donald J. Wheeler in the* Quality Digest Daily* article, “Why We Keep Having 100-Year Floods.” “Anytime we compute a summary statistic, or fit a probability model, or do just about anything else in statistics, there is an implicit assumption that the data, on some level, are homogeneous. If this assumption of homogeneity is incorrect, then all of our computations, and all of our conclusions, are questionable.”

**Links:**

[1] http://www.urbandictionary.com/define.php?term=.30-06

[2] http://www.qualitydigest.com/IQedit/Images/Articles_and_Columns/2015/Sept_2015/ChartData/Before-vs-After-IMAGE.jpg

[3] http://www.qualitydigest.com/IQedit/Images/Articles_and_Columns/2015/Sept_2015/ChartData/DataSet2.jpg

[4] http://www.qualitydigest.com/IQedit/Images/Articles_and_Columns/2015/Sept_2015/ChartData/DataSet2-Composite.jpg

[5] http://www.qualitydigest.com/inside/quality-insider-column/why-we-keep-having-100-year-floods.html