Donald J. Wheeler  |  03/31/2009

No Data Have Meaning Without Context

How should you organize and plot your data?

As Davis Balestracci frequently emphasized in his column, “RealWorldSPC,” published in Quality Digest for four years, it is fundamental to understand the context of the data before you begin to do any computations. It is the background for your data that determines how you should organize the data, how you should analyze the data, and how you should interpret the results of your analysis. Once you ignore the context, you’re like a train that has gone off the track, with the inevitable result.

One day a company sent me some data that it had spent more than a month collecting. These data represented the results of an experimental study carried out using production batches. For each of 30 batches the company recorded all sorts of production information, along with the experimental conditions that applied to that batch. At the end of the production process it took 40 items from each batch and measured the property of interest. Thus, it had a total to 1,200 values: 40 values for each of the 30 batches.

With 40 values per batch, the company got busy and drew a histogram for the values for each batch. It also computed the average, the standard deviation statistic, and even the skewness and kurtosis statistics. In its attempt to make sense of this mountain of data, it created a summary sheet for each batch. These 30 summary sheets each contained the production and experimental information for each batch, along with the histogram and the set of descriptive statistics. Having gotten this far, the company had stalled out. There was still too much information to assimilate; hence it decided to call on my services.

As the company was preparing to send me the data, someone realized that I was likely to ask for a control chart, so the data were placed on an average and standard deviation chart using the 30 subgroups of size 40 defined by the 30 batches. This chart was stapled on to the stack of 30 summary sheets, and the entire collection was delivered to my office.

Although the summary sheets contained all sorts of contextual information, none of that information had been used in creating the average and standard deviation chart. In fact, the order of the points on the chart made it clear that the chart had been done as an afterthought; the order of the points on the chart matched the order of the summary sheets in the stack, and the stack had been rather thoroughly shuffled prior to being sent to me. This chart is shown in figure 1, below.

Upon recognizing that the average and standard deviation chart had not been organized in any manner that respected the context for the data, I immediately rearranged the chart using the dates when each batch was compounded.

With the new ordering I hit the jackpot, as seen in figure 2, above. All of the lower values for the property of interest occurred before a certain date. All of the elevated values for the property of interest occurred after that date. This one change is the largest signal within these data. It did not line up with any of the experimental factors. Hence, regardless of whether the experimental factors have any influence upon the property of interest, there is at least one dominant factor that the company overlooked. If it continues to ignore this dominant factor, it will continue to have an unpredictable production process. Tweaking the factors it studied will not make this unknown, but dominant, factor go away.

February’s column was titled “First, Look at the Data.” This is important even when the data come from an experimental study. Failure to do so can result in a train wreck of an analysis. After discovering the presence of this dominant factor in the company’s data, I went to its plant, showed them my average chart for the data, and watched their jaws drop open. Then I flew home and sent them a bill for simply plotting data in time-order sequence--something which the company should have done itself.

Discuss

About The Author

Donald J. Wheeler’s picture

Donald J. Wheeler

Dr. Donald J. Wheeler is a Fellow of both the American Statistical Association and the American Society for Quality, and is the recipient of the 2010 Deming Medal. As the author of 25 books and hundreds of articles, he is one of the leading authorities on statistical process control and applied data analysis. Find out more about Dr. Wheeler’s books at www.spcpress.com

Dr. Wheeler welcomes your questions. You can contact him at djwheeler@spcpress.com.