Our PROMISE: Our ads will never cover up content.

Our children thank you.

Quality Insider

Published: Wednesday, July 31, 2013 - 12:22

In one recent online forum, a Six Sigma Black Belt asked a question about validating samples—how to ensure that when they are taken, they would reflect (i.e., represent) the population parameter. His purpose: to understand the baseline for a project. He said he had six months of data regarding cycle times for handling maintenance tickets.

Among the early suggestions to the query were an assortment of options, including cross validation using regression (comparing randomly selected subsets) and t-tests on small samples. Another person suggested testing the data for normality: “Then any sampling technique will do,” he claimed.

Someone suggested plotting the data on an XmR or XbarR chart. Someone else suggested simply taking the average and then using process maps and lean techniques to reduce the cycle time. This person asserted that, “Random sampling is all that is needed to have a representative sample—by definition.” He went on to suggest that stability doesn’t matter; with six months of data, you can just number the tickets from 1 to k, and use a random number generator to select a sample. His justification? Classical statistical texts don’t require you to check for stability before taking a random sample.

Well, I guess I can’t argue with that last statement. I have numerous classical statistical texts, and most of them don’t mention statistical control at all. There's a reason for that: Classical statistical texts tend to contain a lot of information about enumerative studies. If they mention control charts or other tools for analytic studies, they usually don’t do it well. That’s a problem, because most managers taking business statistics courses end up studying enumerative studies that they’ll seldom use and not learning anything about analytic studies, which should be their bread and butter. David Schwinn wrote an excellent article about this back in September 2012.

This is a vitally important distinction. We, as quality practitioners, must learn to “Render unto enumerative studies the things that are enumerative studies’, and unto analytic studies the things that are analytic studies’.” Enumerative studies involve studying a population, to characterize it. In that case, sampling theory and sampling error tell you how good your sample is. If you do your random sampling well, you can calculate how representative your sample is likely to be.

In process improvement work, though, we usually work on processes (as the Black Belt questioner was). Process studies are analytic studies. We are studying the cause system for the purpose of acting on that cause system. In this case, we are not studying a population; we are studying rational subgroups to characterize the behavior of the process over time. There is no population because most of the population of interest is in the future. For these studies random sampling is usually irrelevant because judgment is more important in determining a basis for rational subgrouping. Control limits provide the best representation of uncertainty.

So the point is, without knowing whether your daily cycle-time average is in control, you don’t know what the average daily cycle time is. Don Wheeler recently made that point in “Why We Keep Having 100-Year Floods.” As much as I liked that article, though, I always go back to a lesson I learned from Davis Balestracci’s original ASQ Statistics Division special publication, “Data Sanity.” I have adapted it here.

Consider three clinics; we’ll call them A, B, and C. We will use “Daily proportion of nurse line calls answered within two minutes” as the metric.

What can you say about the performance of the clinics, based on the histograms and data summaries?

**Figure 1:**here

**Figure 2:**here

**Figure 3:**here

The summaries presented in the histograms above all show unimodal, fairly symmetrical, bell-shaped piles of data. The p-values for the Anderson-Darling tests for normality are all high, indicating no significant departures from a normal distribution. There are no apparent outliers. The mean percentage for each clinic is a little over 85 percent, and the standard deviations are all around 2.5 percent.

The histogram, though, is a snapshot. It only reveals how the data piled up at a particular point in time. The graphic, and its associated summary statistics, can only represent what’s happening at the clinics if the data are homogeneous. These data were gathered over time: What would a picture of the data over time reveal?

The control chart for Clinic A is in Figure 5. Although the histogram showed the same bell-shaped pattern and high p-value for the normality test, you can easily see that the histogram can’t represent the data for Clinic A; we caught it during an overall upward trend, and so a histogram of the next 60 days will no doubt look very different from the histogram of the first 60 days. Also of interest is the control chart for Clinic B (Figure 6).

**Figure 5:**here

**Figure 6:**here

In the case of Clinic A (where the data look unimodal, symmetrical, and bell-shaped in the histogram), the process has been trending over time. Can you calculate a mean for all the data? Of course you can; it’s just arithmetic. But what would it represent? According to one respondent, it represents “the actual mean of actual output for a specified period of time.” It is the mean of the output, there’s no denying that. If you’ve done the arithmetic correctly, you will get the average of the output. It’s of no use as a baseline, though, because there is no actual distribution of the data; it’s not independent and identically distributed (iid), it's not homogeneous, so the mean doesn’t characterize the process. The number you get is nowhere near anything that represents where the process is today, because the process metric has been steadily climbing for this specified period of time. There is no “actual, true, average.” Stating that the mean of the pile of data represents the actual output is akin to stating that you know where a hurricane is because you’ve done the arithmetic to get the average latitude and longitude since it became a tropical storm off the coast of Africa... don’t worry about those high winds and heavy rains, Florida; on average, the storm is still halfway across the Atlantic!

This brings us to Clinic B (Figure 6). What we are actually seeing in Figure 6 is three different processes, the data for which just appear to stack up to a single, “normal” distribution. If we slice the chart at the shifts, as in Figure 7, we can see that there are three distinct time periods when the variation is in control—i.e., three distinct cause systems were at work. They show up as stable performance over three periods of time. For the first 25 days, the process was producing an average of 84.01, with an upper control limit (UCL) of 86.53 and a lower control limit (LCL) of 81.53. Then something happened, and the process shifted. For the next 15 days, the average output was 81.17, the UCL was 83.79, and the LCL was 78.54. Then the process mean shifted again, this time to 87.14, with a UCL of 90.79 and an LCL of 83.49.

The summary for Clinic B showed that the calculated mean of *all* the data was 84.34. True, it is not far from the average for the first 17 days, but it would not reflect in any way what actually occurred during the next 15 days, or during the last 20 days. In fact, for the last 20 days, the mean has been outside the upper control limit of the data for the first 17. So, if you have the Clinic B case, and you ignore the time order, average all the data, and claim that you have calculated the true, actual average" and try to use it as “a baseline to represent what actually occurred,” you would be incorrect.

**Figure 7:**here

The only clinic with a stable process is Clinic C. Looking at Clinic C’s plot over time in figure 8, we see the random pattern of variation within the control limits. We can now expect that the histogram will not change shape significantly over time, the parameters will all remain about the same, and so our assumptions about distribution will be valid and useful.

**Figure 8:**here

So, if we have data from a process over time, it should be clear that the process must be in control for sampling to be representative. If the process is in control, you don’t need a random sample to get the process parameters; you can estimate them from the control chart. You can make assumptions about the shape of the distribution. You can even test for lack of normality, if you must.

If the process is significantly shifting over time, a random sample across the shifts will not represent the process because you will have mixed a lot of things together. They will represent that “pile” of data, but that pile of data does not represent the output of the process; the pile is not homogeneous. A random sample from an aggregate of several smaller populations (each with different parameters) will give you estimates that do not represent any of the data. In the absence of a state of statistical control, you cannot assume anything about the distribution— and neither can you test for lack of fit. Determining homogeneity is one of the quests of SPC. As Walter Shewhart said in *Statistical Method From the Viewpoint of Quality Control* (Dover Publications, 1986), “We are not concerned with the functional form of the universe but merely with the assumption that a universe exists.” A state of statistical control indicates the existence of that universe.

You could indeed take all the data in a time series and draw random samples from it to get descriptive statistics that would characterize that pile of data. However, if the time series had not been stable throughout the time period, the descriptive statistics resulting from that analysis, although mathematically correct, would be utterly useless, representative of nothing. If I had the charts and knew that there had been sustained shifts or a long-running trend, why would I dump the data into one list and claim that sample statistics drawn from that list represent a characterization of the average cycle time or defective rate during the [entire] period covered by the data? Of what possible use would that statement be to anyone? You would end up with an average that does not represent anything real, and a grossly inflated standard deviation. You might as well calculate the average phone number (or test the data “for normality”).

Data from a time series can’t be “mixed rigorously” and then analyzed. The time series provides important contextual information that bears directly on the validity of the analysis.

If you have the time order for the data (that came in over time) but you decide to ignore that important context and treat it as a homogeneous, iid population, you would be ignoring vital information, and your statistics would be deliberately incorrect. If you then caused someone to make a decision based on statistics derived from that analysis that would be different from the decision they would make if they had the context, it would constitute statistical malpractice.

## Comments

## Go down Sampling

Based on my work experience, I'm more familiar with Pierre Gy's sampling theories, though criticized they can be. In any case, I do much appreciate a professional raising this issue: effective sampling is far from being AQL tables and ISO 2859 only. Thank you.

## I like that one

Average phone number? Rip - good one!

But seriously I sometimes wonder - assuming competent managers and employees, shouldn't we be able to figure out most things just by looking at the time series? A group responsible for maintenance ticket turnover time or nurse response time should be able to understnad when they are doing well and when not and also why by virtue of the fact that they work there. Insiders should know. Outsiders need fancy analysis.

## Maybe...

If they are using a time series, then they are ahead of the game. With a stable process, though, I still want the process behavior chart, so I can tell when assignable causes happen. It's not fancy analysis, just an XmR chart.