Our PROMISE: Our ads will never cover up content.

Our children thank you.

Six Sigma

Published: Monday, January 2, 2012 - 15:56

I

n my February 1996 *Quality Digest* column I discussed an article out of *USA Today*. Since that article provides a great example of how we need to filter out the noise whenever we attempt to interpret data, I have updated it for my column today. “Teen Use Turns Upward” read the headline for a graph appearing in *USA Today* on June 21, 1994. The data in the graph were attributed to the Institute for Social Research at the University of Michigan and were labeled as the “percentage of high-school seniors who smoke daily.”

The portion of this graph covering the 10 years from 1984 to 1993 is shown in figure 1.

**Figure 1:**

Each point in figure 1 is the value found in an annual survey. The 1993 value of 19.0 percent was higher than the 1992 value of 17.3 percent. This was interpreted to mean that more teenagers are using tobacco now than in the past. But are they?

Before we can make sense of numbers such as these we need to know something about the limitations of data from surveys. First of all, values such as these are subject to variation. Two identical surveys carried out at the same time will rarely yield identical results. Among the sources of variation for survey data are differences in who is interviewed, differences in how they are interviewed, differences in how they respond, and differences in how their responses are reported. Second, when a survey is used from year to year, there are also the problems of different personnel being used to conduct the survey, and differences in how the questions are perceived in different years.

Therefore, no matter what is being measured, and no matter how carefully it is measured, the statistics will always vary. There will always be some variation in survey results from year to year. Even if nothing is changing, we can expect the value to go up about half the time, and we can also expect the value to go down about half the time. (Look at the past nine years—these values were greater than the previous value four times, and less than the previous value five times.)

So how do we ever detect a change using survey data? If we interpret each and every change in the percentage who smoke daily as a year-to-year difference, how do we know that we are not being misled by the variation from survey to survey? If we admit that there is variation between surveys, how then do we ever know when there has been a change from one year to another?

The answer is that we must first filter out the routine variation between surveys, and then look for year-to-year differences. The simplest way to do this is with a process behavior chart.

We begin with the yearly values. For the data on high-school students who smoked daily, the annual percentages reported for 1984 through 1993 were, respectively:

The average of these 10 values is 18.64 percent. This average is used as a central line, and the 10 values are plotted as a time series as shown in figure 2. This graph is the beginning of the chart for individual values (also known as an X-chart).

**Figure 2: **

Since the variation between one year’s value and the next will always include the routine variation between surveys, we use the year-to-year variation as our guide to how much uncertainty is inherent in the reported results. These year-to-year changes are measured by the differences between successive values (these differences are called moving ranges). The nine moving ranges for these data are:

The average moving range is 0.778 percent. We use this average moving range to compute limits for both the X-chart and the moving range chart (mR-chart). The limits for the X-chart are commonly known as natural process limits. They are placed symmetrically on either side of the central line. The distance between the central line and these limits is found by multiplying the average moving range by 2.660. This value of 2.660 is a constant that converts the raw statistic into the appropriate measure of dispersion. In this case, the natural process limits are computed to be 16.57 percent to 20.71 percent.

**Figure 3:**

These limits make allowance for the routine variation. Before any single value can be said to represent a change in the use of tobacco by teenagers, it will have to either exceed the upper limit or fall below the lower limit. Since none of these values fall outside these limits, any statement about changes in the percentage of teens who smoke is questionable.

But wait—the change between the last two values, where the percentage jumped from 17.3 percent to 19.0 percent, is the biggest change during the past 10 years. Surely this should mean something. To see if this is the case, we place the moving ranges on a chart. The average moving range of 0.778 will be the central line for this chart, and the upper limit will be found by multiplying the average moving range by 3.27. This results in an upper range limit of 2.54 percent.

**Figure 4: **

The last value on this moving range chart shows the “jump” between 1992 and 1993. This moving range of 1.7 percent does not fall above the upper limit of the moving range chart. Thus, once again, the “jump” from 17.3 percent to 19.0 percent does not qualify as a clear-cut signal.

So, what can we say about the percentage of teenagers who smoke daily? Just this: The data presented show no evidence that the percentage of teenagers who smoke has increased. Neither is there any evidence that this percentage has decreased during the past 10 years. The only headline for these data that has any integrity is “No Change in Teen Use of Tobacco.” Anything else is just propaganda.

But what about teens who smoke while driving? The point of this column is that you should be careful about believing the headlines. However, since driving while distracted is currently a hot topic for news, I give you the headline from the Dec. 14, 2011, *USA Today*, “19 states see jump in traffic fatalities.” The subtitle was, “Masked by national decline for fifth year.” The lead paragraph in this story is, “Even as U.S. traffic fatalities dropped in 2010 for the fifth straight year, more than a third of the states saw their deaths increase.” Presumably 31 states reported a decline in traffic fatalities, but evidently the editors wanted to spin the story by talking about the 19 states where traffic fatalities went up. Of course, the implicit assumption behind this headline is that every change is a signal. Unfortunately, this is simply not true. The first mistake of data analysis is to interpret noise as a signal. The second mistake is to fail to detect a signal that is present. When you interpret every change as a signal, then you will always be distracted by the noise that is present in all data, and you will become fair game for the propaganda artists. The result of all this is that the more you listen to the news these days, the less you know.

So how do you avoid being persuaded by propaganda? The first principle of data analysis is that no data have any meaning apart from their context. This automatically precludes interpreting isolated values or even limited comparisons between pairs of points. The second principle of data analysis is that while all data contain noise, only some data contain signals. If you do not know how to separate the probable noise from the potential signals you are susceptible to being misled by the noise in the data. Others may use data to mislead you—or you may even mislead yourself. While I have written a whole book on these two principles, *Understanding Variation, the Key to Managing Chaos* (SPC Press, 2000), the short and simple answer is that process behavior charts are the simplest way to see the data in context and filter out the noise.

By the way, did you read the article about how the trade deficit soared last April? Oh, well, that’s another story—or is it?

## Comments

## Should be required reading

This article should be required reading for every journalist, journalism major and blogger. Also every government employee, politician, lobbyist and spind doctor. Actually, some of those folks probably understand these things and try to use it to advantage (that would be the "damned lies" part).

## Detecting change in successive surveys

The article shows the use of I-MR charts to detect change. With survey data, what is the role of confidence intervals when deciding if there has been a change between successive survey averages?

To use the teens smoking example, what if we had a 99% confidence interval of +/-0.5% for each year of survey data. Then we move from 17.3% in '92 to 19.0% in '93. Should we be able to say we're 99% confident that the 1992 and 1993 smoking rates are different? Or, in this case, as the I-MR chart shows we cannot conclude there is a change, is our only our conclusion that the difference from 1992 to 1993 is not due to survey error, it's just common cause variation in the underllying process.

## 2.66

## For Francois van der Walt

## Wood for my fire

Dear Donald,

Thanks for reminding me of this article. It certainly gives me wood for my fire. (it will be forwarded to some people that should read it) I do a lot of work in marketing and sales. It is possibly the industry where understanding variation should be crucial. I have witnessed Companies "losing" millions by either not even measuring or acting on single events without understanding the context.

I would be interested in your own and your readers experience in applying XmR charts on sales and marketing data.

Kind Regards

Francois van der Walt

## XmR charts on sales and marketing data

Hi Francois van der Walt,

I have used XmR charts on sales and marketing data, and I could email examples to you if you want (Spanish). You may also visit Pauls Selden's web page: www.paulselden.com He is author of several works in the field, including "Sales Process Engineering".

## 2.66 and 3.27

Hello Dr. Wheeler:

Do the constant values of 2.66 and 3.27 have names? I would like to better understand where these constant values come from.

Thank you, Dirk van Putten

## Constant Values 2.66 and 3.27

Dirk Van Putten, 2.66 is 3 / d_2, where d_2 is an anti-biasing constant that can be found in most textbooks. 3.27 is the D_4 anti-biasing constant.

The Wikipedia page on Indivduals charts (https://en.wikipedia.org/wiki/Individuals_chart) gives a decent overview the charts. Dr. Wheeler's books and other writings go into somewhat greater detail on the constants, as do other books on statistical process control (see, for example, the classic text by Montgomery, though I would caution that Montgomery's general approach to process behavior charts sometimes seems to be at odds with Dr. Wheeler's).

## Great Article

Hello Dr. Wheeler:

Great article by the way. I followed all of the math and statistics and understood the genreal concept of noise vs. signal.

Thank you, Dirk

## Scaling Factors

Hello Dr. Wheeler:

I think I found my answer in your Book, "Twenty Things You Need To Know", Chapter 5, Where Do Scaling Factors Come From?

Thank you, Dirk