spctoolkit

by Donald J. Wheeler

Teens Who Smoke

"Teen Use Turns Upward" read the headline for a graph appearing in USA Today on June 21, 1994. The data in the graph were attributed to the Institute for Social Research at the University of Michigan and were labeled as the "percentage of high school seniors who smoke daily." The portion of this graph covering the past 10 years is shown in Figure 1.

Each point is the value found in an annual survey. The 1993 value of 19 percent was higher than the 1992 value of 17.3 percent. This was interpreted to mean that more teenagers are using tobacco now than in the past. But are they?

Before we can make sense of numbers like these, we need to know something about the limitations of data from surveys. First, values like these are subject to variation. Two identical surveys carried out at the same time will rarely yield identical results. Among the sources of variation for survey data are differences in who is interviewed, how they are interviewed, how they respond and how their responses are reported. Second, when a survey is used from year to year, there is also the problem of different personnel being used to conduct the survey and differences in how the questions are perceived in different years.

Therefore, no matter what is being measured, and no matter how carefully it is measured, the statistics will always vary. Even if nothing changes, we can expect the value to go up about half the time, and we can also expect the value to go down about half the time.

So how do we ever detect a change using survey data? If we interpret each and every change in the percentage who smoke daily as a year-to-year difference, how do we know that we are not being misled by the study-to-study variation? If we admit that there is study-to-study variation, how then do we ever know when there has been a change from one year to another?

The answer is that we must first filter out the study-to-study variation, and then look for year-to-year differences. The simplest way to do this is with a control chart.

We begin with the yearly values. For the data on high school students who smoked daily, the annual percentages reported for 1984 through 1993 were, respectively: 18.8, 19.6, 18.7, 18.6, 18.1, 18.9, 19.2, 18.2, 17.3, 19. The average of these 10 values is 18.64 percent.

This average is used as a central line, and the 10 values are plotted as a time series as shown in Figure 2. This graph is the beginning of the control chart for individual values (also known as an X-chart).

Because the variation between one year's value and the next will always include the study-to-study variation, we use the year-to-year variation as our guide to how much uncertainty is inherent in the reported results. These year-to-year changes are measured by the differences between successive values (these differences are called moving ranges). The nine moving ranges for these data are: 0.8, 0.9, 0.1, 0.5, 0.8, 0.3, 1.0, 0.9, 1.7. The average moving range is 0.778 percent. We use this average moving range to compute limits for the previous graph.

The limits for our X-chart are commonly known as natural process limits. They are placed symmetrically on either side of the central line. The distance from the central line to these limits is found by multiplying the average moving range by 2.660. This value of 2.660 is a constant that converts the raw statistic into the appropriate measure of dispersion.

For these data, this distance is:
2.660 x 0.778% = 2.07%
Thus, the upper natural process limit is:
18.64% + 2.07%= 20.71%
The lower natural process limit is:
18.64% p; 2.07% = 16.57%

These limits make allowance for routine variation. They are added to the graph to obtain the X-chart, shown in Figure 3.

Before a yearly value can be said to represent a change in the use of tobacco by teenagers, it will have to either exceed the upper limit or fall below the lower limit. Since none of these values fall outside these limits, any statement about changes in the percentage of teens who smoke is questionable.

But wait-the change between the last two values, where the percentage jumped from 17.3 percent to 19 percent, represents the biggest change during the past 10 years. Surely this should mean something.

To see if this is the case, we can place the moving ranges on a control chart. The average moving range of 0.778 will be the central line, and the upper limit will be found by multiplying the average moving range by the constant value of 3.27. This results in an upper limit for the moving ranges of 2.54 (see Figure 4).

The last value on this moving range chart shows the "jump" between 1992 and 1993. This moving range of 1.7 percent does not fall above the upper limit of the moving range chart. Thus, once again, the "jump" from 17.3 percent to 19 percent does not qualify as a clear-cut signal.

So, what can we say about the percentage of teenagers who smoke daily? Just this: There is no evidence that the percentage of teenagers who smoke has increased. Neither is there any evidence that this percentage has decreased in the past 10 years. The only headline for these data that has any integrity is, "No Change in Teen Use of Tobacco." Anything else is propaganda.

So how do you avoid being persuaded by propaganda? Start by realizing that while all data contain noise, only some data contain signals. If you don't know how to separate the probable noise from the potential signals, you are susceptible to being misled by the noise in the data. Others may use data to mislead you-or you may even mislead yourself. Shewhart's charts are the simplest way to separate signals from noise.

By the way, did you read the article about how the trade deficit soared last April? Oh, well, that's another story-or is it?

About the author . . .

Donald J. Wheeler is an interna-tionally known consulting statistician and the author of Understanding Variation: The Key to Managing Chaos and Understanding Statistical Process Control, Second Edition. © 1996 SPC Press Inc. Telephone (423) 584-5005.