spctoolkit

by Donald J. Wheeler


The problem is not in knowing how to manipulate numbers
but rather in not knowing how to interpret them.


Analyzing Data
From the beginning of our education, we have all learned that "two plus two is equal to four." The very definiteness of this phrase summarizes the unequivocal nature of arithmetic. This phrase is used to characterize that which is inevitable, solid and beyond argument. It is the first item in our educational catechism, which is beyond dispute.

This bit of arithmetic has been elevated to a cliché for the following reasons. During the years when we were learning our sums and our multiplication tables, we were also learning to spell and to write. This means that we had to learn about irregular spellings. We had to learn to use irregular verbs. And we had to learn to cope with many of the idiosyncrasies of language. In contrast to this, we learned that there are no irregular spellings in arithmetic. Whether you multiply three by two or multiply two by three, the result is always six. Addition, subtraction, multiplication and division contain no irony; they contain no hyperbole. The multiplication tables contain no sarcasm.

As a result, we receive a subliminal message: Numbers are concrete, regular and precise, but words are inconstant, vague and changing. The contrast between the regularity (and for some, the sterility) of mathematics and the complexity (and richness) of language leaves us all with an inherent belief that numbers possess some native objectivity that words do not possess. Hence, when we want to indicate a solid and dependable truth, we are prone to recall the first rule in the mathematical catechism: Two plus two is equal to four.

Because of this subliminal belief, we feel that we have some sort of control over those things we can measure. If we can express it in numbers, then we have made it objective, and we therefore know that with which we are dealing. Moreover, due to all the uncertainty we routinely must deal with, this ability to quantify things is so reassuring, so comforting, that we gladly embrace measurements as being solid, real and easy to understand.

Hence, today we have gone beyond measuring the physical world. We have gone beyond the accounting of wealth. Now we are trying to measure everything. If we can quantify it, then we can deal with it "scientifically." So now we "measure" attitudes, we measure satisfaction, and we measure performance. And once we have measured these things, we feel that we know them objectively, definitively and concretely.

But, having obtained these measurements, how do you analyze them? Do the normal rules of arithmetic apply?

Unfortunately, all of our mathematical education has not prepared us to properly analyze such measurements. Our very first lessons taught us that two numbers which are not the same are different. So when the numbers differ, we conclude that the things being measured are also different. That this is not so is a fact that seems to have escaped the attention of almost everyone.

And when we think the things are different, we tend to rank them and publish a list. For example, a recent article in my local newspaper reported that Nashville and Knoxville were, respectively, the 25th and 27th "most violent cities in the country." This ranking was based on the number of crimes against persons reported to the FBI by the local law enforcement agencies. But just what is entailed in such numbers? Is purse snatching a burglary (a crime against property) or a robbery (a crime against a person)? Is domestic violence reported as an assault or as disturbing the peace? These and other crimes are reported differently in different cities.

Finally, even if the crimes were categorized and reported the same way, would the crime rates make the proper comparison? The incorporated portion of Nashville includes all of Davidson County and consists of urban, suburban and rural areas. In contrast, only half the population of greater Knoxville lives within the city limits-the rest live in the unincorporated portions of Knox County. Therefore Knoxville contains a much higher proportion of urban environments than does Nashville. If crime rates are higher in an urban setting, then dividing the number of reported crimes by the city's population will artificially inflate Knoxville's rate compared to that of Nashville.

Considerations such as these can raise more than a reasonable doubt about the appropriateness of most of the published rankings we hear about every day. Many comparisons made by those who compile lists are virtually meaningless. The only thing that is worse than the compilation of such rankings is the use of these rankings for business decisions.

The problem here is not a problem of arithmetic. It is not a problem of not knowing how to manipulate numbers but rather in not knowing how to interpret them. All the arithmetic, all the algebra, all the geometry, all the trigonometry and all the calculus you have ever had was taught in the world of pure numbers. This world is one where lines have no width, planes have no thickness and points have no dimensions at all. While things work out very nicely in this world of pure numbers, we do not live there.

Numbers are not exact in the world in which we live. They always contain variation. As noted above, there is variation in the way numbers are generated. There is variation in the way numbers are collected. There is variation in the way numbers are analyzed. And finally, even if none of the above existed, there would still be variation in the measurement process itself. Thus, without some understanding of all this variation, it is impossible to interpret the numbers of this world.

If a manufacturer applies two film coatings to a surface, and if each coating is two microns thick, will the combined thickness of the two coatings be exactly four microns thick? If we measure with sufficient care and precision, the combined thickness is virtually certain to be some other value than four microns. Thus, when we add one thing that is characterized by the value 2.0 to another thing characterized by the value 2.0, we end up with something which is only equal to four on the average.

What we see here is not a breakdown in the rules of arithmetic but a shift in what we are doing with numbers. Rather than working with pure numbers, we are now using numbers to characterize something in this world. When we do this, we encounter the problem of variation. In every measurement, and in every count, there is some element of variation. This variation is connected to both the process of obtaining the number and to the variation in the characteristic being quantified. This variation tends to "fuzz" the numbers and undermine all simple attempts to analyze and interpret the numbers.

So how, then, should we proceed? How can we use numbers? When we work with numbers in this world, we must first make allowances for the variation that is inherent in those numbers. This is exactly what Shewhart's charts do-they filter out the routine variation so that we can spot any exceptional values which may be present. (One way of doing this was described in this column last month.) This filtering, this separation of all numbers into "probable noise" and "potential signals" is at the very heart of making sense of data. While it is not good to miss a signal, it is equally bad to interpret noise as if it were a signal. The real trick is to strike an economic balance between these two mistakes, and this is exactly what Shewhart's charts do. They filter out virtually all of the probable noise, so that anything left over may be considered a potential signal.

Whether or not you acknowledge variation, it is present in all of the numbers with which you deal each day.

If you choose to learn about variation, it will change the way you interpret all data. You will still detect those signals that are of economic importance, but you will not be derailed by noise.

If you choose to ignore variation, then for you, two plus two will still be equal to four, and you will continue to be misled by noise. You will also tend to reveal your choice by the way you talk and by the mistakes you make when you interpret data.

Two plus two is only equal to four on the average. The sooner you understand this, the sooner you can begin to use numbers effectively.

About the author . . .
Donald J. Wheeler is an internationally known consulting statistician and the author of Understanding Variation: The Key to Managing Chaos and Understanding Statistical Process Control, Second Edition.
© 1996 SPC Press I