Featured Product
This Week in Quality Digest Live
Statistics Features
Donald J. Wheeler
The United States is ranking first and last in all the wrong places
Eric Weisbrod
Make it easier to enable agile and resilient organizations
Steve Wise
There’s a better way to fully utilize the data you already collect
Dirk Dusharme @ Quality Digest
Start by removing unnecessary (and often nonexistent) roadblocks
Jason Chester
The modern manufacturing professional is not only capable of handling technology, but expects it

More Features

Statistics News
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment
A guide for practitioners and managers
Gain visibility into real-time quality data to improve manufacturing process efficiency, quality, and profits
Tool for nonstatisticians automatically generates models that glean insights from complex data sets
Version 3.1 increases flexibility and ease of use with expanded data formatting features
Provides accurate visual representations of the plan-do-study-act cycle
SQCpack and GAGEpack offer a comprehensive approach to improving product quality and consistency

More News

Donald J. Wheeler

Statistics

Numerical Jabberwocky

Numbers that gyre and gymble in the wabe

Published: Wednesday, April 1, 2015 - 16:40

The first axiom of data analysis is: “No data have meaning apart from their context.” Yet we all encounter measures that have been dreamed up without regard for either context or use. This column gives a couple of the more egregious examples I have encountered over the years.

Kentucky Higher Education Commission

The first example is a formula used by the Kentucky Higher Education Commission in the 1970s to allocate state funds to the various state-run colleges and universities. A former student of mine sent this formula along with the relevant data for all of the post-secondary state schools. This funding equation had five terms that had to be evaluated for each school. The first four terms were relatively complex terms involving various variables to characterize the school, while the last term was the simple ratio of the school’s full-time equivalent head count to the total full-time equivalent head count for the state as a whole.

Because of the complexity of the first four terms I began by simply evaluating the formula for each school. Since I wanted to understand the formula, I evaluated each term separately. I listed these in a spreadsheet with the colleges down the left side, and the values for each of the five terms in the successive columns. As I did this I found that the first and third terms were always large and positive, while the second and fourth terms were always large and negative. Of course, in each case, the fifth term was a simple proportion.

As I worked with these values I began to notice a curious phenomenon. For each school the sum of the first four terms always came out to be zero! The first four terms had nothing to do with funding. The funds were prorated on the full-time equivalent head count, and the remainder of the equation was there simply to confuse the school administrators and to keep the legislators at bay.

U. S. Postal Service

In the 1990s the U. S. Postal Service used the Delivery Collection Efficiency Achievement (DCEA) measure to ensure that “the beatings will continue until quality improves.” The DCEA measure was a function of the four variables that I will call X, Y, Z, and H

X = Total cased delivery volume for the operating unit. If all the mail processed by the unit in a single month was placed in a single stack, this value would be the height of that stack measured in feet.

Y = Delivery days in the month

Z = Possible deliveries. Different units serve different numbers of addresses. So the product of Y and Z will essentially define the total number of addresses served by the unit in a month.

H = Man-hours for the operating unit

Clearly, the volume X, divided by the man-hours H, would be a simple measure of productivity, but the DCEA is slightly more complex than this. In fact it begins with the ratio:

If we interpret X to represent the delivery volume of the operating unit, and if we think of the product of Y and Z as the area of opportunity for deliveries, then perhaps it might make sense to divide X (production volume) by the product of Y and Z (number of addresses served) to get a measure of efficiency [volume per address]. However, when this ratio is raised to the 0.35922 power all vestiges of rationality are lost. Both the ratio and the attached measurement units become complete nonsense when raised to the 0.35922 power. How could you even begin to explain this value? Next the DCEA multiplies the nonsense ratio above by the following sum:

The first term of this sum is a very, very specific fraction (given down to the level of parts per million) of our “area of opportunity,” the product of Y and Z. The second term in this sum is a simple measure of productivity—volume divided by man-hours. The problem here is that there is no way that these two terms will have the same units. The first term will be a count of [total delivery addresses served] while the second will be a ratio of [feet per hour]. If you ignore the units attached to your values, then you too can end up adding apples and zebras to get gobbledygook.

Finally, the DCEA multiplies the product of the nonsense ratio and the gobbledygook sum by the obvious scaling factor of 5.61439! Any questions?

Thus, the DCEA is nothing more than numerical jabberwocky—complex nonsense in­tended to impress the numerically illiterate. Such meaningless numbers may be computed, but they cannot be called measures since they do not, in any real sense, measure anything. If you can’t articulate exactly what a value measures or counts, then you have left the realm of common sense.

Just because you can compute a piece of numerical jabberwocky does not justify its use. The only justification for using any number is that it makes sense in terms of its context. Statistical techniques cannot undo numerical jabberwocky. No matter how sophisticated your analysis, when you fail to respect the primary axiom of data analysis and end up using numerical jabberwocky in your computations, your results will be triumphs of computation over common sense.

As we saw in the last example, one of the easiest ways to create numerical jabberwocky is to take some measure, which makes sense on its own, and transform that measure in some nonlinear fashion. Logarithmic transformations, exponential transformations, square-root transformations, and trigonometric transformations can quickly transport you to a land of fantasy and make-believe. Once there, statistical techniques cannot restore that which was lost.

So, “Beware the Jabberwock, my son!”

Discuss

About The Author

Donald J. Wheeler’s picture

Donald J. Wheeler

Dr. Donald J. Wheeler is a Fellow of both the American Statistical Association and the American Society for Quality, and is the recipient of the 2010 Deming Medal. As the author of 25 books and hundreds of articles, he is one of the leading authorities on statistical process control and applied data analysis. Find out more about Dr. Wheeler’s books at www.spcpress.com.

Dr. Wheeler welcomes your questions. You can contact him at djwheeler@spcpress.com

Comments

Nice

A lovely, subtle attack on the Six Sigma crazies and their meaningless transformations.  Why do people want to buy Minitab and spend their hours playing with nonsense numbers instead of reading Shewhart?

Brillig ADB

There is light enough for those who wish to see, and darkenss enought for those who are otherwise inclined.