Featured Product
This Week in Quality Digest Live
Statistics Features
Donald J. Wheeler
The more you know, the easier it becomes to use your data
Scott A. Hindle
Part 7 of our series on statistical process control in the digital era
Donald J. Wheeler
How you can filter out noise
Scott A. Hindle
Part 6 of our series on SPC in a digital era
Douglas C. Fair
Part 5 of our series on statistical process control in the digital era
Statistics News
How to use Minitab statistical functions to improve business processes
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Elsmar Cove is a leading forum for quality and standards compliance
InfinityQS’ quality solutions have helped cold food and beverage manufacturers around the world optimize quality and safety
User friendly graphical user interface makes the R-based statistical engine easily accessible to anyone
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment
Statistics

## Numerical Jabberwocky

### Numbers that gyre and gymble in the wabe

Published: Wednesday, April 1, 2015 - 15:40

The first axiom of data analysis is: “No data have meaning apart from their context.” Yet we all encounter measures that have been dreamed up without regard for either context or use. This column gives a couple of the more egregious examples I have encountered over the years.

### Kentucky Higher Education Commission

The first example is a formula used by the Kentucky Higher Education Commission in the 1970s to allocate state funds to the various state-run colleges and universities. A former student of mine sent this formula along with the relevant data for all of the post-secondary state schools. This funding equation had five terms that had to be evaluated for each school. The first four terms were relatively complex terms involving various variables to characterize the school, while the last term was the simple ratio of the school’s full-time equivalent head count to the total full-time equivalent head count for the state as a whole.

Because of the complexity of the first four terms I began by simply evaluating the formula for each school. Since I wanted to understand the formula, I evaluated each term separately. I listed these in a spreadsheet with the colleges down the left side, and the values for each of the five terms in the successive columns. As I did this I found that the first and third terms were always large and positive, while the second and fourth terms were always large and negative. Of course, in each case, the fifth term was a simple proportion.

As I worked with these values I began to notice a curious phenomenon. For each school the sum of the first four terms always came out to be zero! The first four terms had nothing to do with funding. The funds were prorated on the full-time equivalent head count, and the remainder of the equation was there simply to confuse the school administrators and to keep the legislators at bay.

### U. S. Postal Service

In the 1990s the U. S. Postal Service used the Delivery Collection Efficiency Achievement (DCEA) measure to ensure that “the beatings will continue until quality improves.” The DCEA measure was a function of the four variables that I will call X, Y, Z, and H

X = Total cased delivery volume for the operating unit. If all the mail processed by the unit in a single month was placed in a single stack, this value would be the height of that stack measured in feet.

Y = Delivery days in the month

Z = Possible deliveries. Different units serve different numbers of addresses. So the product of Y and Z will essentially define the total number of addresses served by the unit in a month.

H = Man-hours for the operating unit

Clearly, the volume X, divided by the man-hours H, would be a simple measure of productivity, but the DCEA is slightly more complex than this. In fact it begins with the ratio:

If we interpret X to represent the delivery volume of the operating unit, and if we think of the product of Y and Z as the area of opportunity for deliveries, then perhaps it might make sense to divide X (production volume) by the product of Y and Z (number of addresses served) to get a measure of efficiency [volume per address]. However, when this ratio is raised to the 0.35922 power all vestiges of rationality are lost. Both the ratio and the attached measurement units become complete nonsense when raised to the 0.35922 power. How could you even begin to explain this value? Next the DCEA multiplies the nonsense ratio above by the following sum:

The first term of this sum is a very, very specific fraction (given down to the level of parts per million) of our “area of opportunity,” the product of Y and Z. The second term in this sum is a simple measure of productivity—volume divided by man-hours. The problem here is that there is no way that these two terms will have the same units. The first term will be a count of [total delivery addresses served] while the second will be a ratio of [feet per hour]. If you ignore the units attached to your values, then you too can end up adding apples and zebras to get gobbledygook.

Finally, the DCEA multiplies the product of the nonsense ratio and the gobbledygook sum by the obvious scaling factor of 5.61439! Any questions?

Thus, the DCEA is nothing more than numerical jabberwocky—complex nonsense in­tended to impress the numerically illiterate. Such meaningless numbers may be computed, but they cannot be called measures since they do not, in any real sense, measure anything. If you can’t articulate exactly what a value measures or counts, then you have left the realm of common sense.

Just because you can compute a piece of numerical jabberwocky does not justify its use. The only justification for using any number is that it makes sense in terms of its context. Statistical techniques cannot undo numerical jabberwocky. No matter how sophisticated your analysis, when you fail to respect the primary axiom of data analysis and end up using numerical jabberwocky in your computations, your results will be triumphs of computation over common sense.

As we saw in the last example, one of the easiest ways to create numerical jabberwocky is to take some measure, which makes sense on its own, and transform that measure in some nonlinear fashion. Logarithmic transformations, exponential transformations, square-root transformations, and trigonometric transformations can quickly transport you to a land of fantasy and make-believe. Once there, statistical techniques cannot restore that which was lost.

So, “Beware the Jabberwock, my son!”

### Donald J. Wheeler

Dr. Wheeler is a fellow of both the American Statistical Association and the American Society for Quality who has taught more than 1,000 seminars in 17 countries on six continents. He welcomes your questions; you can contact him at djwheeler@spcpress.com.