Featured Product
This Week in Quality Digest Live
Quality Insider Features
Hayder Radha
And what that means for the future of self-driving cars
Prashant Yadav
How to build resilient healthcare supply chains
Jonathan Griffin
New standard leads to smoother production in 3D printing
Anisur Rahman
ORNL finds scalable, sustainable approach
David Stevens
Tracking your assets is critical to patient safety

More Features

Quality Insider News
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Making designs a physical reality with the know-how to make more
Sapphire XC will ship in late Q3 beginning with aerospace companies
Major ERP projects take six months longer than companies were told
Program inspires leaders to consider systems perspective for continuous improvement and innovation

More News

Davis Balestracci

Quality Insider

Ah... Baseball

Kill the umpire!

Published: Thursday, April 23, 2015 - 15:43


Welcome to baseball season! I always do a baseball-themed article around this time, and I found my topic after stumbling on this article recently: How accurate are umpires when calling balls and strikes?

From what I understand, since 2008, home plate umpires have been electronically monitored every game and given immediate feedback on their accuracy—i.e., the number of actual balls they called as strikes, and vice versa.

Using the aggregated data from the 2008–2013 seasons, the author observed that wrong calls were made 15 percent of the time (average of both rates combined), which, according to him, “is just too high.” He provided a table of umpires whose inaccuracy rate was 15 percent or higher—38 umpires out of approximately 80 (Tsk, tsk... that’s close to half of them.)

He also listed the top 10 most accurate umpires. Oops—I mean the 10 umpires who happened to have the lowest rates of wrong calls.

I was curious and found the data source. This site was unbelievable—you can slice and dice the data any way you want. I obtained the 2014 data for all umpires. It was displayed as the two figures below using a common presentation—left-axis data as a bar graph and right-axis data as a line graph. Individual umpires are the horizontal axis:

By hovering my cursor over each “dot,” I obtained and entered the data on each umpire, converted the graphs above to p-charts, and added a third chart for the combined rates:


According to this chart, umpires 7, 9, 11, 12, 14, 28, 75, and 82 had above-average mistake rates, and umpires 35, 38, 52, 71, and 78 had below-average mistake rates.

According to this chart, umpires 5, 26, 55, and 85 had above-average mistake rates, and umpires 9, 28, 30, and 72 had below-average mistake rates. Note that no umpire was either good at both or poor at both.

I was curious about the overall wrong call rate, so I combined them:


In this chart, umpires 11, 12, 14, 33, 55, and 82 had above-average mistake rates (umpires 11, 12, and 14 appeared previously, but not in both), and umpires 15, 35, 52, 71, and 88 had below-average mistake rates (umpires 35, 52, and 71 appeared previously, but not in both).

Of course there are the 10 lowest rates, but there is truly only a “top five” in accuracy (umpires 15, 35, 52, 71, and 88).

How might these three p-charts change the current conversation?

The correlation is significant

One might wonder whether the two individual types of errors are related based on a theory that if someone were a “bad” umpire, he would have high rates of both and vice versa. What is the correlation between the two?

Correlation = –0.242 (p-value = 0.021, which is < 0.05: statistically significant... or is it?)

Let’s clear things up with a scatter plot, with a trend line of course:


Many people don’t realize that a trend line is an implicit regression and that any regression has at least three diagnostics. The data point at the lower right happens to be a whopping outlier, which invalidates the analysis. In fact, after eliminating that point and looking at the correlation of what remains:

Correlation = –0.135 (p-value = 0.207, which is > 0.05)

As Ellis Ott used to say, “First, you plot the data, then you plot the data, then you plot the data.”

Given a set of numbers...

• 10 percent will be the top 10 percent and a different 10 percent will be the bottom 10 percent.

• An arbitrary number ending in 0 or 5 percent will be the top (same number) percent, and a different (same number) percent will be the bottom (same number) percent.

• 10 people will be the top 10 and 10 different people will be the bottom 10.

Our ranking-obsessed society continues its quest to find the best and worst of everything. As I hope this has shown, there is no pre-set percentage of outliers—and there is also the possibility of no outliers!

I remember an illustration in one of Deming’s books where he took a figure similar to my p-charts and wrote on the chart about the performances between the common cause limits: “These cannot be ranked.” Based on the given data, they are indistinguishable from each other and from the overall average.

More data might shed further light—some umpires currently near either limit might now have a big enough denominator to indeed declare them above or below average.

There will also be the poor person whose performance, as in the umpire analysis above, could, for example, go from 15th best (No. 15) to 15th worst (No. 75)—through no fault of his own or change in his performance—provided the others maintained their current “process” as well.

What does this lack of basic knowledge about variation cost our society?

How would analyses like these change conversations to make subsequent actions productive rather than the status quo of increasing confusion, conflict, complexity, and chaos?

The article’s author felt that a 15-percent wrong call rate was too high. Well, it’s what the current system is perfectly designed to get. He may not like it and other people may not like it, but that’s what it is, and ranking to death won't solve a thing. And—horrors!note that half of the umpires were above average.

Until next time...

Discuss

About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.