Our PROMISE: Our ads will never cover up content.
Our children thank you.
Davis Balestracci
Published: Monday, April 11, 2016 - 17:35 In honor of baseball season, I’m going to apply some simple statistical thinking to my favorite sport in a two-part series today and tomorrow. I want anyone to be able to enjoy this, so I’ll mark any technical statistics as optional reading. For those of you interested only in the interpretations, I'll offer the “bottom line” conclusions, many of which I think will surprise you. Maybe you can’t do the math and don’t even want to, but you should at least realize the importance of understanding these types of analyses and have access to someone who can do them. In many similar daily situations you encounter, anything else would be data insanity. With baseball’s innate tendency to explain common cause as special, opportunistic data torturing abounds. For example, “Could the Red Sox bullpen make a leap of an improvement?” a Nov. 6, 2015, article from The Boston Globe about my favorite team, the Boston Red Sox, had just one too many red flags. I smiled as I read it and couldn’t resist the urge to dig deeper to find the data to test the sportswriter’s statements. There’s a wonderful site with just about any baseball statistic you could want if you search long enough. I went to its section on bullpens, which had three pages’ worth of stats, figuring that if they went through the trouble to compile them, they must somehow contribute to bullpen performance. Here’s a quote from the Globe article: “... After all, the Sox bullpen was by many measures the worst in the majors last year, with the absence of Koji Uehara and late struggles of Junichi Tazawa leaving the team bereft of the sort of strikeout-per-inning arms that have become a staple of the game.” I took my best guess and used nine of the compiled stats: I got the 30-team rankings for each measure—1 to 30, best to worst—and summed them for an analysis. Any score between 9 to 270 (lower is better) is possible, with the calculated average being (9 × 15.5) = 139.5. But one needs to determine how much common cause there is around that average. Optional technicalities: I did a nonparametric analysis of variance (ANOVA) on the individual rankings, from which I obtained the variation to perform an analysis of means (ANOM) on the sum of the nine ranks for each team. Here is the result: Figure 1: Analysis of means (ANOM) for sum of nine rankings Very important for everyone: This ANOM type of analysis to expose special causes is woefully underutilized in most improvement work. Everyone needs to have a basic grasp of it. W. Edwards Deming often used this technique, invented by Ellis Ott, and was emphatic that any points between the two common cause limits (in this case, 71.7 and 207.3) could not be ranked. This is a concept that initially is difficult to wrap one’s arms around. The 25 teams (or 26, depending on how one interprets Baltimore, which is team No. 3 on the horizontal axis in figure 1) between those two limits are indistinguishable from each other—and the overall average. Some might think that Boston (team No. 4) is below average because its rank sum score (197) is greater than the average of 139.5. However, to be truly below average requires a score greater than 207. Based on this snapshot of data, Boston is not a special cause. Bottom line As Deming would say, these 30 teams form a “system,” and individual teams are either inside the system (common cause) or outside the system (special cause in either direction). And then there’s the other half of the quote from the Globe article trying to explain an alleged difference in strikeout rates during the absences of Uehara and Tazawa. Optional technicalities: Figures two and three below are a p-chart ANOM (p = proportion/percentage) of bullpen performances. Figure 2 compares the Boston bullpen’s 2014 and 2015 strikeout rates (strikeouts/total outs). I put in even less conservative criteria (5% and 1% significance limits) as well as the standard “3.” Figure 2: P-chart ANOM comparison of Red Sox strikeout rates Bottom line The graph in figure 3 compares Major League Baseball's (MLB) bullpens’ total strikeout rates for 2014 and 2015 by combining the rates of all 30 bullpens (using the same criteria as figure 2): Figure 3: P-chart ANOM of Major League Baseball (MLB) strikeout rates Bottom line I wouldn’t be a bit surprised if someone has said, “The Red Sox followed the overall trend of the major league bullpen strikeout rate for the 2015 season by being down slightly from 2014.” Sorry, just not true. Using the same p-chart ANOM technique, how does Boston compare to the other 29 bullpens in terms of its individual strikeout rate for 2014 and 2015 (figures 4 and 5)? Figure 4: P-chart ANOM of 2014 strikeout rates Figure 5: P-chart ANOM of 2015 strikeout rates Bottom line Here’s another quote from the Globe article: “Red Sox relievers finished with a 4.24 ERA last year. The league average bullpen had a 3.71 mark. What are the odds of bridging that divide of 0.53 earned runs per nine innings? Excellent, actually.” I needed an estimate of the standard deviation to be able to perform an ANOM on the 2015 individual team bullpen ERAs. Optional technicalities: I analyzed ERAs using the variation from the combined 2014/2015 data. An initial ANOVA showed no difference either by year or league. It also identified five outliers in terms of the difference between 2014 and 2015. Bottom line Boston was 3.33 in 2014 and 4.21 in 2015, for a difference of +0.91—common cause. Optional technicalities Bottom line This standard deviation of 0.29 was then used for the ANOM comparing the 2015 ERAs of the 30 teams (figure 6): Figure 6: ANOM comparing 2015 ERAs of bullpens Actually, the Globe sportswriter came to the right conclusion in terms of the odds of “bridging that divide of 0.53” being excellent—but for the wrong reasons. Based on the 2014–2015 data analysis and its calculated common cause of 0.29: He then states, “The hallmark of bullpens is their inconsistency.” Once again, true, but for the wrong reasons. I hope I’ve shown that variation can be routinely “consistently inconsistent” within a predictable, but humanly unacceptable, range. Rather than accept this, the sports writer then went on a fishing expedition into bullpen stats to “explain” his position further. More about that tomorrow in part two, as well as a surprising conclusion from yet another, totally different ANOM method analyzing the 2015 ERAs. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.Data Torturing in the Baseball World, Part 1
Explaining anything as special cause
‘Worst in the majors’
1. Earned-run average (ERA—lower is better)
2. Blown save percentage (lower is better)
3. Walks per nine innings (minus intentional walks—lower is better)
4. Strikeouts per nine innings (higher is better)
5. Batting average against (lower is better)
6. Something called OPS (sum of on-base percentage and slugging percentage—lower is better)
7. Home runs given up per nine innings (lower is better)
8. Steal success rate (lower is better)
9. Pitches per plate appearance (lower is better)
• “Below average” bullpens: Atlanta (No. 2), Colorado (No. 9), Detroit (No. 10)
• “Above average” bullpens: Pittsburgh (No. 22) and maybe Baltimore (No. 3)About those strikeout rates—p-chart ANOM
No difference in Boston bullpen’s 2014 and 2015 strikeout rates. The data lie between even the narrowest decision limits.
Similarly, the 2014 and 2015 data both lie within the narrowest limits—no year-to-year difference.
• Boston (team No. 4) is between the limits both years, so it was average for both seasons—no difference.
• I didn’t realize how strikeout-dominant the New York Yankees (team No. 19) were in both 2014 and 2015.
• The L.A. Dodgers (team No. 14) were also truly above average for 2015.
• Detroit (team No. 10) and Minnesota (team No. 17) were truly below average in 2015 (Minnesota in 2014, as well).‘What are the odds?’
Any difference between 2014 and 2015 that is greater than ~1.1 is considered significant. This occurred for five bullpens:
2014
2015
Diff
Atlanta
3.31
4.69
+1.38
Houston
4.80
3.27
-1.53
Oakland
2.91
4.63
+1.72
San Diego
2.73
4.02
+1.29
Seattle
2.59
4.15
+1.56
• Because the initial ANOVA showed no difference by either year or league, I also did a simpler, SPC-type analysis using only the individual year-to-year ranges of each team (i.e., the absolute value of the individual teams’ 2014–2015 difference).
• Using both the median and average of these ranges to detect outliers until they pretty much concurred, the conclusions matched those of the more formal ANOVA, both in terms of outlier criteria (difference > 1.1) and approximate standard deviation (~0.29).
• I also used a nonparametric box plot analysis of the actual individual team year-to-year differences (not absolute value), and it determined that an outlying range was greater than ~1.1 – 1.2 .
Three different techniques concluded that a difference greater than ~1.1 was significant, and a good-enough estimate of the standard deviation is 0.29.
• Special cause “high” ERAs: Atlanta (team No. 2), Colorado (team No. 9), Oakland (team No. 20)
• Special cause “low” ERAs: Kansas City (team No. 12), Pittsburgh (team No. 22), St. Louis (team No. 26)
• Boston (4.24): Above the average of 3.71, but not a special cause. The team was no different from the 24 other teams between the limits (or 3.71, for that matter).Bottom line from both analyses
• There was no divide. Boston’s 4.23 was statistically indistinguishable from 3.71.
• Boston could easily go from 4.23 to as low as 3.13 (a difference of 1.1) just due to common cause, which wouldn’t necessarily indicate improvement.
• But he’s right that the odds are indeed excellent... just due to chance.
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Davis Balestracci
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Comments
Thanks for your insights
Mr. Balestracci,
As a "stat geek", I have baseball to thank for my interest in statisical analysis. I also subscribed to SABR for a time and still keep up with them. I dislike listening to baseball broadcasters, former MLB players and managers, and so-called analysts who spout useless trivia to "bolster" their cases but who don't know the meaning of "statistical significance", as in your example.
In fact, I'd love to see you present your insights on the "MLB Network". Think about it.
Best wishes.