© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.
Published: 04/11/2016
In honor of baseball season, I’m going to apply some simple statistical thinking to my favorite sport in a two-part series today and tomorrow. I want anyone to be able to enjoy this, so I’ll mark any technical statistics as optional reading. For those of you interested only in the interpretations, I'll offer the “bottom line” conclusions, many of which I think will surprise you.
Maybe you can’t do the math and don’t even want to, but you should at least realize the importance of understanding these types of analyses and have access to someone who can do them. In many similar daily situations you encounter, anything else would be data insanity.
With baseball’s innate tendency to explain common cause as special, opportunistic data torturing abounds. For example, “Could the Red Sox bullpen make a leap of an improvement?” a Nov. 6, 2015, article from The Boston Globe about my favorite team, the Boston Red Sox, had just one too many red flags. I smiled as I read it and couldn’t resist the urge to dig deeper to find the data to test the sportswriter’s statements.
There’s a wonderful site with just about any baseball statistic you could want if you search long enough. I went to its section on bullpens, which had three pages’ worth of stats, figuring that if they went through the trouble to compile them, they must somehow contribute to bullpen performance.
Here’s a quote from the Globe article: “... After all, the Sox bullpen was by many measures the worst in the majors last year, with the absence of Koji Uehara and late struggles of Junichi Tazawa leaving the team bereft of the sort of strikeout-per-inning arms that have become a staple of the game.”
I took my best guess and used nine of the compiled stats:
1. Earned-run average (ERA—lower is better)
2. Blown save percentage (lower is better)
3. Walks per nine innings (minus intentional walks—lower is better)
4. Strikeouts per nine innings (higher is better)
5. Batting average against (lower is better)
6. Something called OPS (sum of on-base percentage and slugging percentage—lower is better)
7. Home runs given up per nine innings (lower is better)
8. Steal success rate (lower is better)
9. Pitches per plate appearance (lower is better)
I got the 30-team rankings for each measure—1 to 30, best to worst—and summed them for an analysis. Any score between 9 to 270 (lower is better) is possible, with the calculated average being (9 × 15.5) = 139.5. But one needs to determine how much common cause there is around that average.
Optional technicalities: I did a nonparametric analysis of variance (ANOVA) on the individual rankings, from which I obtained the variation to perform an analysis of means (ANOM) on the sum of the nine ranks for each team. Here is the result:
Figure 1:
Very important for everyone: This ANOM type of analysis to expose special causes is woefully underutilized in most improvement work. Everyone needs to have a basic grasp of it. W. Edwards Deming often used this technique, invented by Ellis Ott, and was emphatic that any points between the two common cause limits (in this case, 71.7 and 207.3) could not be ranked. This is a concept that initially is difficult to wrap one’s arms around. The 25 teams (or 26, depending on how one interprets Baltimore, which is team No. 3 on the horizontal axis in figure 1) between those two limits are indistinguishable from each other—and the overall average.
Some might think that Boston (team No. 4) is below average because its rank sum score (197) is greater than the average of 139.5. However, to be truly below average requires a score greater than 207. Based on this snapshot of data, Boston is not a special cause.
Bottom line
• “Below average” bullpens: Atlanta (No. 2), Colorado (No. 9), Detroit (No. 10)
• “Above average” bullpens: Pittsburgh (No. 22) and maybe Baltimore (No. 3)
As Deming would say, these 30 teams form a “system,” and individual teams are either inside the system (common cause) or outside the system (special cause in either direction).
And then there’s the other half of the quote from the Globe article trying to explain an alleged difference in strikeout rates during the absences of Uehara and Tazawa.
Optional technicalities: Figures two and three below are a p-chart ANOM (p = proportion/percentage) of bullpen performances. Figure 2 compares the Boston bullpen’s 2014 and 2015 strikeout rates (strikeouts/total outs). I put in even less conservative criteria (5% and 1% significance limits) as well as the standard “3.”
Figure 2:
Bottom line
No difference in Boston bullpen’s 2014 and 2015 strikeout rates. The data lie between even the narrowest decision limits.
The graph in figure 3 compares Major League Baseball's (MLB) bullpens’ total strikeout rates for 2014 and 2015 by combining the rates of all 30 bullpens (using the same criteria as figure 2):
Figure 3:
Bottom line
Similarly, the 2014 and 2015 data both lie within the narrowest limits—no year-to-year difference.
I wouldn’t be a bit surprised if someone has said, “The Red Sox followed the overall trend of the major league bullpen strikeout rate for the 2015 season by being down slightly from 2014.” Sorry, just not true.
Using the same p-chart ANOM technique, how does Boston compare to the other 29 bullpens in terms of its individual strikeout rate for 2014 and 2015 (figures 4 and 5)?
Figure 4:
Figure 5:
Bottom line
• Boston (team No. 4) is between the limits both years, so it was average for both seasons—no difference.
• I didn’t realize how strikeout-dominant the New York Yankees (team No. 19) were in both 2014 and 2015.
• The L.A. Dodgers (team No. 14) were also truly above average for 2015.
• Detroit (team No. 10) and Minnesota (team No. 17) were truly below average in 2015 (Minnesota in 2014, as well).
Here’s another quote from the Globe article: “Red Sox relievers finished with a 4.24 ERA last year. The league average bullpen had a 3.71 mark. What are the odds of bridging that divide of 0.53 earned runs per nine innings? Excellent, actually.”
I needed an estimate of the standard deviation to be able to perform an ANOM on the 2015 individual team bullpen ERAs.
Optional technicalities: I analyzed ERAs using the variation from the combined 2014/2015 data. An initial ANOVA showed no difference either by year or league. It also identified five outliers in terms of the difference between 2014 and 2015.
Bottom line
Any difference between 2014 and 2015 that is greater than ~1.1 is considered significant. This occurred for five bullpens:
2014 | 2015 | Diff | |
Atlanta | 3.31 | 4.69 | +1.38 |
Houston | 4.80 | 3.27 | -1.53 |
Oakland | 2.91 | 4.63 | +1.72 |
San Diego | 2.73 | 4.02 | +1.29 |
Seattle | 2.59 | 4.15 | +1.56 |
Boston was 3.33 in 2014 and 4.21 in 2015, for a difference of +0.91—common cause.
Optional technicalities
• Because the initial ANOVA showed no difference by either year or league, I also did a simpler, SPC-type analysis using only the individual year-to-year ranges of each team (i.e., the absolute value of the individual teams’ 2014–2015 difference).
• Using both the median and average of these ranges to detect outliers until they pretty much concurred, the conclusions matched those of the more formal ANOVA, both in terms of outlier criteria (difference > 1.1) and approximate standard deviation (~0.29).
• I also used a nonparametric box plot analysis of the actual individual team year-to-year differences (not absolute value), and it determined that an outlying range was greater than ~1.1 – 1.2 .
Bottom line
Three different techniques concluded that a difference greater than ~1.1 was significant, and a good-enough estimate of the standard deviation is 0.29.
This standard deviation of 0.29 was then used for the ANOM comparing the 2015 ERAs of the 30 teams (figure 6):
Figure 6:
• Special cause “high” ERAs: Atlanta (team No. 2), Colorado (team No. 9), Oakland (team No. 20)
• Special cause “low” ERAs: Kansas City (team No. 12), Pittsburgh (team No. 22), St. Louis (team No. 26)
• Boston (4.24): Above the average of 3.71, but not a special cause. The team was no different from the 24 other teams between the limits (or 3.71, for that matter).
Actually, the Globe sportswriter came to the right conclusion in terms of the odds of “bridging that divide of 0.53” being excellent—but for the wrong reasons. Based on the 2014–2015 data analysis and its calculated common cause of 0.29:
• There was no divide. Boston’s 4.23 was statistically indistinguishable from 3.71.
• Boston could easily go from 4.23 to as low as 3.13 (a difference of 1.1) just due to common cause, which wouldn’t necessarily indicate improvement.
• But he’s right that the odds are indeed excellent... just due to chance.
He then states, “The hallmark of bullpens is their inconsistency.” Once again, true, but for the wrong reasons. I hope I’ve shown that variation can be routinely “consistently inconsistent” within a predictable, but humanly unacceptable, range. Rather than accept this, the sports writer then went on a fishing expedition into bullpen stats to “explain” his position further. More about that tomorrow in part two, as well as a surprising conclusion from yet another, totally different ANOM method analyzing the 2015 ERAs.
Links:
[1] http://www.qualitydigest.com/inside/statistics-column/121415-can-you-prove-anything-statistics.html
[2] https://www.bostonglobe.com/sports/2015/11/06/soxpen/ffYuMtddT0OKB4UYUiQ6wI/story.html
[3] http://espn.go.com/mlb/stats/team/_/stat/pitching/split/128
[4] http://archive.aweber.com/davis_book/hTAP/h/From_Davis_Balestracci_.htm
[5] http://www.qualitydigest.com/may07/articles/05_article.shtml
[6] http://www.qualitydigest.com/inside/quality-insider-column/statistical-stratification-part-2.html
[7] http://archive.aweber.com/davis_book/g1L1/h/From_Davis_Balestracci_Why.htm