If you like baseball pitching statistics, then you’ve loved the month of June. On the first of the month, Johan Santana pitched the first no-hitter in Mets history. Then a week later, the Seattle Mariners used six different pitchers to do the same thing, which tied the Major League Baseball record for most pitchers used in a no-hitter. And finally, five days after that, Giants pitcher Matt Cain threw the 22nd perfect game in Major League history.
It doesn’t take a Six Sigma Black Belt to realize it’s been a crazy month. But as a stat nerd, the question I have is, how crazy has June really been? What are the odds of throwing a perfect game and a no-hitter? (Don’t worry; it doesn’t take a Six Sigma Black Belt to figure that out, either.) But before we start, we have an important question to answer:
There have been 22 perfect games, with the first two both happening in 1880. But in 1880 pitchers threw underhand, it took eight balls to draw a walk, and a batter was not awarded first base if he was hit by a pitch. In other words, the odds of pitchers in 1880 throwing a perfect game were vastly different than today. To account for this, I’m going to start collecting data at 1900, since people seem to agree this is when the modern era of Major League Baseball began. Since 1900, there have been 20 perfect games and 235 no-hitters.
I went to baseball-reference.com and recorded the total number of games played, including any post-season games, for the last 113 seasons (I included all games played through June 13, 2012). I also recorded the league average for on-base percentage. If you want to follow along, you can get the data here.
Since 1900, there have been 181,921 Major League baseball games. But in each game, there are two pitchers. So to get the number of opportunities for a perfect game, we need to double that number. That means since 1900, there have been 363,842 opportunities for a perfect game. And only 20 of them have occurred. What are the odds?
Odds of throwing a perfect game = 20 / 363,842 = 0.000055 = approx. 1 in 18,192
Yeah, that’s pretty low. Giants fans who were in attendance at AT&T Park on the fateful night should consider themselves extremely lucky. What about Mets and Mariners fans? How lucky should they consider themselves?
Odds of throwing a no-hitter = 235 / 363,842 = 0.000646 = approx. 1 in 1,548
That’s still quite lucky, but not near as much as the perfect game. So, sorry Mets and Mariners fans, we’re going to focus on the perfect game from here on out. It’s just more interesting. Why? Well, you’ll see.
The odds above are just what we’ve observed in the last 113 years. But let’s stop for a minute and think about what we would expect. To pitch a perfect game, no runner can reach base. That means you have to get 27 batters out in a row. So the probability of throwing a perfect game is equal to the probability of getting 27 batters out in a row.
Remember when I said for each season I collected the league average for on-base percentage (OBP)? Well, OBP is the percent of the time that a batter reaches base (either by a hit, a walk, or getting hit by a pitch). That means the probability of getting a batter out is 1 minus the on-base percentage. I’ll have Minitab calculate the average OBP since 1900.

This is the average OBP for all of Major League Baseball during the last 113 years. So the probability of a pitcher getting a batter out is:
1 – 0.32856 = 0.67144 = 67.1 percent
This means that during the past 113 years, batters get out 67.1 percent of the time. Now, this number isn’t constant because it changes slightly depending on the batter and the pitcher. But I can’t break down every plate appearance since 1900, so we’re going to stick with this number. Now let’s calculate the odds.
Odds of throwing a perfect game = 0.67144^27 = 0.00002134 = approx. 1 in 46,800
Again, the true odds depend on the pitcher and the team he’s pitching against. Some games will have odds slightly better, and some will have odds slightly worse. But they should even out, making our odds of 1 in 46,800 a good estimate for the average game. So using a probability of 0.00002134, how many perfect games would we expect to see in 363,842 opportunities?
Expected number of perfect games = 0.00002134 * 363,842 = 7.8 perfect games
So there have been more than twice as many perfect games as we would expect! But of course, the 7.8 number is just the average. Certainly we could get other outcomes. After all, if you flip a coin 100 times, you’re not always going to get 50 heads. We can use a probability distribution plot to visualize the other possibilities. We use a binomial distribution with 363,842 trials and an event probability of 0.00002134.

We see that any number of perfect games between 4 and 11 wouldn’t be that uncommon. But wait, there have been 20 perfect games. I don’t see any gray bars even close to 20. In fact, by using Minitab’s cumulative distribution function, the probability that we would see at least 20 perfect games since 1900 is 1 in 5,780. That’s very uncommon.
It could be. But think of it this way: Imagine we take the 181,921 games played since 1900, and say they are just one sample. Then we take another sample of 181,921 games. And then another. And another, until we have 5,780 samples (it would take more than 650,000 years). In just one of those samples, we would expect to have at least 20 perfect games. So are we just “lucky” enough to have that sample be the very first one we took? I’m thinking not.
Then something has to be wrong with the expected value, right? I guess so, but I’m not sure what it is. And then I found some numbers that really made my head spin. Let’s take the fact that there have been 20 perfect games and work backward:
• 20 perfect games / 363,842 opportunities = A probability of 0.0000055 of getting 27 batters out in a row
• 0.000055^(1/27) = A probability of 69.5 percent of getting one batter out
• 1 – 0.695 = An average OBP of 0.305
In a league where there have been 20 perfect games in 363,842 opportunities, we would expect the average OBP of the league to be 0.305. Why did this make my head spin? Consider these stats:
• Batters who have faced Hall of Famer Nolan Ryan had an OBP of 0.307
• Batters who have faced Yankee Ace CC Sabathia have an average OBP of 0.306
• Batters who have faced the last 10 pitchers to throw a perfect game have an average OBP of 0.310
So in a league made up of nothing but clones of Nolan Ryan, CC Sabathia, and the last 10 pitchers to throw a perfect game (that includes Randy Johnson), you still wouldn’t have a league where the average batter gets out 69.5percent of the time. Mind = Blown.
Well, I can confidently say that they are low, at least 1 in 18,192 and no higher than 1 in 46,800. But for the life of me, I can’t figure out why these two numbers are so different. If anybody has any theories, I’d love to hear them. In the meantime, I’ll finish with some things that definitely have better odds of happening than a perfect game.
• Winning $400 on a Pirates or Phillies Pennsylvania Lottery scratch off ticket (1 in 12,000)
• Having a randomly picked clover be a four-leaf clover (1 in 10,000)
• Getting four of a kind in a five-card poker hand (1 in 4,164)
• Successfully navigating an asteroid field (1 in 3,720... at least according to C-3PO)
Comments
Perfect Game
This is an excellent article on one of several aspects of the game of baseball that every baseball fan has wondered about! You have proven what a remarkable feat and huge accomplishment a perfect game really is. It is the human factor that defies the odds, doubling what would be expected. Thank you for enlightening me !
What about the odds of hitting four home runs in one game? Why has no one hit five home runs in one game?
Examine your assumptions
When your statistics don't work out, it is often a good idea to examine your assumptions. So what assumptions did you make in order to do your analysis? I suspect that your findings regarding the on-base percentage for all batters against the the pitchers who pitched perfect games suggests there are factors not included in your model.
1. Independence--You assumed that the likely of striking a batter out was independent of whether or no a pitcher struck out an earlier batter. I don't know how ture this is, but I suspect that pitchers, like all of us, have good days and bad days. On a bad day a pitcher will probably strike out fewer batters than on a good day. The effect of this would be to flatten the curve, putting higher probabilities in the tails and less in the middle.
2. Psychology has no impact--Does a pitcher who enters the fifth inning with a perfect game going pitch differently than he would if he doesn't have that perfect game going? Does the coach make different calls? Do the opposing batters start responding differently? I suspect the answer is yes. Can we measure it? Probably not.
3. The occurrence of perfect games follows a Binomial distribution--Perfect games are rare events and often the occurrence of rare events is more appropriately modeled by a Poison distribution. The rarer the events, the more closely they fit the Poison distribution over the Binomial. What if we took our area of opportunity to be all of the games played in a year (not a constant so we would have to calculate the number of perfect games per some (large) number of games). You could do a u chart and first of all see if the rate of occurrence is stable or not. (I would be interested in seeing if there were clusters of occurrence that might indicate a special cause during certain periods of time.
4. Another way to analyze rare events might be to look at mean time between failures (or in this case--occurrences). How long has it been since the last perfect game? Is this gap unusually long or unusually short?
In any event, rare events are much more difficult to model than frequent events. For any one of a number of reason, not the least of which being that they are rare. It is very difficult to understand things that don'thappen very often and often they violate the assumptions made in performing ordinary statistics.
Examine your assumptions
When your statistics don't work out, it is often a good idea to examine your assumptions. So what assumptions did you make in order to do your analysis? I suspect that your findings regarding the on-base percentage for all batters against the the pitchers who pitched perfect games suggests there are factors not included in your model.
1. Independence--You assumed that the likely of striking a batter out was independent of whether or no a pitcher struck out an earlier batter. I don't know how ture this is, but I suspect that pitchers, like all of us, have good days and bad days. On a bad day a pitcher will probably strike out fewer batters than on a good day. The effect of this would be to flatten the curve, putting higher probabilities in the tails and less in the middle.
2. Psychology has no impact--Does a pitcher who enters the fifth inning with a perfect game going pitch differently than he would if he doesn't have that perfect game going? Does the coach make different calls? Do the opposing batters start responding differently? I suspect the answer is yes. Can we measure it? Probably not.
3. The occurrence of perfect games follows a Binomial distribution--Perfect games are rare events and often the occurrence of rare events is more appropriately modeled by a Poison distribution. The rarer the events, the more closely they fit the Poison distribution over the Binomial. What if we took our area of opportunity to be all of the games played in a year (not a constant so we would have to calculate the number of perfect games per some (large) number of games). You could do a u chart and first of all see if the rate of occurrence is stable or not. (I would be interested in seeing if there were clusters of occurrence that might indicate a special cause during certain periods of time.
4. Another way to analyze rare events might be to look at mean time between failures (or in this case--occurrences). How long has it been since the last perfect game? Is this gap unusually long or unusually short?
In any event, rare events are much more difficult to model than frequent events. For any one of a number of reason, not the least of which being that they are rare. It is very difficult to understand things that don'thappen very often and often they violate the assumptions made in performing ordinary statistics.
Perfect Games