This is a continuation of my last column, which I’ve written to honor my late dad who loved golf. As promised, let’s look at the Masters golf tournament final four-round scores for the 55 players who survived the cut. We’ll analyze and then give it a twist based on the ongoing enumerative vs. analytic conundrum.
ADVERTISEMENT |
Analyzing the four-round final scores with analysis of variance (ANOVA):
Source | DF | SS | MS | F | P |
Round | 3 | 51.382 | 17.127 | 2.83 | 0.040 |
Golfer | 54 | 420.800 | 7.793 | 1.29 | 0.116 (!) |
Error | 162 | 980.618 | 6.053 | ||
Total | 219 | 1452.800 |
S = 2.46032
It was interesting that “Round” showed significance (p < 0.05) and “Golfer” did not.
Hole placements on the greens are changed for every round, and some can significantly increase a hole’s difficulty. Sometimes the weather during a round can also be a factor. Curious, I did an analysis of means (ANOM) by round:
Given that the p-value was somewhat marginal, and that the ANOM criteria are more conservative, none of the rounds are flagged by ANOM (not that it matters).
Below is the ANOM for the Masters golf tournament final scores (range calculated by nonparametric box plot method: 271.5 to 299.5):
As you can see, Jordan Spieth, who was pretty much on fire the entire tournament, was a true champion, statistically different from the rest of the field.
Be careful about declaring ‘differences’
Applying the least significant difference criterion described in my last column for this sample—[1.96 x sqrt(2 x (4 x 6.053))]—scores that differ by 14 or less aren’t statistically different. However, because the p-value for “Golfer” (0.116) wasn’t significant, it probably should not be used, which is the recommendation of most statistical textbooks.
To get a more conservative figure, I used the “Studentized” range criterion from my trusty old Statistical Methods, by ) for multiple comparisons. Unfortunately, the table went to only 20 comparisons maximum. But even using that, one would declare a difference only if two final scores differed by more than 25, which, as you may note, encompasses the score range of golfers No. 2 (274) through No. 55 (297) and is in line with the p-value of 0.116, hinting at no differences.
Now, suppose you’re at a meeting with the purpose of ranking 55 employees from best to worst. You hand out a similar table and tell people that a difference of 14 “might” show a difference but, more conservatively, only a difference greater than 25 should be considered different. Can you envision the chaos resulting from the variation in how people would perceive and interpret the variation? I can easily imagine “little circles” being drawn and discussion about “above average” performers, “below average” performers, and “quartiles,” along with puzzlement as to who might be different from the person with the best score.
Can you see how the elegant simplicity of the ANOM would frame a different, perhaps more productive, conversation?
Applying the ANOM to this scenario of employee ranking, there is one “superstar” and 54 “average” employees: no one above or below average, no top-quartile, no second quartile, no third quartile... and no bottom quartile.
One could almost consider the results of golfers No. 2 through No. 55 a lottery. Using the resulting variation of this tournament as an example, the four-round scores of a golfer for two consecutive tournaments can differ, due to common cause, by as much as 18. Let’s apply this to golfer No. 2 (274). The resulting score of 292 would now place him 50th in the current pack due just to common cause, which in this case means a difference between a prize of $880,000 and $25,000! How do you think the golf world would treat this difference: as common or special cause?
(I hope you’ve concluded that it should be common cause.)
Uncomfortable analogy?
For those of you in healthcare who are slaves to the current patient-satisfaction survey nonsense, how are your survey-to-survey changes in rankings and percentiles treated? What light could ANOM shed?
How does this compare with the U.S. Open round three?
If these were all the data you had, what would this analysis allow you to predict about these golfers at the next major tournament, which happened to be the U.S. Open on June 20, 2015?
Recall that there are three types of statistics. For this example:
Descriptive statistics: What can I say about this specific golfers’ score?
Enumerative statistics: What can I say about this specific group of golfers’ scores?
This was an enumerative analysis. All of these analyses and conclusions have been based on this specific data set, and action was taken on this specific group.
Some of you might ask if this is a random sample. It is, of sorts: 55 elite golfers of varying ability participating in the 2015 Masters. But is it truly random? It’s hardly sampling with replacement.
How does it compare with the “sample” from the U.S. Open? Of these 55 golfers, 48 participated and 16 missed the cut (in line with my last column’s ANOM result of ~76% rate of making cut). So, from that group, 32 remained and 43 different golfers who made the cut participated. Care to predict?
For the 32 golfers who made the cut for both tournaments, let’s compare the results. Prepare to be surprised:
Masters |
Golfer |
U.S. Open |
Rank difference |
Open score minus Masters score |
1 |
Jordan Spieth |
1 |
0 |
0 |
2 |
Phil Mickelson |
64 |
-62 |
19 |
2 |
Justin Rose |
27 |
-25 |
11 |
4 |
Rory McIlroy |
9 |
-5 |
4 |
5 |
Hideki Matsuyama |
18 |
-13 |
6 |
|
|
|
|
|
6 |
Paul Casey |
39 |
-33 |
7 |
6 |
Dustin Johnson |
2 |
4 |
-3 |
6 |
Ian Poulter |
54 |
-48 |
12 |
9 |
Zach Johnson |
72 |
-63* |
15 |
12 |
Kevin Na |
46 |
-34 |
6 |
|
|
|
|
|
17 |
Sergio Garcia |
18 |
-1 |
0 |
19 |
Louis Oosthuizen |
2 |
17 |
-8 |
19 |
Henrik Stenson |
27 |
-8 |
1 |
22 |
Keegan Bradley |
27 |
-5 |
-1 |
22 |
Angel Cabrera |
64 |
-42 |
7 |
|
|
|
|
|
22 |
Ernie Els |
54 |
-32 |
5 |
22 |
Patrick Reed |
14 |
8 |
4 |
28 |
Jason Day |
9 |
19 |
-7 |
28 |
Morgan Hoffman |
27 |
1 |
-2 |
28 |
Webb Simpson |
46 |
-18 |
1 |
|
|
|
|
|
33 |
Chris Kirk |
75 |
-42 |
13 |
33 |
Brooks Koepka |
18 |
15 |
-5 |
33 |
Ryan Palmer |
52 |
-19 |
2 |
38 |
Charl Schwartzel |
7 |
31 |
-11 |
38 |
Adam Scott |
4 |
34 |
-12 |
|
|
|
|
|
38 |
John Senden |
14 |
24 |
-7 |
38 |
Cameron Tringale |
54 |
-16 |
2 |
38 |
Jimmy Walker |
58 |
-20 |
3 |
46 |
Matt Kuchar |
12 |
34** |
-9 |
46 |
Lee Westwood |
50 |
-4 |
-1 |
|
|
|
|
|
48 |
Geoff Ogilvy |
18 |
30 |
-8 |
49 |
Jason Dufner |
18 |
31 |
-9 |
* largest drop
** largest gain
For this group of 32, was there a significant difference between the U.S. Open scores and the Masters scores?
t-test of (U.S. Open score – Masters score):
Variable N Mean StDev T p
Diff 32 0.843750 7.903080 0.60 0.550 (No)
Since there’s no significance, let’s treat them as two “replicates” and take the range for each golfer:
Average range: 6.281 R max = 3.268 × 6.281 ~ 20
Median range: 6.0 R max = 3.865 × 6.0 ~ 23
Using the ANOVA standard deviation of 2.8: R max ~ 21
(Maximum observed difference between two scores was 19—Phil Mickelson.)
If you read my last column, do you remember when I calculated that two tournament scores could differ by as much as 18? And, by the way, the standard deviation for an individual round obtained from the ANOVA for all the golfers playing four rounds was 2.80, compared to 2.46 for the Masters. For those of you who watched, this was a very frustrating course, but the Masters course is no picnic, either.
For those of you who are interested, here is the ANOM for the U.S. Open final scores:
Note that, unlike his unbelievable Masters performance, Jordan Spieth was not a “true” champion in this case. For those of you who watched the exciting conclusion (down to the last putt!), the last three or four holes saw three people interchanging the leader position seemingly at random. Everyone’s putting was (common cause) erratic, and Luis Oosthuizen came out of nowhere to almost win via the back door! No one was truly in “the zone.”
Bottom line: What can you predict from these two analyses?
Not much.
So, we now have two individual enumerative analyses. Suppose I did similar calculations of “significant differences” for the U.S. Open and passed those out with these data in addition to the Masters analysis and data, with the goal of ranking these golfers? You’d be there until Christmas.
Do you think this might apply to some of your organizational data? As Deming said, “Management is prediction.” How do you see through the heavy fog of common cause to make good management decisions?
Are you as amazed as I at the amount of common cause? And why, on any given day, any professional golfer is probably capable of winning any tournament or just as easily missing the cut? But in this golf example, despite the amount of common cause, one could ask, “Are some players more consistently at the top of the leader board regardless of the tournament and competition?” Looking at data over time begs the analytic statistics question: “What can I say about the process that produced both of these tournament results, both of these groups of golfers, and the individual golfers’ results?”
I can just picture my dad if I tried to explain this all to him. As if it were yesterday, I see that wrinkled brow and twinkle in his eye as he says, “Where did you come from?”
I miss you, Dad.
Add new comment