Featured Product
This Week in Quality Digest Live
Management Features
Constance Noonan Hadley
The time has come to check whether the benefits of teamwork still outweigh the costs
Naresh Pandit
Enter the custom recovery plan
Anton Ovchinnikov
In competitive environments, operational innovation could well be the answer to inventory risk
Julie Winkle Giulioni
The old playbook probably won't work
Sarah Schiffling
But supply chains will get worse before they get better

More Features

Management News
Program inspires leaders to consider systems perspective for continuous improvement and innovation
Recent research finds organizations unprepared to manage more complex workforce
Attendees will learn how three top manufacturing companies use quality data to predict and prevent problems, improve efficiency, and reduce costs
More than 40% of directors surveyed cite the ability of companies to execute as one of the biggest threats to improving ESG performance
MIT Sloan study shows that target-independent compensation systems can be superior
Steps that will help you improve and enhance your employee recruitment, retention, and engagement
300 Talent acquisition leaders and HR executives from companies gather in Kansas City
FedEx demonstrates commitment to customer-focused continuous improvement

More News

Davis Balestracci

Management

More Golf, Statistically

True champion or lottery winner?

Published: Monday, July 13, 2015 - 16:05


This is a continuation of my last column, which I’ve written to honor my late dad who loved golf. As promised, let’s look at the Masters golf tournament final four-round scores for the 55 players who survived the cut. We’ll analyze and then give it a twist based on the ongoing enumerative vs. analytic conundrum.

Analyzing the four-round final scores with analysis of variance (ANOVA):     

Source DF SS MS F P
Round 3 51.382 17.127 2.83 0.040
Golfer 54  420.800 7.793 1.29 0.116 (!)
Error 162 980.618 6.053    
Total   219  1452.800      

S = 2.46032

It was interesting that “Round” showed significance (p < 0.05) and “Golfer” did not.

Hole placements on the greens are changed for every round, and some can significantly increase a hole’s difficulty. Sometimes the weather during a round can also be a factor. Curious, I did an analysis of means (ANOM) by round:

Image

Given that the p-value was somewhat marginal, and that the ANOM criteria are more conservative, none of the rounds are flagged by ANOM (not that it matters).

Below is the ANOM for the Masters golf tournament final scores (range calculated by nonparametric box plot method: 271.5 to 299.5):

Image

As you can see, Jordan Spieth, who was pretty much on fire the entire tournament, was a true champion, statistically different from the rest of the field.

Be careful about declaring ‘differences’

Applying the least significant difference criterion described in my last column for this sample—[1.96 x sqrt(2 x (4 x 6.053))]—scores that differ by 14 or less aren’t statistically different. However, because the p-value for “Golfer” (0.116) wasn’t significant, it probably should not be used, which is the recommendation of most statistical textbooks.

To get a more conservative figure, I used the “Studentized” range criterion from my trusty old Statistical Methods, by George Snedecor and William Cochran (first published in 1937) for multiple comparisons. Unfortunately, the table went to only 20 comparisons maximum. But even using that, one would declare a difference only if two final scores differed by more than 25, which, as you may note, encompasses the score range of golfers No. 2 (274) through No. 55 (297) and is in line with the p-value of 0.116, hinting at no differences.

Now, suppose you’re at a meeting with the purpose of ranking 55 employees from best to worst. You hand out a similar table and tell people that a difference of 14 “might” show a difference but, more conservatively, only a difference greater than 25 should be considered different. Can you envision the chaos resulting from the variation in how people would perceive and interpret the variation? I can easily imagine “little circles” being drawn and discussion about “above average” performers, “below average” performers, and “quartiles,” along with puzzlement as to who might be different from the person with the best score.

Can you see how the elegant simplicity of the ANOM would frame a different, perhaps more productive, conversation?

Applying the ANOM to this scenario of employee ranking, there is one “superstar” and 54 “average” employees: no one above or below average, no top-quartile, no second quartile, no third quartile... and no bottom quartile.

One could almost consider the results of golfers  No. 2 through No. 55 a lottery. Using the resulting variation of this tournament as an example, the four-round scores of a golfer for two consecutive tournaments can differ, due to common cause, by as much as 18. Let’s apply this to golfer No. 2 (274). The resulting score of 292 would now place him 50th in the current pack due just to common cause, which in this case means a difference between a prize of $880,000 and $25,000! How do you think the golf world would treat this difference: as common or special cause?

(I hope you’ve concluded that it should be common cause.)

Uncomfortable analogy? 

For those of you in healthcare who are slaves to the current patient-satisfaction survey nonsense, how are your survey-to-survey changes in rankings and percentiles treated? What light could ANOM shed?

How does this compare with the U.S. Open round three?

If these were all the data you had, what would this analysis allow you to predict about these golfers at the next major tournament, which happened to be the U.S. Open on June 20, 2015?

Recall that there are three types of statistics. For this example:

Descriptive statistics: What can I say about this specific golfers’ score?

Enumerative statistics: What can I say about this specific group of golfers’ scores?

This was an enumerative analysis.  All of these analyses and conclusions have been based on this specific data set, and action was taken on this specific group.

Some of you might ask if this is a random sample. It is, of sorts: 55 elite golfers of varying ability participating in the 2015 Masters. But is it truly random? It’s hardly sampling with replacement.

How does it compare with the “sample” from the U.S. Open? Of these 55 golfers, 48 participated and 16 missed the cut (in line with my last column’s ANOM result of ~76% rate of making cut). So, from that group, 32 remained and 43 different golfers who made the cut participated. Care to predict?

For the 32 golfers who made the cut for both tournaments, let’s compare the results. Prepare to be surprised:

Masters
finish

Golfer

U.S. Open
finish

Rank difference

Open score minus Masters score

1

Jordan Spieth

1

0

0

2

Phil Mickelson

64

-62

19

2

Justin Rose

27

-25

11

4

Rory McIlroy

9

-5

4

5

Hideki Matsuyama

18

-13

6

 

 

 

 

 

6

Paul Casey

39

-33

7

6

Dustin Johnson

2

4

-3

6

Ian Poulter

54

-48

12

9

Zach Johnson

72

-63*

15

12

Kevin Na

46

-34

6

 

 

 

 

 

17

Sergio Garcia

18

-1

0

19

Louis Oosthuizen

2

17

-8

19

Henrik Stenson

27

-8

1

22

Keegan Bradley

27

-5

-1

22

Angel Cabrera

64

-42

7

 

 

 

 

 

22

Ernie Els

54

-32

5

22

Patrick Reed

14

8

4

28

Jason Day

9

19

-7

28

Morgan Hoffman

27

1

-2

28

Webb Simpson

46

-18

1

 

 

 

 

 

33

Chris Kirk

75

-42

13

33

Brooks Koepka

18

15

-5

33

Ryan Palmer

52

-19

2

38

Charl Schwartzel

7

31

-11

38

Adam Scott

4

34

-12

 

 

 

 

 

38

John Senden

14

24

-7

38

Cameron Tringale

54

-16

2

38

Jimmy Walker

58

-20

3

46

Matt Kuchar

12

34**

-9

46

Lee Westwood

50

-4

-1

 

 

 

 

 

48

Geoff Ogilvy

18

30

-8

49

Jason Dufner

18

31

-9

*  largest drop
** largest gain

For this group of 32, was there a significant difference between the U.S. Open scores and the Masters scores?

t-test of (U.S. Open score – Masters score):

Variable           N      Mean            StDev        T        p
Diff                  32     0.843750  7.903080  0.60  0.550 (No)

Since there’s no significance, let’s treat them as two “replicates” and take the range for each golfer:

Average range:  6.281    R max  =  3.268 × 6.281 ~ 20 

Median range:    6.0        R max  =  3.865 × 6.0 ~ 23

Using the ANOVA standard deviation of 2.8:  R max ~ 21

(Maximum observed difference between two scores was 19—Phil Mickelson.)

If you read my last column, do you remember when I calculated that two tournament scores could differ by as much as 18? And, by the way, the standard deviation for an individual round obtained from the ANOVA for all the golfers playing four rounds was 2.80, compared to 2.46 for the Masters. For those of you who watched, this was a very frustrating course, but the Masters course is no picnic, either.

For those of you who are interested, here is the ANOM for the U.S. Open final scores:

Image

Note that, unlike his unbelievable Masters performance, Jordan Spieth was not a “true” champion in this case. For those of you who watched the exciting conclusion (down to the last putt!), the last three or four holes saw three people interchanging the leader position seemingly at random. Everyone’s putting was (common cause) erratic, and Luis Oosthuizen came out of nowhere to almost win via the back door! No one was truly in “the zone.”

Bottom line: What can you predict from these two analyses? 

Not much.

So, we now have two individual enumerative analyses. Suppose I did similar calculations of “significant differences” for the U.S. Open and passed those out with these data in addition to the Masters analysis and data, with the goal of ranking these golfers? You’d be there until Christmas.

Do you think this might apply to some of your organizational data? As Deming said, “Management is prediction.” How do you see through the heavy fog of common cause to make good management decisions?

Are you as amazed as I at the amount of common cause? And why, on any given day, any professional golfer is probably capable of winning any tournament or just as easily missing the cut? But in this golf example, despite the amount of common cause, one could ask, “Are some players more consistently at the top of the leader board regardless of the tournament and competition?” Looking at data over time begs the analytic statistics question: “What can I say about the process that produced both of these tournament results, both of these groups of golfers, and the individual golfers’ results?”

I can just picture my dad if I tried to explain this all to him. As if it were yesterday, I see that wrinkled brow and twinkle in his eye as he says, “Where did you come from?”

I miss you, Dad.


Discuss

About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.