I’ve done some research on the relationship between various measures of golfer performance and overall performance (strokes per round; prize money), but have found it difficult to know exactly where to go with it. Here’s how I have approached it.
Overall performance, measured as strokes per round, has three components:
1. Shots off the tee. There is essentially no variation in this measure of performance. Everyone has essentially one tee shot per hole, or 18 per round, and the standard deviation is almost zero.
2. Putts. There is some variation here, but less than one would expect. In 2007, according to data at PGATour.com, the average number of putts per round (averaged across golfers, so this is an average of averages) was 29.30, with a standard deviation on 0.52. The coefficient of variation was 1.77%.
3. All other shots. There’s a very little more variation here; again, using 2007 data, the average was 23.98 “other” shots per round, with a standard deviation of 0.63, and a coefficient of variation of 2.62%.
Overall, golfers in 2007 averaged 71.28 strokes per round, with a standard deviation of 0.59 strokes per round, for a coefficient of variation of only 0.83%. Overall performance was, then, less variable than the components of scoring with some variation.
The PGA reports a number of what it calls “skill statistics;” all of these are reported in Table 1. (Putts per round shows up in the “skill statistics;” “Other Shots” is Strokes per Round, minus Putts per Round, minus 18). If our objective is to explain overall performance, as measured by Strokes per Round, then we have to select explanatory variables from among the available performance measures. For 2007, the PGA reported all the “skill statistics” data for 196 golfers.
I believe it is inappropriate to use Putts per Round as an explanatory variable. If we could control adequately for other performance measures, then the (expected) coefficient on Putts (in a multiple regression) would be 1—each additional putt would raise Strokes per Round by 1. What would be useful, however, would be to find explanatory factors for the components of Strokes per Round—Putts, and Other Shots.
* Average distance from the hole, once on the green
* “Conventional” vs. “long” putter
* Putting style (cross handed vs. conventional, e.g.)
About all that’s currently available are the measures of bunker shots per round and of sand saves percentage. I would expect that save percentage would be negatively correlated with putts per round, but I have no prior expectation about the effect of being in the bunker. However, the number of Putts per Round is positively (and significantly) correlated with Drive Distance (+0.29), and with Greens in Regulation (+0.54—players who hit more greens in regulation take more putts). Putts per Round is negatively and significantly correlated with Sand Save percentage (-0.49), with Bunker Shots (-0.28), with the number of holes on which the player Scrambles (-0.56), and with the number of Scramble Saves (-0.63). With the exception of the correlation with Bunker Shots, I suspect all these correlations are driven by distance to the hole, once a player makes the green. In short, I suspect many of these correlations do not necessarily mean that the player shoots a lower overall score.
It may be possible, however, to use some of the other “skill statistics” to look at Other Shots per Round. Looking first at correlations, we find that the number of Other Shots per Round is negatively correlated with Drive Distance (-0.28) and also negatively correlated with the Percent of Drives in Fairways (-0.26). While neither correlation is large, both are significant at the 1% level. This suggests that players who drive (a) further and (b) more accurately take fewer Other Shots; this is hardly a great surprise. Other Shots is positively correlated with Bunker Shots per Round (+0.42). This correlation is both large and highly significant. Players who are more often in the sand take more Other Shots, which is, again, hardly surprising. However, the range here is fairly small—from a high of 2.13 bunkers per round (Paul Gaydos) to a low of 1.00 (Michael Sim). [Surprisingly, the correlation between Other Shots and Putts is also large (-0.49) and significant. I would guess that this means that players who take a lot of Other Shots wind up closer to the hole, once they make the green.]
Suppose we move on to look at
regression results. First, I looked at
Putts per Round. Here, I actually do
not like my options for explanatory variables, so I began with three—Distance
(DIST), Percent of Drives in Fairway (FAIRWAY), and Percent of Greens in
Regulation (GIR). (t-statistics in
PUTTS = 18.7 + 0.013*DIST + 0.008*FAIRWAY + 0.096*GIR
(10.81) (2.31) (0.79) (5.98)
R2 = 0.321
So, longer drives and reaching the green in regulation lead to more putts. Well…Expanding the explanatory variables to include Bunkers Hit (BUNKERS) Percentage of Sand Saves (SANDSAVES), Number of Scrambles (SCRAMBLES) and Percentage of Scrambles Saved (SCRAMSAVES), I get:
PUTTS = 32.23 + 0.005*DIST + 0.019*FAIRWAY + 0.033*GIR – 0.178*BUNKERS -
(12.08) (1.31) (2.89) (1.41) (-1.34)
0.006*SANDSAVES – 0.095*SCRAMBLES – 0.304*SCRAMSAVES
(-1.31) (-11.50) (-2.13)
R2 = 0.699
I really don’t like this. Both the coefficients and the significance levels seem to be extremely sensitive to inclusion of additional variables. This reinforces my conclusion that what we really need for PUTTS is a different set of variables. These don’t seem to measure well what the factors are that affect putting, and there seem to be a number of econometric problems that crop up when I use them.
So I moved on to
look at OTHER strokes—those other than tee shots and putts—getting these
OTHER = 42.67 – 0.009*DIST – 0.004*FAIRWAY – 0.223*GIR – 0.021*BUNKERS
38.76) (-2.76) (-0.73) (-23.17) (-0.20)
R2 = 0.861
Here, at least, some of the coefficients make sense. The longer your drives, the more often they are in the fairway, and the more frequently you reach the green in regulation, the fewer OTHER strokes you take. (Well, the GIR result is obvious, so maybe I should exclude that variable…) But why on earth does hitting into bunkers, or more scrambling, lead to fewer other shots? So, excluding GIR, I get:
OTHER = 43.89 – 0.052*DIST – 0.071*FAIRWAY + 0.092*BUNKER -
(20.07) (-10.02) (-7.95) (4.85)
R2 = 0.432
Well, again, the coefficients aren’t all that stable (but, then, there’s a fair degree of multicolinearity between some of the explanatory variables). But they do make more sense. The explanatory power of the regression drops quite a bit (big duh! there). At least hitting traps is now related to taking more shots.
I don’t like combining these
sub-categories of shots into a single STROKES variable, because some of the
effects work in opposite directions.
Longer drives means fewer OTHER shots, but more PUTTS (presumably
because you’re further from the hole once you make the green), for
example. But, dutifully:
STROKES = 88.14 – 0.003*DIST + 0.017*FAIRWAY – 0.148*GIR – 0.264*BUNKER
(29.04) (-0.65) (2.25) (-5.46) (-1.75)
- 0.009*SANDSAVE - 0.116*SCRAMBLES – 0.006*SCRAMSAVE
(-1.93) (-12.41) (-0.037)
R2 = 0.700
(Dropping SCRAMSAVE, the results are virtually identical; the coefficients change, if at all, in the third place to the right of the decimal point, and the R2 is also unchanged.) The most interesting thing is that being in the fairway on your drive is apparently not notably valuable, if you’re a really good golfer.
I’m still at somewhat of a loss as to what to make of all this. Except that it’s apparent (to me, anyway) that the PGA’s “skill statistics” don’t help us all that much in analyzing player performance.
Strokes per Round