I�ve done some research on the relationship between various measures of golfer performance and overall performance (strokes per round; prize money), but have found it difficult to know exactly where to go with it.� Here�s how I have approached it.
Overall performance, measured as strokes per round, has three components:
1. Shots off the tee.� There is essentially no variation in this measure of performance.� Everyone has essentially one tee shot per hole, or 18 per round, and the standard deviation is almost zero.
2. Putts.� There is some variation here, but less than one would expect.� In 2007, according to data at PGATour.com, the average number of putts per round (averaged across golfers, so this is an average of averages) was 29.30, with a standard deviation on 0.52.� The coefficient of variation was 1.77%.
3. All other shots.� There�s a very little more variation here; again, using 2007 data, the average was 23.98 �other� shots per round, with a standard deviation of 0.63, and a coefficient of variation of 2.62%.
Overall, golfers in 2007 averaged 71.28 strokes per round, with a standard deviation of 0.59 strokes per round, for a coefficient of variation of only 0.83%.� Overall performance was, then, less variable than the components of scoring with some variation.
The PGA reports a number of what it calls �skill statistics;� all of these are reported in Table 1.� (Putts per round shows up in the �skill statistics;� �Other Shots� is Strokes per Round, minus Putts per Round, minus 18).� If our objective is to explain overall performance, as measured by Strokes per Round, then we have to select explanatory variables from among the available performance measures.� For 2007, the PGA reported all the �skill statistics� data for 196 golfers.
I believe it is inappropriate to use Putts per Round as an explanatory variable.� If we could control adequately for other performance measures, then the (expected) coefficient on Putts (in a multiple regression) would be 1�each additional putt would raise Strokes per Round by 1.� What would be useful, however, would be to find explanatory factors for the components of Strokes per Round�Putts, and Other Shots.
Unfortunately, the PGA�s �skill statistics� make it difficult to find explanatory variables for Putts per Round.� Some factors that might matter:
����������� * Average distance from the
hole, once on the green
����������� * �Conventional� vs. �long�
putter
����������� * Putting style (cross handed
vs. conventional, e.g.)
About all that�s currently available are the measures of bunker shots per round
and of sand saves percentage.� I would
expect that save percentage would be negatively correlated with putts per
round, but I have no prior expectation about the effect of being in the
bunker.� However, the number of Putts
per Round is positively (and significantly) correlated with Drive Distance
(+0.29), and with Greens in Regulation (+0.54�players who hit more greens in
regulation take more putts).� Putts per
Round is negatively and significantly correlated with Sand Save percentage
(-0.49), with Bunker Shots (-0.28), with the number of holes on which the
player Scrambles (-0.56), and with the number of Scramble Saves (-0.63).� With the exception of the correlation with
Bunker Shots, I suspect all these correlations are driven by distance to the
hole, once a player makes the green.� In
short, I suspect many of these correlations do not necessarily mean that the
player shoots a lower overall score.
It may be possible, however, to use some of the other �skill statistics� to look at Other Shots per Round.� Looking first at correlations, we find that the number of Other Shots per Round is negatively correlated with Drive Distance (-0.28) and also negatively correlated with the Percent of Drives in Fairways (-0.26).� While neither correlation is large, both are significant at the 1% level.� This suggests that players who drive (a) further and (b) more accurately take fewer Other Shots; this is hardly a great surprise.� Other Shots is positively correlated with Bunker Shots per Round (+0.42).� This correlation is both large and highly significant.� Players who are more often in the sand take more Other Shots, which is, again, hardly surprising.� However, the range here is fairly small�from a high of 2.13 bunkers per round (Paul Gaydos) to a low of 1.00 (Michael Sim).� [Surprisingly, the correlation between Other Shots and Putts is also large (-0.49) and significant.� I would guess that this means that players who take a lot of Other Shots wind up closer to the hole, once they make the green.]
Suppose we move on to look at
regression results.� First, I looked at
Putts per Round.� Here, I actually do
not like my options for explanatory variables, so I began with three�Distance
(DIST), Percent of Drives in Fairway (FAIRWAY), and Percent of Greens in
Regulation (GIR).� (t-statistics in
parentheses)
PUTTS = 18.7 +
0.013*DIST + 0.008*FAIRWAY + 0.096*GIR
����� (10.81) (2.31)������ (0.79)��������� (5.98)
������������������������������������� ����������� �����������R2 = 0.321
So, longer drives and reaching the green in regulation lead to more putts.� Well�Expanding the explanatory variables to
include Bunkers Hit (BUNKERS) Percentage of Sand Saves (SANDSAVES), Number of
Scrambles (SCRAMBLES) and Percentage of Scrambles Saved (SCRAMSAVES), I get:
PUTTS = 32.23 + 0.005*DIST + 0.019*FAIRWAY� + 0.033*GIR � 0.178*BUNKERS -
������ (12.08) (1.31)������ (2.89)���������� (1.41)�����
(-1.34)
��������������� 0.006*SANDSAVES �
0.095*SCRAMBLES � 0.304*SCRAMSAVES
�� ������������(-1.31)���������
(-11.50)���������� (-2.13)
������������������������������������� ����������������������������������� R2 = 0.699
I really don�t like this.� Both the
coefficients and the significance levels seem to be extremely sensitive to
inclusion of additional variables.� This
reinforces my conclusion that what we really need for PUTTS is a different set
of variables.� These don�t seem to
measure well what the factors are that affect putting, and there seem to be a
number of econometric problems that crop up when I use them.�
So I moved on to
look at OTHER strokes�those other than tee shots and putts�getting these
results:
OTHER = 42.67 �
0.009*DIST � 0.004*FAIRWAY � 0.223*GIR � 0.021*BUNKERS
������� 38.76) (-2.76)����� (-0.73)������� (-23.17)���� (-0.20)
���� - 0.024*SCRAMBLES
���� (-4.35)
������������������������������������� ����������������������� R2 = 0.861
Here, at least, some of the coefficients make sense.� The longer your drives, the more often they are in the fairway,
and the more frequently you reach the green in regulation, the fewer OTHER
strokes you take.� (Well, the GIR result
is obvious, so maybe I should exclude that variable�)� But why on earth does hitting into bunkers, or more scrambling,
lead to fewer other shots?� So,
excluding GIR, I get:
OTHER = 43.89 �
0.052*DIST � 0.071*FAIRWAY + 0.092*BUNKER -
������ (20.07) (-10.02)���� (-7.95)��������� (4.85)
���������������� 0.017*SCRAMBLES
��������������� (-1.59)
��������������������������������� ����������������������� ���� R2 = 0.432
Well, again, the coefficients aren�t all that stable (but, then, there�s a fair
degree of multicolinearity between some of the explanatory variables).� But they do make more sense.� The explanatory power of the regression
drops quite a bit (big duh! there).� At
least hitting traps is now related to taking more shots.
I don�t like combining these
sub-categories of shots into a single STROKES variable, because some of the
effects work in opposite directions.�
Longer drives means fewer OTHER shots, but more PUTTS (presumably
because you�re further from the hole once you make the green), for
example.� But, dutifully:
STROKES = 88.14 �
0.003*DIST + 0.017*FAIRWAY � 0.148*GIR � 0.264*BUNKER
�������� (29.04) (-0.65)����� (2.25)��������� (-5.46)����
(-1.75)
�������������������� -
0.009*SANDSAVE� - 0.116*SCRAMBLES �
0.006*SCRAMSAVE
����������� ���������(-1.93)����������
(-12.41)��������� (-0.037)
��������������������������������� ������������������������� �� �����������R2 = 0.700
(Dropping SCRAMSAVE, the results are virtually identical; the coefficients
change, if at all, in the third place to the right of the decimal point, and
the R2 is also unchanged.)�
The most interesting thing is that being in the fairway on your drive is
apparently not notably valuable, if you�re a really good golfer.
I�m still at somewhat of a loss as to what to make of all this.� Except that it�s apparent (to me, anyway) that the PGA�s �skill statistics� don�t help us all that much in analyzing player performance.
Donald A. Coffin
Indiana University Northwest
March, 2008
|
|
|
Coefficient |
Strokes per Round |
71.28 |
0.59 |
0.83% |