I am writing this after finally finishing Bill James’s (and Jim Henzler’s) Win Shares. One of the impressions that reading has left me with is the seemingly constant attacks Bill makes in this book against Pete Palmer’s Linear Weights player evaluation system (partly tempered by his laudatory evaluation of Pete as a person, which I for one share). I believe these attacks to be at least partly misguided, in that, at least implicitly, Pete’s system is attempting to do something quite a bit different than Bill’s system is. In this blog, I attempt to explicate that difference.

I begin by contrasting what I see as the sometimes-implicit goals of three well-known indices for representing player performance; that used by Bill, Pete, and the Baseball Prospectus folks. (In so doing, I am not interested in examining the minutiae of how, for example, catcher defense is measured, but rather the general thrust of each). In win shares, James’s goal is to propose an index intended to represent the contribution individual players make to the victories a team achieves in a given season. As such, he truly wants a measure in which zero win shares means no contributions to wins, 5 win shares means deserving of credit for 5 wins, etc. Thus in principle a player cannot have negative win shares, making this an attempt at a truly ratio level scale, and Bill explicitly distinguishes his proposal from a possible index of loss shares for that very reason. With their index “wins above replacement level” as part of their entire approach to analysis, the Prospectus goals are somewhat different than James’s. They wish to discuss team front office strategy in terms of the strategic value of acquiring and retaining specific players within the context of a given salary budget. As such, it makes all the sense in the world to use replacement level as a zero point, under the presumption that it is relatively simple to find an alternative to a player below replacement level at little cost, in so doing improving the team. Thus they have very good reason to be comfortable with an interval level scale, although one with very few players in the negatives. Linear Weights, in sharp contrast with both, wants to be able to compare players’ performance for a given year in an environment free of contexts such as team wins or team strategy. What any multiple regression-type method does very well is lay out players on an interval level scale such that, if the scale difference between Player A and Player B is twice as great as the difference between Player B and Player C, then that reflects the differences in their actual productivity in a given year. With only this goal, the placement of a zero point becomes totally arbitrary; wherever zero is, the relative scale distance between players is the same. Palmer chose to place zero at average performance, such that there will be hordes of players with negative value.

Where Bill errs is his conviction that Pete shares his goal, which is clearly not the case (and surprising in that Bill does explicitly accept the difference in goals between what he and someone working with replacement level are doing). To repeat, Bill wants to evaluate players in the context of team victories, and Pete wants to evaluate players independently of team. This distinction has resulted in radical differences in the construction of their indices, such that they can imply very varied conclusions about players, which Bill rightly notes. Pete Rose is a case in point. In Linear Weights, Rose comes out as a good but far from great player. This makes sense within Palmer’s system, because for much of his career, Rose indeed was an average or even below average performer among National League players at whatever position he occupied in a given season. However, the fact that Rose was about an average performer for many many seasons means that he was well above replacement level for a long time and was responsible for quite a few wins in total, so Rose (dare I say) ends up smelling like a rose in either a win shares- or Prospectus-type evaluation method. (Incidentally, there is an analogous problem with these latter two systems; a player with any substantial career but with performance below replacement level will have value in win shares but not in a Prospectus rating; perhaps Luis Lopez?)

It appears that the majority of analysts feel that there is more intrinsic value in evaluating players in terms of something, be it actual team victories or replacement level, rather than the context-free Linear Weights environment, while still maintaining the ability to compare players adequately using interval- or ratio-scale measures. At least in principle, both wins above replacement level and win shares ought to be able to do this, thus trumping Linear Weights and fueling Bill’s claim that its time is past. But it seems to me that all is not lost here. In a multiple regression context, the slope values are sacrosanct; they are what measure the difference in performance among players. But the intercept is arbitrary. Palmer placed it at average performance, but I can see no reason why we can’t move it up, so to speak, so that zero is equivalent to some value with contextual meaning. Ted Turocy has done something of this sort, using a total player value of -2 or -3 as representative of replacement level.

But why would we want to save Linear Weights in the first place? Here is why; the regression-type weights are based on the calculated run potential for various types of game events. As such, they get to the very basis of the game; the likelihood of getting on base, advancing around them, and scoring. So, for example, if we evaluate a batter partly on, for example, the number of doubles he hits based on known probabilities that a double will on average lead to such-and-such number of runs scored, then we have a superb theoretical rationale for our evaluation, probably the best we could possibly have. Now, there are clearly problems with the run potential figures Pete uses; they are valid for a very specific era in baseball, and run potential figures for various base-out situations will vary markedly between periods in which games average ten runs scored for both teams and periods in which games average six. But that’s the sort of minutiae I wish to stay away from in this argument. We want Linear Weights because, in principle, it is the best system we have. So, despite my conviction that we have more than enough player evaluation methods already and ought to spend our time on other things, I would invite fixes to Linear Weights that are more slow-and-clean than mine.

-- Charlie Pavitt, chazzq (at) udel (dot) edu