So I'm on my computer at home, looking at porn like any red- (or Reds-) blooded American, and I get an e-mail from slyde. "TheC suggested that we have a "topic of the day" about different sabermetric type concepts...I'm emailing this group because I consider you to be the main group of people who really seem to understand the sabermetric concepts", it reads in part. Hey, I love to tell people how smart I am. Sign me up!
I have to admit, I'm somewhat of a late attendee to this party. A couple of my friends from Detroit had been playing Strat-O-Matic baseball for a very long time, and invited me to play back in '03. As these guys were old pros (Chris had been playing for 20+ years, for example) I knew I needed all of the help I could get. Using Yahoo! to search for info, I came across both Baseball Prospectus and the old Baseball Primer (now Baseball Think Factory) websites. Holy Frijoles!!! I found out quickly how much I didn't know about baseball, and how runs are scored (and games won). What I've learned has helped me overcome errors in drafting and managing (I think), to stay competitive against those guys. More importantly, it's allowed me to see the game (and how it's played) in a new light.
So anyway, slyde handed out essay assignments, and asked me to write about advanced hitting stats. Here goes...
In my mind, hitting is the easiest thing to get a grasp on, when considering these advanced metrics; pitchers depend a lot on their defenders ("Gold Glover" Derek Jeter turns lots of outs into hits), and defenders on pitchers. Also, you need to know all sorts of data on a batted ball to rank a defender, and much of that info is subjective (how hard it was hit, into which zone on the field, positioning). Hitting events are all binary, though: you're either out or on base.
First, the most basic thing in baseball is to outscore your opponent. You do that by, on the offensive side, scoring runs (I'll let others talk about pitching and defense). How do you score runs? By advancing bases without making outs. Any book you read about sabermetrics will include a statement about the sanctity of outs - you only get 27 of 'em; once you use up those 27 outs, you lose (extra-inning games, obviously, excluded).
The first stat we'll look at is Runs Created (RC). What RC formulae try to do is to put a number on a hitter's ability to create runs while not making outs. You noticed I used the plural there - there's more than one formula for RC. Sean Forman at Baseball Reference says in his glossary, "There are 24 different versions of RC depending on the stats you have" (which wildly underestimates it; every Strat player on the planet has his own version). He uses a very basic one that only accounts for hits, walks, total bases, and AB+BB; ESPN.com uses a more complicated equation that includes stolen bases (and caught stealing), sacrifices, and adjusts for intentional walks. Another important point: after you calculate RC, remember to divide it by outs, and multiply by 27 - RC/27 gives you a much better view of who the better player is because someone with more playing time can rack up more runs (compare Jose Reyes and Barry Bonds. They've created the same number of runs, but Reyes has used up 1/3 more outs to do it).
But does it work? A little cut & paste into Excel says yes: Using ESPN's formula, last year the NL scored 3.3% fewer runs than predicted; this year, the league is running just over 4% under. In the NL, over the past season and a half, the simpler formula at Baseball Reference has been slightly more accurate, within 2.2% this year and 1.4% last. One explanation may be that ESPN's adjustments may reflect different scoring environments. Regardless, both formulae are pretty good predictors of run scoring.
At this point, you're asking yourself, "That's all fine and good, but isn't a home run better than 4 singles? Your formula doesn't seem to adjust for that." The answer, of course, is "Silly rabbit! Everyone knows that home runs kill rallies, while 4 singles is the very definition of a rally!¨ Seriously, it's obvious that the HR is the most valuable thing a hitter can do. We can prove it by looking at lots and lots of data, then seeing what happens. Retrosheet makes those lots of data available for us to analyze; they have play-by-play data for every game since '74 (except for '99, which they are finishing up), which can allow someone with a database manager, a plan, and some patience the ability to see what happened after a single in '06, for instance. Using that data, we can create what are called Linear Weights (lwts). Lwts are the run values of events - how many runs score in an inning when something happened.
More precisely, we know that from '99-'02 (the data generally used today for lwts), there were .555 runs scored in an inning on average; we also know that in innings where a HR was hit, 1.964 runs were scored (how do we know these things? We know how to use the interweb and leech someone else's hard work ). We subtract .555 from 1.964, and find that a HR has a value of 1.409 runs in that run environment. The values at the bottom of the page are the values for the most common type of events - note the value of a strikeout, compared to other outs. See, I haven't been lying the past 2 years.
(Disclaimer: Because you have to look at individual events, and compare them to a season's worth of stats, lwts are not a "real time" stat; they only have value after the season is complete.)
As long as you can properly account for how important different events are, more complete equations (like lwts) are better:
-A walk is (almost) as good as a single (just like they told you in Little League).
-Extra base hits are better than either.
-Sacrifices aren't as bad as other outs, but they're a lot worse than a hit or walk.
-Stolen bases are good, but getting caught is more bad (in today's run environment, you need to be successful about 70% of the time to break even)
-Double plays are really bad (everyone is talking about Dunn's strikeouts again, but he's already grounded into more DPs than in any other year. That's not good.)
-When you get your hit (or make your out) is irrelevant; you're always trying to avoid an out.
Once you've figured all of this stuff out, you have a more complete picture of who having (or had) a better offense year. When someone starts spouting off about Dunn's strikeouts, or his batting average, or his lack of sac flies (or savvy), you'll know better - the guy gets on base, hits for a ton of power, and steals bases only when it's smart to do so.
You should be careful, however, how you use the info. Unless you've compensated for it, a big RC or lwt number for a player in Cinci isn't as impressive as the same number put up by a Marlin; a catcher who puts up a big number is much more valuable than a 1B who does the same (though I'd argue that it was rare that Mike Piazza's offense made up for his defense).
That's where the third step in the chain - Value Over Replacement Player (VORP) - comes in. VORP will compare a hitter's stats to those of a "replacement-level" player, defined as the talent level that's freely available (essentially, the worse major-league talent), adjusted for park and position.
Again, the math is involved, but not complicated at all. Say you have a RF named "Gen Kriffey"; we know his stats (or can easily look them up); Baseball Prospectus has all of the stats we need, broken down by league and position, to help us find average and "replacement level". We can plug the league numbers into our RC formula, and compare them to Gen's to get his position adjustment; Baseball Reference will give you the park factor to adjust for. The only voodoo part is "replacement level"; the popular number is 80% of league average for a position (if that's true, how has Royce Clayton kept a job?). Again, you know your RC formula (which is all +,-,x,/); plug in the numbers for the average and for your player; subtract 80% of average from the player's value, and what's left is VORP.
Your eyes are glazing over, aren't they? You have no intention of downloading event histories, sorting, linking to Excel spreadsheets, and running long calculations. Here's the thing: Every one of these stats is freely available. ESPN and Baseball Reference have RC; VORP is in the free part of Baseball Prospectus. All you have to do is look (it's what I do; you don't really think I sit around and calculate this stuff, do you?). How do you know Griffey's hitting .292 - did you calculate it? No, someone else did; all you did was use someone else's work. That's all you have to do here, now that you have an idea of what the numbers mean, and where to find them.
And that's the great frustration that I, and many others, have with guys like Marty Brenneman. A minimal amount of effort on his part is all that's needed to see what some of this stuff means (like strikeouts not being worse than other outs), yet even though it's his job, he refuses. That's inexcusable.
I find it funny that "old school" guys seem threatened by these modern stats; they shouldn't be. No formula in the world will see Mark Prior's "perfect mechanics" (or his propensity for injury), or Josh Hamilton's flawed makeup, or how driven Prince Fielder is by hatred. You can't calculate a pitcher's "stuff", or a hitter's work ethic. They can only look in the past (though Nate Silver at Baseball Prospectus thinks he can look into the future; his results are varied). But anyone who calls themselves "professional" should know that batting average is not a good stat, and pitcher wins is completely useless. That, in a nutshell, is the difference between "saber" guys and "old school" guys - we admit our tools are limited.
To finish, I want to add something: Use these stats as a guide, not as an answer. I think that's an important thing to remember when you're reading all of these articles: All of the stats in the world may give you a more complete picture, but they'll never tell you the whole story about a player or his season. Someone like Sean Casey isn't normally going to create a lot of runs; his defense is above average, but not by much. But doesn't it make sense that just having him on the team might make those around him a little better? Don't you work better when you like the people you're working with? Just because you can't quantify "veteran presence" doesn't mean it doesn't ever exist.
All of the advanced stats articles I read think otherwise. But real baseball isn't Strat-O-Matic; baseball is played by real people who may or may not like their teammates, or be sick, or have other things on their minds, or any of the hundreds of things that affect how you and I approach our jobs every day.
Besides, I'd much rather sit in a box seat on an 80 degree afternoon for 3 hours drinking beer than run 1,000,000 simulated seasons on my computer any day.