Sabermetric Overview Series Part VII: Evaluating Pitching

More fun with numbers today, so dust off your scientific calculators.  I'm going to try to summarize the sabermetric approach to evaluating pitching.  I've had some trouble writing this because (1) I am not completely on board with the sabermetric approach, (2) there is considerable disagreement in the sabermetric community about pitching, and (3) the last two weeks at work have been BOHICA. Still, I do agree with the overarching message of the sabermetric view of pitching:  pitchers control a lot less about batted balls than traditionally thought.  

Evaluating pitching the old-school way
Wins/losses and ERA have been and continue to be the primary means of evaluating pitchers, at least if you're looking at Cy Young award voting.  I'll spare you a long lecture why W/L records are a poor measure (see chapter 2-1 of Baseball Between the Numbers if you care for the treatise).  To summarize, it's heavily dependent on the pitcher's offense, defense, bullpen, and blind luck.  We can do better than that.  I mean, what kind of idiot would reward a pitcher with a multi-million dollar contract solely due to a 15-10 record for one year?  

While ERA is a significantly better measurement because it removes the pitcher's offense from the equation, it's still greatly limited because of its reliance on factors outside of the pitcher's control. The pitcher's defense certainly impacts his ERA.  So does his bullpen, if he leaves with runners on.  ERA also subjectively separates "earned" and "unearned" runs, which further compromises its effectiveness.  It also doesn't account for park factors (although baseball-reference's ERA+ does). The list goes on.  This is why a pitcher's ERA greatly varies on a year-to-year basis, much more so than a typical hitter's OPS.   Just as an example, take a look at Roger Clemens and Junya.  If you scan Clemens' career lines, you see several year-to-year ERA differences of over a run (even if you don't count his partial seasons).  That's a huge variance.  It means that Clemens was something like 33% to 50% more (or less) effective by ERA from year to year.  Now look at Griffey's lines.  His OBPs are generally within the .350-.410 range, OPSs within .850-1.07.  On a percentage basis, these discrepancies are much less than Clemens' ERA variances.

Nerds shed the light
So why the unpredictability in pitching performances?  Simply put, it's because the pitcher has little control over what happens to batted balls.  Pitchers control three primary events:  Walks, Strikeouts, and Home Runs (we'll disregard minor events like HBPs for now).  With anything else, the pitcher depends on his defense and luck.  Having Adam Everett instead of Felipe Lopez as your SS can cost a pitcher a significant number of hits allowed over the course of a year (I believe defensive stats will be covered in a future diary).  But luck is probably just as important.  In some games (or years), the grounders have eyes.  In others, the screaming line drives given up happen to be hit directly at your fielders.  

This was the main thrust of a Voros McCracken 2001 article in Baseball Prospectus.  In the article, McCracken (try not to judge the author by his last name) studies the results of batted balls and divides defense independent events (basically BB, HR and K) from defense dependent events (H, R, ER, W/L, etc.).  He concludes that "hits allowed are not a particularly meaningful statistic in the evaluation of pitchers," and takes it a step further by declaring that "there is little if any difference among major-league pitchers in their ability to prevent hints on balls hint in the field of play."  Not exactly Galilean, but it created a fair amount of controversy in the baseball community (or at least the internet baseball community).  

One way to prove this is by examining the results of batted balls through the Batting Average of Balls In Play (BABIP).  In a December 2001 article that first introduced me to this concept, Rob Neyer looked at Pedro Martinez's 1999 and 2000 seasons, which happen to be two of the best pitching seasons in the last twenty years.  In 1999, Pedro had a 2.07 ERA in 213.3 IP.  But in 2000, his ERA was an insane 1.74 in 217 IP (these are the 2nd and 7th best seasons by ERA+ of all time).  He lowered his ERA even though in 1999 he struck out more batters (313 versus 284) and gave up fewer HRs (9 versus 17) (though he did walk a few more hitters in 1999, 37 versus 32).  Why? Because on the relatively rare occasions when batters did make contact against Pedro, they did much worse in 2000 (.236 BABIP) than in 1999 (.323).  Whether it was better defense or blind luck, less batters reached base against Pedro in 2000 and consequently he allowed fewer runs. McCracken posited that there is no rhyme or reason to which pitchers have low or high BABIP, unlike defense independent components like BB or K.  Pedro's BABIP might vary wildly from season to season, but his BB, K and HR rates will stay relatively stable.  

As an alternative to ERA, McCracken devised a "DIPS ERA" that translates traditional ERA into a measurement of only defense independent factors.  After subtracting defense dependent events from total batters faced, McCracken applies league-average rates to determine the number of singles, doubles, etc. given up.  Others have refined this approach to come up with their own metrics, such as The Hardball Times' Fielding Independent Pitching (FIP).  Baseball Prospectus also has its own Defense-adjusted ERA (DERA).  Although its formula is secret (unlike FIP), I believe that DERA looks at hits and is less DIPS-ish than FIP.  I'm not going to look at BPro's other pitching metrics because it's a subscriber-only service, but if you want a full description check out chapter 2-1 of Baseball Between the Numbers.

Rather than leaving this as just a series of statements and numbers, I'll throw in a real-life example that we're all familiar with - Bronson 2006 versus Bronson 2007.  Bronson 2006 was superb - 3.29 ERA in 240.7 IP, 14-11, and an All-Star berth.  He was third in the league in ERA+ (one reason I thought he was robbed in not getting any Cy Young votes).  But things obviously haven't gone as well this year.  In 153.7 IP, his ERA is more than a full run higher at 4.63, slightly better than league-average according to ERA+.  He's also 5-13.  

But is he really that much worse of a pitcher?  No, I don't think so.  His K/9 IP has gone done some (6.88 to 6.38), but not nearly enough to explain an ERA jump of that magnitude.  His walks have gone slightly up (2.5 to 2.8/game), but he's also given up many fewer home runs (31 last year, 16 so far in 2007).  Not surprisingly, his FIP has barely moved from last year, from 4.14 to 4.23.  Which means his ERA has been pushed up by (1) more batted balls not finding defenders, (2) hits and walks being less scattered than last year, and possibly (3) more inherited runs allowed by the bullpen (though I'm not sure how to check this without looking at each game individually).  I think the answer lies mainly in (2).  We can point to two outings this year (on 5/21 and 8/1) where he pitched only 2 and 1.7 innings but gave up 6 and 7 ER, respectively (not coincidentally in my mind, both of those stinkers directly followed his two highest pitch count starts (129 and 123) of the year).  Remove those two starts from the equation and his ERA decreases from 4.63 to 3.96!  His 2006 ERA, in contrast, wasn't spoiled by any outings of less than 4.1 IP.  

Questioning DIPS  
I'm on board with the general principle that most pitchers have little control over batted balls, but there are exceptions that DIPS cannot account for.  Perhaps most obviously, some pitchers exhibit a real ability to induce ground balls, which are less likely to turn into hits (and especially XBH) than line-drives and fly-balls.  Chien-Ming Wang is an extreme example.  Last year he only struck out 76 batters in 218 IP (versus 52 BB and 21 HR).  But he also induced grounders on 63% of batted balls, which helped him achieve a 3.63 ERA.  Knuckleballers also defy DIPS, although you can probably count the active ones on one hand.  

Another disagreement I have with the sabermetric approach (though not necessarily DIPS-related) to pitching concerns the ability to pitch to the situation.  In Baseball By the Numbers, BPro's Keith Woolner and Dayn Perry declare that "pitchers don't exhibit significant control over when they give up hits and walks."  (p. 53).  Whether it's because I've seen Tom Glavine wiggle out of too many jams or because the preachings of Brennamania have seeped into my subconsciousness, I disagree. Oddly enough, today's BPro features an article that addresses pitching to the situation.  Using data from enhanced Gameday, Dan Fox studied the velocity and movement of pitches according to the number of runners on base.  Although the discrepancies aren't huge, Fox concludes that pitchers (some more than others) adjust to the situation by giving a little extra oomph on a fastball with runners on.  Interestingly, curve balls tend to break slightly less on average with runners on, though that could be because a pitcher is less likely to throw something in the dirt in those situations.  That article aside, I also have a problem with the theory that pitchers can't control the distribution of hits and walks because if that were true you'd expect to see OBP and OPS stay relatively stable regardless of the situation.  But this isn't the case.  This year's pitching staff, for example, allows a .760 OPS with no runners on but a .825 OPS with runners on.  Similarly, the OPS allowed varies depending on the score: .795 when only one run separates the teams, but .766 when the margin is four or more runs.  

That's it  
Congratulations if you've made it this far.  Looking forward to your comments.