clock menu more-arrow no yes mobile

Filed under:

Lies, Damned Lies, and Sabermetric Statistics - Pitching Edition

You're beyond Wins and Losses. Here's a primer on the battles among some of the major pitching metrics.

Something stinks, but it isn't Bronson's xFIP.
Something stinks, but it isn't Bronson's xFIP.
Thearon W. Henderson

Do you remember Red Reporter's sabermetric series from 2007? No? Well, we're doing an update, starting with the pitching. There's an alphabet soup of advanced pitching metrics. I'm not attempting to tackle all of them, but a survey of the major ones - and their underlying principles - can lead to some interesting conversations.

ERA+ vs. ERA-
As far as the triple crown rate metrics go, ERA is far better than its hitting equivalent - batting average. Rather than describing just one aspect of a player's contribution, ERA attempts to measure the very essence of a pitcher's job - the runs allowed which are his fault. But there are obvious problems with ERA, such as the subjective determination of which runs are "earned," accounting for park effects, and comparing across different eras. That's where ERA+ comes in. ERA+ adjusts for the offensive environment (park and era), then scales the metric like OPS+ (so that 100 is average, and higher is better). Note that there's less deviation with ERA+ compared to OPS+. Last year, Johnny Cueto led the league with a 151, while the top OPS+ was 171.

If you're content to use ERA, then ERA+ is easy and now fairly widespread. So why use anything else? According to this BtB article, ERA- (found on Fangraphs) is superior because it compares a pitcher to the league average. But isn't that what ERA+ does? Technically, no. It tells you how much better (or worse) the league was than the pitcher.

Let's look at an example - Bronson Arroyo's 2012. His raw ERA of 3.74 gives him an ERA- of 93 and ERA+ of 111 (ERA- is also scaled to 100, but unlike ERA+ (and like ERA), lower is better). This means that Arroyo's adjusted ERA was 7% better than league average, according to Fangraphs. The 111 ERA+ means that the league average ERA (after adjusting for park) was 11% higher than Arroyo's. ERA+ essentially swaps the numerator and denominator compared to ERA and ERA-.

Does it really matter? Maybe if you're a math purist, but I don't see the big whoop. When I see a 111 ERA+, I don't automatically think "hey, Bronson was 11% better than league average!" (which is wrong). I see a solidly above-average but not All-Star level performance. But if that isn't precise enough for you, ERA- is right up your wheelhouse.

Easily the most significant sabermetric principle is that pitchers control the fate of batted balls much less than previously thought. Voros McCracken's seminal 2001 BPro article dropped this bombshell:

There is little if any difference among major-league pitchers in their ability to prevent hints on balls hit in the field of play.

And thus DIPs (defense independent pitching theory) was born.

DIPs was a revolutionary idea even if it was based on obvious realizations - that not all defenses are created equal, and that some days (or years), hitters simply hit them where they ain't. ERA might adjust for errors, but it does nothing to account for the difference between having a spry Paul Janish or the corpse of Edgar Renteria as your Shortstop. Or the fact that Sam LeCure saved your ass by inducing a double play after you left them loaded.

FIP attempts to separate what a pitcher can control and what he purportedly cannot. In other words, it considers only those events where the defense has no effect - strikeouts, walks, homeruns, and HBPs (you'll see Ks, BBs, and HRs collectively referenced as the "three true outcomes" or "TTO"). It is scaled to ERA, meaning that the average FIP will be the same as the average ERA in a given year. The FIP link above is for Fangraphs' explanation; go here for another helpful primer if you're so inclined.

Like a lot of great ideas, McCracken's findings were generally accurate but somewhat overstated. For one, homeruns aren't completely within a pitcher's control. While some pitchers exhibit a consistent skill in allowing more (or fewer) homeruns per flyball allowed, most pitchers do not. Which means that fluctuations in HR/FB rates should be regressed to the mean (about 9-10% per flyball) if we're trying to gauge a pitcher's true talent level. Fangraphs refined FIP into xFIP, which is calculated the same way but replaces homeruns allowed with a regressed estimate of how many should have been allowed.

Fangraphs claims that xFIP is more predictive of future ERA than FIP and especially ERA itself. And that's the real advantage to the DIPs-based stats. Other variations like BPro's SIERA take things a step further than xFIP by accounting for groundball rate or other ways pitchers affect the fate of batted balls. I'm sure I'm not the right person to ask which one is the best, though I doubt any one of them is a material improvement over the others.

bWAR v. fWAR
Go ahead and say it - it's a battle of the WARs. While position player differences between the two WARs are rarely significant, the two pitcher WARs are entirely different beasts. That's because of their philosophical differences - while fWAR attempts to separate team defense and batted ball luck from the evaluation, bWAR does not. bWAR is better if you're looking at the actual on-field results, in my mind, while fWAR is better for evaluating what should have happened given normal defensive support, strand rate, and other factors. Where you stand on bWAR v. fWAR probably depends on your thoughts of FIP and ERA.

As an example, let's take a look again at Bronson. His comeback season in 2012 - 202 innings, 3.83 RA, 3.74 ERA, 111 ERA+ - netted 3.6 bWAR (second-best in his career). The Fangraphs metric was not as convinced of his proficiency, giving him 2.5 fWAR based on his FIP being over 4. Also contributing were a slightly higher than average strange rate (77%).

I'm inclined to give Arroyo credit for keeping hitters off balance with his slop coming from all angles and go with his bWAR as the fairer judgment. So maybe his fWAR for that year is a bit harsh. But you know what? Over his career, the WARs do not view Arroyo all that differently - 23.5 fWAR compared to 25.7 bWAR. This comes back to the DIPs principle that pitchers have limited control over what happens to batted balls. We might see significant differences in ERA and FIP over the course of small samples or even a few seasons. But if you get a large enough sample, a lot of this stuff evens out. Arroyo has pitched in front of bad defenses and good ones. In some years he strands more runners, in others he might allow more homeruns per flyball. His career FIP is about a third of a run higher than his ERA, but both systems agree that Arroyo's been an average-plus starting pitcher. If there's a metric that can say with reasonable certainty that he will continue to do as well for another few seasons, I'm sure the Reds would like to see it.