"You never know what you're going to get out of the bullpen."

It was something my father would often nervously say when the starting pitcher was being yanked for a reliever. There was wisdom there: While John Franco, Frank Williams, and Rob Murphy could usually be counted on, most relievers were subject to this truth: They might well come into a game sucking out loud and barf up the whole game. Even the best blew it sometimes.

In 1987 I read Jim Bouton's classic Ball Four for the first time. His game results seemed mostly random from his 1969 MLB Seattle Pilots season. The book included an appendix that showed how HE rated HIMSELF as a reliever, on a per-game basis. I thought he had a pretty good feel for how each outing should be categorized unto itself: EXCELLENT, GOOD, FAIR, or POOR. But these ratings were completely subjective, as he had no concrete rules for this rating system. The idea stuck with me for decades:


Then last autumn, I wondered how I could develop a useful rating system for relief appearances based on Bouton's rating concept of categorizing each appearance as a separate event. Having played fantasy baseball since 1989, I knew that WHIP (walks + hits + hit batsmen) equated to runners allowed, which is arguably the best measure of a reliever; one or two bad outings a year might make an ERA look misleading, W-L record for a reliever is nearly useless, inherited runners score or are stranded perhaps mostly by luck, and the Saves stat has funky rules in combination with oddly-conservative usage of one reliever by most teams. Following the findings of our Sabermetric brothers before us, we have learned that getting outs/not making outs is the most important aspect of the game. My conclusion was that looking at the runners allowed per appearance is pivotal to knowing that reliever's dependable usage and tendencies. It might be as useful as any other stat we have for rating relievers.

So I dove into the numbers. I made spreadsheets that detailed every one of the 8200 National League relief appearances from 2009 to make sure I had a good study sample size. Then I came up with a quantifiable definition of what made a relief appearance a quality one versus a fair one versus one that probably blew the whole game. I started drawing some hard lines in a few places, and I think I've come up with a good measuring stick:

REG (Reliever Effective Game):
A WHIP that is LOWER THAN 2.000

FAIR appearances:
A WHIP that is anywhere from 2.000 to 3.000

BaRF (Bullpen Relief Failure):
A WHIP that is HIGHER THAN 3.000

Here was my highly-subjective reasoning for these standards:

If you're a reliever, your job is to come in and effectively get outs. If you pitch 1 inning and allow only 1 baserunner, it's hard to argue that you didn't do your job. Worst case scenario: 1 run scores. If you pitch more than an inning, you may allow more than 1 runner to reach per inning, as long as overall your WHIP is below 2.000 for that appearance.

If you allow 2 or 3 baserunners per inning, you may escape most innings without allowing a run. You might not look good, but overall, you've likely done your job.

If you allow more baserunners than outs recorded, you've really blown it as a reliever. If you pitch 1 inning, that means you've allowed 4 baserunners, and have likely started a rally or allowed multiple runs to score.

These categories seem good to me with one exception: one-batter appearances. These categories work well once you've pitched to at least 2 batters, but we need to make an adjustment here, as outings where you pitch to only one batter and allow him to reach do not fit tidily into mathematical formulas: dividing by zero rips holes in the space-time continuum, you know. If you pitch to only one batter, and you walk him or allow a base hit, you haven't exactly screwed up the game, so we'll give you a FAIR rating. If you face one batter and retire him, though, then you HAVE done your job, and done it perfectly. So my standards for one-batter appearances:

Retire the batter, and you get a REG. Allowing the only batter you face to reach gets you a FAIR, not a BaRF. Face more than one batter, though, and the standard rules apply.

If you can see a player's season percentage of REG or BaRF games, there's some good information in there. So...

% of relief appearances with a REG is your REP (Reliever Effectiveness Percentage)

% of relief appearances with a REG is your BaRF (Bullpen Relief Failure Percentage)

National League average for a pitcher with at least 20 relief appearances (a number I pulled out of my ass) was:

REG = 62.9%
BaRF = 14.2%

I find that the most useful way to look at these numbers is by percentage or ratio. Here is a bar graph that shows the Reds 2009 bullpen.



League average is listed at the top, and that the Reds pen did quite well last year. You can see that looking at the size of a reliever's REG and BaRF could prove a useful tool for managers in determining when best to use which reliever. You might want to use your highest REP guy in a situation, but you also might be gambling a lot of BaRFs if that same often-successful reliever has a tendency to be a feast-or-famine talent (not unlike Badroyo vs. Goodroyo).

Of note is that just under my 20-game qualifying threshold lies a man who pitched 19 games and lead the league in bullpen suckitude. That man, was the Reds' own Mike Lincoln (not a shocker), who not only had the NL's 16th-worst 47.4 REP (>15 G), he had the league's worst BaRF: 36.8%. So if you called in Lincoln from the pen, you can expect him to spark your opponents' rally more than 1 out of 3 times. Ouch.

Here are the leaders in these categories:



As you can see, when you look at relievers with at least 20 appearances, you get a range in REP from 87.0% to 36.4%, and a BaRF range of 0.0% to 35.7%, with league averages again being REG 62.9% and BaRF 14.2%.

Of note is one of the best relievers in the NL last year, Josh Fogg. Yes, THAT Josh Fogg. He was rocked in his sole start in 2009, but as a reliever he came in 23 times to record 20 REGs and only 1 BaRF. That's solid performance.

Even more impressive was Brad Thompson's '09 campaign, in which he pitched in 24 games without a single BaRF outing and 19 REGs.

Another way you may find these stats useful is as a ratio of REGs to BaRFs. Here are the league leaders in REG/BaRF ratio:



While there has recently been a recent rating system based on appearances using WAR, calculating it is bulky, whereas any single appearance is easy to figure using my system.

So whatcha think? Too sloppy? Not useful? Groundbreaking? Let's talk about it, and use it to our evil advantage in our everyday sabermetric pursuits.

What REP and BaRF research would you most like to see next? Vote below to help steer my research.

Go, Reds! They're my favorite team!

