I was supposed to have this posted for FVA last Friday, but work got in the way ("Some major glitch in the accounting. A lot of money missing." (bp)). Alas, today was a day that I could finally get around to chopping away at it.
Most of us learned in kindergarten that in any right triangle, the area of the square whose side is the hypotenuse is equal to the sum of the areas of the squares whose sides are the two legs. It's been ingrained in our minds since we first read Winnie the Pooh Studies Euclidean Geometry. And now, hardly a day goes by when you don't find yourself doodling new ways to lay out an algebraic proof for the Pythagorean theorem on a napkin at lunchtime by yourself. Am I right?
Baseball and Pythagoras
Let's face it though, the hypotenuse of a right triangle has very little do with baseball. But Pythagoras is such a cool name that you often find yourself wishing you could use it in everyday baseball conversation, don't you? Well enter Bill James to the rescue. While obviously living in his mother's basement in the late 1970's and early 1980's, James noticed that you get a pretty good idea of what a team's record would be just by knowing the team's runs scored and runs allowed for the season. All you need is a simple formula that looks a lot like the formula Mr. Ruford T. Pythagoras came up with oh so many years ago:
Because James's formula resembled the equation of the Pythagorean Theorem so much, he decided to name his new equation the Pythagorean Winning Percentage (or Expectation or Record or Expected Record, depending on who you ask). And finally the world was able to identify which team was better without actually having to play the games. It was a great leap forward for spreadsheet enthusiasts around the globe.
Now I know what you are thinking. You're thinking, "Slyde, you are full of shit. We already know a team's win-loss record because they actually played the games you giant nerd-wad. Why should I take a step back and look at something that isn't a precise replication of that record that happened in reality, not your matrix created alterna-reality, dork?" First of all, you could probably tone down the name calling a bit. I may be a computer...er...on a computer, but I do have feelings. Secondly, it's a very good question.
The real benefit of knowing a team's Pythagorean winning percentage is that it sets a realistic expectation for what a typical team would do given the number of runs they score and the number of runs they surrender. More to the point, it can give us an indication if a team is under-performing or over-performing their expectations and therefore would be likely to improve or decline in the future.
A good example of this is the 2006 Reds, who spent two-thirds of the season with a winning record despite being outscored by their opponents. After 81 games, they were 7 games over .500 at 44-37 (and in first place!). However, they had scored 410 runs while surrendering 413 runs, giving them a Pythagorean record of 40-41. Knowing what we know about run differentials based on the Pythagorean Record, it should have been obvious that the Reds were due to falter as the season carried on. Unless a team with a negative run differential can turn it around, the likelihood of them maintaining a winning record is slim. Remember the Captain Obvious phrase that pays: "Teams that give up more runs than they score just don't win that often."
Runs on a balance sheet
But predicting the future isn't all that the Pythagorean record is used for. In fact, given that run differentials can fluctuate throughout the year, it can be a very imprecise tool if you just use it based on raw numbers (i.e. you don't adjust for key injuries and inopportune slumps, etc.). What I like to use the Pythagorean record for is preparing for next season. The insight that a Pythagorean Record formula gives us is the interaction between runs scored and runs allowed and how a change in one or the other will likely affect a team's record.
How about an example? What I like to do is take runs scored or runs allowed and freeze it to see what types of changes will need to be made on the other side in order to get to a certain level of performances. For instance, if the Reds offense performs at the same level next season and scores about 750 runs, how much will the pitching have to improve in order to get close to 90 wins. Using the Pythagorean formula, we can estimate that the pitching staff will need to give up about 675 runs, or around 4.2 a game. So far this season, they have surrendered about 5.1 runs a game, which would be about 826 for the season, meaning they would need to drop 150 runs in order to get to 675 runs for the season. That's a lot of runs.
Now since there are several young players on the offense, we probably wouldn't expect them to stagnate completely. Say improvements by Encarnacion, Phillips, and Hamilton manage to counter any dip that other players might have and even improve the Reds up to a 800 run team (this is all purely hypothetical). The Pythagorean formula tells us then that the pitching and defense would need to surrender around 720 runs, or just a 105 run improvement.
The thing you might notice here is that as you get into a higher scoring bracket, it takes a larger run differential to get you to the same record. This means that a run differential of 100 runs doesn't always mean the same thing, or said a different way a team that scores 750 runs and surrenders 650 is likely going to win more games than a team that scores 950 and surrenders 850. I believe this is why many believe in the pitching-defense model that Krivsky preaches. And yes, it might be an effective way to build a team in the long run, but the problem from the Reds perspective is that they were on the other side of the spectrum when Krivsky took over the team. For him to get his dream squad, it would require pretty much a full overhaul of the roster. Rather than doing that, he might be better off trying to build a successful team with some of the quality players that he already has.
So it's all about luck, right?
Alright, enough editorializing about the Reds. This is supposed to be about Pythagoras. The examples that I gave above assumed that a team's record would be right in line with their Pythagorean expectations. As you've probably guessed, that is not always the case. While the majority of teams are within a game or two of their projected record, from time to time teams will fall as many as 10 games above or below their expected record (this season's DBacks are currently 9 games over their Expected Record, while the Giants are 8 below theirs). Often times these differences will simply be explained as luck, but some sabermaticians are starting to show that there may be more there.
David Gassko at The Hardball Times had an article in May that looked at a few different theories and found that:
- A balanced lineup can increase your chances of outperforming your Pythag record. In this case, balance does not mean being able to do the little things. Rather it means that all of your offensive production does not come from just 2 or 3 players.
- Properly using a bullpen can drastically change your ability to outperform your Pythagorean record. Using your best pitchers in high-leverage situations can make a large difference in whether or not you win more games that expected.
- An experienced manager can make a difference as well. The more game that a manager has managed, the more likely it's going to be that his team will outperform expectations. Part of me thinks this may be selection bias since managers who have failing teams don't tend to manage for very long, but the significance of managerial experience is considerably less than the effect of a properly managed bullpen.
One factor that Gassko looked at that didn't seem to affect the expected record much was a team's dependence on home runs. So a team that scores more runs from home runs does not significantly change whether or not that team will outperform it's expected record.
Advanced math stuff that I don't understand
One last thought about the Pythagorean Percentage. It's been determined that squaring the runs scored and allowed really may not be the most accurate measure of an expected record. For one thing, not all runs mean the same thing. A run scored in an environment where teams only score 3 runs a game is worth much more than a run scored in a league where teams average 8 runs a game. Adjustments must be made. Along come the math nerds to help us out. If you would like to determine the proper exponent to use when judging a team's expected record, simply use the formula of (r + ra)/g)^0.287 (where r is runs scored and ra is runs allowed for the league). This will give you a number somewhere around 1.87 in today's run environment. It won't make a huge difference in terms of the overall Pythagorean record, but it will be a little more accurate, and accuracy is important if you are trying to out-nerd somebody.