Updated Hot Zone Graphs: Now with More Artificial Coloring!
I've spent a lot of time over the last couple of weeks in my labs working on the "hot zone" graphs to hopefully make them easier to read and more accurate. I'm not sure if I've succeeded, which is why I need your help. I'd like some feedback from you on how effective these graphs are. Do they make sense? What could help make them easier to read? Are they worth the effort? What is wrong with my methodology (which will be explained after the jump)?
Below you'll see an example of the new format with a graph for Joey Votto. After the jump are links to several other players as well as some notes on how I generate these graphs. Again, any feedback you can provide would be greatly appreciated.
RHB: Keppinger | Pujols | Justin Upton | Vlad Guerrero
LHB: Dunn | Giles | Howard | Ichiro | Votto
Pitch Type: Phillips vs. RHP Slider | Bruce vs. RHP Fastball
Each graph represents a different aspect of how a hitter performs at the plate:
- The top left graph is the hitters rate of swinging per pitch in a zone.
- The top right graph is the rate of contact per swing in a zone - foul balls count as contact.
- The bottom left is based on the slugging rate per each at bat in a zone. In this case, strikeouts where the third strike was in that zone counts as an AB.
- The bottom right graph is a differentiation between zones that have a higher fly ball rate versus zones that have a higher ground ball rate for that hitter.
For each graph besides the batted ball graph, the colors of the graph are relative to the how the rest of the league performs in that zone. The comparison is only made to hitters of the same handedness who faced the same handed pitchers and the same pitch type (in cases where we are only looking at a single pitch type). This means that if we are only looking at how a right-handed batter fared on sliders from RHP, we are only comparing to how other RHB did against sliders from RHP.
Batted ball data is only relative to the hitter being examined because I thought it was more informative scouting-wise to know if a hitter tended to hit more flyballs than groundballs from a zone, not whether he did it more than the rest of the league.
Some technical details on the steps I use to generate the graphs. Feel free to skip this section. If you do read it, please critique my methods if something doesn't make sense. On a lot of this stuff I had to feel my way through it:
- Each pitch that has location data is assigned to 1 of 99 zones based on that location data. All of the zones touch the strike zone except for the outer most zones, which represent all pitches that are more than a radius of the ball outside of the zone.
- Rates for the league are calculated thusly:
- All data is aggregated at the individual player level per zone, pitcher handedness, and pitch type. In order to smooth the data, the 8 zones that surround each given zone are averaged into that zone as well.
- All hitters that meet the minimum requirement for a rate in a zone are used to create percentile buckets for that zone. The minimum requirements are:swing rate: 20 pitches
These values are admittedly arbitrary, but when I'm looking at all pitch types, they give me a pretty large sample to use to create buckets. If somebody with more statistical background can teach me how to find the proper significant value so that I know I am getting a good cross-section of data, please speak up. I'm willing to do whatever it takes to make these more accurate.
contact rate: 10 swings
power rate: 10 AB
- When I look at specific pitch types, the amount of data in each zone becomes screwy - in some cases to the point where I don't have enough sample to generate percentiles. In those cases, I calculate the average rate for that zone across all instances meeting the batter, pitcher, and pitch type criteria. I then set the average as the 50th percentie rate (I know this isn't right, but I'm compensating), and then generate buckets on either side of that at intervals that are +/- 10% of the average. So, if I have an average rate that is 62%, my intervals for my "percentile" buckets would be:
I'll admit that I don't like this, but it's the best way that I could come up with to give me a spread of rates that are based somewhat on reality. For what it's worth, this process generates rates that are similar to the process of just creating percentiles, except for in some zones where the data is not normally distributed. Again, I'm open to suggestions on how to do this more correctly.37% 43% 50% 56% 62% 68% 74% 81% 87% - Once I have all of my league rate buckets, I compare the individual hitter who we are graphing to the the league rates and assign each zone to a bucket. That bucket determines the hue and the shade of the color that we see on the graph.
The biggest problem that I am having is with the fact that the data is not normally distributed. So, if I change my minimum criteria, I can dramatically change the percentile buckets. For instance, if I lower the contact rate criteria to 5 pitches, I end up with zones that have a 40th Percentile of 0% contact. So, a lot of my time was spent trying to compensate for that in a realistic manner. I'm not sure if I've done that, but hopefully through your feedback we can find out.
17 comments
|
1 recs |
Do you like this story?
Comments
i have no idea about the technical hobbledy hoy going on
but the graphs do make a ton more sense than the old ones. and i liked the old ones. these have much less going on and are simpler to comprehend. great stuff Slyde.
My millions are unconventional!
by Charlie Scrabbles on Mar 23, 2009 10:42 AM EDT reply actions
Love it.
Easy to understand, and does a great job of showing important trends. Great job, Slyde.
"Sometimes I listen for Griffey’s infectious laugh or Dunn’s humor and wit. But they’re gone." - Dusty
Bahaha.
I love Guerrero’s swingrate graph. The other thing I was looking at is all the ‘take’ zones for Keppinger – does he need to be swinging more? Are there stats for swinging strikes and called strikes somewhere?
As for a suggestion
the colors of the graph are relative to the how the rest of the league performs in that zone.
I think this needs to go right next to the graphs somewhere. (Because that was the first question that popped into my mind when looking at them?) But these are really great, Slyde.
Everybody's a jerk. You. Me. This jerk.
I agree on your second point
so far it appears that is the first thing that people misunderstand on these. Perhaps a note at the bottom of the image with the catcher perspective text?
on your swinging strikes vs. called strikes, the best I can say is the Z-swing% on FanGraphs. Kepp (59% ) is actually fairly low on the list in terms of the amount of swings at pitches in the zone. The weird thing is that for players with a minimum of 200 PA, Luis Castillo had the lowest rate of in zone swings last year at 46%, and he was 3rd on the list for most contact, just in front of Keppinger. Strange to me that two very high contact guys take so many pitches. I wonder if they would have more power but lower contact rates if they would swing the bat more frequently.
"How big IS your magic wand?"
Wow..
Yeah, I like these a bit better than the circle graphs from an understanding point of view. I am amazed at the difference in swing rates between potential hall-of-fame hitters with Pooholes and Vlad. Vlad must never walk.
Education is what you get from reading the directions. Experience is what you get from not reading them.
just 6% of his non-IBB plate appearances were walks
in fact, nearly a third of his walks were intentional.
"How big IS your magic wand?"
I love the graphs
and congratulations on your new psychedelic utility belt!
by ChillyCheezItz on Mar 23, 2009 11:51 AM EDT reply actions
My head hurts!!!
But these are great. Really interesting trends show up, and most of them reinforce what I see with my own eyes when I watch these guys hit.
Plus the colors are pretty!
Bruce struggled to make contact in general
I think his problem is recognition. It seemed like he swung at everything last year. I think if he can learn to focus his strike zone more effectively, he’ll make more contact. From what he’s been saying this Spring, that is his goal too.
"How big IS your magic wand?"
From the desk of Captain Obvious...
… it comes as no surprise that Dunn has plenty of purple in his swing rate graph.
Unfortunately he has too much purple in the strike zone…
I just want to say...
that I like the colors. Nice job.
We want to build long period of time. I didn’t come here for the shot run.
Michael Kay
always says lefties love the low and inside ball. That would seem to be true for Votto, judging from the contact graph.
All Things Bubba: Because how can you not love a baseball player named Bubba?
nice work
It helps if the hitter thinks you're a little crazy. - Nolan
by Trei Brundrett on Mar 24, 2009 10:25 PM EDT reply actions
Me likey.
I was going to ask if the darker the color, the more pronounced the effect…and perhaps that can be made clearer.
But once I went to the Vlad pages, all became clear. I guess you pitch Vlad away, on the outside edge, and don’t let it drop too low.
Or you can walk him. Not a bad idea.
It's all fun and games until someone gets herpes. - Fox 4 News

by 
















