Red Reporter: An SB Nation Community

Navigation: Jump to content areas:



Around SBN: Falcons and Chargers Recap: The Win Is The Thing Bar-right-arrows



Sabermetric Overview Series Part V: Correlation

Correlation

Our Sabermetric Overview Series has lost some momentum (which doesn't really exist in baseball, but nevermind) so I thought I'd kick start it with a big diary about the C-word.

Baseball statheads love saying the word correlation. Sabermetricians toss around "correlation" like Joe Morgan does "consistency" (in terms of frequency, not accuracy). But what does it mean? Let's get to the bottom of it and lay the groundwork (read: ruin the surprises) for further installments in this series.

! A Note: This is about as math-y as the new baseball thinking gets. I mean it's scary `rithmatic--we're talking weird stuff like "linear regression" and "covariance". If you want to do cool original baseball research these days you have to know how to do this stuff. Fortunately you don't need to know much math to understand the concepts. Take me. I have no idea how any of this stuff works and I'm writing a Goddamn primer on it. That's why I'm carefully citing all the charts and figures here. I didn't do any of this work (with one exception). I'm just cribbing it from people who kindly posted their efforts on the internet. My prediction: this diary will have enough weird concepts to freak out mathophobes while having plenty of incorrect information to piss off the people who actually know about this stuff. And away we go!

Correlation (or the correlation coefficient) describes a relationship between two variables. In our case the variables are going to be baseball statistics--strikeouts and runs scored or BA w/RISP '06 and BA w/RISP '07. It could be any two sets of data ("Things", for our purposes). Statisticians have many different ways to measure the relationship between the two Things. The most common is called the Pearson product method, named for some guy called Pearson (or maybe some other guy called Galton). This is what the guts of what we're talking about looks like:

 

Don't worry if you don't understand this! Nobody does. Scientists have been studying it for years and so far they have only concluded that math is hard.

The important thing is that you get something designated `r' or the correlation coefficient that describes the relationship. Let's talk real world and use visuals.

Here's a chart of boys' heights vs their age. As you can see there's some positive correlation. As the boys get older they generally get taller. The dots kind of go from the bottom left to the upper right. Sometimes statisticians will draw a "best fit line" through the data to give you the idea. Now here's a chart of boys' heights vs the month in which they were born.

It's just a big blob! There's no correlation, and why would there be? What month a boy is born has no bearing on how tall he'll be. Remember, when you see a big blob there's no correlation (these charts courtesy of Science Buddies.org: Free Science Fair Project Ideas, Answers and Tools for Serious Students1).

The correlation coeffecient, r, is expressed as a number between -1 and 1. If there's no correlation it's 0. So if the correlation is between 0 and -1 (say -0.25) there's an inverse correlation. If it's between 0 and 1 (say 0.58) there's a positive correlation. The closer to 1 or -1 the stronger the correlation.

Now let's talk baseball (finally). There are many nifty things we can do with this. One is seeing which stats correlate best with scoring or preventing runs so we know how best to judge players. Here are the correlation coefficients between run scoring and various offensive measures (courtesy of Dan Fox at the HardballTimes2)

BB   0.590
HR   0.719
AVG 0.843
OBP  0.910
SLG  0.913
OPS  0.955
RC    0.964

RC is Runs Created, a fancy saberstat. As you can see all of these measure have a positive correlation with run scoring. The more advanced measures paint a fuller picture. Some data--say GIDP--would have a negative correlation with run scoring and the numbers would be below zero. SPOILER ALERT: strikeouts are right about at zero when it comes to scoring runs.

Another useful way to use this tool is to check for correlations between a statistic in one year and that same statistic the next year. If a player has a skill, like Adam Dunn's home run power, it will strongly correlate year-to-year. If a player's stats are the result of normal statistical fluctuation (luck, to use a loaded term) they won't correlate year-to-year--think Bronson Arroyo's home run power. Here's how some basic batting stats correlated (r-squared) from 2005 to 2006 (courtesy of David Appelman at FanGraphs3):

AVG  .12
OBP  .36
OPS  .36
SLG  .38

Batting average correlates only a third as well as the others. It really fluctuates. Remember in Bull Durham when Kevin Costner made that speech about how if he could just get one extra flukey hit a month he'd hit .300 instead of .250 and be in the show? He's talking about batting average's low correlation year-to-year even if he doesn't know it. If Crash's GM were savvy he'd look beyond the average (inflated on balls in play) and see that his slugging and on base skills were the same. Maybe in September, Crash. SPOILER ALERT: any sort of stat designed to measure clutchness--hitting with RISP, Late Inning Pressure Situations, October accomplishments--do not correlate from one year to the next. This is why Stats geeks say clutch is not a skill.

Turning to pitching stats, here's a graph showing the correlation of one year's ERA to the next year's (courtesy of JC Bradbury at the HardballTimes4):

As you can see from the best fit line the blob is slightly moving in the right direction. The r-squared is .13, about the same as batting average--better than nothing but we can do a lot better. Here's the same thing with strikeout rate:

Oh yeah! Now there's some correlation. Point-six-one baby! So we just learned that if we're trying to predict a pitcher's future performance we'd be much better off looking at his K-rate than ERA. SPOILER ALERT: stats that are independent of the pitcher's defense--K-rate, BB-rate and HR-rate--are the basis of a nerdy way to evaluate pitcher's called DIPS (Defensive Independent Pitching Stats).

If you'd like to do this wizardry yourself you easily can in Excel (here comes the original research of this piece). Just click on the paste function button, under Function Category select Statistical, under Function Name select Pearson (Galton?), then in the popup highlight one column or row of data for Array1 and another set of data for Array2. The result is the correlation between your two Things. Square it for the r-squared. Now I can tell you that in the Joe Morgan Red Reporter Fantasy League, the correlation between a team's standing and its number of roster moves is a whopping .73!


These Things are closely correlated...

Update [2007-8-7 9:57:58 by Red Menace]: I forgot to mention the important maxim that correlation does not equal causation (technically, as Gray points out, one should say correlation does not imply causation, because sometimes correlation does equal causation). Or to be properly anal: empirically observed covariation is a necessary but not sufficient condition for causality. Example: there's a strong correlation between ice cream sales and drownings. If one forgot that C does not imply C, one would say ice cream causes drownings. In fact both are affected by a third factor: hot weather. For a baseball example there's a slight positive correlation between strike outs and run scoring, but you shouldn't tell all your hitters to try to strike out more. Both strike outs and run scoring seem to be the result of going deep into counts by waiting for a pitch to drive.

0 recs | Comment 45 comments

Read Related

Story-email Email Printer Print

Comments

Display:

Outstanding
That wasn't scary at all.  Very smartly done.  Where did you go to school again?
Quick! Somebody make a Cincinnati loves Ken Griffey Jr. too! video

by TheC on Aug 7, 2007 8:00 AM EDT   0 recs

mmmm, numbers

Good job, Menace.  This article has an r-squared of .92 with my understanding of correlation.

I'm a numbers freak, numbers freak. I'm numbers freaky, ow.

by Slyde on Aug 7, 2007 8:34 AM EDT   0 recs

Nice writeup of this...
As someone studying econometrics like mad for one of my two PhD comprehensive exams this month, I can't help but make a few comments.

There are some assumptions made in deriving this correlation coefficient.  Most importantly, the only relationship we're looking at is a linear one.  If the relationship between two things isn't linear, this will only pick up some component of it, and not the "true" relationship.  Like, say you were looking at age and batting average.  I would assume that batting average tends to increase with age for some age range, then decrease.  This is not a linear relationship.  Reducing the relationship to a linear one will affect your conclusions about it.

Also, this measure assumes that both variables are normally distributed, i.e. if you were to plot the values of one, they would generate a shape like a bell curve.  This can be something of a problem if the distribution is actually pretty heavily skewed towards one side, or truncated at some value.  I'm sure there are some decent examples in baseball, but I'm not coming up with any at the moment.

And finally, always, always remember that correlation does not imply causation.  Even if two factors are perfectly correlated, you cannot conclude that one of them causes the other. Red Menace suggested nothing of the sort, but I feel that it probably needs to be mentioned anyway.

by Gray on Aug 7, 2007 9:48 AM EDT   0 recs

I forgot to mention
but I'm adding C / C.
Reeeddd Men-ace

by Red Menace on Aug 7, 2007 9:52 AM EDT to parent up   0 recs

Forgetting C/C may be the reason
that Wayne picked Pete mariachi over you for the Red's manager's job RM. Although it is apparent after this post that should have gotten the job.
They say it is all over in the blink of an eye. Don't Blink.

by Madville on Aug 7, 2007 12:19 PM EDT to parent up   0 recs

RE-TYPE
Forgetting C/C may be the reason

that Wayne picked Pete Mariachi over you for the Red's manager's job RM. Although it is apparent after this post that you (RM) should have gotten the job.

They say it is all over in the blink of an eye. Don't Blink.

by Madville on Aug 7, 2007 12:20 PM EDT to parent up   0 recs

YIKES
you lost me at

Hope Springs Eternal! Go Reds

by Caleb on Aug 7, 2007 10:11 AM EDT   0 recs

It's not that difficult.
You've just got to assign "extraordinarily quick 150 lb fireplug frame" for "r".

Menace, I enjoyed this diary.  I was a little bit hungry and this hit the spot.  Filled me right up. Dessert?  Oh, thank you but I am stuffed.  No really, I couldn't eat another bite.  I think I need to go lie down.  Where's your restroom?

Rose's talk was "weird" - something that "might have been appropriate for a Kiwanis Club, but not for kids."

by Fat Vegas Alan on Aug 7, 2007 5:04 PM EDT to parent up   0 recs

my favorite part was when
you blew up that helicopter by ramping a car into it.  that was sooooo tough and cool.

wait, that wasnt you?  oh, nevermind...

you know who the reds need? lebron james.

by Charlie Scrabbles on Aug 7, 2007 11:10 AM EDT   0 recs

Hey Menace
Thanks for this post.  Next, you should tackle the other "C" word ("clutch"), which is what I thought originally this diary was about when I first saw it.

Could you show the scatter-plots for OPS vs. runs scored and strikeouts vs. runs scored?  Also, could you also show a scatter-plot for strikeouts by pitchers versus runs allowed?  I just thought it would be good to show why statisticians think strikeouts for hitters are not that bad, while strikeouts for pitchers are good.  That's the intuitive roadblock for a lot of people who bash Dunn for his strikeouts.

Also, as Gray mentioned, relationships are not always linear.  Is there any non-linear relationship between strikeouts and runs scored?

It's also interesting that OPS actually does correlate better with runs scored than does either OBP or SLG, even though OBP correlates with SLG to a certain extent, and you might get some effects of multicollinearity.

Please exercise Adam Dunn's 2008 option.

by Paul Householder on Aug 7, 2007 11:28 AM EDT   0 recs

thanks
I'll check for the scatter-plots you're looking for. I don't have any handy because like I said I'm just pulling these from articles I've found. Maybe we need a whole diary on strikeouts, although I don't look forward to that one.

I changed correlation to the c-word at the last minute in a fit of ribaldry, but you're right. Clutch is the word (is the word... is the word...).

Reeeddd Men-ace

by Red Menace on Aug 7, 2007 12:26 PM EDT to parent up   0 recs

How does momentum not exist?
"Always root for the winner. That way you won't be disappointed." -Tug McGraw

by Zach K on Aug 7, 2007 11:31 AM EDT   0 recs

When there is a lot of inertia.
Mass is a measure of inertia.

An object at rest will remain at rest unless acted upon by an external and unbalanced force.

Please exercise Adam Dunn's 2008 option.

by Paul Householder on Aug 7, 2007 12:04 PM EDT to parent up   0 recs

When involved in a faith vs. reason treatise
the only stuff that exists is what one either believes to exist or can empirically prove to exist.
I mean, after all: correlation does not imply causation. And we all know what that means in the discussion of whether CLUTCH is a verb or a noun.
Momentum exists, I saw it in the dictionary.
They say it is all over in the blink of an eye. Don't Blink.

by Madville on Aug 7, 2007 12:07 PM EDT to parent up   0 recs

big mo
I was speaking kind of freely. There's the old saying "momentum is only as good as the next night's starting pitcher" (screw numbers, old baseball aphorisms are where you find the truth). Players certainly get locked in, but I'm pretty sure it's been shown that there's little carry over effect during the course of a game or a season for a team. Baseball is composed of so many seperate events, unlike 'flowing' sports like soccer or basketball.
Reeeddd Men-ace

by Red Menace on Aug 7, 2007 12:30 PM EDT to parent up   0 recs

you are right to point out that
a baseball game is composed of fragmented events often either disjointed  or linked by how time and action are related (causlality). In soccer or basketball (esp. soccer because there are no 'time outs')being in the flow zone is crucial to utilizing momentum positively.
They say it is all over in the blink of an eye. Don't Blink.

by Madville on Aug 7, 2007 1:20 PM EDT to parent up   0 recs

Ok I agree with that
I thought you ment for all sports!
"Always root for the winner. That way you won't be disappointed." -Tug McGraw

by Zach K on Aug 7, 2007 2:34 PM EDT to parent up   0 recs

Yes, although...
What people will often interpret as "momentum" within a game is really just a pitcher getting tired or just plain sucking.

Example:  Most teams the Reds play have a lot of momentum in the 8th inning, it seems.

Another Example:  Most teams the Reds play have seemed to gain momentum every time Gary McJeffsky pitches.  They have less momentum when Coutlangus pitches.

Please exercise Adam Dunn's 2008 option.

by Paul Householder on Aug 7, 2007 2:37 PM EDT to parent up   0 recs

Yeah I agree with that haha
When I played football or wrestled Momentum would change all the time, for example there was some crazy stat our coachs brought in like whoever go the first takedown in wrestling wins the match 84% of the time or something like that. The reason? you build momentum after you take your man down first!
"Always root for the winner. That way you won't be disappointed." -Tug McGraw

by Zach K on Aug 7, 2007 2:41 PM EDT to parent up   0 recs

Or
that the better wrestler or team of wrestlers often gets the first take down and usually wins the match/meet.
"Two Dunn's enter, but only one Dunn will leave...unless neither do because they decide to play cards, drink beer, golf, and fish."--SlydeFrog

by Man Mountain on Aug 7, 2007 6:24 PM EDT to parent up   0 recs

This reminds me...
..of the whole "team that wins the first two games of a seven game series wins the series X% of the time."

Very often the team that wins the first two games of a series is winning those games because they are simply the better team. So if the series were to be played out to a "best of 61" it would still be likely that the team that won the first two games would win the series.

Also (in every sport except baseball) the championship series usually begins at the home of the team with the better record (which is usually an indicator that they are the better team) so if a team goes up 2-0 before taking the series on the road they are, as the saying goes- "doing what they are supposed to be doing" but they are also doing what they might have been expected to do regardless of where the series began.

Rose's talk was "weird" - something that "might have been appropriate for a Kiwanis Club, but not for kids."

by Fat Vegas Alan on Aug 7, 2007 6:45 PM EDT to parent up   0 recs

this reminds me...
of why I've tried to forget all the things a high school or youth sports coach has ever told me. Except how to pop a bra strap in no time flat.
"Two Dunn's enter, but only one Dunn will leave...unless neither do because they decide to play cards, drink beer, golf, and fish."--SlydeFrog

by Man Mountain on Aug 7, 2007 6:51 PM EDT to parent up   0 recs

This reminds me...
the sports columnist at Salon (who is excellent, by the way, though I haven't read Salon in ages) used to complain about ESPN statistics which would often flash up during hockey games: the team that scores first wins X% of the time.  Well, sure, because the team that scores first is now up by a goal.

He ran the numbers and actually concluded that if you want to go with this argument, you want to be the team that scores...oh, I can't remember which.  Maybe second or third.  But you know, "Score second!" doesn't seem as good of a motivator...

by Gray on Aug 7, 2007 7:28 PM EDT to parent up   0 recs

This reminds me..
..that coincidently just earlier today I linked to the Salon sports columnist.

Crazy, dude.

Rose's talk was "weird" - something that "might have been appropriate for a Kiwanis Club, but not for kids."

by Fat Vegas Alan on Aug 7, 2007 8:28 PM EDT to parent up   0 recs

Crazy.
I took this opportunity to search through his old articles.  I liked this, from a later article that discussed this in the context of ESPN's World Cup coverage:
But the message for ESPN is that it's showing the wrong graphic.

If a goal is scored and ESPN flashes a graphic saying, "Teams that have scored first are 22-3-3," I, the typical American sports fan who doesn't care about soccer, will think, "Well, there's about a four-in-five chance that this baby's over. I believe I will turn off the TV, kick my dog, curse some foreigners and play with my assault rifle."

But if that graphic said, "Teams that have scored second are 17-2-3," I'm going to want to stick around to see which team can come up with that all-important tally. Better for me, better for ESPN and way better for the dog.

by Gray on Aug 7, 2007 8:47 PM EDT to parent up   0 recs

This reminds me...
I knew guys in high school who were convinced that the key to winning in FIFA on PS1 was to score the third goal.
Reeeddd Men-ace

by Red Menace on Aug 7, 2007 10:08 PM EDT to parent up   0 recs

This reminds me...
Rose's talk was "weird" - something that "might have been appropriate for a Kiwanis Club, but not for kids."

by Fat Vegas Alan on Aug 7, 2007 10:53 PM EDT to parent up   0 recs

Oh yeah?
Well, you seem to think about .. your mom! ..often.

Ha!

Rose's talk was "weird" - something that "might have been appropriate for a Kiwanis Club, but not for kids."

by Fat Vegas Alan on Aug 7, 2007 11:44 PM EDT to parent up   0 recs

You could say that
if you are in a tought match, every point matters, so getting the early 2 points is a big boost. However I was not very good on top so guys would always escape and get one point. Thats why I never went to state haha
"Always root for the winner. That way you won't be disappointed." -Tug McGraw

by Zach K on Aug 7, 2007 8:25 PM EDT to parent up   0 recs

Momentum....
"How does momentum not exist?"

This is the limitation of the quantitative approach.  Periodically, it assumes, or those who believe in the central theorem assume, that if you can not measure something, it lacks validity.  This is why the use of the scientific method outside of the hard or real sciences has limitations, even in the world of baseball (I see it in my discipline every day [much more extensively than in baseball]).  Of course, for those who played, there is such a thing as momentum, but it is oftentimes a state of mind or psychological.  It is the same phenomenon as those who say there is no such thing as a clubhouse chemistry (strange to give such a subjective concept a scientific name).  Well, Albert Belle and Dave Kingman never won a WS, but on the other hand look at the self-hating, clubhouse poison of the Yankees in the late '70s.  It cuts both ways, and anytime when you hear someone in baseball (like in economics, politics, or anything) say, "on the other hand," you know you have a problem.

by tonywf on Aug 8, 2007 9:22 AM EDT to parent up   0 recs

it doesn't really
put me any closer to a concrete understanding of correlation...but I like to think of it in terms of variance (covariance?) and standard deviation. It at least feels like I understand the  root of the concept.

Nicely done, in any case/

Everybody's a jerk. You. Me. This jerk.

by andromache on Aug 7, 2007 1:35 PM EDT   0 recs

Well...
When looking at two variables, x and y, the r squared (which is actually what he's referring to in the examples up there) is the fraction of the variance in y that is accounted for by a linear fit of x to y.  That is, you have some model, y = ax + b + error.  You find a and b.  The fraction of the variance in y explained by that model is the r squared, or coefficient of determination.  The square root of that is r, the sample correlation coefficient.

by Gray on Aug 7, 2007 1:41 PM EDT to parent up   0 recs

You really should have handled this one
I think my biggest mistake of ommission was only focusing on linear relationships (because bell curves are fun).
Reeeddd Men-ace

by Red Menace on Aug 7, 2007 1:54 PM EDT to parent up   0 recs

No, I'm sure I would have botched it...
Plus, I never use Excel!  And if it can't be expressed as a linear relationship, it's not worth studying...or at least it's going to  be hard to study in Excel.  :-)

by Gray on Aug 7, 2007 2:32 PM EDT to parent up   0 recs

Evaluating pitchers
First, good article.  Thanks for taking the time to create and post this diary.

Second, my question is in regards to pitcher analysis.  Are any of the batting metrics useful in evaluating/projecting pitchers?  For example, batting average against, OPS against, RC against, etc.  A pitcher making 30 starts can face about 800 batters in a season.

Also, seems that the correlation of pitching stats may vary between starters and relievers.  It seems that ERA is a better indicator of a starting pitcher's ability, but I wouldn't use ERA to judge a reliever.  The WHIP stat has been one I've looked at a lot for relievers.  For example, look at Gary Majewski's "great" 2005 season.  A WHIP of 1.48.  That's a lot of baserunners for a reliever to allow.  Throw in the 7 batters he hit with a pitch, and it bumps to 1.55 runners per inning.

by omnired on Aug 7, 2007 1:40 PM EDT   0 recs

pitchers
This is not exactly what you're asking about, but if you follow link #4 above you can see a scatter-plot of BABIP for pitchers. It's only .06, which confirms a central tenet of DIPS, that pitchers have little control over what happens once a ball is hit (there is still controversy surrounding this in the stats community).

I don't have any data handy right now, but I imagine BA against isn't that consistent year to year for the same reason as for hitters.

Reeeddd Men-ace

by Red Menace on Aug 7, 2007 1:59 PM EDT to parent up   0 recs

Correlation question
I've always thought that reaching due to an error should count in a player's OBP.  I think that a shortstop or third baseman is more likely to Felipe-Lopez a play with Hunter Pence flying down the line vs. Griffey trotting down the line.

Is there any correlation between the number of times a player reaches due to an error and speed?  

Maybe there's a correlation between how hard a player hits the ball and how often he reaches on an error.

Just a thought.

The face of a child says it all, especially the mouth part

by JJ on Aug 7, 2007 5:08 PM EDT   0 recs

no sure
The problem there would be defining speed. You need a set of data points. You could use stolen bases but we all know veteran players can often steal based on savvy more than raw speed. I'm pretty sure there's some sabery way researchers define speed for studies like this (involving triples) but it's not written on the back of my hand.

A not very scientific way to get started would be to look at the reach-on-error leaders and see what type of players they are. I agree there would probably be more Reyes types than Howard types.

Reeeddd Men-ace

by Red Menace on Aug 7, 2007 6:25 PM EDT to parent up   0 recs

Speed and power
and it looks like it helps to be scrappy too.  Here are the leaders in Reached on Errors from this season and the previous five seasons.

2007
-----------
Derek Jeter      15
Placido Polanco   9
Shane Victorino   9
Brandon Phillips  9
Julio Lugo        9
Jose Guillen      8
Travis Hafner     8
David DeJesus     8
Randy Winn        8
Johnny Damon      8
Jose Reyes        8
Ryan Theriot      8

2006
-----------
Kenji Johjima    13
Juan Pierre      12
Carlos Beltran   12
Ichiro Suzuki    11
Josh Willingham  11
Brandon Inge     11
Clint Barmes     11
Adrian Beltre    11
Jay Payton       10
Brian Anderson   10
Adam Everett     10
Orlando Hudson   10

2005
-----------
Jason Kendall    15
Freddy Sanchez   13
Jose Reyes       12
Derek Jeter      11
Jose Guillen     11
Jack Wilson      11
Grady Sizemore   11
Carlos Beltran   11
Chone Figgins    10
Garrett Atkins   10
Johnny Damon     10
Adrian Beltre    10

2004
-----------
Miguel Tejada    16
Ichiro Suzuki    15
Albert Pujols    14
Derek Jeter      13
Juan Pierre      13
Alex Rodriguez   12
Luis Castillo    12
Brian Roberts    12
Chipper Jones    11
Mark Loretta     11
Carl Crawford    11
Angel Berroa     11

2003
-----------
Ty Wigginton     15
Aaron Boone      13
Craig Biggio     13
Miguel Tejada    12
Cristian Guzman  12
Ken Harvey       11
Dave Roberts     11
Marquis Grissom  11
Joe Randa        11
Ichiro Suzuki    10
Casey Blake      10
Juan Pierre      10

2002
-----------
Sammy Sosa       13
Shea Hillenbrand 13
Jeff Kent        13
Craig Biggio     13
Rondell White    13
Junior Spivey    12
Vinny Castilla   12
Michael Young    11
Aaron Boone      11
Jacque Jones     11
Jeff Cirillo     11
Randy Winn       11
I'm a numbers freak, numbers freak. I'm numbers freaky, ow.

by Slyde on Aug 7, 2007 8:36 PM EDT to parent up   0 recs

Ahem... Park adjusted?
Rose's talk was "weird" - something that "might have been appropriate for a Kiwanis Club, but not for kids."

by Fat Vegas Alan on Aug 7, 2007 8:41 PM EDT to parent up   0 recs

I don't get it
I'm a numbers freak, numbers freak. I'm numbers freaky, ow.

by Slyde on Aug 7, 2007 8:44 PM EDT to parent up   0 recs

The Joy of Stats............
I am impressed.  This is an excellent article and very good job of explaining the statistical components.  It is not that hard, once you get used to it (and stats is always more fun when you are rationalizing variables on something you take a personal interest in).  It is a lot more boring coding that junk on SPSS in the social sciences.

by tonywf on Aug 8, 2007 9:09 AM EDT   0 recs

Correlation vs. Causation
The traditional "maxim" about correlation not meaning causation is a frustration to me as a scientist and educator.

It's true that you can get better evidence about causation by doing an experiment.  But just because correlations between two valuables can be caused by a third causal factor doesn't mean that they always are.  Therefore, one can test hypotheses using correlational evidence.  If that weren't true, we wouldn't be able to do any hypothesis testing in baseball.  Period.  And clearly that's not the case, even though all we have to work with are correlational data.

The key issue is one of timing.  If I create a correlation matrix of 20 different variables and I find that one or two of the variables are correlated, I can't then say that one causes the other.  All I've done is make the observation that one variable is related to the other.

However, if I make an a priori hypothesis that variable A causes change in variable B, and then I do a study and find a correlation between the two variables, I can certainly say that the I've supported my hypothesis that variable A causes change in variable B.  Yes, an experiment would be better evidence, but experiments aren't always logistically possible (e.g. unless we work for a team, we can't actually manipulate how teams execute game strategy in some sort of controlled manner).  

The key point is that experiments are not the only way that one can test a causal hypothesis.  Yes, one must be tentative about findings from correlational studies because of the possibility of confounds from other variables.  But then again, because an experiment can never be perfectly controlled, one has to be tentative about findings from any experimental study as well...
-j

by JinAZ on Aug 10, 2007 12:11 PM EDT   0 recs

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about Cincinnati Reds.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Images_small
SIS - favorite US city(Continental 48 only)
Images_small
Just thought I'd share the good news with everyone...
1330_small
What if there had always been two divisions....
Anime_small
Spring training
Reds_small
Report: Reds in discussions for Jermaine Dye
Happyhanukkah_small
Seriously, a serious issue
Images_small
Mackanin Gets Job With Phillies
Dunn_small
Glenn Sample Passes at age 77
Images-3_small
Island Update: Soto to be pitchng coach in baseball classic, Vólquez is is signed up to play
Images_small
SIS --- Best Burger Edition

Post_icon New FanPost All FanPosts Carrot-mini


Managers

Dunn_small Slyde

Dump_11_may_096_small Fat Vegas Alan

Prd_brick1_small Rick House

Burger-king_small BK

Happyhanukkah_small Brendanukkah

ad

Site Meter