Friday, April 3, 2009

Correlation Does Not Imply Causation. And...?

Since I started grad school four years ago, I've noticed that the general public is much more attuned to idea of correlation not always implying causation. Of course, the indoctrination is not complete just yet, and there are plenty of instances where an association is mistaken for something more, but the fact that people are becoming better consumers of statistics is gratifying. I attribute this to the spate of popular press economics and statistics books/blogs in the last few years (though I might be in danger of confusing correlation and causation myself by saying this!)

The standards in empirical research reflect how seriously people are taking this motto: finding a clever instrumental variable or even experimental variation is no longer good enough. Papers without extensive "robustness" checks and falsification tests have less credibility now than they would have even five years ago. This, like the trend in the general public, is a good development.

However, with these positives come some more troubling tendencies. Specifically, I have a beef with the overuse of the causation-correlation dictum. Now, anybody can bring down a paper simply by saying "correlation does not imply causation" without having to provide a reason why this might be the case. For example, I am working on a paper looking at the long-run causal effects of birth year exposure to a clean water and sanitation efforts (I'll post a link to this paper in a month or so when a good draft is ready). I have a plausible identification strategy, and also include all sorts of controls, trends and falsification checks in my analysis to further establish causality. My results check out.

However, someone recently remarked told me that I should be concerned about omitted variables. When I pressed her on what these might be, she wasn't sure but commented that "there are always omitted factors."

Clearly, this isn't helpful. It's really easy to look/sound clever and point out that correlation does not imply causation: it is technically a true statement! But I think people who make this claim should talk about how it applies to the analysis at hand (i.e., have some kind of model or story that makes more explicit the nature of the potential biases and where they come from). Otherwise, the statement by itself is pretty uninformative and does little to advance our knowledge.

2 comments:

Jeremy said...

I really like this post. Theres some great discussion in the new Agrist and Pitschke book where they discuss the difference between saying a coefficient is significant conditional on the Xs or conditional on some assumptions, and just to generally understand what is being estimated and what assumptions this entails. They say that "correlation does not imply causation" is the simple remark that the naive observer will make. I heard some discussants at MPSA last few days go into length about ommitted variables, overcontrolling, measurement error, etc. My coment was just to include some fixed effects, get some polynomial time trends interacted with the time variant characteristics, and think of a good falsification test that will help make the models more convincing. Otherwise the conversation on "you could have ommitted variables, etc. etc." will never end.

Anonymous said...

I concur; a very good point indeed, Atheen.