Correlation is not causation

Friday, October 20, 2017
1 mins read

We often calculate correlation during EDA (Exploratory data analysis) to check how strongly two variables are correlated to one another. It’s tempting to assume that one variable causes the other. But correlation proves causation is a logical fallacy known as Cum Hoc, Ergo Propter Hoc1.

For example,

As ice cream sales increase, the rate of drowning deaths increases. Therefore, ice cream causes drowning.

In the above example, the month when the sale of ice cream is high plays a significant role. The ice cream is sold at higher rate in the hot summer than during the winter. The people are more likely to go for a swim in the summer thus are more prone to drowning. Hence, the above statement is false.

Have a look at another example2.

The correlation between divorce rate in Maine and per capita consumption of margarine tempts us to believe that consumption of margarine causes divorce, which is incorrect.

On the other hand, Tufte states that saying Correlation is not causation is incomplete3. According to him, the shortest true statement that can be made about causality and correlation is:

Correlation is not causation but it sure is a hint. — Tufte

xkcd: Correlation

Resources:
1: Cum Hoc, Ergo Propter Hoc: With this, therefore because of this, Latin
2: Spurious correlation
3: Correlation does not imply causation - Wikipedia

You May Also Like

comments powered by Disqus