If correlation does not imply causation, then what does?

Ice cream does not cause increased murder rates!

Whether it be in your high school statistics class or in regular conversation, everyone at some point has heard the famous phrase, “correlation does not imply causation.” Economists, biologists, and the media alike cite it to acknowledge potentially spurious results; after all, with so much data becoming regularly available, seemingly arbitrary things could easily trend up or down together without actually being directly related. In other words, some events of interest may be correlated without any causal direction between them. For instance, we could measure the amount of ice cream sold over the summer and would find that as sales skyrocket, evidently so do murders [1]. Naively, we may think that increased ice cream sales induce homicidal tendencies. But this is obviously not the case, and many researchers have actually shown that it is the warmer weather that influences both murder rates and ice cream sales [2]. However, this does raise an important question: if correlation cannot imply causation, then what does?

While there may not always be a clear-cut answer, it should be clear that making a definitive causal claim is extremely valuable: if we know the reason behind some event, then we may have the power to manipulate the results. For instance, if it were true that a delicious frozen dessert caused murders, then we could stop selling them to save a lot of people. Belaboring a point about ice cream and summer killings may seem silly, but when we apply the same idea to say, pharmaceuticals or policies, we quickly see the importance of distinguishing between correlation and causation. We would want to make sure that our drugs truly save lives, and that legislation actually influences real change. Alternatively, if we knew that a medicine or policy caused harm, we could save either lives, money, or both by getting rid of or reforming it.

So, how do we normally assess causation?

The gold standard way to prove causality is to execute a randomized controlled trial, where we randomly assign people to either a treatment or a control group. The treatment group receives some sort of intervention, like a new medicine, and the control group does not. Because these groups were assigned randomly, we can safely claim that any effect we observe is because of whatever intervention was given. More specifically, the random assignment erases any hidden variables that could influence a correlation. For example, let’s say we desire to measure the effect of a new treatment for back pain. As shown in Panel A, we would first recruit people with back pain and then split them randomly into two subgroups, where half would receive the treatment and the other half would receive a placebo. We could then measure the amount of back pain that each group experiences on average and compare what we observe between the groups. If we see that our treatment group experiences much lower pain, then we can claim that our medicine is successful at alleviating pain, while if we see that both groups are the same, we know then that our new drug doesn’t work.

A. Randomized controlled trial: The gold standard for causality

It is often the case, though, that randomized trials are infeasible or unethical. To see why, let’s return back to our starting example of ice cream and homicides. Even if we had reason to test whether or not ice cream sales caused murders, it would be impossible. We would have to randomly assign a group of people to buy ice cream and then test whether or not they will be a murder victim — no one in their right mind would ever agree to participate, and such a study would never be approved by any institution. Similarly, if we want to measure the effect of say, having insurance from Medicaid on survival outcomes, it may be unethical to only give a portion of people insurance and put the other group’s lives on the line (although interestingly, there have been studies doing very closely related things! [3]).

What else can we do then?

Luckily, researchers have developed workarounds to this issue in the form of studying natural experiments, or situations where the intervention we aim to study is administered naturally and outside the control of experimenters. Natural experiments could be anything from policy changes to game shows and have been shown to be very useful in estimating causal effects without sacrificing our ethical concerns [4].


One such method is called difference-in-difference, a widely used approach to measure the effect of an intervention like a policy change such as raising the minimum wage on an outcome like employment [5]. The idea is as that we can look at our outcome over the period of time before and after the intervention takes place and record the difference between places where the intervention occurred and where it did not before it happened and compare that original “difference” to the “difference” between those same places after the intervention. Importantly, we assume that should that intervention never have happened, each location would trend similarly and the difference between them would be relatively constant. Let’s reconsider our ice cream and murders example. Imagine a hypothetical scenario where only in California, we issue a ban on ice cream sales in the summer. We can then measure the murder rates in California and other states before and after our ice cream ban and compare each difference as shown for our experiment in Panel B. If we find that there is a significant difference in differences between before and after our ban (as in the solid magenta line), then we can claim that it was actually the ban on ice cream sales that reduced murder rates. On the other hand, if we do not see a difference in differences (as in the dashed magenta line, which is expected given our knowledge on ice cream), then we cannot claim causality for ice cream.

B. Difference-in-difference method for an ice cream ban in California

Regression discontinuity

Some interventions, though, are administered based on threshold values and require a different type of method to assess causality. As a final example, consider another fake case of an ice cream ban, where now the ban only applies to adults greater than 35 years old. In this case, we can employ an approach known as the regression discontinuity design, which is where we examine the murder rates right at the threshold value of 35 years and see if there is a difference in rate from just before someone turns 35 and just after. Presumably, someone who is 34 years and 11 months old is not that different from someone who is 35 years and 1 month old, so if we find that there is a sharp decrease or “discontinuity” in murder rates right at that boundary, then we can say that our ice cream sales were in fact causing higher murder rates. On the other hand, if we do not see such a sharp change, then we cannot claim causality. In Panel C, we see these two possibilities in an actual graph, where the solid black, “Actual discontinuity” represents the case when an ice cream ban actually may cause reduced murder rates and the lighter gray, “No discontinuity” represents the expected outcome, where an ice cream ban really does not have any effect on murder rates whatsoever.

C. Regression discontinuity method for the case of an ice cream ban at 35 years old

These two non-experimental methods are often used across research in epidemiology, statistics, and political science, and understanding them is critical to make causal claims. Most of the time, we do not care about ice cream, but rather want to understand how our health care coverage, medical treatments, or local policy effects health or livelihood. Using these techniques gets us one step closer to answering those concerns without performing questionable experiments, and ultimately showcases when, under crucial assumptions, correlation may in fact imply causation.


[1] https://www.buzzfeednews.com/article/kjh2110/the-10-most-bizarre-correlations

[2] https://www.nytimes.com/2018/09/21/upshot/a-rise-in-murder-lets-talk-about-the-weather.html

[3] http://dx.doi.org/10.1056/NEJMsa1212321

[4] https://www.annualreviews.org/doi/full/10.1146/annurev-publhealth-031816-044327

[5] https://www.nber.org/papers/w4509

Sameer Sundrani is a junior at Stanford University. He is majoring in Biomedical Computation and is passionate about medical innovation and health policy.