Spurious Correlations And Other Statistical Tricks

I know this has been making the rounds on the blogosphere, 'this' being the subject of spurious correlations, by which two entirely unrelated statistics appear to have some kind of correlation, with the trends of one appearing to some how affect the other.

We've seen this before over the years where politicians, activists, and/or media will use these correlations to 'prove' some point or another and using that correlation to justify the need to “Do Something!” (I generally find that the furor generated by such folks to “Do” that “Something” is inversely proportional to it's actual importance.) As has been stated again and again, correlation does not imply causality, meaning that just because two sets of data appear to track each other does not automatically mean that one causes the other. In other words one set of data implies the cause and the other set implies the effect.

Something that illustrates this concept are the charts shown at the link above that show correlation of a series of unrelated data that could be used to imply all kinds of tenuous links, such as the divorce rate in Maine being affected by the per capita consumption of margarine in the US, the import of oil from Norway tracking the number of drivers killed in a collision with a train, or how the number of honey producing bee colonies in the US ties in with the number of juvenile arrests for possession of marijuana in the US.

Another bit of statistical legerdemain is to use two correlating statistics and arranging them such that statistical cause and effect are reversed.

One of the most recent and most blatant use of this statistical trick is the link between global average temperatures and CO2 levels. When AGW proponents try to prove their case, they show the upward slope of CO2 concentrations in the atmosphere 'matching' the upward slope of temperatures. They point to the graph and proclaim “See, the increasing CO2 levels are causing a rise in temperature!” and demand the world does something to stop it. But what these same folks don't show you is the beginning points of both sets of data, where the downward slopes of both datasets level out and then start their upward swing. Those points are the references one should use to determine cause and effect. If the AGW proponents included those points in their graphs much of their arguments would be blown out of the water because their graph would show that temperatures started rising well before the CO2 levels did so. But because their charts show at best a couple of hundred years of data and ignore the reference points they give a false impression that effect is really cause. The chart doesn't show the offset of the two reference points that show the implied cause of rising CO2 levels is rising temperatures and not the other way around.

Another little statistical trick sometimes used to justify some 'cause' is to take two sets of correlating data and claiming that the first caused the second when in fact both were caused by a third factor totally ignored by those trying to create a certain impression. A perfect example was the much hated and maligned National Maximum Speed Limit imposed by the government at the behest of a bunch of busybodies who decided they were going to save our lives even though it wasn't necessary.

The proponents of the NMSL took two statistics, the number of traffic fatalities and the average speed limits and 'proved' that the lower speed limits imposed during the Arab Oil Embargo back in 1973/1974 led to lower traffic fatalities in the US. But what they conveniently ignored were the number of passenger miles traveled during that time which had greatly decreased because of the scarcity of fuel. Taking a look at the traffic fatality rate – the number of fatalities per millions of passenger miles – there was no statistically significant change seen, meaning the real reason for the drop in traffic fatalities was the decrease in the number of miles traveled. The 10% fewer fatalities correlated directly with the 10% decrease in the number of miles traveled which in turn meant there was no change in the fatality rate.

So the next time someone tells you that statistics show that X correlates with Y, and therefore X must be causing Y, take it with one heck of a huge grain of salt because they're probably lying to you...and maybe themselves.