17 January 2011

The geography of mental health

Over at The Atlantic there is an article mapping the geography of gun deaths. They go one step further and create the below list of correlates with deaths from gun violence that I added a big red box to for you.

What's going on here? Didn't I make a big fuss about there being a real correlation between mental illness and violent crime? Yes, and I am still correct. First of all, they're comparing gun deaths, not violent crime. More importantly though, the article is the bar room equivalent of blindfolding yourself, and spinning around while throwing darts wildly in the hope that a few hit the board.

Part of the problem is that we have no idea what data sets Florida and Mellander used (they do not say). This is particularly important for measuring mental illness because it is very unevenly "distributed" in this country. I put distributed in quotes because existing data sets all purport to detail this distribution, but they all come up with wildly different results. For example the SAMSA study uses real field interviews and comes up with this map of "Serious Psychological Distress in Past Year among Persons Aged 18 or Older":

However, if you were to use the CDC's NHIS HRQOL data set the prevalence changes dramatically (they don't have any pretty maps of mental distress, although they do ask questions about it). Without cluttering this post up too much here is a link to a csv I made comparing the 05-06 SAMSA SPD results (the data for the above map) to the 05-09 HRQOL results for measure #7 ("Percentage with 14 or more mentally unhealthy days (Frequent Mental Distress)"). The R2 between the two is a paltry 0.07.

The details of that comparison aren't very important but they do evidence my point. The multitude of studies on mental illness in this country point to widely divergent results. In large part this is due to the highly variable quality of surveillance and identification of mental illness or psychiatric distress. Below is a map of the states colored according to their Gross State Product, it correlates reasonably well to the DHHS SAMSA data, but terribly to the CDC HRQOL data. It isn't hard to speculate that richer states have more resources to expend surveilling and identifying those in psychiatric distress, however the speculation is pointless because it only correlates somewhat with a particular data set. Comparing it to the other destroys any correlation that existed with the other, even though both sets purport to be examining frequent psychiatric stress (although their methods differ). (The R2 between a state's SAMSA rank and its 05-09 average GDP ranking is 0.24; the R2 between a state's HRQOL rank and its 05-09 average GDP ranking is 0.06).

Can I explain the wild divergence in those two R2s? No, I doubt anyone can. This is the same problem that plagues voodoo correlation in brain imaging studies. Grab enough data points and you can surely find some correlations.

One ought to be wary of correlations when the data sets are not rigorously selected and one ought to be downright suspicious of correlative revelations that do not include their data sets.

