The census data includes 108 ancestries. I ran the regression on all of them against the same Google Trends values. Here are the top 20 highest R^2 values:
This was showing lots of UK countries at the top, and I realized that it may be because people could report multiple ancestries.
From an
overview of these datasets:
The ACS asks each respondent to write their ancestry or ethnic origin, and records up to two ancestries per person (the first two ancestries written by the respondent).
The table for
People Reporting Single Ancestry (B04004) shows data for those who reported only one ancestry, while
People Reporting Multiple Ancestry (B04005) shows data for those who reported more than one ancestry.
People Reporting Ancestry (B04006) shows data for those who reported any ancestry, regardless of whether it was the only ancestry or part of multiple ancestries they reported.
Note: this means that values in B04005 and B04006 will not necessarily add up to match totals, because one person may be represented under more than one ancestry.
I (and I think Murph) used B04006, which counts up to two ancestries per person. So, for example, the correlations for Scottish and English might both be high because the same people reported both ancestries.
I tried again with
B04004, which only includes people who reported one ancestry, to avoid double counting and allow better comparison between ancestries.
In this case, the correlation with English is pretty much gone (the sample size is also much smaller for this dataset, so the ancestry values may be less precise):
There are still some large correlations in this analysis, and British is still near the top at #9, though less significant, with an R^2=0.26 and p=0.0018.
The top 5 correlations with Google Searches for ME/CFS are Northern European, European, Swedish, Icelander, and New Zealander.
Here for example is the plot of the top correlation, Northern European ancestry vs Google Trends for ME/CFS:
(Note that only 35 states were included as the rest had missing data.)
However, it looks like the correlation might be mainly driven by just the five states in the upper right. If I exclude those, the correlation is much less apparent:
