A crumb of a clue on epidemiology

Murph

Senior Member (Voting Rights)
Inspired by @Simon M's bimodal distribution paper I wanted to share a tiny germ of an idea. It is a modest little write-up I did of a very modest little clue about epidemiology.

Basically I eyeballed a correlation in search term usage for mecfs with England and Norway and tested whether a correlation with English heritage would hold across US states. It did. Which suggests, just maybe, me/cfs could be more common in people with English heritage?



Some people on reddit insisted the existence of this post meant my brain was made of swiss cheese. Extra holey type. And that was despite me expending about 60% of the words in the post trying to explain that I actually know this is not definitive.

There's a frustrating dynamic in play when a person shares a clue they themself place low probability on. people come back to me and say but murph, this isn't 100% proof like you seem to think? And I say, welllllll, no. But they can't provide proof that it contains 0% signal either and so the back and forth of trying to disprove it risks turning into an unbecoming muddle. I apologise in advance for that.

That said, I know the history of HIV/AIDS is that following the epidemiology helped them solve it. I wonder if we could also do more on epidemiology. Please consider this as just an opening gambit, a feint, a prompt.
 
First thought is, does 'English' really exist?

People of white European heritage from Britain, Ireland, northwestern Europe and Scandinavia are genetically quite similar. Invasions, wars, natural population mixing etc—apparently it's hard to identify where they're from with any certainty.

The same kind of mixing will probably have been repeated in the white European populations who migrated to north America, Australia and New Zealand.

If I've got that right, we probably can't get much more accurate than 'northern European'.
 
Very interesting!

I tried to see if I could replicate it.

Data I used:
This is looking at the association between the Google Trends value for ME/CFS as an indicator of search popularity in a state, and the proportion of the state's population reporting English ancestry.

You got R^2=0.52. I got R^2=0.31. It looks like the difference may be due to me using newer Google Trends data or a slightly different search term. Vermont is #1 in mine, but #4 in your analysis. But it's still very significant.

1774240337097.png

The census data includes 108 ancestries. I ran the regression on all of them against the same Google Trends values. Here are the top 20 highest R^2 values:
1774240791839.png

Scottish is first with R^2 = 0.49. English is sixth.

Edit: Replaced with bigger images above, and added all ancestries below:
RankAncestryCoefficientStandard ErrorR-squaredP-value
1Scottish0.0005449976771622167.93E-050.4908641.04E-08
2Northern European7.16E-051.12E-050.4540035.99E-08
3Celtic7.62E-061.23E-060.4408371.09E-07
4British9.93E-051.86E-050.3673292.44E-06
5Canadian9.42E-051.84E-050.3495504.92E-06
6English0.002279626737040330.0004846192629022390.3110932.12E-05
7Welsh0.0001284664629859692.83E-050.2961863.66E-05
8French Canadian0.000791016913105570.0001811568273373780.2801126.52E-05
9European0.0002995378833688627.35E-050.2531230.000167968967975688
10African-0.0001269475267328863.16E-050.2476980.000202469202877451
11Irish0.00142403536471350.0003826539000110460.2203580.000510791251471821
12Other groups-0.005030776549871110.001390390989314430.2108450.000700710815353213
13French (except Basque)0.001021793072387710.0002838879286665380.2091010.000742275722392602
14Swedish0.0004622470183479370.0001325874255107290.1987530.00104305167393203
15Scandinavian0.0001620965070600044.67E-050.1976250.00108224951170263
16Latvian6.08E-062.06E-060.1514840.00475997725533723
17Finnish9.88E-053.40E-050.1467180.00553220766775087
18Australian4.73E-061.71E-060.1356820.00782435026505045
19Swiss6.34E-052.30E-050.1344110.00814188957003827
20New Zealander2.08E-067.86E-070.1252020.0108571360959814
21Danish0.0002065493586207188.06E-050.1181570.0135226027907983
22Jordanian-5.45E-062.13E-060.1173490.0138668116293763
23Austrian2.44E-059.84E-060.1116740.0165440524925274
24Palestinian-8.71E-063.92E-060.0915830.0308887870162486
25Estonian1.63E-067.54E-070.0867000.0359582204625218
26Egyptian-1.93E-059.58E-060.0765290.0493963099404976
27Lithuanian3.88E-051.92E-050.0764050.0495889271190848
28U.S. Virgin Islander-1.53E-067.79E-070.0726260.0558249838954956
29Bahamian-4.84E-062.62E-060.0650740.0708244783172688
30Eastern European4.22E-052.35E-050.0620530.077940273261879
31Slavic4.95E-062.75E-060.0617250.0787570425513355
32Arab-8.79E-064.96E-060.0603170.0823634815839269
33Russian8.93E-055.14E-050.0581060.0883801529465509
34Icelander1.18E-057.02E-060.0542640.0999600930781502
35Yugoslavian1.00E-056.01E-060.0534620.102573621693047
36West Indian-1.51E-059.14E-060.0527700.104887512991284
37Basque1.25E-058.07E-060.0465000.128577235891286
38Greek4.59E-052.97E-050.0464990.128583658018922
39Scotch-Irish7.50E-054.87E-050.0462090.129807107320796
40Belizean-2.25E-061.49E-060.0443460.137998617280404
41Nigerian-3.48E-052.34E-050.0431550.143528926959175
42Ukrainian3.83E-052.65E-050.0408990.154677806266161
43West Indian (except Hispanic groups):-0.0001669055298557530.0001199467888502230.0380130.170360292444778
44Moroccan-6.11E-064.39E-060.0380080.170389700892927
45American-0.0004419406177021670.0003211184028458260.0372160.175000162803091
46Armenian1.93E-051.49E-050.0329610.202298799131889
47Norwegian0.0005877697965820290.0004608070138732160.0321360.208138378420421
48Jamaican-6.57E-055.23E-050.0312250.214812465921621
49Other Arab-1.17E-059.38E-060.0308240.217827927509659
50Arab:-5.61E-054.54E-050.0302390.22231486927353
51Cypriot-4.95E-074.05E-070.0296300.227102930923734
52Trinidadian and Tobagonian-1.29E-051.06E-050.0294240.228750379930393
53Other West Indian-3.93E-073.27E-070.0287220.234473285545592
54Ghanaian-1.01E-058.47E-060.0283300.237739509167253
55Haitian-5.88E-054.97E-050.0277420.242744519493416
56Guyanese-1.51E-051.41E-050.0228520.28964502995771
57Zimbabwean-5.04E-074.74E-070.0225930.292430062597306
58Maltese-2.32E-062.19E-060.0223600.294960339738259
59British West Indian-6.19E-065.93E-060.0217710.301484651227002
60Ugandan1.47E-061.47E-060.0201840.319986427774549
61Subsaharan African:-0.0001138921242014610.0001170250893678130.0189640.335219986647394
62Bermudan-5.21E-075.56E-070.0176370.352874188832797
63Somali1.90E-052.04E-050.0174810.355038513299539
64Macedonian-2.68E-062.92E-060.0168980.363256351507785
65Assyrian/Chaldean/Syriac-6.34E-067.31E-060.0151120.390120086141124
66Cape Verdean2.77E-053.41E-050.0133450.419528577245975
67Unclassified or not reported-0.0004199054275693090.000516579793274990.0133050.420232143608361
68Alsatian-2.06E-072.64E-070.0122430.43952993625448
69Italian0.0003460649099747860.0004570441108327160.0115650.452567486585777
70Cajun-1.26E-051.75E-050.0104800.474698962193852
71Israeli-2.83E-063.98E-060.0102090.480509790005687
72Portuguese0.0001015210888597910.000148802006681980.0094100.498289625362481
73Serbian-3.05E-065.11E-060.0072180.553348902261824
74Turkish-4.35E-067.42E-060.0069770.56006976506054
75Dutch5.65E-059.84E-050.0066670.56895850827549
76German0.0006159013541541330.001088853575129360.0064870.57421835502114
77Barbadian-1.86E-063.34E-060.0063110.579487464851524
78Sudanese3.21E-065.97E-060.0058680.593144466632387
79Czechoslovakian2.94E-065.57E-060.0056640.599673148649729
80Other Subsaharan African8.10E-061.68E-050.0047500.630847811108338
81Slovak-1.19E-052.55E-050.0044290.642630722585144
82Sierra Leonean-1.28E-062.76E-060.0043920.644025063490686
83Hungarian-1.35E-053.08E-050.0039280.662183702128348
84Iraqi-3.07E-067.38E-060.0035100.679649182405239
85Kenyan1.83E-064.50E-060.0033770.685433337239352
86Lebanese6.48E-061.59E-050.0033680.685847049430261
87Belgian5.71E-061.42E-050.0032750.689966730754611
88German Russian2.75E-067.60E-060.0026770.71843559650475
89Romanian3.01E-068.69E-060.0024470.730304122880722
90Polish7.61E-050.0002272203991392970.0022830.739188147611896
91Pennsylvania German-4.87E-061.47E-050.0022440.741326957199527
92Albanian-3.14E-061.03E-050.0019040.761072313248967
93Dutch West Indian-9.95E-073.27E-060.0018830.762405665483838
94Iranian3.68E-061.35E-050.0015230.785703522018689
95Bulgarian5.71E-072.89E-060.0007930.84444091884464
96Liberian2.17E-061.12E-050.0007730.846393172545874
97Czech1.47E-057.56E-050.0007700.846761875068168
98Brazilian5.66E-062.97E-050.0007410.84963956239547
99Croatian1.72E-069.26E-060.0007030.853430949989788
100Syrian-8.38E-075.02E-060.0005690.868027847075342
101South African2.83E-071.82E-060.0004970.876562908845692
102Afghan-8.77E-076.36E-060.0003890.890795423904812
103Carpatho Rusyn-5.94E-084.37E-070.0003780.892353515530601
104Ethiopian-3.88E-063.05E-050.0003300.899312925343401
105Slovene8.90E-077.50E-060.0002870.90604431573313
106Soviet Union2.76E-083.42E-070.0001330.935912872501683
107Senegalese-1.10E-071.87E-067.04E-050.953401362724816
108Luxembourger6.91E-094.00E-066.10E-080.998627069715394

Edit 2: I updated both tables to include the coefficients. The association with African ancestry is a negative association.
 
Last edited:
It’s either genetic or to do with vitamin D.

Asians and blacks get it less than whites. Tropical countries have it less too.

I’m one of the few unfortunate Asians.
 
I quickly did the same experiment for Ebola and HIV/AIDS and the results weren't as bad as I initially would have thought but there seems to be a ton of noise that is probably impossible to properly filter out, unless one has some smart ideas, but I thought it was quite interesting nonetheless.

I suppose one could try to see if filtering out some peaks might give a clearer situation if one thinks that peaks might be more likely to be related to things such as hearing something on the radio or TV rather than googling one's own symptoms (there were some large documentaries in Germany and Germany seems to have a lot of peaks [but I haven't compared it to other countries], funnily there's also a peak in 2004 and the continous upwards trend after 2019 might be more related to media attention then true ME/CFS rate following Covid). Similarly Czechia has one single Ebola search peak in 2005 which means it's a top 3 country in the last 12 months for searches. I would also suspect that search engine optimisation could create quite drastic changes, for example a person googling symptoms similar to those of ME/CFS in South Africa might land on an HIV/AIDS website and then start googling HIV/AIDS whilst in Norway they might land on a Norwegin ME/CFS website of similar.
 
Last edited:
Interestingly there doesn't seem to be an uptick surrounding the XMRV saga or at least I can't see it and if I've gotten the dates right, I guess that never lead to much googling? There's a big uptick in August last year, maybe related to DecodeME? A fun tool to play around with @Murph !
 
Last edited:
I think MS is well known to be more common in white northern Europeans than in people whose ancestry is closer to the Equator. Maybe ME/CFS is similar in that respect. Can you run a correlation between searches for ME/CFS and searches for MS?

For some reason Mississippi has extremely high searches for MS, making the regression not very reliable:
1774272068465.png

Without Mississippi:
1774272332486.png

So not a very strong correlation between MS searches and ME/CFS searches.

Here is the regression predicting proportion with English ancestry from MS searches, and a table with the top 20 ancestries:
1774272609909.png


Results for MS are not as significant as with ME/CFS. Irish is the highest positive association. Australian is a negative association.

Edit: Oh ha. Mississippi's searches for MS are extremely high because the state abbreviation for Mississippi is MS.
 
Last edited:
Does this mean that concerns about underrepresentation of ethnic minorities in the UK is misplaced? Are there simply a much smaller percentage affected ?


Is this the same in Long covid (MEtype)?
 
Interesting theory, I think it would be something that needs to be explored directly with genetic data where you can do an admixture model for ancestry, since Americans of European descent who aren't 2nd or 3rd generation are likely to just be guessing based on their last name (Smith being the most common, especially in Utah where many Mormons adopted the name from Joseph Smith). All of Us probably has diagnostic labels that you could cross reference with ancestry, though it would be even more likely to be diluted with "chronic fatigue" cases

Utah's searches seem to be 100% driven by Batemann Horne center being located there.
 
Last edited:
I was wondering if it might be that states with larger populations with UK ancestry have a larger English speaking population, so might be more likely to search for "ME/CFS".

I found census data for "Language spoken at home" for each state: https://data.census.gov/table/ACSST1Y2024.S1601?q=language&g=010XX00US$0400000&tp=false

There isn't a very strong correlation between the ME/CFS search trends and speaking English at home:
1774286675823.png

And when controlling for the proportion of the state that speaks English at home, the association with proportion that have English ancestry is still very significant (p=0.000133)
1774286943845.png
 
I was wondering if it might be that states with larger populations with UK ancestry have a larger English speaking population, so might be more likely to search for "ME/CFS".

Would that necessarily hold? My cousin's wife speaks Polish most of the time, but she often uses English search terms because large parts of the internet are heavily dominated by content written in English.
 
Would that necessarily hold? My cousin's wife speaks Polish most of the time, but she often uses English search terms because large parts of the internet are heavily dominated by content written in English.

Definitely doesn't necessarily hold.

But I thought it was possible that people who speak another language at home might either be less likely to have heard the term ME/CFS, or less likely to search for it in English.

So I wanted to rule that out as an explanation for states with higher British ancestry populations searching for ME/CFS more often. And it indeed seems that that is not the explanation for it.
 
Back
Top Bottom