A crumb of a clue on epidemiology

Among the top correlations are the very confusing "dog food recall", "food recall", and "cross stitch pattern".
I gotta say this is very funny.

I am intrigued by the correlation with ADHD, but the even higher correlation with “dog food recall” should probably keep me humble.

That dog food thing has got to be a fluke I guess? But what a fun story it’d be if it turned out to reflect something real.

(And btw thanks for all the effort and for sharing all these results.)
 
I was talking about the British - ME/CFS correlation with my brother, and he said something like "aren't those basically the states that have the most white people?" and suggested that it could be basically how white people may get diagnosed at different rates from other races due to cultural differences.

So to double check, I controlled for proportion of a state that is white when predicting ME/CFS searches ("Topic" version) using either English proportion or "migraine aura" searches.

Previously when I correlated ME/CFS searches against all the variables from Correlates of State Policy, one of the largest negative correlations was "nonwhite_mcent" which is "Percentage of voter registrants that are nonwhite". So I tried with that as a covariate, and I also tried with percent of a state that is white from the DP05 table from the US census.

And still, English ancestry and migraine searches are significant predictors of ME/CFS searches even when controlling for either of those white proportion variables..
Predictor: English proportion
Covariate: Nonwhite percentage from Correlates of State Policy
Outcome variable: ME/CFS searches
1776035718287.png

Predictor: English proportion
Covariate: White percentage from DP05
Outcome variable: ME/CFS searches
1776035748587.png

Predictor: "migraine aura" searches
Covariate: Nonwhite percentage from Correlates of State Policy
Outcome variable: ME/CFS searches
1776036136378.png

Predictor: "migraine aura" searches
Covariate: White percentage from DP05
Outcome variable: ME/CFS searches
1776036154229.png

If I include all of English proportion, "migraine aura" searches, and white proportion as covariates, English and migraine are still both significant predictors of ME/CFS searches:
1776036317565.png

So it seems that the white proportion of a state does not explain either English proportion or migraine searches correlating with ME/CFS searches.
 
In colloquial language terms, is it worth trying
knackered
Shattered
Puggled
Deid

I’m sure there’s a massive influence on results of English being the lingua Franca as well as English-speaking countries being generally populous and with good access to decent internet and healthcare (apart from America, but white privilege probably means a lot of white people earn enough to buy insurance).
 
Last edited:
Can we maybe get an interim Discussion section, @forestglip? What do you think might be implied from the things you are seeing, and to what degree of probability? What if you saw it, would make the conclusions stronger? what would falsify some of these premises?
There seem to be a few main possibilities that all seem plausible:
  1. People with British or Irish ancestry have a higher genetic predisposition to ME/CFS.
  2. States with higher British or Irish ancestry have higher levels of an environmental risk factor for ME/CFS (e.g. infections).
  3. People with British or Irish ancestry tend to be diagnosed with ME/CFS at higher rates due to cultural differences.
  4. Populations in states with higher British or Irish ancestry tend to use the term "fatigue" while other states use other terms (something like how some state populations say "soda" and some say "pop"). (Very similar to possibility 3.)
  5. Some other confounder we haven't thought of explains the association.
I really don't know if we can say with any confidence which of these is most likely. I think we have good reason to try to explore the genetic possibility through other avenues, but can't be very certain just based on what we've seen so far.

I haven't thought of how, but I think it'd be good to further explore the possibility of difference in terminology used in different states to describe "fatigue" or "tiredness".

Seeing similar correlations of ancestry with ME/CFS searches in other countries would be interesting, though I'm not sure how much it would help narrow down if it is genetic or something else. If there was no correlation in a different country, that'd probably make an environmental/cultural factor much more likely in the USA.

I'm not really sure what the "optimal" way to test the genetic possibility would be, given unlimited resources. Maybe a population study, similar to Leonard Jason's studies, and in which the researchers diagnose people as part of the study. They would also ask people their ancestry, or better yet, identify everyone's ancestry through genome sequencing, and see if the risk of ME/CFS is higher in those with British or Irish ancestry. If there are cultural/language differences, I think it's possible this might still bias who gets diagnosed in the study, so the researchers would need to be very careful with diagnosis. If there was no difference in incidence/prevalence in such a study, that would probably suggest that it's not genetic. If there was a difference, there'd still be the question of whether it's due to environmental or genetic factors.

Maybe it could be tested with the DecodeME data. Even though everyone has very similar ancestry in that study, there will probably still be some differences, so maybe they could test if those people with greater similarity to a reference British population have higher risk. I have no idea how feasible that is.

I guess there's also the question of what the implications would be if there was a higher genetic propensity in those with British or Irish ancestry. It's not obvious to me how knowing this would help with identifying causes. Maybe prioritizing studying these populations? I guess we lucked out that DecodeME was done in the UK, in case there actually is a higher genetic risk in these ancestries.



* Also, I think I have been using the wrong term, since Irish ancestry seems to also be correlated. I was using "British" because of these populations being from the "British Isles", but maybe "British or Irish" would be better.
 
I haven't thought of how, but I think it'd be good to further explore the possibility of difference in terminology used in different states to describe "fatigue" or "tiredness".
I did a couple of little tests of the idea more formal terminiology could explain the phenomenon: I tested trends for heart attack vs cardiac infarction; and for short sighted vs myopia. no match to the me/cfs fatigue-vs-tired pattern. Not an exhaustive test of the idea.
 
I wonder what would happen if you compared the results for people reporting British/Irish ancestry with those reporting Scandinavian, and those reporting German?

Genetic markers associated with Scandinavian heritage seem to be retained at high levels in British/Irish people, possibly due to the fact that their ancestors lived on islands. So theoretically, British/Irish ought to be more similar to Scandinavian than German.

It holds true if you believe the results of those ancestry DNA companies—though they never say quite how they can tell the difference or how reliable it is.
 
I wonder what would happen if you compared the results for people reporting British/Irish ancestry with those reporting Scandinavian, and those reporting German?

Genetic markers associated with Scandinavian heritage seem to be retained at high levels in British/Irish people, possibly due to the fact that their ancestors lived on islands. So theoretically, British/Irish ought to be more similar to Scandinavian than German.

It holds true if you believe the results of those ancestry DNA companies—though they never say quite how they can tell the difference or how reliable it is.
Those companies constantly update their information as well. I’ve had several updates and at one time was showing some German ancestry which then disappeared!

For me it’s not that serious to do the “where am I from” dna because I knew it already, it was confirmed by the dna test, and I’m not especially bothered if I’m 2% German or 12% English and 33% Scottish and it changes to 5%English and no German and 40% Scottish…

I can see some might get obsessed with these small changes in mapping if you’ve based your identity on the test results.
 
I wonder what would happen if you compared the results for people reporting British/Irish ancestry with those reporting Scandinavian, and those reporting German?
I'm not sure if this is what you mean, but I previously tested all ancestries' correlations with ME/CFS search interest: https://www.s4me.info/threads/a-crumb-of-a-clue-on-epidemiology.49455/post-683712

It seems that proportion reporting Scandinavian, Swedish, Norwegian, or German are all a lot less correlated to ME/CFS searches than proportion reporting British or Irish is. Scandinavian seems to have a small correlation without controlling for any confounders, but not with controlling. Norwegian and German seem to possibly have small negative correlations when controlling for confounders.

"Northern European" has a high positive correlation though.
 
One thing I was thinking about is that all the British and Irish ancestries probably tend to be in the same states, so, for example, is Welsh correlated with ME/CFS searches just because Scottish is, with the same states having high populations of both?

I tried doing a regression with all of English, Scottish, Welsh, and Irish at the same time to see if any are significant when controlling for the others:
Screenshot from 2026-04-13 09-30-59.png

They're all a lot less significant, maybe partly because of high multicollinearity (high correlation between predictors), though Scottish and Irish squeeze just under p<0.05.

I also tried this at the "Metro" level (groups of counties), as I did previously just for Scottish, in case the larger number of regions allows for detecting associations even if there is multicollinearity.
Screenshot from 2026-04-13 09-42-47.png

Interestingly, in this case, all four are significantly correlated with ME/CFS searches even when controlling for the other three ancestries - except English is negatively correlated, while Scottish, Welsh, and Irish are positively correlated.

Edit: Take this with a grain of salt, since I'm not yet well-versed in linear regression diagnostics, and the diagnostics in the summary for the metro model seem to indicate non-normal residuals, which might be making the results less reliable.
 
Last edited:
It seems that proportion reporting Scandinavian, Swedish, Norwegian, or German are all a lot less correlated to ME/CFS searches than proportion reporting British or Irish is.

Yes, it is odd. The reason I asked is that DNA markers sometimes show low proportions of Irish ancestry in families who're Irish-born at least as far back as all eight great-grandparents—as many as two thirds of their markers can be Scandinavian. That sort of picture doesn't seem to be uncommon, but perhaps the markers don't have an influence on health (or more likely, don't mean anything at all).

Anyway if I had to put a bet on your results, I'd still plump for the outcome being influenced more by cultural factors than genetic ones. Online searches are a cultural activity and could be influenced all kinds of filters and biases.
 
Should we be thinking more about migraines? Autoimmune disease is often discussed as also having a similar sex bias to ME/CFS. But migraine fits that picture as well, and does not really seem to be in the same category as autoimmunity.

The large scale correlation search with other trends scores that I did highlighted "migraine aura" as being highly correlated to "chronic fatigue syndrome" searches at the metro level. Here is the relationship plotted for both the metro and state levels, showing that where searches are higher for CFS, they are also higher for migraine aura.


Since sex is an obvious confounder here, I tried controlling for sex, and also for all of the confounders I looked at previously at the same time (at the state level). Controlling for these doesn't make the relationship disappear.




Study I found from a quick search:

Sex and gender differences in migraines: a narrative review

Web | DOI | PMC | PDF | Neurological Sciences | 2022


Edit: Fixed plot that wasn't showing up.
For migraine, the very high correlation I saw with "chronic fatigue syndrome" was with the search term "migraine aura". "Migraine symptoms" is also high, just not as high.

In terms of sex bias, Wikipedia says that migraine without aura is where there is a significantly larger risk in females, while there is little difference for migraine with aura. So the clue about sex bias supporting the connection between the search terms might not be as strong as I thought.
A population-based study in Denmark suggests the sex difference in attack frequency is largely due to higher rates of migraine without aura (11% in females and 3.59% in males). In comparison, sex differences were not significant in migraine with aura (1.72% in females and 1.58% in males).[88]
 
They're all a lot less significant, maybe partly because of high multicollinearity (high correlation between predictors), though Scottish and Irish squeeze just under p<0.05.
Yes that would be my guess, the standard errors getting inflated is a warning sign. Did you also confirm linearity between the independent and dependent variables?

Lasso regression with all the available ethnicities might be the best way forward (cv.glmnet in R is what I usually use)
 
Did you also confirm linearity between the independent and dependent variables?
Just visually looking at fitted values vs residuals, with a univariate model for each of the four variables, it doesn't look like any huge departures from linearity. Some outliers that might be skewing the slope.
1776108212103.png

Maybe not the cleanest line for Welsh. Here's the raw Welsh proportion plotted against the ME/CFS search interest:
1776108458008.png

Lasso regression with all the available ethnicities might be the best way forward (cv.glmnet in R is what I usually use)
I don't know a lot about this, but my understanding is Lasso regression wouldn't necessarily help tease apart which specific ancestries are or are not responsible for ME/CFS search interest, but rather help create a model that doesn't have a huge number of parameters. So even if Scottish and Irish both have a true underlying correlation with searches, it might drop one if they both provide more or less the same information about searches due to being collinear. Seems useful for building a predictive model, if that's what we wanted, but I'm not sure how much it can tell us about the relationships.
 
Back
Top Bottom