A crumb of a clue on epidemiology

The catch-all term Google trends is showing me now is CFS/me and the subscript the browser shows under that term is not Topic, it now says Disability.
Yeah, mine has always said "Disability". In a video about Trends, they seem to imply that any word under the term other than "Search term" means a Topic.

Interesting that your R2 is higher.

Just to clarify, the trends page you're using says "CFS/ME" and not "Myalgic encephalomyelitis/chronic fatigue syndrome"? Can you link to that page?

And you're doing a correlation against the percentages from the World Population Review website?
 
Yeah, mine has always said "Disability". In a video about Trends, they seem to imply that any word under the term other than "Search term" means a Topic.

Interesting that your R2 is higher.

Just to clarify, the trends page you're using says "CFS/ME" and not "Myalgic encephalomyelitis/chronic fatigue syndrome"? Can you link to that page?

And you're doing a correlation against the percentages from the World Population Review website?
Will get you that information in a bit;
Meantime I've also found the Australian census has extremely detailed ancestry and place of birth data, combined with some health data. It's not going to be definitive because mecfs is not listed. But I will pre-specify the hypothesis I will set out to test.

If there is a stronger association between English heritage and "other health conditions" than between other heritages and "other health conditions". Putting extra weight on association among females.

And using association with other health conditions, e.g. stroke diabetes mental health, as a baseline.

And after doing any coldly rational bonferroni adjustments I will also probably do some highly motivated data mining!
 
Last edited:
I have enjoyed drilling into the Australian census data : it's telling me a lot about the health risks of Englishness! But not too much about me/cfs.

At first I thought the "other health issue"category correlated quite well with English heritage...

But then I checked and it probably wasn't even as strong as the association with arthritis.

So then I got a bit systematic, and looked to see what really was most associated with English heritage, controlling for age. Turns out it is early onset arthritis. Also early onset mental health issues. (English heritage is strongly negatively correlated with kidney disease, apparently).

It is fascinating and it certainly seems to stand up the idea that people from certain backgrounds can be more and less prone to certain diseases but it doesn't shed light on me/cfs directly nor does it say why. So I think it is worth leaving there - unless anyone has any hypotheses they would like me to dig into? Feel free to ask!
 
I think what could be interesting is looking for the best correlations of ME/CFS searches with other searches. For example, if the states that search most for ME/CFS also search most for "mono".

There's no publicly available API for Google Trends, though, so data would have to be downloaded one at a time through the browser. The idea I had was to start by looking at the correlation of ME/CFS searches with searches for 10 random words. For the most correlated search term, find 10 concepts related to that search term, with something like a thesaurus, and test correlation with those to see if any are even better. And keep iterating.

But I think without being able to automate doing that for thousands of search terms, it might be too slow to be fruitful. But maybe still worth testing with some hand selected terms.
I managed to do this, using gtrendsR, as suggested by Murph. I decided to have an AI code basically the whole script, and I am pleasantly surprised by how well it works.

Essentially, the script is testing how well search interest in "chronic fatigue syndrome" correlates to search interest for other terms. I chose the specific search term "chronic fatigue syndrome" (without quotes) because I worry the ME/CFS Topic includes unrelated terms, and searches for the specific term "ME/CFS" are much less common than for "chronic fatigue syndrome" (henceforth referred to as CFS).

I decided to use the scores for metro regions instead of for states because there are more of them (~210 vs 51), making the statistics more precise, and allows comparing a larger variety of regions.

The algorithm used is as follows:
  1. I have a huge list of words as "seeds". The script starts with a seed word, say "elephant", and then tests how well search interest correlates between "elephant" and CFS.
  2. If this initial correlation passes a lenient threshold of p<0.01, then the script identifies 10 related search terms. Google Trends helpfully provides related terms, which are accessible in the gtrendsR results.
    • For example, for "elephant", here are the first 3 related terms it returns: "white elephant", "baby elephant", "drunk elephant".
  3. Then the script tests correlation of CFS with each of those terms, and identifies the most significant correlation with CFS out of these 10 terms (let's say "baby elephant"), and if that term is also more significant than the "parent" term ("elephant"), then 10 more related terms are retrieved that are related to this new term.
    • For example, related terms for "baby elephant" include: "baby elephant baby shower", "baby elephants", "baby elephant walk"
  4. Then the script tests correlation of CFS with these 10 new terms, and on and on, until none of the 10 new terms is more significant than its parent term.
  5. Then the script goes on to the next seed word and starts all over. The correlation statistics for each term are saved to a file.
Here for example is the log from two seed words (the seed list used at this point was specifically a list of common diseases):
Code:
=== Seed 2/201: 'uveitis' ===
  [depth 0] 'uveitis'
    r=0.565  p=5.5e-19  n=209
  [depth 1] Scoring 10 children of 'uveitis'
    'uveitis eye'
      r=0.483  p=2.2e-12  n=188
    'anterior uveitis'
      r=0.371  p=1.8e-06  n=157
    'uveitis symptoms'
      r=0.463  p=1.3e-09  n=155
    'uveitis treatment'
      r=0.307  p=3.6e-04  n=131
    'what is uveitis'
      r=0.401  p=1.5e-06  n=135
    'uveitis causes'
      r=0.365  p=8.9e-05  n=110
    'uveitis dogs'
      r=0.324  p=1.8e-03  n=90
    'iritis uveitis'
      r=0.033  p=7.6e-01  n=91
    'uveitis pain'
      r=0.060  p=6.3e-01  n=68
    'uveitis in dogs'
      r=-0.043  p=7.4e-01  n=62
    Best child p=2.2e-12 did not beat parent p=5.5e-19, stopping
 
=== Seed 12/201: 'kidney infection' ===
  [depth 0] 'kidney infection'
    r=0.210  p=2.3e-03  n=209
  [depth 1] Scoring 10 children of 'kidney infection'
    'kidney infection symptoms'
      r=0.334  p=7.5e-07  n=209
    'kidney infection pain'
      r=0.395  p=3.1e-09  n=209
    'uti'
      r=0.058  p=4e-01  n=209
    'uti kidney infection'
      r=0.269  p=8.5e-05  n=208
    'symptoms of kidney infection'
      r=0.325  p=1.7e-06  n=208
    'kidney infection back pain'
      r=0.402  p=1.7e-09  n=208
    'back pain'
      r=0.269  p=8.3e-05  n=209
    'kidney stones'
      r=0.315  p=3.5e-06  n=209
    'kidney infection signs'
      r=0.363  p=7.2e-08  n=208
    'signs of kidney infection'
      r=0.384  p=1.1e-08  n=207
    -> Descending into 'kidney infection back pain' (p=1.7e-09)
  [depth 2] Scoring 1 children of 'kidney infection back pain'
    'kidney infection back pain location'
      r=NA  p=NA  n=0
    No valid children at depth 2, stopping

For "uveitis", none of the 10 related terms was more significantly correlated than "uveitis" to CFS, so it stopped there. For "kidney infection", the related term "kidney infection back pain" was even more significant, so it identified the related terms to this new term, of which there was only one, and for that term, there was too little data to run the correlation test.

It takes about 2 to 5 seconds per term, and if retrieving too many results too fast, it starts giving errors.

So far, I've tested the correlation of CFS with around 4800 other terms. I've used some seed words that are totally random words, some that are diseases, and some that are body parts.

Here are the top 50 most significant correlations so far. (I'll attach a file with the full results.)
TermRR2P valueNumber of metro areas testedSearch depthParent term
chronic fatigue syndrome0.9820.9652.61E-1522092fibromyalgia syndrome
fibromyalgia syndrome0.7710.5952.67E-412031fibromyalgia
chronic pain0.7460.5562.46E-382093chronic fatigue syndrome
migraine aura0.7410.5491.82E-372081migraine
dog food recall0.7290.5325.98E-362091recall
ocular migraine0.7280.5309.25E-362091retinal migraine
chronic fatigue symptoms0.7300.5327.54E-352023chronic fatigue syndrome
plantar warts0.7200.5181.22E-342092warts
Raynaud's0.7160.5133.62E-342090
raynaud's0.7160.5133.62E-342090
food recall0.7130.5089.85E-342091recall
rotator cuff injury0.7120.5062.04E-332082rotator cuff
cross stitch pattern0.7090.5033.21E-332092cross stitch
tendonitis treatment0.7110.5053.56E-332071tendonitis
adult adhd0.7080.5014.28E-332092adhd test
celiac disease symptoms0.7080.5014.82E-332092celiac disease
hip arthritis0.7060.4987.97E-332091arthritis
celiac disease0.7050.4971.01E-322091disease
kidney disease symptoms0.7040.4961.30E-322091chronic kidney disease
ms multiple sclerosis0.7020.4922.63E-322091multiple sclerosis
symptoms lactose intolerance0.7000.4894.73E-322091lactose intolerance
lewy body dementia0.6990.4895.10E-322091dementia
pneumonia vaccine0.6920.4793.99E-312091pneumonia
assisted suicide0.6920.4784.62E-312091assisted
trigeminal neuralgia0.6880.4731.25E-302090
raynauds0.6870.4721.99E-302081raynaud's
elbow tendonitis0.6850.4694.10E-302081tendonitis
migraine symptoms0.6820.4665.48E-302092ocular migraine
fibromyalgia pain0.6820.4655.99E-302091fibromyalgia
food allergies0.6810.4647.44E-302091food allergy
cross stitch patterns0.6810.4647.74E-302092cross stitch
onion recipes0.6810.4648.17E-302091onion
disease0.6810.4648.18E-302090
liver failure0.6790.4601.49E-292091liver
autoimmune disease0.6780.4601.79E-292091disease
free cross stitch0.6770.4583.22E-292082cross stitch
celiac disease test0.6780.4605.67E-292052celiac disease
dog treat recipes0.6790.4616.41E-292042dog treat
big toe joint0.6730.4531.59E-282061big toe
knit blanket0.6690.4471.97E-282091blanket
shoulder rotator cuff0.6680.4462.52E-282092rotator cuff
bursitis hip0.6650.4424.63E-282091bursitis
posts0.6650.4424.85E-282090
winter scenes0.6710.4518.28E-282021scenes
thumb joint0.6620.4389.98E-282094thumb pain
spleen pain0.6620.4381.09E-272091spleen
bursitis0.6590.4342.01E-272090
fatigue symptoms0.6590.4342.29E-272093chronic fatigue syndrome
symptoms of celiac disease0.6620.4383.58E-272052celiac disease
spleen symptoms0.6570.4324.51E-272081spleen
Search depth of 0 indicates that this was a seed term. 1 means it was a related term of the seed word, and so on.

Notably the correlation of CFS with itself is not 1. This is because the search score data isn't the same every time, as previously discussed. But reassuringly, searches for fibromyalgia are most significantly correlated with CFS, which makes sense.

Among the top correlations are the very confusing "dog food recall", "food recall", and "cross stitch pattern".

Some of the diseases highly correlated in search interest to CFS are "ocular migraine", "plantar warts", "Raynaud's", and "tendonitis".

It is interesting that multiple sclerosis is highly correlated to CFS here, but was not when I tested previously. I think that is because I previously tested the ME/CFS "Topic" vs the multiple sclerosis "Topic". We know that the Multiple Sclerosis Topic includes the short abbreviation "MS", since Mississippi (abbreviation MS) has extremely high scores, so it's possible that searches for "MS" make the Topic scores less true to searches specifically about multiple sclerosis.

Here are plots for a couple of the most significant terms. (Note that the data in the plots may not be identical to that used in the initial correlations because they are based on re-downloaded data.)
1775825006006.png1775825022023.png

Some potential other things to try:
  • Use states or countries instead of metros.
  • Set target term to "chronic fatigue" or the ME/CFS Topic.
  • Test correlations using Spearman's rho instead of Pearson.
I uploaded the script to GitHub if anyone else wants to try experimenting with it.

Now just need to figure out what ME/CFS, migraines, and cross stitching have in common...

Edit: The time span used for all trends data was 2004-01-01 to 2026-03-24.
 

Attachments

Last edited:
The correlation of English ancestry with early onset arthritis in Murph's Australia census analysis, and the many diseases highly correlated to ME/CFS in my USA Google Trends analysis made me wonder if places with British people have higher rates of many diseases, whether through genetic or diagnostic reasons, and ME/CFS is just one among many.

I tested correlation of search interest in various diseases by state against English and Scottish ancestry, as we did before for ME/CFS and ancestry. I picked a few diseases at random, some diseases that were highly correlated to ME/CFS searches above, and arthritis ("early onset arthritis" doesn't have enough data in Google Trends).

And yet, ME/CFS search interest still has the top correlation out of all these diseases, with both ancestries. It's possible there are other diseases that might do even better, so if anyone has any suggestions, I can test them.


* Capitalized disease names are Google Trends "Topics" that include related terms, and lower case names are just for the specific search term.

Edit: Updated to include chronic fatigue, chronic pain, and fibromyalgia. "chronic fatigue" is a higher correlation than ME/CFS for English ancestry.

Edit: We can see here how much the Google Trends numbers change when re-downloading the same data at a different time. R^2 for the same Scottish ancestry data correlated against "chronic fatigue" was previously 0.60, but here, with newly downloaded chronic fatigue data, it was 0.46.
 
Last edited:
Type1 narcolepsy?

Relatively low correlations of Scottish or English ancestry with these four search terms:

1775916391867.png

1775916360300.png

Note that the data for "type 1 narcolepsy" and "type 2 narcolepsy" might not be very reliable, because there looks to be very little search volume for these, and unlike most searches, states only have one of three values for these two searches: 100, <1, and No Data. I replaced <1 with 0 in the analysis.

Here for example is "type 2 narcolepsy" vs Scottish ancestry:
1775916651039.png
 
I tested the above suggestions, alongside ME/CFS, "tired", "always tired", and "fatigue" as well, for English, Scottish, and Irish ancestry.

I sorted by Pearson, but also tested Spearman to see if large outliers are masking an association (which seems to be the case for Mississippi being an outlier for the Multiple sclerosis Topic, making MS the highest Spearman correlation with Irish ancestry).

These are all newly downloaded Trends scores, so they may not match exactly to previous results.

Also note that there is very little variety in search scores for the "haemochromatosis" spelling (the only scores are either 100, 50, or <1), so that one is probably not very reliable.

And again, the terms that are capitalized are "Topics", so they contain multiple related terms.

So at least some of these seem to produce the expected strong correlations, especially the Irish/hemochromatosis relationship.

It's interesting that "chronic fatigue" has a large correlation with English and Scottish, but "tired" and "always tired" do not. Maybe suggests that it is more about the language used in states that have higher British ancestries?
 
Something weird seems to be going on with West Virginia (WV). I keep seeing WV listed as the number one ranked state by search interest, or at least highly ranked, for various diseases.

To examine this more systematically, I downloaded the Google Trends state scores for the 28 common conditions listed under A, B, and C on the NHS website. And I also downloaded the first 30 words listed in a file of common English words, like "about", "business", and "click".

Here is WV's ranking plotted for search interest for each of the common words. WV is in red, and dots at the top are ranked higher (more searches for this term). The ranking of WV looks more or less random for these common words.
1775938007749.png

On the other hand, this is WV's search interest among all states for 28 common health conditions:
1775938134368.png
Is West Virginia a particularly unhealthy state? Why are they searching so much for all sorts of diseases?



* Note: For terms where WV was tied with another state, the order of the tied states in the plot is arbitrary. (E.g. If WV was tied with 2 other states for rank 1, then it could be at position 1, 2, or 3.)

Trends scores are based on the time span 2004-01-01 to 2026-03-24. For example, "adhd" scores were downloaded from this URL: https://trends.google.com/trends/explore?date=2004-01-01 2026-03-24&geo=US&q=adhd
 
Something weird seems to be going on with West Virginia (WV). I keep seeing WV listed as the number one ranked state by search interest, or at least highly ranked, for various diseases.

Google AI says West Virgina is very distinct in relation to health outcomes and health provision compared to other states, with high per capita expenditure but some of the worst outcomes of all fifty states. It has the highest levels of obesity and drug/alcohol abuse and health provision deserts in its extensive rural areas. See https://www.wboy.com/news/west-virg...lso ranked,deaths per 100,000 state residents).

[Drastically edited to comply with our use of AI rules]
 
Last edited:
Google AI says West Virgina is very distinct in relation to health outcomes and health provision compared to other states, with high per capita expenditure but some of the worst outcomes of all fifty states. It has the highest levels of obesity and drug/alcohol abuse and health provision deserts in its extensive rural areas. See https://www.wboy.com/news/west-virg...lso ranked,deaths per 100,000 state residents).
Thanks! I should have done a quick search because just seeing the search results for "west virginia unhealthy" makes it very clear this probably is actually related to their health.

I thought it might be an issue with Google Trends, but I guess instead this provides more support about Trends scores at least somewhat tracking prevalence.

For "ME/CFS" (specific search term), WV ranks 22, and for "chronic fatigue syndrome", rank is 4.
 
Just a quick check, are these trends consistent for post exertional malaise instead of ME/CFS?

Yes, it looks comparable.

The Post-exertional malaise topic (capitalized above) has a large correlation, but who knows what terms this encompasses. The specific term "post exertional malaise" is lower, but still high.

Honestly, the idea about the correlation of ME/CFS searches with British ancestry just being the type of language used in states with higher British ancestry seems like a strong possibility. I don't see why a population suffering from high rates of "chronic fatigue" (based on high search interest) wouldn't also be searching at similarly higher rates for "always tired" or "tired", as this is a much more common word that people would probably search first.

Edit: Here's "post exertional malaise" vs Scottish:
1775944613965.png
 
Back
Top Bottom