A crumb of a clue on epidemiology

Yes, on Google Trends, you can pick Metro or City for the subregion option.

Metro does make it more fine-grained into 210 areas, but the problem is that it becomes a lot harder to try to correlate it with other variables, since I think it'd be hard to find various stats like average income or ancestry subdivided in this way.

There's also the City option, but it looks like only 10 cities in the US have enough data to show.
I’m just coming out with random stuff now, isn’t a lot of Canada Scottish as well?
 
I’m just coming out with random stuff now, isn’t a lot of Canada Scottish as well?

Yeah, it looks to be higher than in the US:

Scottish Canadians
13.9% of the total Canadian population (2016)

Scottish Americans
8,422,613 (3.6%) Scottish alone or in combination

Also higher when considering the more general British ancestry:

British Canadians
32.5% of the total Canadian population (2016)

British Americans
18.4% of the total US population

If British ancestry were a risk factor for ME/CFS, then presumably we'd see a larger prevalence in Canada. I don't know if we have any good studies on that.

But if we are considering the Google Trends data for the past 22 years, the search interest in Canada for ME/CFS is only barely higher than in the US (scores are 18 vs. 16), which seems to go against the idea of British propensity for ME/CFS.
 
One slightly frustrating thing about Google Trends is that the data does not appear to be consistent if downloaded on different days, even if representing the same time span and search term.

The 22 year data for ME/CFS I was using was downloaded on 2026-03-24, and represents the time span of 2004-01-01 to 2004-03-24. I re-downloaded the data for the same time span today, and it is not the same. Of course, I probably shouldn't have included the present date of March 24 within the range of the time span for the original download, as the day wasn't done yet, but the additional few hours shouldn't meaningfully change the results that represent 22 years of data.

Others have commented about the inconsistency elsewhere, with the explanation given that the Trends data is not based on all Google searches, but is instead based on a relatively small sample. On a different day, the search interest for a topic could have been recalculated with a different sample, changing the results.

Thankfully, the values don't change by a huge amount. Here I have plotted the data I used for the previous analyses based on the 22 years of ME/CFS trends data, against the data I have re-downloaded today based on the same time span.

1775235204677.png

It's highly correlated, so shouldn't change results too much, but I wanted to note this to avoid confusion in case anyone follows the links to the Trends data I provided and sees that what it shows doesn't exactly match what I described in my posts.
 
Moving right along in our data mining operation...

CDC Wonder is a website which provides access to several interesting public health datasets. I downloaded age-standardized rates of causes of death grouped by state from the Multiple Cause of Death 1999 - 2020 dataset.

This dataset gives the rate of a cause of death in a state whether it was the single "underlying" cause of death, or if it was one of up to 20 additional contributing causes of death listed on the death certificate.

After filtering to only causes which all states had data for, I was left with 598 causes.

I tested Spearman correlation between each cause of death and the Google Trends ME/CFS data (using the same dataset I was using previously: 2004/01/01-2026/03/24), both with and without covariates added for potential confounders.

I added a few more covariates for this analysis that I didn't use previously: proportion of state that has never smoked, sex distribution, healthcare access (proportion who have been to the doctor in the past year) and age. Sources for all covariates:
  • Sex ratio
    • DP05, Estimate!!SEX AND AGE!!Total population!!Sex ratio (males per 100 females)
  • Age
    • DP05, Estimate!!SEX AND AGE!!Total population!!Median age (years)
  • Education
    • DP02, Percent!!EDUCATIONAL ATTAINMENT!!Population 25 years and over!!Bachelor's degree or higher
  • Internet access
    • DP02, Percent!!COMPUTERS AND INTERNET USE!!Total households!!With a broadband Internet subscription
  • Language spoken at home
    • DP02, Percent!!LANGUAGE SPOKEN AT HOME!!Population 5 years and over!!English only
  • Income
    • S1903, Estimate!!Median income (dollars)!!HOUSEHOLD INCOME BY RACE AND HISPANIC OR LATINO ORIGIN OF HOUSEHOLDER!!Households
  • Rurality
    • P2, proportion calculated from (!!Total:!!Rural)/(!!Total: )
  • Healthcare access
    • BRFSS, About how long has it been since you last visited a doctor for a routine checkup? Within the past year
  • Smoking
    • BRFSS, Smoker Status - Never smoked

These are the 15 most significant Spearman correlations for cause of death vs. ME/CFS searches in a state, ranked by p-value controlling for covariates:
Cause of deathSpearman R (Univariate)P value (Univariate)Spearman R (with covariates)P value (with covariates)Spearman P value (with covariates, Bonferroni)Spearman P value (with covariates, FDR)
1Muscular dystrophy (G71.0)0.5345.4E-050.6062.1E-050.01250.0125
2Alcohol, unspecified (T51.9)0.5081.4E-040.5795.9E-050.03520.0176
3Poisoning by and exposure to other and unspecified drugs, medicaments and biological substances, undetermined intent (Y14)0.3875.0E-030.5581.2E-040.07350.0245
4Perforation of intestine (nontraumatic) (K63.1)0.4962.1E-040.5214.0E-040.23970.0599
5Rheumatic heart disease, unspecified (I09.9)0.5131.2E-040.4968.3E-040.49380.0854
6Malignant melanoma of skin, unspecified - Malignant neoplasms (C43.9)0.6161.5E-060.4958.6E-040.51230.0854
7Other and unspecified narcotics (T40.6)0.3142.5E-020.4881.0E-030.61810.0883
8Crohn disease, unspecified (K50.9)0.5963.9E-060.4791.3E-030.80000.1000
9Vascular disorder of intestine, unspecified (K55.9)0.4003.6E-030.4741.5E-030.90460.1005
10Methadone (T40.3)0.4962.1E-040.4671.8E-0310.1084
11Motor neuron disease (G12.2)0.6541.9E-070.4632.0E-0310.1084
12Sequelae of complications of surgical and medical care, not elsewhere classified (T98.3)0.4371.4E-030.4562.4E-0310.1101
13Acute vascular disorders of intestine (K55.0)0.2379.5E-020.4562.4E-0310.1101
14Other synthetic narcotics (T40.4)0.2784.8E-020.4483.0E-0310.1190
15Sequelae of surgical and medical procedures as the cause of abnormal reaction of the patient, or of later complication, without mention of misadventure at the time of the procedure (Y88.3)0.4082.9E-030.4443.3E-0310.1190
Only two causes were significant after strict Bonferroni correction with a 0.05 threshold: muscular dystrophy and alcohol. Poisoning was also below the 0.05 if correcting with FDR.

Alcohol is interesting because it aligns with the results of the previous analysis where we found high correlations with drugs meant for reducing alcohol cravings.

Here is a map showing rates for muscular dystrophy, which had the largest correlation. It doesn't show the distinct grouping of high values seen in both upper corners for both ME/CFS searches and British ancestry.
33Rvq-muscular-dystrophy-g71.0-age-adjusted-rate-.png

And a plot of ME/CFS search trends vs. muscular dystrophy rate (note the stats in the corner are univariate Pearson stats):
1775249001619.png

Edit: I attached a spreadsheet with correlation results for all causes.
 

Attachments

Last edited:
Fascinating discussion.
I tried to look for cases of different drugs correlated in the same direction and which treat the same disease:
Interesting to see losartan (and the other "sartans") in the negative correlation. I think they may have some effect in lowering or preventing increases in TGF-B? Which if I have understood correctly is one of the few somewhat consistent ME/CFS immune findings.

This almost certainly has nothing to do with losartan being "protective" against ME/CFS, and is possibly related to HBP being common in people with diabetes, which as noted earlier is more prevalent in the US South.

But since we are discussing crumbs of clues...
 
As a sensitivity check to see how much the changes in Google Trends data downloaded on different days would affect the correlations, I redid the above cause of death correlation analysis, but with the re-downloaded Google Trends ME/CFS data for the same time span (2004/01/01 - 2026/03/24).

These are, again, correlations of state search interest for ME/CFS with rates for state-wide cause of death.

There were some shifts in results, but it's still very similar. Muscular dystrophy became less significant, but death due to alcohol was still Bonferroni significant.

Cause of deathSpearman R (Univariate)P value (Univariate)Spearman R (with covariates)P value (with covariates)P value (with covariates, Bonferroni)P value (with covariates, FDR)
1Alcohol, unspecified (T51.9)0.5384.6E-050.6072.0E-050.0120.012
2Sequelae of surgical and medical procedures as the cause of abnormal reaction of the patient, or of later complication, without mention of misadventure at the time of the procedure (Y88.3)0.4281.7E-030.5352.6E-040.1560.042
3Sequelae of complications of surgical and medical care, not elsewhere classified (T98.3)0.4538.6E-040.5352.6E-040.1580.042
4Rheumatic heart disease, unspecified (I09.9)0.5493.0E-050.5332.8E-040.1680.042
5Muscular dystrophy (G71.0)0.4411.2E-030.5223.9E-040.2360.043
6Intentional self-poisoning by and exposure to other and unspecified drugs, medicaments and biological substances (X64)0.4774.0E-040.5145.0E-040.2960.043
7Vascular disorder of intestine, unspecified (K55.9)0.3944.2E-030.5135.2E-040.3090.043
8Other and unspecified narcotics (T40.6)0.3756.7E-030.5076.1E-040.3640.043
9Motor neuron disease (G12.2)0.6591.5E-070.5017.2E-040.4310.043
10Other synthetic narcotics (T40.4)0.3291.8E-020.5017.3E-040.4340.043
11Heroin (T40.1)0.3993.8E-030.4978.0E-040.4810.044
12Methadone (T40.3)0.5091.3E-040.4841.2E-030.7040.056
13Poisoning by and exposure to other and unspecified drugs, medicaments and biological substances, undetermined intent (Y14)0.3202.2E-020.4821.2E-030.7270.056
14Embolism and thrombosis of unspecified vein (I82.9)0.4391.3E-030.4622.1E-0310.075
15Crohn disease, unspecified (K50.9)0.5483.1E-050.4622.1E-0310.075

Edit: Also note that since these correlations are based on any of multiple causes from a death certificate, some of the different high correlations, such as "intentional self-poisoning" and "poisoning", could be largely describing the same deaths.
 
Last edited:
It's kind of difficult to see the actual pattern among states on the Google Trends map for ME/CFS, so I made a map with much more contrast.

IFnn7-google-trends-myalgic-encephalomyelitis-chronic-fatigue-syndrome-1-1-04-3-24-26-.png

The two upper corners of the country seem to have the highest scores for ME/CFS searches.
Intriguing...I found another variable (with the help of some brainstorming with AI) that clusters in the two upper corners: higher physical activity.

- https://www.worldlifeexpectancy.com/explore/usa/physical-activity-regular/map
1775355787655.png

The data from that map is at this link. I did a quick linear regression with the trends data, without any covariates, and R^2 is 0.21. So still not really getting to the R2 of 0.35-0.5 we were getting with Scottish/English ancestry correlated with trends. But maybe there's something here. Maybe there's a better physical activity metric that would correlate better.

It could theoretically make sense. More physical activity could increase risk of someone with underlying risk of ME/CFS getting their first PEM.

This was with 2023 physical activity data using the variable "Percent of adults who achieve at least 150 minutes a week of moderate-intensity aerobic physical activity or 75 minutes a week of vigorous-intensity aerobic activity".

I also first tried the regression with 2015 data using a slightly different variable that doesn't include the vigorous activity part, and the result was pretty much the same with R2=0.22.
 
Last edited:
It occurs to me that the p-values I've reported for the correlations are probably not totally valid, since the observational units being tested here (states) are not really independent from each other. For example, states near each other will tend to be more similar to each other than to far away states for various metrics, which would skew p-values down.

But we can focus on the magnitude of the correlations to at least probe possible connections to ME/CFS searches.

That correlation with British ancestry is so interesting to me because it is remarkably high for being based on a hunch. I really want to see if it's possible to identify why they're correlated. Whether it's genetics, awareness, or something else, I feel like there should be some way to figure it out. My best idea at this point is testing correlations with lots of other variables to try to find something even more strongly correlated than ancestry.
 
Where are the best known ME/CFS and LC research groups, clinics and specialists?

Has someone already checked that? I've been following the thread from the start but can't remember.
I appreciate we might have to compile a list of people/institutions/businesses and corresponding locations first.
 
Where are the best known ME/CFS and LC research groups, clinics and specialists?

Has someone already checked that? I've been following the thread from the start but can't remember.
I appreciate we might have to compile a list of people/institutions/businesses and corresponding locations first.

This analysis is a good one, but it's also an idea I was trying to overcome at the very start.

It's easy to say Norway and the UK google me/cfs because they have higher awareness and more clinicians, and stop trying to untangle things there.

But we need to iterate our logic - awareness, researchers and clinicians may be more likely to spring up where prevalence is hgiher.

This is why I'm disinclined to definitively conclude Utah's high rate of me/cfs googling is due only to the location of the Bateman Horne centre. Like, yes, obviously the centre affects the results, but it's important not to stop the train of thought there.
 
This analysis is a good one, but it's also an idea I was trying to overcome at the very start.

It's easy to say Norway and the UK google me/cfs because they have higher awareness and more clinicians, and stop trying to untangle things there.

But we need to iterate our logic - awareness, researchers and clinicians may be more likely to spring up where prevalence is hgiher.

This is why I'm disinclined to definitively conclude Utah's high rate of me/cfs googling is due only to the location of the Bateman Horne centre. Like, yes, obviously the centre affects the results, but it's important not to stop the train of thought there.
I see and I don't disagree.

I was actually wondering if it the centres, doctors and researchers were significantly contributing to the searches. So, if you have something like BHC, you have tens of employees who are googling ME/CFS stuff daily. They might be interested in e.g. cholesterol levels but to get better results would probably google "cholesterol ME/CFS" rather than just "cholesterol".
 
I see and I don't disagree.

I was actually wondering if it the centres, doctors and researchers were significantly contributing to the searches. So, if you have something like BHC, you have tens of employees who are googling ME/CFS stuff daily. They might be interested in e.g. cholesterol levels but to get better results would probably google "cholesterol ME/CFS" rather than just "cholesterol".
That is a good point and the smaller the geography the more impact this could have. e.g. maybe wouldn't swing NY but could affect Utah!
 
Is it too much of a stretch to think about the decodeME genetic regions? I know they’re not specific but are any of them correlated with diseases that are high among say, Northern European or British?

also how recent/strong would the Scots/English ancestry need to be? 3 generations? Six?
 
Tested the hypothesis that perhaps liver damage might predispose a person to mecfs; noting the correlation with alcohol addiction medications seen by FG above.

But I found no correlaiton between 2023 alcohol consumption per capita and mecfs searches.
That's surprising, since the correlation seemed large for both prescription of alcohol addiction drugs as well as deaths due to alcohol from a totally different source. Not only that, but I also ran correlations against around 2800 more variables from a third source, the Correlates of State Policy dataset, which I hadn't posted yet. And wouldn't you know it, an alcohol-related variable was number one.

First, looking at CDC Wonder for deaths data again, just to make sure I didn't mess anything up. This one's easier to check since it's just the raw data from the site, as opposed to the Medicare prescription data which I calculated using two variables.

I think maybe it didn't make sense for me to use the age-adjusted death rate, since we want the actual rate of deaths due to alcohol if we're comparing to the actual rate of Google searches. Anyway, the result seems about the same either way.

Here are the steps I used to get the crude rate per 100,000 for "T51.9 (Alcohol, unspecified)" as a "Multiple cause of death":
  1. Go to: https://wonder.cdc.gov/mcd-icd10.html
  2. Press I Agree at the bottom of the page.
  3. Change the values in the first section ("1. Organize table layout") under Group Results By to State for the first dropdown and Multiple Cause of Death for the second dropdown.
  4. Check Age Adjusted Rate if you want that as well, but Crude rate should be fine, and this is provided by default.
  5. In section 7 ("Select multiple cause of death"), use the tool to select "T51.9 (Alcohol, unspecified)" whether by using the Browse tab or the Search tab, then with it highlighted, click "Move Items Over".
  6. In section 8 ("Other options")
    1. Check "Export Results" to directly download the file, or uncheck to first see the data
    2. Uncheck "Show Totals"
    3. Set Precision to 9 decimal places.
  7. Press Send.

And here is the plot of deaths due to alcohol vs. ME/CFS search interest:

It's a weird looking plot since it looks like it diverges into two lines, but there's a moderate relationship there with R2=0.16.

Ok, so now the third dataset I tested, Correlates of State Policy. I downloaded the 3000 variable dataset from this link, then ran this code to filter, for each variable, to the most recent year that had data for at least 45 states and at least 2 unique values. This resulted in 2795 variables.
Python:
from mecfs_trends.settings import DATA_PATH
import pandas as pd

df = pd.read_csv(DATA_PATH / "correlates_state_policy/raw/correlates.csv")
df = df.set_index("state")

cols = [c for c in df.columns if c not in ["st", "stateno", "state_fips", "state_icpsr", "year"]]

year_groups = {
    year: group[cols]
    for year, group in sorted(df.groupby("year"), reverse=True)
}

series_list = []
for col in cols:
    for year, year_df in year_groups.items():
        s = year_df[col]
        s = pd.to_numeric(s, errors="coerce")
        if s.notna().sum() >= 45 and s.nunique() >= 2:
            series_list.append(s.rename(f"{col}_{year}"))
            break

result = pd.concat(series_list, axis=1)
result.to_csv(DATA_PATH / "correlates_state_policy/cleaned/correlates_best_year.csv")
result

Here are the top 20 correlations based on Pearson, no covariates:

The names of the variables have to be looked up in their Codebook. The year after the variable name was added by me, indicating which year of the variable I used.

So the highest correlation is calcdist from 2015. In the codebook, the definition given is "Sum of 6 alcohol distribution variables = cbret + cbwhol + cwret + csret + cwwhol + cswhol". So this is a sum of these variables:
cbret: Exclusive state control of retail sales of some types of beer 0 = no, 0.5 = only some very high- alcohol beers (>12% ABV), 1 = yes, 5 = total prohibition; +1.5 = near- absolute on-premises prohibition, +0.5 = on-premises limitation (restaurants, bars, private clubs) Exclusive state control of wholesale sales of some types of beer

cwhol: Exclusive state control of wholesale sales of some types of beer 0 = no, 0.5 = only some very high- alcohol beers (>12% ABV/W), 1 = yes; +1.5 = near-absolute on-premises prohibition, +0.5 = on-premises limitation (restaurants, bars, private clubs)

cwret: Exclusive state control of retail sales of some types of wine 0 = no, 1 = yes; +1.5 = near-absolute on-premises prohibition, +0.5 = on- premises limitation (restaurants, bars, private clubs)

csret: Exclusive state control of retail sale of some types of spirits 0 = no, 1 = yes; +1.5 = near-absolute on-premises prohibition, +0.5 = on- premises limitation (restaurants, bars, private clubs)

cwwhol: Exclusive state control of wholesale sale of some types of wine 0 = no, 1 = yes; +1.5 = near-absolute on-premises prohibition, +0.5 = on- premises limitation (restaurants, bars, private clubs)

cswhol: Exclusive state control of wholesale sale of some types of spirits 0 = no, 1 = yes; +1.5 = near-absolute on-premises prohibition, +0.5 = on- premises limitation (restaurants, bars, private clubs)
I guess higher values mean more government control over alcohol sales, and the higher this is, the higher the searches for ME/CFS. I suppose if there are more deaths due to alcohol, there would be more government regulation. Here is the plot of that calcdist variable vs ME/CFS search interest:
1775515226977.png

The second highest is cwret, which is specifically government retail wine restrictions. And the third highest is ruse, which is defined as "Past month use of any illicit drug (percentages)", which also seems to be a similar topic to alcohol.

I'll see if I can find more datasets related to alcohol to test. Which one did you use for alcohol consumption @Murph? Maybe the variables I found to be correlated highlight more problematic alcohol use (needing treatment, dying, government regulation), while alcohol consumption per capita might include lots of non-problematic use.

Edit: Attached all results from Correlates of State Policy correlations.

Also, I think there is an issue with the variables that have "racep" in the name, from the CSP dataset. The definitions say they are about test scores for children of a given race, but the values seem strangely similar to the proportion of a state that is that race.

For example, r4_racep_white_perc is defined as: "This variable captures fourth grade reading scores among Whites at the state-year level."

Here is the 2014 data for that variable:
1775568527977.png

The ordering of states and values seem close, though not identical, to the proportion of each state that is white: https://en.wikipedia.org/wiki/List_of_U.S._states_by_non-Hispanic_white_population
 

Attachments

Last edited:
Liver is a major immune-metabolic-endocrine organ and it could make sense that a compromised liver would make a person more prone to acquiring longlasting effects from an infection.

Certainly @mariovitali's AI analyses implicate the liver. And, n=1, my own student lifestyle immediately prior to coming down with me/cfs was not one of temperance, propriety and moderation. Rather the opposite. Furthermore, and at risk of drawing on steretotypes, a common thread between the scandinavians and the English might be a fondness for a drop. (although probably the germans drink too).

A counterpoint would be that mecfs hits adolescents, who are presumably not drinking and have no liver function risk factors.

I admit I was expecting a correlation when I set off to download that alcohol consumption data; perhpas that dataset is no good? Risk of double jeopardy though if you go looking for a different dataset every time you don't get the info you expect - we have to be driven by evidence in the end!
 
Last edited:
Is it too much of a stretch to think about the decodeME genetic regions? I know they’re not specific but are any of them correlated with diseases that are high among say, Northern European or British?
I was wondering about the genetics too. One way I was thinking of going about it was, if we assume the DecodeME regions cause ME/CFS, and if we assume being British increases risk of having ME/CFS, then we might expect that one or more of the DecodeME variants might be found more often in British people than people of other ancestries, as an explanation for the increased risk for people with British ancestry.

Here are the top DecodeME loci. The A1FREQ column shows the frequency of the variant in the cohort, so basically this is the frequency in British people.
1775519749334.png

I grabbed allele frequencies for other ancestries from the linked gnomAD pages for each of the 8 variants (sorted from most to least significantly associated with ME/CFS):
Effect for ME/CFSUnited Kingdom Allele Frequency
(A1FREQ from DecodeME)
Admixed American Allele FrequencyAfrican/African American Allele Frequency
20:48914387:T:TAIncrease risk0.6340.69570.8852
17:52183006:C:TIncrease risk0.3300.24010.06611
6:26239176:A:GIncrease risk0.2610.31470.05821
15:54866724:A:GIncrease risk0.3120.24930.5059
1:173846152:T:CProtective0.3250.36890.7528
6:97984426:C:CAProtective0.5460.64950.5220
13:53194927:GT:GIncrease risk0.2870.16350.04824
12:118202773:C(T^13):CIncrease risk0.1390.18550.1672

I chose two other ancestries to compare to, African since there seemed to be a negative correlation with African ancestry in earlier analyses which might suggest they have very low risk, and American since a high proportion of the US population reported this ancestry, but there was no correlation of American ancestry with ME/CFS searches.

So if the ancestry effect is due to one of these variants, we should expect that the frequency would be higher than in other ancestries for a variant that increases risk of ME/CFS, and lower for a variant that is protective.

I made the text green for the three variants that seem to follow this pattern. For example, the chr17 variant (near the CA10 gene) increases risk of ME/CFS and has a frequency of 33% in the UK, 24% in Americans, and 7% in Africans/African Americans.

There are also other ancestries to compare to in gnomAD, so this could use a deeper look (for the chr17 variant, only one other ancestry shows a higher frequency than that in DecodeME: Amish).

I don't think there's any smoking gun here, but maybe something to think about. CA10 seems to be one of the more robust findings from DecodeME, since the same variant is found to increase risk of chronic pain, so maybe it plays a big part in ME/CFS. And maybe this pattern of higher frequency in British ancestry combined with the British ancestry/Google searches correlation might further corroborate that.
 
Last edited:
Didn’t DecodeME only use people with European ancestry (or something like that)?

Too foggy to think it all through, but it feels like “group X” not being in the study might make interpreting “this variant is associated with ME/CFS and group X has it less often” more complicated.
 
One way I was thinking of going about it was, if we assume the DecodeME regions cause ME/CFS, and if we assume being British increases risk of having ME/CFS, then we might expect that one or more of the DecodeME variants might be found more often in British people than people of other ancestries, as an explanation for the increased risk for people with British ancestry.
That's unfortunately a bit too confounded by ancestry to begin with--it's a pretty common phenomenon that sets of associated variants or polygenic risk scores translate poorly across groups with different ancestry, even if they all have the same disease. You'll probably find that GWAS from different ancestry cohorts point to similar pathways overall, but each set of hits will be ancestry-specific to a degree because the respective risk variants will be interacting with a different frequency distribution of hundreds of other SNPs to produce the disease state.

So the hits from DecodeME should be interpreted as "variants associated with risk of developing ME/CFS among people with a British ancestry"
 
Back
Top Bottom