Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

With the help of @forestglip, I've finally managed to run linkage disequilibrium score regression (LDSC) on the DecodeME results. The original package is written in the outdated Python 2 which caused all sorts of errors. So I've used the Python package GWASlab which provides a wrapper function to run the code.
LDSC in gwaslab - GWASLab

As reference for LD structure we used the European LD scores from 1000 Genomes which can be downloaded here:

The output looks like this:
h2_obsh2_seLambda_gcMean_chi2InterceptIntercept_seRatioRatio_seCatagories
0.040580380.002916921.099476051.141692790.914159330.00766967Ratio < 0NANA

The LDSC paper from 2015 suggests that for binary traits (having ME/CFS or not) h^2 is on the observed scale.
... This relationship holds for meta-analyses, and also for ascertained studies of binary phenotypes, in which case h2 is on the observed scale.
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies - PubMed

So to transform it to the liability scale as reported in the DecodeME paper we have to use this formula.
1756628336877.png
Heritability 201: Types of heritability and how we estimate it — Neale lab

Where K is the population prevalence and P is the prevalence of the trait in your GWAS. In R this becomes:
observed_to_liability <- function(h2_obs, K, P) {
# h2_obs: observed-scale heritability
# K: population prevalence
# P: proportion of cases in GWAS sample

# Calculate threshold corresponding to prevalence
t <- qnorm(1 - K)

# Height of standard normal distribution at that threshold
z <- dnorm(t)

conversion_factor <- (K * (1 - K))^2 / (P * (1 - P) * z^2)

h2_liability <- h2_obs * conversion_factor

return(h2_liability)
}
h2_liab <- observed_to_liability(h2_obs = 0.0405, K = 0.0065, P = 15579/(259909+15579))

In The DecodeME paper they report a h^2 of 0.095. The ME/CFS prevalence that would convert our observed h^2 of 0.0405 to this number would be 0.65%. In other words, it seems like the DecodeME paper assumed a prevalence of 0.65% in calculating the heritability.

Also tried to calculate the LDSC using only SNP that had a MAF > 0.05 and using LD data from the UK biobank but the results were similar (h^2 = 0.0402 and 0.0405 respectively on the observed scale ). If you upload the DecodeME data to BigaGWAS it also gives the same result of h^2 = 0.0405.
 
Last edited:
The intercept of LDSC is often used as a measure of stratification effects or confounding bias. It should be close to 1. If it is substantially higher, it would suggest that population differences between group are inflating the p-values. The good news is that this isn't the case in DecodeME!

The LDSC intercept, however was 0.914, which is substantially smaller than 1. I'm not sure what this means. Perhaps it's because only half of measured SNPs could be used for imputation so that the LD in the sample was underestimated? Or perhaps it indicates that the principal components took away more than just population differences but also some real effects of the illness?

Would be interested in hearing if these figures are correct and if so what the the low intercept might mean @Chris Ponting
 
Some weirder findings are:
  • Never eats dairy products
I remembered that some people with irritable bowel syndrome avoid dairy. So I think the genetic correlation to not eating dairy is related to the genetic correlation to IBS and/or 43% of the DecodeME cohort having IBS.

Prevalence and Presentation of Lactose Intolerance and Effects on Dairy Product Intake in Healthy Subjects and Patients With Irritable Bowel Syndrome, Clinical Gastroenterology and Hepatology, 2013
Methods
Sixty patients diagnosed with D-IBS at the Sir Run Run Shaw Hospital, Hangzhou, China and 60 controls were given hydrogen breath tests to detect malabsorption and intolerance after administration of 10, 20, and 40 g lactose in random order 7–14 days apart; participants and researchers were blinded to the dose. We assessed associations between the results and self-reported lactose intolerance (LI).

Results
Malabsorption of 40 g lactose was observed in 93% of controls and 92% of patients with D-IBS.

Fewer controls than patients with D-IBS were intolerant to 10 g lactose (3% vs 18%; odds ratio [OR], 6.51; 95% confidence interval [CI], 1.38–30.8; P = .008), 20 g lactose (22% vs 47%; OR, 3.16; 95% CI, 1.43–7.02; P = .004), and 40 g lactose (68% vs 85%; OR, 2.63; 95% CI, 1.08–6.42; P = .03). H2 excretion was associated with symptom score (P = .001).

Patients with D-IBS self-reported LI more frequently than controls (63% vs 22%; OR, 6.25; 95% CI, 2.78–14.0; P < .001) and ate fewer dairy products (P = .040).

However, self-reported LI did not correlate with results from hydrogen breath tests.

Diet in subjects with irritable bowel syndrome: A cross-sectional study in the general population, BMC Gastroenterology, 2012
Methods
The cross-sectional, population-based study was conducted in Norway in 2001. Out of 11078 invited subjects, 4621 completed a survey about abdominal complaints and intake of common food items. IBS and IBS subgroups were classified according to Rome II criteria.

Results
IBS was diagnosed in 388 subjects (8.4%) and, of these, 26.5% had constipation-predominant IBS (C-IBS), 44.8% alternating IBS (A-IBS), and 28.6% diarrhoea-predominant IBS (D-IBS).

Low intake of dairy products (portions/day) (Odds Ratio 0.85 [CI 0.78 to 0.93], p = 0.001) and high intake of water (100 ml/day) (1.08 [1.02 to 1.15], p = 0.002), tea (1.05 [1.01 to 1.10], p = 0.019) and carbonated beverages (1.07 [1.01 to 1.14], p = 0.023) were associated with IBS.

A lower intake of dairy products and a higher intake of alcohol and carbonated beverages were associated with D-IBS and a higher intake of water and tea was associated with A-IBS. [...]
 
I remembered that some people with irritable bowel syndrome avoid dairy. So I think the genetic correlation to not eating dairy is related to the genetic correlation to IBS and/or 43% of the DecodeME cohort having IBS.

Prevalence and Presentation of Lactose Intolerance and Effects on Dairy Product Intake in Healthy Subjects and Patients With Irritable Bowel Syndrome, Clinical Gastroenterology and Hepatology, 2013


Diet in subjects with irritable bowel syndrome: A cross-sectional study in the general population, BMC Gastroenterology, 2012
Yeah. The first thing that happened when I came to a doctor with my weird stomach symptoms (which a year and a half later were diagnosed as ME/CFS).

Was I was told to try not eating dairy.
 
Honestly 99% of the overlapping list calculated by Me/cfs science could IMO be attributed to misdiagnosis or clinical constructs which are vague and overlap.
In the DecodeME sample, they did quite some efforts with the questionnaires + self-reported clinical diagnosis to ensure patients had ME/CFS. So I don't think its likely that misdiagnosis would affect the results so much to create spurious relationships of this magnitude.

Another option is that the categories about depression or anxiety include patients with ME/CFS who were misdiagnosed or had this as a comorbidity. But as you say these clinical constructs are so vague and broad that I think ME/CFS patients would only form a very small subgroup. So that wouldn't explain the correlation either.
 
Last edited:
In the DecodeME sample, they did quite some efforts with the questionnaires + self-reported clinical diagnosis to ensure patients had ME/CFS. So I don't think its likely that misdiagnosis would affect the results so much to create spurious relationships of this magnitude.

Another option is that the categories about depression or anxiety include patients with ME/CFS who were misdiagnosed or had this as a comorbidity. But as you say these clinical constructs are so vague and broad that I think ME/CFS patients would only form a very small subgroup. So that wouldn't explain the correlation either.
Yes. That’s what I meant by misdiagnosis. Not necessarily the decode sample. But the anxiety/depression/ibs samples. Since they seemed pretty loose.
 
For example more than 90% of points have an observed -log10 p-value < 4.
In fact, 90% of points would be expected to be below a -log10 p-value of 1.

If looking at the x-axis in the QQ-plot (the expected p-value for each point if it was a null distribution), 90% of points are left of 1 (p>0.1), 99% left of 2 (p>0.01), 99.9% left of 3 (p>0.001), and so on.

I calculated that in reality (the y-axis) there was a slight deviation where 88.6% of points were below -log10 p of 1.
 
Coming back to OLMF4 and NEGR1, two genes which had significant SNPs close to them in a big depression GWAS:
Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression | Nature Genetics

DecodeME also has SNPs close to these genes that were close to reaching the significant threshold of 5*10^-8. The blue dotted line indicates the lead SNP found in the depression GWAS. The DecodeME data has a similar signal, but it's slightly different (as the coloc analysis of DecodeME already showed for OLFM4).

1757863666963.png



1757863673279.png
 
Last edited:
But unfortunately, it looks like you have to request access to get the GWAS data on depression.
It looks like you can download summary stats on the GWAS Atlas website.

I'm not sure if it has data from exactly the same study you linked, but it has several for depression. For example, there's a depression trait which looks like it has NEGR1 as significant. Clicking the link next to "File" gives a 625 MB file with these columns:
chr rsid pos A2 A1 AlleleFreq ImputationAccuracy Beta StandardError P
 
I thought it might be useful to extend this to more loci than the top 8. Supplementary table 3 has the top 25 loci.
Did something similar by looking at SNPs that had a p-value below 5*10^-7 but that didn't appear in 8 regions that DecodeME already highlighted.

So they were just below the threshold of 5*10^-8 for statistical significance. But because this threshold is a bit rough and arbitrary, it might be useful to look at the signals just below it. Here's what I got:

ID
p-value
Odds ratio
Frequency
Closest genes
1:69696474:A:G
2.06e-7
1.09
0.18
LRRC7
1:73126414:C:CA
1.19e-7
1.07
0.49
NEGR1, LRRIQ3
1:91028158:C:T
1.89e-7
1.07
0.44
ZNF64, BARHL2
6:4336259:T:C
2.90e-7
0.92
0.19
(unclear)
11:16217844:C:G
1.08e-7
1.12
0.10
SOX6
12:123924955:G:A
2.43e-7
1.07
0.32
CCDC92, DNAH10 (unclear)
17:11325637:G:C
8.25e-8
0.93
0.51
SHISA6
18:53232948:C:T
2.48e-7
1.07
0.53
DDC

The graphs below show protein-coding genes close to the SNP with the lowest p-value in the new region. The blue dashed line shows the location of that top SNP. I started with a window of 1Mb but in some cases (Chromosomes 12 and 17) I narrowed it down if there were many genes in the region.

CHROMSOME 1
1757873972429.png

1757873986935.png

1757874011392.png

CHROMSOME 6
1757874027509.png


CHROMSOME 11
1757874171806.png

CHROMOSOME 12
1757874194684.png

CHROMSOME 17
1757874205013.png

CHROMSOME 18
1757874223702.png
 
Highlighting the gene cards of some of the closest genes that have little competition or that have the top SNP inside it (Forestglip already posted about many of these before).

LRRC7
Predicted to enable protein kinase binding activity. Predicted to be involved in several processes, including establishment or maintenance of epithelial cell apical/basal polarity; positive regulation of neuron projection development; and protein localization to membrane. Located in several cellular components, including centrosome; cytosol; and nucleoplasm. Implicated in cocaine dependence.
LRRC7 Gene - GeneCards | LRRC7 Protein | LRRC7 Antibody

SOX6
This gene encodes a member of the D subfamily of sex determining region y-related transcription factors that are characterized by a conserved DNA-binding domain termed the high mobility group box and by their ability to bind the minor groove of DNA. The encoded protein is a transcriptional activator that is required for normal development of the central nervous system, chondrogenesis and maintenance of cardiac and skeletal muscle cells. The encoded protein interacts with other family members to cooperatively activate gene expression. Alternative splicing results in multiple transcript variants.
SOX6 Gene - GeneCards | SOX6 Protein | SOX6 Antibody

CCDC92 (UNCLEAR)
Enables identical protein binding activity. Predicted to be involved in innate immune response and regulation of defense response to virus. Located in centriole; centrosome; and nucleoplasm.
CCDC92 Gene - GeneCards | CCD92 Protein | CCD92 Antibody

SHISA6
Predicted to enable ionotropic glutamate receptor binding activity. Predicted to be involved in several processes, including excitatory chemical synaptic transmission; modulation of chemical synaptic transmission; and negative regulation of canonical Wnt signaling pathway. Predicted to be located in asymmetric, glutamatergic, excitatory synapse. Predicted to be part of AMPA glutamate receptor complex. Predicted to be active in dendritic spine membrane; postsynaptic density; and postsynaptic membrane.
SHISA6 Gene - GeneCards | SHSA6 Protein | SHSA6 Antibody

DCC
This gene encodes a netrin 1 receptor. The transmembrane protein is a member of the immunoglobulin superfamily of cell adhesion molecules, and mediates axon guidance of neuronal growth cones towards sources of netrin 1 ligand. The cytoplasmic tail interacts with the tyrosine kinases Src and focal adhesion kinase (FAK, also known as PTK2) to mediate axon attraction. The protein partially localizes to lipid rafts, and induces apoptosis in the absence of ligand. The protein functions as a tumor suppressor, and is frequently mutated or downregulated in colorectal cancer and esophageal carcinoma.
DCC Gene - GeneCards | DCC Protein | DCC Antibody
 
As the DecodeME preprint highlights, DDC has repeatedly been associated with chronic pain including in this GWAS of Analgesic Use
 
See quite some similarity between this GWAS on multisite chronic pain.
Genome-wide association study of multisite chronic pain in UK Biobank | PLOS Genetics

Heritability around 10%, MAGMA only points to brain regions, DDC, CA10 and SOX6 as significant hits, genetic correlation with depression of around 0.5. They conclude:
We identified 76 independent genome-wide significant SNPs associated with MCP across 39loci. The genes of interest had diverse functions, but many were implicated in nervous-system development, neural connectivity and neurogenesis
A lot of the loci are different from those in DecodeME though. The NEGRI and OLFM4 loci show more similarity to depression GWAS findings.
 
Back
Top Bottom