Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

Blog: DecodeME: the biggest ME/CFS study ever


@ME/CFS Science Blog , spotted a typo, "Rare SNPs might show langer and clearer effects"
Thanks for this piece. Really great! I will definitely come back to this more than once, whenever I've forgotten what had been talked about. What I didn't quite catch in the text is why some of your dots in the graphs are grey and why some are black? I might have missed it in the piece, but if not, I think it could be useful to be included somewhere.

I would think there's a tremendous amount of different analyis that can be done to see how closely related ME/CFS is to a different illness in some sense. My impression is that whilst LDSC gives an indication on the genetic correlation between illnesses the reason to not necessarily take these data too seriously is not necessarily related to how diagnosis was recorded can alter the results (which is sort of how I understood what the text said) but probably rather that to compare the genetic basis of two illnesses in such a way can or may not be appropriate depending on the context. I would imagine that there are illnesses that are entirely different in symptoms and presentation but that share variants in the same region (or as you mentioned in the text that it matters where the genes is expressed) and illnesses that are similar in symptoms and presentation and are similar in terms of the underlying biology but that don't share variants in the same region. In short your results may reflect the following: Pleiotropy (shared genes), Shared risk factors, or correlated measurement artifacts without reflecting anything genuinely connecting the mechanisms of illnesses. Of course you know these things much better than me, I just thought the justification to not take the LDSC too serious was maybe a bit too short for my taste in the text?

I haven't had a closer look but I've understood LDSC estimates global correlation, rather than anything else. Now it seems to me that this possible error in finding an HLA-association that doesn't exist suggests that there is room for imputation error in lots of different places (presumably only the significant findings where triple checked to hold water?). I find it possible that this additionally means that LDSC can from time to time identify common noise not signal as long as there is something systematically causing such issues?
 
Last edited:
Thanks @EndME
What I didn't quite catch in the text is why some of your dots in the graphs are grey and why some are black?
They are all the same color (gray) but I lowered the opacity so that if you see black ones it means there are multiple dots in a similar place overlapping each other.
Pleiotropy (shared genes), Shared risk factors, or correlated measurement artifacts without reflecting anything genuinely connecting the mechanisms of illnesses.
Think that having the same SNP signals, suggest the same genes as risk factors which points to similar biological mechanisms. So this points to a true connection, although there is some ambiguity at each of the steps (similar SNP signals might point to different genes, same genes may have different functions, etc).
I just thought the justification to not take the LDSC too serious was maybe a bit too short for my taste in the text?
It's mainly because the correlation differed a lot depending on the trait name that you use in the UK biobank database. There were for example multiple for schizophrenia and the correlation differed enormously. The LDSC data also came from us, S4ME members, not the DecodeME researchers or other experts, so another reason to not put too much weight on it yet.

1760024566283.png

Now it seems to me that this possible error in finding an HLA-association that doesn't exist suggests that there is room for imputation error in lots of different places
The HLA region is particularly known to be hard to read because of polymorphism (there many different and similar versions of genes). So it gives quite a different situation compared to the rest of the genome.

Don't think that potential errors elsewhere would affect the overal LDSC because it tests millions of SNP locations. SNPs that suggested a problem with imputation were all filtered out.
 
I thought the pre-print suggested there was no overlap with genes associated with anxiety or depression. I’ve not managed to keep up with discussions. Is that now considered to be inaccurate?
It depends on how you look at it. The 8 significant SNP signals were not seen before in the same pattern in depression or anxiety. There were however genes such as OLFM4 that are implicated in both depression and ME/CFS, even though the SNP pattern around it is different.

In addition, the genetic data is much more than what sticks out above that 5*10^-8 threshold. A high correlation means that ME/CFS and depression show signals in the same genomic regions even if these did not reach significance. An example is NEGR1 which has been implicated in depression GWAS. If we look at this region in ME/CFS, then there is a signal close to this gene with a p-value around 5*10^-7.
 
It depends on how you look at it. The 8 significant SNP signals were not seen before in the same pattern in depression or anxiety. There were however genes such as OLFM4 that are implicated in both depression and ME/CFS, even though the SNP pattern around it is different.

In addition, the genetic data is much more than what sticks out above that 5*10^-8 threshold. A high correlation means that ME/CFS and depression show signals in the same genomic regions even if these did not reach significance. An example is NEGR1 which has been implicated in depression GWAS. If we look at this region in ME/CFS, then there is a signal close to this gene with a p-value around 5*10^-7.
Is it possible people with depression are more prone to ME? Or that a percentage of depression cases are actually people with the prodromal form of ME, whether or not it ever turns into the real thing?
 
Is it possible people with depression are more prone to ME? Or that a percentage of depression cases are actually people with the prodromal form of ME, whether or not it ever turns into the real thing?

If the local SNP patterns are different, as they seem to be, my reading is that it is likely that variations in the same gene or group of genes are relevant both to depression and to ME/CFS but for different reasons. For the CA10 gene and chronic pain it looks more as if the link is for the same or closely related reason.
 
Is it possible people with depression are more prone to ME? Or that a percentage of depression cases are actually people with the prodromal form of ME, whether or not it ever turns into the real thing?
A high score in LDSC simply means that 2 groups on average look more similar thoughout their whole genome than one might expect via a flip of a die. That is the case here for ME/CFS and whole range of other things, but it is also the case for IBD and a whole range of things, Schizophrenia and a whole range of things, SLE and a whole range of things and so forth, without it meaning that Schizophrenia being prodromal SLE.

Based on LDSC, a person whose dad has SLE might be statistically more likely to develop RA, but not in any way that matters.

I also find it a bit hard to make sense of the second statement. ME/CFS people are according to many studies are remarkably undepressed, so if people with depression are supposed to have prodomoral ME/CFS does getting ME/CFS resolve the depression?

One of the highest LDSCs in the above traits is the UK biobank code for chronic fatigue syndrome. My understanding is that we've pretty much agreed that this UK biobank code doesn't really stand for ME/CFS, that the people in DecodeME are somewhat different to these people in a meaningful way and that GWAS results for it are pretty much different but still there is some correlation and it is the strongest one what that was found above, so any connections cannot be stronger than the connections we already think are somewhat loose.
 
Last edited:
Here's the social media summary for it:


1) We’ve just published our second instalment on the DecodeME results, this timing zooming in on the genes associated with ME/CFS.

2) The clearest signals point to genes such as CA10, SHISA6, SOX6, LRRC7, and DCC, which are involved in neuronal development and communication in the brain.

3) There are also gene candidates that point to the immune system such as OLFM4, RABGAP1L, BTN2A2, and TAOK3. These point to e.g. the innate immune system and regulation of T-cells. Unfortunately, they lie in regions stacked with genes and are therefore more uncertain.

4) The locus on chromosome 20 provided by far the strongest signal in DecodeME. The three closest genes (ARFGEF2, CSE1L, and STAU1) are involved in intracellular traffick and transport.

5) A bit more speculative but some other genes are related to autophagy, the process that degrades and recycles parts of a cell. FBXL4f for example is involved in mitophagy (clearing up of mitochondria) and caught the eye of Australian ME/CFS researchers.

6) The most consistent pattern however points to neuronal development and communication in the brain. This aligns with a previous genetic study by the Stanford group of Mark Snyder that focused on rare variants and loss of function.
https://www.medrxiv.org/content/10.1101/2025.04.15.25325899v1

7) In the blog we also go deeper into the reliability of the results and assess if the DNA differences could be due to ancestry, selection bias or other confounding factors.

8) We also used a different approach to explore gene linked to ME/CFS. In contrast to the DecodeME preprint, we didn’t focus on matching gene expression data but instead used a simpler approach based on proximity and genes per locus.

9) Instead of focusing solely on the 8 hits, we also looked just below the statistical significance threshold to spot more signals about what the pathology of ME/CFS might be.

10) We also publish (very zoomed out) graphs of these regions so that you can look how the signal looks like and which protein-coding genes are nearby.
 
Second blog article on the DecodeME results, this time focusing on genes related to ME/CFS.
Thanks for the blog!

We do not know which gene(s) the DNA signal points to, and the process of figuring this out is called ‘fine-mapping’.
I think fine mapping refers to identifying the causal variant out of all the significant variants, not identifying the gene that the variant affects. I think that would be 'gene prioritization':

- https://royalsocietypublishing.org/doi/10.1098/rsob.190221
3. Gene prioritization using GWAS traits
Traditional fine-mapping approaches focus on identifying the causal variants that affect a trait of interest. While very important, knowing which variants are causal does not identify the downstream effects of the variant on the trait. One way to gain such insights is by identifying the genes that are affected by each GWAS locus.

---

Although the region has multiple candidate genes, it’s quite likely that ARFGEF2, CSE1L, and STAU1 are involved in ME/CFS pathology because the signal around them is so strong. The gene-based test of MAGMA, a tool that helps you estimate which genes are relevant, highlighted all three of them.
The one below on chromosome 1 is likely to have more than one signal.
I don't think we can say things about loci likely being related to multiple genes with certainty. For chromosome 20, the locus being very significant doesn't indicate that multiple genes are involved. It may just be that ARFGEF2 or another gene just has a very strong effect.

MAGMA is basically just looking at the significance of variants only within the bounds of a gene. If a variant in one gene is very significant, and also is in LD with variants in another gene, both genes will be significant in MAGMA.

For some reason, LocusZoom doesn't show LD for the chr20 locus. But it does for the chr1 locus (last locus in the first image here), which shows that the second long RABGAP1L region is in moderate LD with the main variants above DARS2. So even if the only interesting causal variant was a variant above DARS2, we'd likely see the same pattern as we do.

---

Typo:
A bit caveat is that the genes highlighted above are involved in other functions

Change to "as similar as possible"?
They included only British participants with European ancestry so that they were as similar as the controls in the UK Biobank.

Add space.
Likely candidates are TAOK3, SUDS3, andPEBP1.
 
Last edited:
3) There are also gene candidates that point to the immune system such as OLFM4, RABGAP1L, BTN2A2, and TAOK3. These point to e.g. the innate immune system and regulation of T-cells. Unfortunately, they lie in regions stacked with genes and are therefore more uncertain.
Besides SequenceME, is it possible/practical for someone to do a study focussing on all or some of these genes? My understanding is that you don't need anywhere near as many participants for statistical significance when you are looking at a specific gene rather than at the genome as a whole. It would be really useful to have the uncertainty surrounding these genes cleared up.

P.S. What I've managed to read of the blog was very interesting @ME/CFS Science Blog!
 
Last edited:
I think fine mapping refers to identifying the causal variant out of all the significant variants, not identifying the gene that the variant affects. I think that would be 'gene prioritization':
Thanks. I misunderstood fine-mapping as a broad term encompassing gene prioritization (some tools and papers give that impression), but will update the text.
I don't think we can say things about loci likely being related to multiple genes with certainty. For chromosome 20, the locus being very significant doesn't indicate that multiple genes are involved. It may just be that ARFGEF2 or another gene just has a very strong effect.

MAGMA is basically just looking at the significance of variants only within the bounds of a gene. If a variant in one gene is very significant, and also is in LD with variants in another gene, both genes will be significant in MAGMA.
Agree. If you click on one of the other SNPs with low p-value it does show the LD. Didn't mean to imply that a strong signal must mean they are all three involved (although this is possible). The mean issue is that I didn't want to overlook the locus on chromosome 20 for being too dense with genes, because the signal seems quite concentrated around those three genes. Think it's likely that one or more of them are linked to ME/CFS.

For some reason, LocusZoom doesn't show LD for the chr20 locus. But it does for the chr1 locus (last locus in the first image here), which shows that the second long RABGAP1L region is in moderate LD with the main variants above DARS2. So even if the only interesting causal variant was a variant above DARS2, we'd likely see the same pattern as we do.
That's quite likely, but for some of the points to the right of the dragon-like pattern above RABGAP1L the LD was < 0.4 with the top SNP, so it could be two signals as well. Will change the text from: "The one below on chromosome 1 is likely to have more than one signal." to: "The one below on chromosome 1 might have more than one signal" Either way, RABGAP1L wasn't among the most likely gene in the approach we used because of the many other candidates nearby.

Thanks for the useful comments, corrections and feedback - much appreciated.
 
Last edited:
The mean issue is that I didn't want to overlook the locus on chromosome 20 for being too dense with genes, because the signal seems quite concentrated around those three genes. Think it's likely that one or more of them are linked to ME/CFS.
I'm just concerned that there's no other way to read the sentence when it uses "and" than that it's suggesting they are all involved:
it’s quite likely that ARFGEF2, CSE1L, and STAU1 are involved in ME/CFS pathology
Maybe "or" instead?
 
Second blog article on the DecodeME results, this time focusing on genes related to ME/CFS.
Thanks again @ME/CFS Science Blog. Your blogs are particularly helpful for me when I’m not able to keep up with threads. I wish I was able to contribute more, but your summaries also take the pressure of me feeling that I need to follow everything on here in order to keep up with developments.

One suggestion: if you don’t already do so (I’ve not checked) perhaps you should indicate on a blog when you’ve made changes, and specify any substantive changes.

The link to genes associated with intelligence is interesting. I need to read the blog again (so apologies if this is covered) but I’m wondering how confident we can be that this is not due to selection bias from self-referral.
 
Last edited:
A few years back I had a conversation with Robert Souhami, who most UK physicians have revered as one of the sharpest and most down to earth and common sensical teachers of his time. I grew up to believe that if you could not convince Bob that something was valid you needed to start again. Interestingly, I failed to convince him that my rituximab study design was valid and I proved him wrong. But the next time i met him the first thing he said was 'I was wrong.'

Bob asked me why there should be a category of ME/CFS - what justified separating off this group of patients? He could not see any reason to do so. So I wrote a Qeios article on the Concept of ME/CFS to try to answer him. I was arguing a case, which I think DecodeME now makes cast iron. There is a distinct biological category. If the sharpest minds in medicine can be persuaded of that, there is some hope that it will trickle down.
Apologies if this has been answered, but did Robert Souhami respond to your Qeios article? Do you think you’ve convinced him?

Are you minded to add anything about DecodeME to your Qeios article?
 
Last edited:
There's also this blog by Paolo Maccalini on the DecodeME results, focusing on the FUMA SNP2GENE analysis, which forestglip explored earlier in this thread.
 
There's also this blog by Paolo Maccalini on the DecodeME results, focusing on the FUMA SNP2GENE analysis, which forestglip explored earlier in this thread.
Oh that's great. Good to see another person got neurons/excitatory neurons in the cell type enrichment analysis.
 
The link to genes associated with intelligence is interesting. I need to read the blog again (so apologies if this is covered) but I’m wondering how confident we can be that this is not due to selection bias from self-referral.
Genetic links to "intelligence" always struck me as a hollow concept anyways. Really, it's a genetic association for doing slightly better on a handful of tests where you match patterns or pick words out of a list. The links between those types of tests and anything else people would associate with "intelligence"--good decision making, creative problem solving, professional success, interest in research, etc.--have always come across as incredibly dubious to me. Not in the least because social factors so heavily skew both performance on those tests and any of those other indicators of "intelligence". Is a gene actually associated with the nebulous concept of "intelligence", or with the closed-off social strata that have better access to schooling and more time on their hands to participate in research, or perhaps with the lack of various health conditions that would make someone less focused during a long battery of cognitive tests?

There may well be some confounding and self-selection with that particular finding, but more likely explained by those other factors rather than any concept of "intelligence."
 
with the closed-off social strata that have better access to schooling and more time on their hands to participate in research, or perhaps with the lack of various health conditions that would make someone less focused during a long battery of cognitive tests?
Not to mention access to healthier food, less pollution, less overwork, more time to rest and exercise, faster better quality and more comprehensive medical care. All that probably improves the health of the average person.

And I’d wager healthy people do better in these kinds of “intelligence tests”. I mean I’m sure there’s a study showing that the scores are far worse when people have the flu or whatever.
 
Back
Top Bottom