Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

Dumb question time: what’s the difference between a candidate gene, like the ones on this list, and the 8 ‘headline’ genes, in very simple terms? Thanks
In simple terms, the headline one is the strongest of several candidates for each genetic signal (though more than one gene covered by a genetic signal may play a part in the illness.

This is from the DecodeME blog:
>
DecodeME identifies top genes
DecodeME has started searching for the treasure, and has identified likely genes using several methods. These methods looked at genes within the genetic signals – the peaks on the Manhattan plot.

The most powerful method searched those genes to find ones whose activity levels (the amount of protein they make) are known to go up or down depending on what form a particular variant within the ME/CFS genetic signal takes. This means that the activity of those genes is related to someone having ME/CFS – a strong clue that those genes might be a cause of ME/CFS.
<

The second paragraph talks about a method called eQTL, which I will try to explain here in a bit more detail.

We start with the genetic signal for ME/CFS. Basically, candidate genes are nearby, but which are playing a part in causing ME?

One way of finding out is to see if gene activity changes in people with ME.

Lets take a genetic signal for ME/CFS that has a particular known variant We want to know if the gene behaves differently in peoople who have that variant. If there is no difference, the gene probably isn't doing anything relevant.

There is a large public database (GTEX) that has such data: it shows gene expression of each in people with different variants. So, the method is to find the variant associated with ME, then look to see if nearby genes show different levels of activity in people with that variant.

Here is an example (not from the study) for the gene PLG, in this case showing its mRNA expression level in the brain's frontal cortex. The known variant has either the letter C or the letter T. We have two copies of each set of DNA (one from each parent), so each person has a 'genotype' of CC, CT or TT.

As you can see, the gene is less active (X axis) where people have CC and more when they are TT, with CT somewhere in between.

1754557342695.png

Note that we don't know the DNA sequence of the gene itself - just its activity level and how that changes with the genotype.
And it does NOT show that the variant causes gene expression itself. More likely it is another, unseen, variant affecting both the gene and ME status.

Back to DecodeME. If gene activity correlates with ME status (through the variant) then it is an interesting gene. I think 43 genes were identified this way, and 29 were classified as priority genes (presumably, taking into account other factors as well).
 
As others have already mentioned, I don't think lack of replication tells us anything useful.

Perhaps lack of replication is even a good thing: We know that according to DecodeME, if anything the effects of these genes on ME/CFS status are small (as to be expected) and we know that something like the UK biobank (some of the other cohorts like EHR cohort- R-1 is probably just entirely useless) is completely unreliable at detecting what DecodeME considers to be ME/CFS. That means if you'd have replicated your results in people who "on average might not have ME/CFS, despite small effect sizes" that too me would make it much more likely that the genes you've picked out actually have little to do with ME/CFS rather than just confounders for ill-health.

However, I see a different thing. We recently discussed a study that in my eyes had results of a very similar flair: Genome-wide study of somatic symptom and related disorders identifies novel genomic loci and map genetic architecture. People were largely of the opinion that those results meant nothing because the diagnostic criteria used in the study likely meant nothing so the study would likely find all sorts of confounders that are just generally linked to ill health, rather than having anything to do with a specific set of symptoms. I fail to see how things are entirely different here. Here is a hypothetical example to illustrate my point: Suppose ME/CFS is "a real disease" that however does not have any genetic links. Then to me I would find it equally likely that DecodeME would be able to pick up those confounders that don't have anything to do with ME/CFS because these types of studies can simply end up that way, despite the authors of DecodeME trying their best to avoid such scenarios. I can appreciate that some might say, "the things picked up here aren't linked to things like anxiety of depression" but I don't think this circumvents the confounders argument and I think the same argument might also apply to the above FND study. That being said whilst I find it extremely likely that some of the genes found significant are irrelevant, I do find it somewhat unlikely that all of them are.
 
Asking a «dumb» question:

Does a hypothesis of what ME/CFS is have to include all of the genes, or is it enough to find a way to make just some or them work together?
I think it is entirely unlikely that all of the genes that have been picked up are relevant. I think you'd only want to construct a story around a few but nobody has any idea which ones those would be (and it could even be some that are insignificant here).
 
Suppose ME/CFS is "a real disease" that however does not have any genetic links. Then to me I would find it equally likely that DecodeME would be able to pick up those confounders that don't have anything to do with ME/CFS
I'm not entirely sure about that. Certainly heritability was modest but non-zero (0.096, I think). Given that these are well defined individuals, I'm not sure if that number is also prone to the same confounders.

However, there will be subgroup analysis at some point, of comorbid conditions. And that might throw some lights on this.

And I'm not sure how you couldn't use the argument with other chronic illnesses with relatively low heritability – which is most of them.

But maybe I haven't understood your argument ?
 
To the gene masters: We now have an idea of the effect sizes of some genes on ME/CFS status based on the UK population.

Has anybody looked at the spread of these genes across populations to see whether studies in certain areas make more or less sense? Is that something feasible and does it make sense?
 
Presumably, full genome analysis including rarer variants could not only find other genes not looked at yet, but those genes could have a higher contribution/effect size?

A few rarer variants working in conjunction with these more common ones could give clearer information on pathways involved. But also, it seems with a disease with a sub 1% prevalence that gene variations with lower prevalence could be involved (and show stronger associations). There’s no reason they must, but it seems a possibility. To the genetics experts, does that make sense?
 
Last edited:
I'm not entirely sure about that. Certainly heritability was modest but non-zero (0.096, I think). Given that these are well defined individuals, I'm not sure if that number is also prone to the same confounders.

However, there will be subgroup analysis at some point, of comorbid conditions. And that might throw some lights on this.

And I'm not sure how you couldn't use the argument with other chronic illnesses with relatively low heritability – which is most of them.

But maybe I haven't understood your argument ?
Good question. I think the argument largely always applies, but maybe it isn't always sensible if you can hope that there are some more specific things that make the argument meaningless (if an illness involves a red spot on the forehead then maybe if you do a GWAS on people how have this illness you'd somehow end up with some noise related to things that don't have anything to do with the red spot on the forehead but you'd also be picking up the red spot if it has a genetic signal and if it doesn't you can't but maybe you think you'd have on the basis of some noise related things)? And like you said comborbities might be one of those things.

I don't necessarily think that all confounders would necessarily have to be related to being well-defined individuals, some might be more related to getting diagnosed or participating in a study or something else that might be outside the control of the authors. I'm not quite sure if I'm being ridiculous or not, but supposedly it's possible that certain genes might make it more likely for you to be a participant in DecodeME without having to do anything with ME/CFS even if the authors tried their best to rule out such things (I think @Hutan might for example argue that female sex could possibly be one such thing).
 
Back to DecodeME. If gene activity correlates with ME status (through the variant) then it is an interesting gene. I think 43 genes were identified this way, and 29 were classified as priority genes (presumably, taking into account other factors as well).
But, the "candidate genes" seem to be something different to what is an interesting gene.

Some of the candidate genes in the supplementary document don't have eQTL information- they are the ones called Tier 2 Candidate genes.
 
To the gene masters: We now have an idea of the effect sizes of some genes on ME/CFS status based on the UK population.

Has anybody looked at the spread of these genes across populations to see whether studies in certain areas make more or less sense? Is that something feasible and does it make sense?
You want to know if it would make sense to study ME/CFS in certain geographic regions?

My understanding is that it would not make sense. ME/CFS is not a disease caused by mutations in a single gene. The variants identified in DecodeME are very common in the population and their effect on the disease is tiny. If some remote village existed where an unusually large amount of people had ME/CFS, we would discover that by looking at diagnostic rates by geographic region, not through a GWAS.

It's better to spend the resources on studying other things, like the whole genome.
 
@Simon, @Andy

What are the plans for publishing this preprint? Do you plan to get it published before other analyses (like that of the x and Y chromosomes) are done? If so, do you have any idea how long that might take?

There's this on the DecodeMe website FAQs
'Our analysis is ongoing, and once complete, we will send all our findings for peer review and publication.'
 
@Simon, @Andy

What are the plans for publishing this preprint? Do you plan to get it published before other analyses (like that of the x and Y chromosomes) are done? If so, do you have any idea how long that might take?

There's this on the DecodeMe website FAQs
'Our analysis is ongoing, and once complete, we will send all our findings for peer review and publication.'
I believe the FAQ answers most of your questions. Timescale for all of that is currently estimated at 6 months.
 
To get a feel of the effect size, I've made an overview of the prevalence of these SNPs in patients versus controls. I couldn't find this in the paper or supplementary material so I've tried to calculate them using R (trying out different guesses until the combined prevalence and odds ratio match - hopefully someone will check if they get the same result!).

It shows that the effect size comes down to a 1-2% difference in prevalence of SNPs between ME/CFS patients and controls.

SNPid
Combined prevalence
Odds ratio
Prevalence ME/CFS
Prevalence controls
chr1q25.1
0.325
0.927
0.3095
0.3259
chr6p22.2
0.261
1.086
0.2763
0.2601
chr6q16.1
0.546
0.934
0.5300
0.5470
chr12q24.23
0.139
1.100
0.1501
0.1383
chr13q14.3
0.287
1.077
0.3015
0.2861
chr15q21.3
0.312
1.082
0.3282
0.3110
chr17q22
0.330
1.084
0.3470
0.3290
chr20q13.13
0.634
1.095
0.6536
0.6328

Here are those 8 SNPs from the summary stats for GWAS 1:

1754563709489.png

I'm confused about how the frequency in cases and controls separately for several of these can both be higher than the frequency combined.
 
You want to know if it would make sense to study ME/CFS in certain geographic regions?

My understanding is that it would not make sense. ME/CFS is not a disease caused by mutations in a single gene. The variants identified in DecodeME are very common in the population and their effect on the disease is tiny. If some remote village existed where an unusually large amount of people had ME/CFS, we would discover that by looking at diagnostic rates by geographic region, not through a GWAS.

It's better to spend the resources on studying other things, like the whole genome.
More specifically I was thinking that possibly the next ones to do a suitable GWAS could for example be the german speaking countries (who possibly have some of the higher diagnostic rates, but that may be related to clinicans rather then having to do something with ME/CFS). I was wandering what kind of predictions you'd be able to make on necessary sample sizes on the basis of these results, but I suppose that won't have anything to do with gene spread across populations if they are no significant differences in these rather common variations.
 
Did they test the remaining 10, 000 (known) virally-triggered ME/CFS samples and still find the association in these samples without the other 5,000? I think you would have to know that?
I think Figure 2 from the preprint answers your question @Ariel.
Screenshot 2025-08-07 at 10.48.26 pm.png
They did a number of GWAS. One was called GWAS-Infection, which was just the people reporting an infectious trigger for onset. You can see that some of the genes found in other analyses were significant in that group of people.
An additional locus, OLFM4, was genome-wide significant in a GWAS of cases reporting an infection prior to their ME/CFS symptoms.

They also looked at people not reporting an infectious trigger, and didn't find any significant genes. But the problem is that the number of people in the group was a lot smaller, so it gets harder for results to be significant. The analysis of just men had the same problem. It doesn't mean that the signals aren't in those groups.
 
The failed replications are disappointing but I suspect that next too broad case definitions for ME/CFS in the other databases, the sample size might also have been too small.
Here's an overview of the replication cohorts' sample sizes (taken from the supplementary material):

ReplicationCohortCasesControls
R1Lifelines3,44017,080
R1UK Biobank10,327195,103
R2Estonian Biobank1,926195,103
R2FinnGen283463,029
R2Michigan Genomics Initiative3,92657,247
R2Mass General Brigham Biobank11451,055
R2Million Veteran Program4,948617,301
R2Genes and Health3,05349,755
 
Back
Top Bottom