Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

It does need to be taken with a slight pinch of salt, but it’s also important to understand that “AI” isn’t a static entity. The model out this week has very low error rates on health bench marks, and AI on its own has lower diagnostic errors than human doctors now (interestingly lower than human doctor plus AI which should really be the best combination). Let me know if you’d like be to find the references,

Context is important though and in my prompt I had asked it what was hyperbole, so that it wasn’t unrealistically optimistic. There can definitely be a tendency to guess what the user wants and be sycophantic and here it might have over corrected.
Of course. It all depends what we count as “low error rates on health”. I can see that making sense for finding objective findings on a scans or whatever. But in Me/cfs where we barely know anything, I dunno if it means much.
 
The model out this week has very low error rates on health bench marks, and AI on its own has lower diagnostic errors than human doctors now (interestingly lower than human doctor plus AI which should really be the best combination). Let me know if you’d like be to find the references,
I like that it's good at summarising information. I'm not sure how good it is at creating hypotheses. It's useful for sure, but maybe not for all tasks.

I can believe that it might be better than doctors at diagnosing illness, but this is presumably in controlled setting for which it has been optimized.
 
Of course. It all depends what we count as “low error rates on health”. I can see that making sense for finding objective findings on a scans or whatever. But in Me/cfs where we barely know anything, I dunno if it means much.

I’m basing it on hearing about these.
6894e49ca3a1be9d17beb575_reliability.png

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30d08a4-c69c-43d4-a614-acd2c35ab12c_6008x2966.heic


We dismiss the potential of AI for ME at our peril I think. It contains enormous volumes of medical information about the human body, more than human specialists can, and can potentially spot connections we are missing. This will include false positives, but worth investigating suggestions if there’s a true positive within the bundle of suggestions.
 
I’m basing it on hearing about these.
6894e49ca3a1be9d17beb575_reliability.png

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30d08a4-c69c-43d4-a614-acd2c35ab12c_6008x2966.heic


We dismiss the potential of AI for ME at our peril I think. It contains enormous volumes of medical information about the human body, more than human specialists can, and can potentially spot connections we are missing. This will include false positives, but worth investigating suggestions if there’s a true positive within the bundle of suggestions.
But here it makes sense for objective findings. But we need to be quite careful. Who decides whats good “management reasoning”. Depending on who is testing the AI they could count BPS things as the “right answer”.

So it’s always worth thinking more critically about these “AI tests”.

I’m not one to think LLM’s are worth dismissing. (As others have seen given my other opinions in the forum). But i think we should be very careful with these tests of LLM’s. Because who decides what is truth, what answer is better than another. Can become very murky with things like ME.

And of course the AI absorbs the langauge we use. Talking about “CFS” or “CFS/ME” and it will give you BPS views. But “ME” it will give biomed.
 
I think that even if you weren't included in this particular analyses that has been written up as a preprint, your data is likely to be included in some other analyses in the future.
@Ariel, spotted this here:
The results reported in the latest preprint are based only on people with European ancestries, but the DecodeME team say that analysis of more diverse DNA data is ongoing.
 
Thank-you. I have been worrying about this all day. I hope we get more info at some point. I wonder what kind of analysis they are doing.

(Also I still don't get what is meant by European ancestry vs "more diverse DNA" in this context as don't we all have DNA from all kinds of regions? What is the cut off?)
There’s more information in the Data Analysis Plan which may help
Check the section of the PDF on Ancestry
 
Re: ancestry.

From the FAQs,

"Why did you only use DNA from participants of European ancestry in this initial analysis?

This is because we had to closely match the ancestry of the study samples with those of the control samples from the UK Biobank, which were largely of European ancestry. This was to be sure that the differences we are identifying are more likely to be because of ME/CFS, and not because of differences in ancestry. An ongoing analysis uses all study samples from all ancestries."

From the supplementary methods,

"We inferred major group ancestries (African, American, East Asian, European, and SouthAsian) by PCA projection of DecodeME samples onto the 1000 Genome reference population samples (15) using bigsnpr (17) (Fig. S8). We completed a second round of PCA on theEuropean ancestry cohort to calculate relatedness and principal components for downstream analyses. We ran the implementation of KING used in PLINK 2.0, on the subset of variants(‘in_Relatedness’) used by UKB that were present in DecodeME’s data (68,453 SNPs). This was because not subsetting the SNPs used would have severely inflated relatedness estimates(14)."

To translate, the ancestry was calculated from your DNA data, not in any other way.
 
So the study found significant locations in the DNA, not necessarily significant specific genes. The trouble is figuring out which gene associated with a given location is the troublemaker in ME/CFS.

I understand that this is the case but I’ve been struggling to understand why.

From the preprint:

“Linking GWAS variants to causal genes that may provide biological insights and medical applications remains a challenge for the field (43).”​

43. Qi T, Song L, Guo Y, Chen C, Yang J. From genetic associations to genes: methods, applications, and challenges. Trends Genet. 2024 Aug;40(8):642–67.​

After reading various other things it still wasn’t clear to me, so I asked ChatGPT for a simple explanation:

ChatGPT:
1. Most GWAS hits are not in genes

• Around 90% of GWAS-associated SNPs fall in noncoding regions (DNA that doesn’t directly code for proteins).

• These regions often act as regulatory elements (enhancers, silencers, etc.) that can control genes far away — sometimes hundreds of thousands of base pairs away, even skipping over nearby genes.

So the nearest gene to a SNP is often not the one it regulates.

2. Linkage disequilibrium (LD) blurs the picture

• GWAS doesn’t pinpoint the exact causal variant — it identifies a block of variants that are inherited together due to LD.

• Any variant in that block could be the causal one, and they may each point to different potential genes.

It’s like knowing a crime happened in a group of five roommates — but not knowing which roommate did it.

3. Regulatory wiring is complex

• Gene regulation is 3D, not just linear along the DNA strand.

• DNA folds and loops, bringing faraway enhancers into contact with gene promoters.

• This means a SNP located near one gene in the genome’s “street map” might actually physically interact with a totally different gene in 3D space.

4. Cell-type and context specificity

• The effect of a variant often depends on which cell type or biological condition you look at.

• A SNP might regulate a gene in neurons but not in liver cells — so if you only look in the wrong cell type, you might see no signal.

5. Multiple genes can be affected

• Sometimes a single regulatory variant influences several genes at once.

• This makes “the” causal gene ambiguous — it could be one, several, or even all of them contributing to the trait.

In short:

Linking GWAS SNPs to genes is tricky because:

• Most hits are in noncoding regulatory DNA

• The causal variant is hidden among many linked ones

• Regulatory elements can act over long genomic distances in 3D

• Effects depend on cell type and context

• Multiple genes may be involved

That makes sense to me but is it accurate? Is there anything that anyone would correct or add?

Sorry for blocking up this thread with entry level questions but I wasn’t sure where else to ask, and I’m guessing that I may not be the only person reading this thread that doesn’t understand this aspect of GWAS.
 
I understand that this is the case but I’ve been struggling to understand why.

From the preprint:

“Linking GWAS variants to causal genes that may provide biological insights and medical applications remains a challenge for the field (43).”​

43. Qi T, Song L, Guo Y, Chen C, Yang J. From genetic associations to genes: methods, applications, and challenges. Trends Genet. 2024 Aug;40(8):642–67.​

After reading various other things it still wasn’t clear to me, so I asked ChatGPT for a simple explanation:

ChatGPT:


That makes sense to me but is it accurate? Is there anything that anyone would correct or add?

Sorry for blocking up this thread with entry level questions but I wasn’t sure where else to ask, and I’m guessing that I may not be the only person reading this thread that doesn’t understand this aspect of GWAS.
I didn't know that stuff either and I'm really grateful that you posted it!
 
I don't know how accurate those two AI statements above are (1, 2), but they seem to me clear and concise, which is always a big plus, if for no other reason than that makes it much easier to spot the limitations and flaws.

It demonstrates what I have long thought is the main potential benefit of AI (at least in its current form and level) which is efficiency gains. How long would it have taken a human or even a team of humans to produce that?

It also shows that the prompts AI is given are very important (e.g. ME/CFS v. ME). That is where the human skill component of the equation matters.
 
Last edited:
But here it makes sense for objective findings. But we need to be quite careful.
Yes of course we need to be careful, just like we wouldn’t blindly accept something a human medic or scientist said, I think most pwme are beyond that from our experience. But we could miss useful insight from being too cynical.

The training is from all sorts of sources, but also includes directly what we’ve been writing about for years. I’d be shocked if this forum isn’t in the training data. I know something like 27,000 words from my blog was in GPT3.5 model. It does also know about BPS papers too. It’s a couple of years since it’s said anything like that to me, as if it’s valid knowledge, but we should be aware that it’s in there.

Anyway this is probably a different discussion thread.
 
That makes sense to me but is it accurate? Is there anything that anyone would correct or add?
That seems pretty much right, but I'm no expert, so I wanted to find a more reputable source to back it up (these correspond to ChatGPT's points 1 and 2):

Prioritization of causal genes from genome-wide association studies by Bayesian data integration across loci (2025, PLOS Computational Biology)
Understanding which gene in a GWAS locus is responsible for the causal effect is a current challenge [1].

The challenge arises for two reasons.

First, SNPs identified by a GWAS are statistical associations, not causal mechanisms. Linkage disequilibrium creates large blocks of correlated SNPs or haplotypes. Methods that predict functional consequences of variants are helpful [2], but often statistical measures are insufficient to distinguish which SNPs in a block are responsible for a causal effect.

Second, even among causal variants, only a small fraction occur in protein-coding regions, and a small fraction of these cause amino acid changes that provide strong evidence implicating a particular gene. At the majority of loci, the causal variants occur in intergenic regions thought to regulate the expression of nearby genes, but without direct evidence from GWAS of which gene’s regulation is affected.
 
So isn't the argument simply: If you want large genetic studies of ME/CFS with rectruitment based on diagnosis, you have to have clinicans decidated to ME/CFS spread across the country? Instead of focusing on a possible shortcoming should the response not be: Yes, now is the time to give us a GWAS where the diagnosis is made by dedicated clinicans that know what they are doing! We'd all happily sign up for that, please get the wheels moving now.
Some sort of... plan? That would... deliver better outcomes, uh? It would involve targets and work to address specific shortcomings such as this? There's an idea there, if only there were some... people... organized in some fashion who could... you know... think of something here? Ah well.
 
Going through the candidate genes pdf and SLC9C2 is mentioned which I don’t think we’ve had much discussion of

SLC9C2 (Tier 1)
• Protein: Sodium/hydrogen exchanger 11 (NHE-11). UniProt. GeneCards. The allele that increases the risk of ME/CFS is associated with decreasing SLC9C2 gene expression.
• Molecular function: Predicted to enable potassium:proton antiporter activity and sodium:proton antiporter activity. Exhibits testis-/sperm-restricted expression in humans (24) although some expression occurs in the thalamus and midbrain, and in the choroid plexus.
• Cellular function: Unclear, but it is localised to the acrosomal region of the head in mature sperm cells.
Which reminded me of discussion in the Zhang paper starting with his from @chillier

As an aside, there's a couple of genes that seem to have high expression in spermatids (again from human protein atlas single cell with whatever problems that may or may not have). S100PBP and AKAP1 from precisionLife have very high spermatid expression specificity. ADCY10 from zhang et al as well for instance. Is there something that neuron function and spermatozoa have in common?

There was a fair bit of speculation about shared ion channels, results in PrecisionLife and if this was significant or not. I’m not sure what to make of it all but perhaps an interesting thread to pull on some more?
 
@richie this is a GPT5 answer to your question:


This is the analyse of the whole paper by the new GPT5 (hallucination is meant to be very low for health now with this model), is there anything new here?





This is the view of a customGPT I created:

Pathway-to-patient map (compressed view)

4. Pathway-to-patient map (compressed view)


PathwayGenesPotential trial classLow-risk now
Pain-synapseCA10NR2B NMDA antagonists, PSD95 disruptorsPEM-aware pacing, sensory load control
NETosis / neutrophil toneOLFM4Low-dose colchicine, DNase, PAD4 inhibitorsOral hygiene, prompt infection care
Autophagy / ER-phagyKLHL20, CCPG1ULK1 modulators, ER stress chaperonesRegular meals, avoid extreme fasting
Oxidative / membrane repairPRDX6, PEBP1Lipid replacement therapy, antioxidantsOmega-3, choline, vitamin-C foods
Viral sensingRABGAP1L, BTN2A2, ZNFX1TLR modulators, γδ T-cell agonistsInfection avoidance, vaccination as eligible
Thanks. I'm pretty familiar with the paper (which I hand in advance so that I could write the blog). I'd say the chat GPT5 summary is astonishingly good. The custom GPT one was more mixed, but some good stuff.

I liked they picked up subtle things, at least one of them spotted the paper saying the HLA results didn't quite add up, and the authors are going to do a new analysis. And that replication didn't really work, but there were problems with case definition variations in the replication cohorts.

ADDED: sorry, the stuff about pain and drug trials was junk, not sure which version of ChatGPT produced that. It's a very interesting clue, but no more than that at this stage.

Also, I think they missed that the genes identified are not nailed on, and need more and let's go work, and ultimately, experimental work.

Even do, it's still very impressive

On AI generally – I had an extraordinary helpful answer from the chat but on the Visible app yesterday. Normally it spews junk, which you have to get through to connect to a real person. But it nailed a really a tricky question, I suspect that was also ChatGPT five or similar.

I feel borderline redundant already. Don't fancy my chances after ChatGPT 6.

I wonder how to chat but will deal with debate in contentious areas?
 
Last edited:
That makes sense to me but is it accurate? Is there anything that anyone would correct or add?
Looks good to me. And the DecodeME team are well aware of these issues (and spell out the generic limitations of GWAS in the paper) - one reason the analysis took so long. And also a reason why analysis has taken so long.

They have more analyses in the pipeline. I think fine mapping is one of these, like they did for the genetic signal for ME/CFS vs depression. This should further narrow down the candidate list of genes. I don't know if the summary statistics would allow other groups to do this kind of analysis too (but the depression analysis used summary stats from depression genetic research).
 
EDIT: these probably reflect artefacts that were excluded in the main analysis?

I'm looking at the DecodeME summary data, namely the file: gwas_1.regenie.gz taken from here: https://osf.io/rgqs3/files/osfstorage

When I arrange by lowest p-value however, I get 8320 rows with a p-value lower than 5 * 10^-8. Some go all the way up to 2.44^-62.
It looks like what we want is in the qced.var.gz file. It just has a list of SNPs, which I assume are those that passed QC. When I filter the main summary stats file (gwas_1.regenie.gz) to only include the SNPs in this list, then it looks like it matches the reported data:

Screenshot from 2025-08-09 08-00-27.png
 
Back
Top Bottom