Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

forestglip · May 16, 2025

Thanks @jnmaciuch. So maybe not quite as exciting as what I thought was happening which did seem too good to be true: lots and lots of synaptic genes coming up independently. As far as I understand your description, it's something more like, if the model sees some variants in, say, both NLGN1 and DLGAP3, then it says, might as well invite all the rest of the synaptic genes to the party.

On another note, I wonder if there's any reason to be concerned about a point brought up by @mariovitali's AI:

External validation is underpowered
The Cornell cohort (36 cases, 21 controls) gives an AUROC of 0.670, but the confidence interval is artificially tight because the authors repeat the same split 500 times; the effective N is still 57, so real world performance uncertainty is larger than portrayed.

I don't know much about the machine learning world, but most or all of the reported confidence intervals do seem very small, for example AUROC: 0.670 ± 0.003. Maybe with a test set this small, it's not actually very unlikely to get an AUROC of 0.670 just by chance.

jnmaciuch · May 16, 2025

forestglip said:
Thanks @jnmaciuch. So maybe not quite as exciting as what I thought was happening which did seem too good to be true: lots and lots of synaptic genes coming up independently. As far as I understand your description, it's something more like, if the model sees some variants in, say, both NLGN1 and DLGAP3, then it says, might as well invite all the rest of the synaptic genes to the party.

On another note, I wonder if there's any reason to be concerned about a point brought up by @mariovitali's AI:

I don't know much about the machine learning world, but most or all of the reported confidence intervals do seem very small, for example AUROC: 0.670 ± 0.003. Maybe with a test set this small, it's not actually very unlikely to get an AUROC of 0.670 just by chance.

Yeah it is an artificially small confidence interval, but in my opinion you don’t particularly need one for validating on an independent test cohort. And even increasing the confidence interval by an order of magnitude wouldn’t put the AUC in the range of being no better than random assignment (0.5).

I think the bigger problem is what EndME already mentioned—we just don’t know which of the genes are recapitulated in the test cohort. But if we’re interpreting results simply based on possibility of involved pathways, not their predictive capacity for identifying ME/CFS vs. control, I think we can afford to be more lax.

Others may have different feelings on these points.

forestglip · May 16, 2025

jnmaciuch said:
Yeah it is an artificially small confidence interval, but in my opinion you don’t particularly need one for validating on an independent test cohort. And even increasing the confidence interval by an order of magnitude wouldn’t put the AUC in the range of being no better than random assignment (0.5).

I think the bigger problem is what EndME already mentioned—we just don’t know which of the genes are recapitulated in the test cohort. But if we’re interpreting results simply based on possibility of involved pathways, not their predictive capacity for identifying ME/CFS vs. control, I think we can afford to be more lax.

Others may have different feelings on these points.

Sure, I don't know how large of a CI would be expected if they had only run it once. I was just worried that it would actually be much bigger and cross 0.5, and that the genes may just be more or less random, and they happened to get 0.670 by chance on the test set. I'm thinking more about this after seeing that it looks like there's pretty much nothing special about any of these genes when looking at the LoF rare variant genes for CFS in the BioBank. Though I'm just learning the basics of genetic studies still, so I could definitely be wrong in this interpretation.

Though depression being the top hit seems to be unlikely to be a coincidence with all rest of the 4000 other random phenotypes it could have been, so it does make me think there's some real genes there at least, though maybe not specific to an ME/CFS mechanism.

Jonathan Edwards · May 16, 2025

I am in the frustrating position of knowing more about the DecodeME results than everyone else here except Andy (and Brian Hughes) and trying hard to make a judgment on these Zhang data as if I didn't. I continue to be impressed that they pulled out things that are very plausible and cannot really have turned up just because of rogue software but which on their own do have uncertainties attached. The finding of HLA-C seems to be isolated so that presumably is relying on its own weighting and I think that must be bona fide, although there are some general caveats about MHC data.

There is a certain irony about this paper coming out first, especially when DecodeME has had so much to cope with in terms of delays through no fault of the Edinburgh team. But I think it remains the case that data from DecodeME will bring a degree of confidence to the field that we don't have here. And it is not as if this is even the first study . The Bergen group have already provided data, some of which I think will turn out to hold up, but again had question marks at the time.

One thing the interests me is that rare genes are likely to be deleterious, whereas SNPs in GWAS tag common variants that may be mostly neutral or even represent alleles that have hung around because they have some advantages, together with downsides. So this study may be picking up not so much genes for signal paths that drive the process as genes for target tissues that cannot cope well if they are running on the basis of heterozygosity for a functional gene. That might be a wrong analysis but I suspect that this approach and the standard GWAS approach may end up giving rather different but complementary information. And it will be the DecodeME data that, being based on a less contrived analysis of a very large well defined group, will provide the bedrock for going back and re-assessing other data for consistency and complementarity.

forestglip · May 16, 2025

Jonathan Edwards said:
I am in the frustrating position of knowing more about the DecodeME results than everyone else here except Andy (and Brian Hughes) and trying hard to make a judgment on these Zhang data as if I didn't. I continue to be impressed that they pulled out things that are very plausible and cannot really have turned up just because of rogue software but which on their own do have uncertainties attached.

Thanks for giving some reassurance at least.

rapidboson · May 16, 2025

Could any of the findings in this paper (and DecodeME) point towards potential symptomatic treatments through drug repurposing while curative treatments are in the making?

hotblack · May 16, 2025

What a tease!

It seems there are at least useful results from DecodeME then, which there was never a guarantee of. That in itself feels like positive news. If the data can help to start to tie together things together in a bigger picture even better.

JemPD · May 16, 2025

I am waaay too ill to read the thread & wouldnt understand most of it if I did. I'm not really taking it in it's all just a blur in the main.

But there were moments & certain snippets that caught my eye that suggest to me that perhaps we might be near to some kind of breakthrough??? But i keep seeing mention of Decode.... 'when we get the decode results' type comments.

False hope is more catastrophic for my mental health than no hope at all. I honsetly dont think I'd survive if I get my hopes up that something's going to change & then it doesnt. Again.

So I need to prepare myself & so i want to ask... what happens if Decode finds nothing? if it is essentially negative?

Would that knock all these theories being discussed here on the head?

Jonathan Edwards · May 16, 2025

@JemPD. I apologise for the confusion but events have turned out a bit complicated.

I have said that whatever DecodeME finds I suspect we will have more faith in the results as a sort of yardstick or gold standard than we are going to for this study because the method is more transparent and the numbers are much bigger. I don't actually know what DecodeME will show in detail - exactly where it may be positive or negative - but the project has been successfully completed.

Nevertheless, even having said that, I do not believe that this Zhang study is yet another piece of dubious data that will not hold up. I don't see how this could be an artefact. And for that reason I personally think that you can take it that yes, this is some sort of breakthrough. If DecodeME comes up with no genes like this I will still think they are real - and that maybe the two methods pick out different areas for technical reasons. So I am saying forget about DecodeME for the moment because there is nothing in the public domain. Focus on this study and take it as something almost certainly very important. It is pointing to processes at the molecular level likely to be central to the biological process underlying the way the disease affects people in real life. It may be pointing to nerve cells but in no way is it pointing to psychology or 'effort preference'. It is pure bio- and no babble.

forestglip · May 16, 2025

I now see that this Zhang model is more telling us about clusters of genes with related function as opposed to the specific genes picked out. And I've been learning about GSEA the past few days. So I wondered, if I did a GSEA on the rare variants for chronic fatigue syndrome in the Genebass BioBank data, would the concepts we're talking about here, like synapses, be most important there?

So I did preranked GSEA using the -log10(SKATO p value) for ranking from the Genebass page for CFS. I used two different gene set collections: hallmark gene sets and GO cellular component. I uploaded the GSEA reports, as well as leading edge analysis for the cellular components, here: https://glittery-tarsier-413b2d.netlify.app/gsea/ (For example, click "Hallmark gene sets", then "Detailed enrichment results in html format" to see the most enriched gene sets in the hallmark collection.)

The hallmark collection doesn't include any gene sets specifically about synapses, but I've seen @Jonathan Edwards talking about TGF beta a lot, and that gene set was ranked 2nd out of 50.

But looking at the cellular component report, there seem to be a lot of neuron-related components near the top. Hopefully someone who has done GSEA for longer than a few days (@jnmaciuch?) can look at this and say if there's anything actually interesting here.

Edit: Reuploaded the hallmark report after I reran it when I realized I had a setting not set correctly. The results are still almost identical, just a few gene sets farther down switched places. The first 10 are still in the same order.

jnmaciuch · May 16, 2025

forestglip said:
But looking at the cellular component report, there seem to be a lot of neuron-related components near the top. Hopefully someone who has done GSEA for longer than a few days (@jnmaciuch?) can look at this and say if there's anything actually interesting here.

Looking through I would say the synapse-related findings are quite weak. CC pathways are quite small to begin with, I imagine there would be difficulty getting much signal from them from a Genebass dataset. Few pathways pass the FDR cutoff, and only 3-4 genes are driving the enrichment that aren't too highly ranked to begin with.

The hallmark TGFb hit is a little more interesting. Despite also not passing the FDR cutoff, there's reason to suspect TGFb a priori and treat it as a separate hypothesis (I probably wouldn't argue this in a paper having already seen the list of results, but for an informal assessment like this it's fine).

The actual leading edge genes are the interesting part--often when I run GSEA with hallmark, the top hits are ones that are kind of tangential to the actual name of the pathway. Genes that have been observed to be related to e.g. TGFb signaling in experiments, but the exact relationship to TGFb is a bit vague. In these TGFb top hits, however, they are all direct binding partners or known downstream signaling components of TGFb.

As others have noted earlier in this discussion, these results are definitely going to be skewed by the fact that all Genebass phenotype data is going to be self report of CFS matched to a diagnostic code by a random nurse. So not definitive, certainly less confidence in these findings than the Zhang study, but still potentially interesting.

forestglip · May 16, 2025

jnmaciuch said:
Looking through I would say the synapse-related findings are quite weak. CC pathways are quite small to begin with, I imagine there would be difficulty getting much signal from them from a Genebass dataset. Few pathways pass the FDR cutoff, and only 3-4 genes are driving the enrichment that aren't too highly ranked to begin with.

The hallmark TGFb hit is a little more interesting. Despite also not passing the FDR cutoff, there's reason to suspect TGFb a priori and treat it as a separate hypothesis (I probably wouldn't argue this in a paper having already seen the list of results, but for an informal assessment like this it's fine).

The actual leading edge genes are the interesting part--often when I run GSEA with hallmark, the top hits are ones that are kind of tangential to the actual name of the pathway. Genes that have been observed to be related to e.g. TGFb signaling in experiments, but the exact relationship to TGFb is a bit vague. In these TGFb top hits, however, they are all direct binding partners or known downstream signaling components of TGFb.

As others have noted earlier in this discussion, these results are definitely going to be skewed by the fact that all Genebass phenotype data is going to be self report of CFS matched to a diagnostic code by a random nurse. So not definitive, certainly less confidence in these findings than the Zhang study, but still potentially interesting.

Interesting, thanks! I'll take your word for it if you say the neuron related scores aren't particularly interesting. But when I look at the number of neuron components, it seems like a lot near the top? I see that six of the top ten components are related to neurons (at least based on the names). And on the leading edge analysis, it shows that four of these components have no overlapping leading edge genes:

So 4 different neuron related components are among the most highly enriched, more or less independently of each other in terms of specific genes, right?

jnmaciuch · May 16, 2025

forestglip said:
So 4 different neuron related components are among the most highly enriched, more or less independently of each other in terms of specific genes, right?

It is an interesting tidbit, it just has so little strength that I can’t confidently say it isn’t due to pure chance and the relative overrepresentation of synapse-related pathways in the database (even if they don’t contain the same genes). Viewed in light of the Zhang et al findings, it’s interesting, certainly. Another piece of evidence towards the same corner. But just a very weak one at that.

forestglip · May 16, 2025

jnmaciuch said:
It is an interesting tidbit, it just has so little strength that I can’t confidently say it isn’t due to pure chance and the relative overrepresentation of synapse-related pathways in the database (even if they don’t contain the same genes). Viewed in light of the Zhang et al findings, it’s interesting, certainly. Another piece of evidence towards the same corner. But just a very weak one at that.

Cool, I appreciate the insights!

Perrier · May 17, 2025

Jonathan Edwards said:
I am not aware of ME/CFS involving swollen joints. I have never heard of it before and if it was a real association I think I would. People have swollen joints for all sorts of reasons quite commonly so I suspect it is a coincidence. Of course there are swollen joints in Lyme arthropathy but I don't see that as having anything to do with ME/CFS. Otherwise the features of ME/CFS are much the same for everyone I suspect. Latent HSV1, EBV and varicella/zoster give completely different pictures.

Dr. Edwards, not wishing to change the discussion, but curious:
what about fibromyalgia points being painful. A good number of pts seem to have this also. Dr. Moreau in Montreal said with his test he could determine who had this additional symptom and who did not.
Secondly, what about autism. Not sure where I read this, it was some time ago, but a number of people with ME indicated that they had mild autism ( earlier in the thread with regard to Gulf War the gene connected to autism was mentioned, if I recall). Thank you.

Jonathan Edwards · May 17, 2025

Perrier said:
Dr. Edwards, not wishing to change the discussion, but curious: what about fibromyalgia points being painful.

The general view amongst rheumatologists in recent years is that tender points are not specific to fibromyalgia. They are points that are often tender or painful in normal healthy people. Before we had fibromyalgia we had something called fibrositis - which sort of morphed into FM. The commonest site for 'fibrositis' was at the inner border of the shoulder blade. But I have often had pain and tenderness there too and so does almost anyone I ask. Another point I the tennis elbow spot - which is a bit tender in everyone.

Having painful points might indicate some slightly different slant to the process but I would lie to see replication of any claims that someone could relate it to lab findings.

Jonathan Edwards · May 17, 2025

Perrier said:
Secondly, what about autism. Not sure where I read this, it was some time ago, but a number of people with ME indicated that they had mild autism

I have recently had a number of reports from people both off and on the forum that made me think hard about a relation to autism spectrum. I would not be surprised if that were so. Zhang pulled out a gene lined to autism and a number of other genes lin to similar sorts of problem. I thin we going to be discussing that a lot more as time goes by.

Sasha · May 17, 2025

Jonathan Edwards said:
The general view amongst rheumatologists in recent years is that tender points are not specific to fibromyalgia. They are points that are often tender or painful in normal healthy people. Before we had fibromyalgia we had something called fibrositis - which sort of morphed into FM. The commonest site for 'fibrositis' was at the inner border of the shoulder blade. But I have often had pain and tenderness there too and so does almost anyone I ask. Another point I the tennis elbow spot - which is a bit tender in everyone.

Having painful points might indicate some slightly different slant to the process but I would lie to see replication of any claims that someone could relate it to lab findings.

I went to see a rheumatologist about another matter (he was very interested to hear that you were interested in ME). He promptly rammed his thumbs into my joints and surprisingly to me, it made some of them hurt. They'd never given me any trouble, but he diagnosed me with fibromyalgia (which I disagreed with). Maybe if you prod anybody hard enough, their joints will hurt.

wastwater · May 17, 2025

Itaconate transporter SLC13A3
There are a few papers describing its involvement in different diseases

https://m.youtube.com/shorts/qJW6JQ6iFPU

Itaconate shunt update
Earlier videos mention gene SOX3

https://www.genecards.org/cgi-bin/carddisp.pl?gene=SOX3

FOXC1 and PAX5/6 of interest

Braganca · May 17, 2025

Jonathan Edwards said:
We already have strong and clear biomedical causal evidence from Zhang I would say. It is just that it isn't in an easily interpretable form. With regard to DecodeME, I assume that as soon as the preprint goes up someone will start a thread (within half an hour) and it will be at the top of the recent threads and posts page for about a month.

Could someone say in a really basic, brief layman’s terms what it is that has been found and what the theory is? My cognitive function is so bad I can’t remember the complex descriptions and would love to tell friends.

Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)