Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Ok, I've run GSEA on the Zhang genes ranked by attention scores with the hallmark and canonical pathways collections:
This is the Wikipedia entry for NOTCH signalling pathway. No idea what it means.
https://en.wikipedia.org/wiki/Notch_signaling_pathway
The Notch signaling pathway is important for cell-cell communication, which involves gene regulation mechanisms that control multiple cell differentiation processes during embryonic and adult life.
It also goes on to state that Notch signalling has a role in neuronal function and development (among many other roles).

i checked my WGS and am heterozygous for a missense mutation on NOTCH1, the highest ranked gene on your list. Clinvar lists the variant as benign, and CADD = 22.5 which seems high.
I also have a 2nd heterozygous intron variant listed as significant on NOTCH1.
 
Last edited:
This is the Wikipedia entry for NOTCH signalling pathway. No idea what it means.
It just refers to all the signaling that happens as a result of the NOTCH protein binding to one of its ligands, which is a pretty broad category. It’s involved in a lot of biological processes, most notably in early embryonic development and growth, which is why mutations in it are highly associated with cancer. In later life it’s very important for neural plasticity, angiogenesis (new blood vessels), basically any situation where you need new cells to grow.

[Edit: this is a very well characterized pathway so it would not surprise me that the protein-protein interactions are strong in STRING]
 
Last edited:
Final thing for now because I am exhausted. I ran GSEA with the cellular component collection since I already ran the same one on the Genebass data.

Link to list of enriched component gene sets in Genebass

I decided not to use collapsePathways here. Since I am comparing if any gene sets match between the two cohorts, I don't want any pathways to be removed that might potentially be in the other dataset.

So here's the top 30 out of 469 gene sets:
cc_nocollapse_30.png

And in text form:

The only ones in these top 30 that just might be enriched in the Genebass analysis as well (rank and FDR from the Genebass values):

Rank | Gene Set | unadjusted p | FDR
5 | GOCC_ESCRT_COMPLEX | .004 | 0.237.
9 | GOCC_U5_SNRNP | .033 | 0.446
21 |GOCC_EXCITATORY_SYNAPSE | .003 | 0.517

Edit: Fixed rankings in the final 3 listed genes.

Edit: Added unadjusted p to final list.
 
Last edited:
i checked my WGS and am heterozygous for a missense mutation on NOTCH1, the highest ranked gene on your list. Clinvar lists the variant as benign, and CADD = 22.5 which seems high.
I thought it might be interesting, but when I go to Genebass and filter by any genes with "NOTCH" in the name, none of the six NOTCH genes are significant at all.
 
Finally working through this paper thread armed with a clear brain, some time better spent on paper writing and most importantly a cuppa

The hint of biological confirmation in the paper is what is particularly interesting - they looked at some proteomics data (ME/CFS and controls). Of the 9 proteins mentioned in the M9 gene module, 4 proteins had been measured in the proteomic study. And two of the four were lower in the ME/CFS sample.


The proteomics data didn't confirm the other three gene modules that this study identified from the genetic work, although perhaps it just didn't measure the right proteins for those modules. Or the protein differences aren't found in blood, e.g. are only found locally in the tissues, or they get degraded quickly.


Here's a video on the proteasome that I found helpful.

We have some threads that make mention of it to - see the tag.
Intracellular infections can disrupt the function of the proteasome.


proteasome subunits consistently come up as dysregulated in our work in ME cell lines or primary cells from different tissues, have not looked yet at specific overlaps in subunits or directions between tissues, or at functional assays yet, but there's something here even if subtle. However it may just be the case that something as closely married to a central process such as protein translation would be expected to be affected by probably any flavour of aberrant or disrupted homeostasis. There are also a lot of these subunits, so coming up by chance is also possible (I can do the stats to determine the likelihood of this, but it's been less important than other things I am working on)

I don’t know what the biological relevance would be of encoding more of a certain subunit but not others if those extra subunits don’t form functional complexes on their own

for me the upstream hint is more interesting - what is the mechanism of regulation of the affected gene? that gives you a target to work back against. In terms of downstream function, if one was convinced it was worth pursuing it is directly assayable

separate note, HDAC1 coming up is interesting https://pmc.ncbi.nlm.nih.gov/articles/PMC6787670/

aaand I am out of time for now
 
Last edited:
HDAC1 also in Integrative Multi-Omics Framework for Causal Gene Discovery in Long COVID (2025, Preprint: MedRxiv)

The gene expression profile showed downregulation of HDAC1, SRC, and TP53, along with upregulation of NDUFA6, genes associated with metabolic regulation, immune response, and cellular stress pathways.

 
Last edited:
Not sure about this. Sometimes it seems to me to have an excitatory effect, at least in as much as it is difficult to calm down.

Indeed, but then that is true for depressive illness it seems. It is all a bit handwaving, especially when alcohol is supposed to be a cerebral depressant and yet it makes people sing rude songs and toboggan down the Main Street.
 
Thanks for the responses, @forestglip and @jnmaciuch.

My concern about AUC wasn't anything to do with diagnosis (it's too low to be useful), but as a way to demonstrate the biological validity of the findings. The authors say:
This result highlights the superiority of HEAL2 in generalizability over HEAL, indicating that HEAL2 captured biology-relevant genetic factors contributing to ME/CFS.
My italics above.

In this case, they are stressing the generalisability re the independent test. But, while I can see the argument on generalizability, I don't really see why Heal2 wouldn't also perform better in cross-validation, even if not to the same extent, if it is better at picking up biologically relevant genes.

My bigger concern is that the test cohort is so small at 36 cases/21 controls, and I wouldn't want to hang my hat on that when it comes to evidence of biological relevance.

A solution would be to use UKB for a test cohort. @Hutan suggested the authors probably tried out UKB data and found little, but we know from the recent Samms/Ponting cohort quality paper that the four cohorts available all have issues. However, 95 cases appear in all 4 cohorts, they are probably reliable diagnoses, giving a bigger cohort than Cornell, with the likelihood of a much larger control group (and one matched very well since there are 500k in the UKB). Defining a decent cohort of around 500 should not be too difficult.

That could give greater confidence in all the interesting more specifc results.
 
Last edited:
@Hutan suggested the authors probably tried out UKB data and found little
@forestglip subsequently did the comparative analysis of the identified rare variants on a UK Biobank group labelled CFS (around 2000 people) and found there was little commonality.

But yes, a high quality ME/CFS group and an analysis that focussed on pathways rather than individual variants could be good.
 
@forestglip subsequently did the comparative analysis of the identified rare variants on a UK Biobank group labelled CFS (around 2000 people) and found there was little commonality.
Sorry, I haven’t been keeping up. Interesting analysis But 28% of that group reported good or excellent health, and they were other issues –, Though I don’t think it’s so bad a cohort because it’s not simply “self-reported“. People People were asked if they had a serious illness or disability diagnosed by a doctor. If they did, a nurse interviewing later asked which illness, without prompting. We don’t know how the nurse recorded the replies, but there were a large number of options to choose from, including chronic fatigue syndrome (but not chronic fatigue or ME) and the nurse could also enter other options in a text box. So it’s quite different from someone checking a CFS tick box
 
Last edited:
My bigger concern is that the test cohort is so small at 36 cases/21 controls, and I wouldn't want to hang my hat on that when it comes to evidence of biological relevance.
A solution would be to use UKB for a test cohort. @Hutan suggested the authors probably tried out UKB data and found little, but we know from the recent Samms/Ponting cohort quality paper that the four cohorts available all have issues. However, 95 cases appear in all 4 cohorts, they are probably reliable diagnoses, giving a bigger cohort than Cornell, with the likelihood of a much larger control group (and one matched very well since there are 500k in the UKB). Defining a decent cohort of around 500 should not be too difficult.

That could give greater confidence in all the interesting more specifc results.
Hi @Andy, I'm aware that precisionLife has been granted access to the DecodeME data. Are you aware of any requests to use the HEAL 2 algorithm, that was recently used in the Zhang et al paper, on the DecodeME data?

Hi. The public details that DecodeME has on projects that have had access approved can be seen here, https://www.decodeme.org.uk/approved-studies/. For confidentiality reasons we wouldn't want to reveal any more information than that is available there.

ETA: Added missing "than".

If no one has already, should someone be contacting Zhang et al. to attempt to replicate using the DecodeME database, and feeding back to them any concerns about design and interpretability discussed here before they do so?

Edit: I forgot that DecodeME is a GWAS and that Zhang et al. look at WGS data... o_O
 
Last edited:
Indeed, but then that is true for depressive illness it seems. It is all a bit handwaving, especially when alcohol is supposed to be a cerebral depressant and yet it makes people sing rude songs and toboggan down the Main Street.
Some of the symptoms for us seem to be the body trying to respond to whatever underlying problem exists. At least that’s what it sometimes feels like. That could explain the paradoxical nature?

I also had a period of quite severe depression in my late teens (at least partially triggered by mefloquine) and that had periods of sleeplessness for days, or more wired, even borderline psychosis and plenty of anxiety, etc. So inhibitory and excitatory can often go hand in hand.

This experience has helped me see the significant differences between depression and ME/CFS but it would be interesting if there’s an underlying predisposition to both.
 
Some of the symptoms for us seem to be the body trying to respond to whatever underlying problem exists. At least that’s what it sometimes feels like. That could explain the paradoxical nature?

I also had a period of quite severe depression in my late teens (at least partially triggered by mefloquine) and that had periods of sleeplessness for days, or more wired, even borderline psychosis and plenty of anxiety, etc. So inhibitory and excitatory can often go hand in hand.

This experience has helped me see the significant differences between depression and ME/CFS but it would be interesting if there’s an underlying predisposition to both.

Like you I seemed to have an episode of depression some years prior to my actual ME diagnosis, although my experience was nothing but inhibatory. Interestingly my father had a bipolar diagnosis.

I do worry a bit how that particular finding might be misused by the BPS lot.
 
My bigger concern is that the test cohort is so small at 36 cases/21 controls, and I wouldn't want to hang my hat on that when it comes to evidence of biological relevance.
I have the same concern about the test set being so small.

@forestglip subsequently did the comparative analysis of the identified rare variants on a UK Biobank group labelled CFS (around 2000 people) and found there was little commonality.
I wouldn't say their model definitely wouldn't replicate on UK BioBank cohort based on that. The comparison they and I did on the BB is much less sophisticated than if they had actually used their model for classification.

Edit: But issues with participant selection in the BioBank might still water down the already very fairly low AUC.
 
Last edited:
So I did some preprocessing to extract all genes from the links @forestglip gave, identify which genes are duplicated and feeded them to the information retrieval system I have been using.

The results -from my Information Retrieval system- show that we are looking at the following concepts :

Methyltransferases, DNA Methylation, protein stability and protein interaction, Post translational modifications, Proteasome, histone methyltransferases (subset of methyltransferases), histone deacetylase

Plugging the genes that had multiple entries leads to the above results plus ATP hydrolysis which makes it to the ranking list as well as retinoic acid , STAT3 and G Actin.

Interestingly @TamaraRC hypothesis discusses BHMT Gene which is a methyltransferase (tagging @DMissa) . The thread of this hypothesis can be found here : https://www.s4me.info/threads/a-sys...nergic-imbalance-in-me-cfs-2025-carnac.44116/

Just in case, I contacted @TamaraRC to see whether issues in any other methyltransferases could lead to problems according to her hypothesis (mentioning this in case anyone else wishes to comment)

Please note : I did not verify gene extractions manually so far, I will try to do so ASAP.
 
So I did some preprocessing to extract all genes from the links @forestglip gave, identify which genes are duplicated and feeded them to the information retrieval system I have been using.
Just want to be sure I understand. You noted down all genes that are included in any of the top gene sets? And identified if they're duplicated in what way?
 
It's still an interesting question - does an infection have lasting implications on sperm health (especially in men who go on to develop ME/CFS)? And, if so, could that tell us something about what might be happening to neurons?

Sperm are cells with a big demand for energy, and, unlike neurons, they are easy to get hold of. That might make for relatively easy investigations. I don't think much is known about sperm from men with ME/CFS.

There are reports of decreased sperm motility many months after infections like Zika. e.g.
Potential effect of Zika virus infection on human male fertility?
Stumbled on this and thought of your question, @Hutan . There seem to be a bunch of studies on COVID-19 and sperm.
In one study, impairments to sperm count, semen volume, motility, sperm morphology and sperm concentration were reported in individuals with long COVID compared with control individuals, and were correlated with elevated levels of cytokines and the presence of caspase 8, caspase 9 and caspase 3 in seminal fluid132.[Quoted in https://www.nature.com/articles/s41579-022-00846-2. Reference 132: https://rep.bioscientifica.com/view/journals/rep/161/3/REP-20-0382.xml
 
Just want to be sure I understand. You noted down all genes that are included in any of the top gene sets? And identified if they're duplicated in what way?

OK, I began by taking the first set of genes s1 by following this link : https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOCC_SWI_SNF_COMPLEX and clicked on the "(show 30 source identifiers mapped to 30 genes)". This is the first set of genes, which were extracted.

Then I did the same for s2....s29, which is total 30 sets.

I then created a list L1 containing the unique names of these genes and a second list L2 which contained only genes that had more than one appearances across the 30 sets. Then I submitted L1 and L2 to the Information retrieval system.
 
OK, I began by taking the first set of genes s1 by following this link : https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOCC_SWI_SNF_COMPLEX and clicked on the "(show 30 source identifiers mapped to 30 genes)". This is the first set of genes, which were extracted.

Then I did the same for s2....s29, which is total 30 sets.

I then created a list L1 containing the unique names of these genes and a second list L2 which contained only genes that had more than one appearances across the 30 sets. Then I submitted L1 and L2 to the Information retrieval system.
Oh, I don't think this will tell us much. Most of the genes in those gene sets were not important, and the gene sets themselves might not be the best groupings of the genes that were important. Potentially, the overall gene sets/pathways themselves might provide clues, but I wouldn't do anything with specific genes in them, especially if the attention scores for those genes weren't high at all in the model.

And replication across gene sets isn't useful. It just means a gene is involved in more than one pathway, and we aren't sure which, if any, is the important one in ME/CFS.
 
Back
Top