Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

wigglethemouse · May 21, 2025

forestglip said:
Ok, I've run GSEA on the Zhang genes ranked by attention scores with the hallmark and canonical pathways collections:

This is the Wikipedia entry for NOTCH signalling pathway. No idea what it means.
https://en.wikipedia.org/wiki/Notch_signaling_pathway

The Notch signaling pathway is important for cell-cell communication, which involves gene regulation mechanisms that control multiple cell differentiation processes during embryonic and adult life.

It also goes on to state that Notch signalling has a role in neuronal function and development (among many other roles).

i checked my WGS and am heterozygous for a missense mutation on NOTCH1, the highest ranked gene on your list. Clinvar lists the variant as benign, and CADD = 22.5 which seems high.
I also have a 2nd heterozygous intron variant listed as significant on NOTCH1.

jnmaciuch · May 21, 2025

wigglethemouse said:
This is the Wikipedia entry for NOTCH signalling pathway. No idea what it means.

It just refers to all the signaling that happens as a result of the NOTCH protein binding to one of its ligands, which is a pretty broad category. It’s involved in a lot of biological processes, most notably in early embryonic development and growth, which is why mutations in it are highly associated with cancer. In later life it’s very important for neural plasticity, angiogenesis (new blood vessels), basically any situation where you need new cells to grow.

[Edit: this is a very well characterized pathway so it would not surprise me that the protein-protein interactions are strong in STRING]

forestglip · May 21, 2025

Final thing for now because I am exhausted. I ran GSEA with the cellular component collection since I already ran the same one on the Genebass data.

Link to list of enriched component gene sets in Genebass

I decided not to use collapsePathways here. Since I am comparing if any gene sets match between the two cohorts, I don't want any pathways to be removed that might potentially be in the other dataset.

So here's the top 30 out of 469 gene sets:

And in text form:

GOCC_SWI_SNF_COMPLEX
GOCC_SWI_SNF_SUPERFAMILY_TYPE_COMPLEX
GOCC_MLL1_2_COMPLEX
GOCC_HISTONE_METHYLTRANSFERASE_COMPLEX
GOCC_INO80_COMPLEX
GOCC_NBAF_COMPLEX
GOCC_PRECATALYTIC_SPLICEOSOME
GOCC_U2_TYPE_SPLICEOSOMAL_COMPLEX
GOCC_SPLICEOSOMAL_TRI_SNRNP_COMPLEX
GOCC_TRAPP_COMPLEX
GOCC_METHYLTRANSFERASE_COMPLEX
GOCC_ATPASE_COMPLEX
GOCC_INO80_TYPE_COMPLEX
GOCC_SMALL_NUCLEAR_RIBONUCLEOPROTEIN_COMPLEX
GOCC_U2_SNRNP
GOCC_PRESPLICEOSOME
GOCC_ESCRT_COMPLEX
GOCC_U1_SNRNP
GOCC_SM_LIKE_PROTEIN_FAMILY_COMPLEX
GOCC_PROTEASOME_CORE_COMPLEX
GOCC_PROTEASOME_COMPLEX
GOCC_PROTEIN_PHOSPHATASE_TYPE_2A_COMPLEX
GOCC_U2_TYPE_CATALYTIC_STEP_2_SPLICEOSOME
GOCC_INTEGRIN_COMPLEX
GOCC_TRICARBOXYLIC_ACID_CYCLE_HETEROMERIC_ENZYME_COMPLEX
GOCC_EXCITATORY_SYNAPSE
GOCC_DENDRITIC_SHAFT
GOCC_RNA_CAP_BINDING_COMPLEX
GOCC_CATALYTIC_STEP_2_SPLICEOSOME
GOCC_U5_SNRNP

The only ones in these top 30 that just might be enriched in the Genebass analysis as well (rank and FDR from the Genebass values):

Rank | Gene Set | unadjusted p | FDR
5 | GOCC_ESCRT_COMPLEX | .004 | 0.237.
9 | GOCC_U5_SNRNP | .033 | 0.446
21 |GOCC_EXCITATORY_SYNAPSE | .003 | 0.517

Edit: Fixed rankings in the final 3 listed genes.

Edit: Added unadjusted p to final list.

forestglip · May 21, 2025

wigglethemouse said:
i checked my WGS and am heterozygous for a missense mutation on NOTCH1, the highest ranked gene on your list. Clinvar lists the variant as benign, and CADD = 22.5 which seems high.

I thought it might be interesting, but when I go to Genebass and filter by any genes with "NOTCH" in the name, none of the six NOTCH genes are significant at all.

DMissa · May 21, 2025

Finally working through this paper thread armed with a clear brain, some time better spent on paper writing and most importantly a cuppa

Hutan said:
The hint of biological confirmation in the paper is what is particularly interesting - they looked at some proteomics data (ME/CFS and controls). Of the 9 proteins mentioned in the M9 gene module, 4 proteins had been measured in the proteomic study. And two of the four were lower in the ME/CFS sample.

The proteomics data didn't confirm the other three gene modules that this study identified from the genetic work, although perhaps it just didn't measure the right proteins for those modules. Or the protein differences aren't found in blood, e.g. are only found locally in the tissues, or they get degraded quickly.

Here's a video on the proteasome that I found helpful.

We have some threads that make mention of it to - see the tag.
Intracellular infections can disrupt the function of the proteasome.

proteasome subunits consistently come up as dysregulated in our work in ME cell lines or primary cells from different tissues, have not looked yet at specific overlaps in subunits or directions between tissues, or at functional assays yet, but there's something here even if subtle. However it may just be the case that something as closely married to a central process such as protein translation would be expected to be affected by probably any flavour of aberrant or disrupted homeostasis. There are also a lot of these subunits, so coming up by chance is also possible (I can do the stats to determine the likelihood of this, but it's been less important than other things I am working on)

jnmaciuch said:
I don’t know what the biological relevance would be of encoding more of a certain subunit but not others if those extra subunits don’t form functional complexes on their own

for me the upstream hint is more interesting - what is the mechanism of regulation of the affected gene? that gives you a target to work back against. In terms of downstream function, if one was convinced it was worth pursuing it is directly assayable

separate note, HDAC1 coming up is interesting https://pmc.ncbi.nlm.nih.gov/articles/PMC6787670/

aaand I am out of time for now

SNT Gatchaman · May 21, 2025

HDAC1 also in Integrative Multi-Omics Framework for Causal Gene Discovery in Long COVID (2025, Preprint: MedRxiv)

The gene expression profile showed downregulation of HDAC1, SRC, and TP53, along with upregulation of NDUFA6, genes associated with metabolic regulation, immune response, and cellular stress pathways.

DNA methylation signatures of functional somatic syndromes: Systematic review (2023, Psychosomatic Medicine)

A common epigenetic mechanism across different cellular origins underlies systemic immune dysregulation in an idiopathic autism mouse model (2022, Nature Molecular Psychiatry)

Increased HDAC in association with decreased plasma cortisol in older adults with chronic fatigue syndrome (2011, Brain, Behavior, and Immunity)

Jonathan Edwards · May 21, 2025

Sean said:
Not sure about this. Sometimes it seems to me to have an excitatory effect, at least in as much as it is difficult to calm down.

Indeed, but then that is true for depressive illness it seems. It is all a bit handwaving, especially when alcohol is supposed to be a cerebral depressant and yet it makes people sing rude songs and toboggan down the Main Street.

Simon M · May 21, 2025

Thanks for the responses, @forestglip and @jnmaciuch.

My concern about AUC wasn't anything to do with diagnosis (it's too low to be useful), but as a way to demonstrate the biological validity of the findings. The authors say:

This result highlights the superiority of HEAL2 in generalizability over HEAL, indicating that HEAL2 captured biology-relevant genetic factors contributing to ME/CFS.

My italics above.

In this case, they are stressing the generalisability re the independent test. But, while I can see the argument on generalizability, I don't really see why Heal2 wouldn't also perform better in cross-validation, even if not to the same extent, if it is better at picking up biologically relevant genes.

My bigger concern is that the test cohort is so small at 36 cases/21 controls, and I wouldn't want to hang my hat on that when it comes to evidence of biological relevance.

A solution would be to use UKB for a test cohort. @Hutan suggested the authors probably tried out UKB data and found little, but we know from the recent Samms/Ponting cohort quality paper that the four cohorts available all have issues. However, 95 cases appear in all 4 cohorts, they are probably reliable diagnoses, giving a bigger cohort than Cornell, with the likelihood of a much larger control group (and one matched very well since there are 500k in the UKB). Defining a decent cohort of around 500 should not be too difficult.

That could give greater confidence in all the interesting more specifc results.

Hutan · May 21, 2025

Simon M said:
@Hutan suggested the authors probably tried out UKB data and found little

@forestglip subsequently did the comparative analysis of the identified rare variants on a UK Biobank group labelled CFS (around 2000 people) and found there was little commonality.

But yes, a high quality ME/CFS group and an analysis that focussed on pathways rather than individual variants could be good.

Braganca · May 21, 2025

Curious.. Have any of the authors been invited to this thread, or will some of you update them on what you are finding/ thinking in comments on their preprint?

Simon M · May 21, 2025

Hutan said:
@forestglip subsequently did the comparative analysis of the identified rare variants on a UK Biobank group labelled CFS (around 2000 people) and found there was little commonality.

Sorry, I haven’t been keeping up. Interesting analysis But 28% of that group reported good or excellent health, and they were other issues –, Though I don’t think it’s so bad a cohort because it’s not simply “self-reported“. People People were asked if they had a serious illness or disability diagnosed by a doctor. If they did, a nurse interviewing later asked which illness, without prompting. We don’t know how the nurse recorded the replies, but there were a large number of options to choose from, including chronic fatigue syndrome (but not chronic fatigue or ME) and the nurse could also enter other options in a text box. So it’s quite different from someone checking a CFS tick box

Sasha · May 21, 2025

Simon M said:
My bigger concern is that the test cohort is so small at 36 cases/21 controls, and I wouldn't want to hang my hat on that when it comes to evidence of biological relevance.
A solution would be to use UKB for a test cohort. @Hutan suggested the authors probably tried out UKB data and found little, but we know from the recent Samms/Ponting cohort quality paper that the four cohorts available all have issues. However, 95 cases appear in all 4 cohorts, they are probably reliable diagnoses, giving a bigger cohort than Cornell, with the likelihood of a much larger control group (and one matched very well since there are 500k in the UKB). Defining a decent cohort of around 500 should not be too difficult.

That could give greater confidence in all the interesting more specifc results.

EndME said:
Hi @Andy, I'm aware that precisionLife has been granted access to the DecodeME data. Are you aware of any requests to use the HEAL 2 algorithm, that was recently used in the Zhang et al paper, on the DecodeME data?

Andy said:
Hi. The public details that DecodeME has on projects that have had access approved can be seen here, https://www.decodeme.org.uk/approved-studies/. For confidentiality reasons we wouldn't want to reveal any more information than that is available there.

ETA: Added missing "than".

If no one has already, should someone be contacting Zhang et al. to attempt to replicate using the DecodeME database, and feeding back to them any concerns about design and interpretability discussed here before they do so?

Edit: I forgot that DecodeME is a GWAS and that Zhang et al. look at WGS data...

hotblack · May 21, 2025

Jonathan Edwards said:
Indeed, but then that is true for depressive illness it seems. It is all a bit handwaving, especially when alcohol is supposed to be a cerebral depressant and yet it makes people sing rude songs and toboggan down the Main Street.

Some of the symptoms for us seem to be the body trying to respond to whatever underlying problem exists. At least that’s what it sometimes feels like. That could explain the paradoxical nature?

I also had a period of quite severe depression in my late teens (at least partially triggered by mefloquine) and that had periods of sleeplessness for days, or more wired, even borderline psychosis and plenty of anxiety, etc. So inhibitory and excitatory can often go hand in hand.

This experience has helped me see the significant differences between depression and ME/CFS but it would be interesting if there’s an underlying predisposition to both.

forestglip · May 21, 2025

Simon M said:
My bigger concern is that the test cohort is so small at 36 cases/21 controls, and I wouldn't want to hang my hat on that when it comes to evidence of biological relevance.

I have the same concern about the test set being so small.

Hutan said:
@forestglip subsequently did the comparative analysis of the identified rare variants on a UK Biobank group labelled CFS (around 2000 people) and found there was little commonality.

I wouldn't say their model definitely wouldn't replicate on UK BioBank cohort based on that. The comparison they and I did on the BB is much less sophisticated than if they had actually used their model for classification.

Edit: But issues with participant selection in the BioBank might still water down the already very fairly low AUC.

mariovitali · May 21, 2025

So I did some preprocessing to extract all genes from the links @forestglip gave, identify which genes are duplicated and feeded them to the information retrieval system I have been using.

The results -from my Information Retrieval system- show that we are looking at the following concepts :

Methyltransferases, DNA Methylation, protein stability and protein interaction, Post translational modifications, Proteasome, histone methyltransferases (subset of methyltransferases), histone deacetylase

Plugging the genes that had multiple entries leads to the above results plus ATP hydrolysis which makes it to the ranking list as well as retinoic acid , STAT3 and G Actin.

Interestingly @TamaraRC hypothesis discusses BHMT Gene which is a methyltransferase (tagging @DMissa) . The thread of this hypothesis can be found here : https://www.s4me.info/threads/a-sys...nergic-imbalance-in-me-cfs-2025-carnac.44116/

Just in case, I contacted @TamaraRC to see whether issues in any other methyltransferases could lead to problems according to her hypothesis (mentioning this in case anyone else wishes to comment)

Please note : I did not verify gene extractions manually so far, I will try to do so ASAP.

forestglip · May 21, 2025

mariovitali said:
So I did some preprocessing to extract all genes from the links @forestglip gave, identify which genes are duplicated and feeded them to the information retrieval system I have been using.

Just want to be sure I understand. You noted down all genes that are included in any of the top gene sets? And identified if they're duplicated in what way?

Evergreen · May 21, 2025

Hutan said:
It's still an interesting question - does an infection have lasting implications on sperm health (especially in men who go on to develop ME/CFS)? And, if so, could that tell us something about what might be happening to neurons?

Sperm are cells with a big demand for energy, and, unlike neurons, they are easy to get hold of. That might make for relatively easy investigations. I don't think much is known about sperm from men with ME/CFS.

There are reports of decreased sperm motility many months after infections like Zika. e.g.
Potential effect of Zika virus infection on human male fertility?

Stumbled on this and thought of your question, @Hutan . There seem to be a bunch of studies on COVID-19 and sperm.

In one study, impairments to sperm count, semen volume, motility, sperm morphology and sperm concentration were reported in individuals with long COVID compared with control individuals, and were correlated with elevated levels of cytokines and the presence of caspase 8, caspase 9 and caspase 3 in seminal fluid132.[Quoted in https://www.nature.com/articles/s41579-022-00846-2. Reference 132: https://rep.bioscientifica.com/view/journals/rep/161/3/REP-20-0382.xml

mariovitali · May 21, 2025

forestglip said:
Just want to be sure I understand. You noted down all genes that are included in any of the top gene sets? And identified if they're duplicated in what way?

OK, I began by taking the first set of genes s1 by following this link : https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOCC_SWI_SNF_COMPLEX and clicked on the "(show 30 source identifiers mapped to 30 genes)". This is the first set of genes, which were extracted.

Then I did the same for s2....s29, which is total 30 sets.

I then created a list L1 containing the unique names of these genes and a second list L2 which contained only genes that had more than one appearances across the 30 sets. Then I submitted L1 and L2 to the Information retrieval system.

forestglip · May 21, 2025

mariovitali said:
OK, I began by taking the first set of genes s1 by following this link : https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOCC_SWI_SNF_COMPLEX and clicked on the "(show 30 source identifiers mapped to 30 genes)". This is the first set of genes, which were extracted.

Then I did the same for s2....s29, which is total 30 sets.

I then created a list L1 containing the unique names of these genes and a second list L2 which contained only genes that had more than one appearances across the 30 sets. Then I submitted L1 and L2 to the Information retrieval system.

Oh, I don't think this will tell us much. Most of the genes in those gene sets were not important, and the gene sets themselves might not be the best groupings of the genes that were important. Potentially, the overall gene sets/pathways themselves might provide clues, but I wouldn't do anything with specific genes in them, especially if the attention scores for those genes weren't high at all in the model.

And replication across gene sets isn't useful. It just means a gene is involved in more than one pathway, and we aren't sure which, if any, is the important one in ME/CFS.

mariovitali · May 21, 2025

forestglip said:
Oh, I don't think this will tell us much. Most of the genes in those gene sets were not important, and the gene sets themselves might not be the best groupings of the genes that were important. Potentially, the overall gene sets/pathways themselves might provide clues, but I wouldn't do anything with specific genes in them, especially if the attention scores for those genes weren't high at all in the model.

And replication across gene sets isn't useful. It just means a gene is involved in more than one pathway, and we aren't sure which, if any, is the important one in ME/CFS.

OK makes sense. If you have any set of genes that appear to be important please tag me so I can have a look at them.

Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)