Preprint Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis, 2025, Zhang+

Can we make a table of the proteins and functions for these and maybe check further down for sister genes where they show up?

DNMT3A. DNA methyltransferase 3 alpha Epigenetic DNA methylation -gene expression control
ADCY10 adenylyl cyclase 10. Formation of cAMP
PPP2R2A. Protein Phosphatase 2 Regulatory Subunit Balpha. Cell cycle
NLGN2. Neuroligin 2. Synapse formation
LEP Leptin Weight control/appetite
SYNGAP1. Synaptic Ras GTPase Activating Protein 1. Synapses MAPkinase signalling
AHCYL2. Adenosylhomocysteinase Like 2. Brain signalling
NLGN1. Neuroligin 1 Synapse formation
DLGAP4. DLG Associated Protein 4. Synapses
HDAC1. Histone Deacetylase 1. Regulation of gene transcription
AMPD2 adenosine monophosphate deaminase 2
AHCYL1. Adenosylhomocysteinase Like 1. Anti-inflammatory cytokine production
SHARPIN. SHANK Associated RH Domain Interactor. Signalling in auto inflammation. Synapses
NME2. NME/NM23 Nucleoside Diphosphate Kinase 2. DNA transcription. Risk factor for EBV-associated lymphoma
NME1-NME2. NME1-NME2 Readthrough. DNA transcription
CACNA2D3. Linked to brain development and autism TCR signalling
NME3. NME/NM23 Nucleoside Diphosphate Kinase 3. Goes with NME1
ZC3H13. NME/NM23 Nucleoside Diphosphate Kinase 3. RNA splicing
CAMK2A. Calcium/Calmodulin Dependent Protein Kinase II Alpha. Synapses on dendrites in brain
PIK3CA. Phosphatidylinositol-4,5-Bisphosphate 3-Kinase Catalytic Subunit Alpha. Insulin responses, brain development
MAX. MYC Associated Factor X. DNA transcription
HLA-C HLA-C (MHC I) CD8 T cell receptor and NK cell receptor recognition events
ACE. Angiotensin I Converting Enzyme

We should keep a wide purview. While many of these genes are flagged as neural and relating to synaptic function, they can have important non-canonical roles. The potential link with autism development is fascinating but autism spectrum disorder has other features beyond the neurodevelopmental, eg gastrointestinal dysfunction. So while their effect on synapse formation and maintenance would be important in the primary neurodevelopmental abnormalities we observe, the very common comorbid problems with gut function might be more due to epithelial tight junctions and barrier integrity than with the gut's neural connections.

Tons of us have OI. Does anyone know what synapses would have to be messed up for OI to occur? Are any OI-type genes showing up in the Zhang results?

We should also consider that genes identified in OI might relate more directly to vascular (esp. endothelial cell) function rather than neuronal synapses. Eg NLGN1 and NLGN2 are expressed in vascular endothelial cells. There's also potentially an endocrine aspect, as NLGN2 is also expressed in pancreatic beta cells (insulin secretion), so maybe there's also a link between the suggested GIP secretion (splanchnic vasodilator) and post-prandial increased POTS/OI symptoms.

Neuroligin 1 Induces Blood Vessel Maturation by Cooperating with the α6 Integrin (2014, Journal of Biological Chemistry)

Modulation of Angiopoietin 2 release from endothelial cells and angiogenesis by the synaptic protein Neuroligin 2 (2018, Biochemical and Biophysical Research Communications)

Altered Pancreatic Islet Function and Morphology in Mice Lacking the Beta-Cell Surface Protein Neuroligin-2 (2013, PLOS ONE)

Worsening Postural Tachycardia Syndrome Is Associated With Increased Glucose-Dependent Insulinotropic Polypeptide Secretion (2022, Hypertension)
 
After looking at the paper again, I realized I should have done my GSEA analysis using p-values, not attention scores. I don't fully understand their method for interpreting the model, but p-values are the metric they used for choosing the top 115 that they say are the most important. It didn't really change the results much since p-value rankings and attention score rankings are pretty similar. (But an exception, for example, is that HLA-C is ranked 22 using p-values and 2077 using attention scores). Again, very similar results on the Genebass CFS data, but the run with p-values can be seen on the last link here.

But I thought it might also be useful to just see the top ranked clusters using the Zhang HEAL2 list of genes. Basically another way to do it from their method of taking their top ranked 115 genes and seeing what pathways these specific genes are enriched in, without considering ranking. Instead I uploaded all of the genes with their -log10 p values to STRING so that it looks at all of them and weighs them by the metric that is apparently useful for interpreting the most important genes.

Here is the link to the enrichment analysis with many different gene set collections. (The page is pretty slow to load and interact with.) I merged items by similarity greater than 0.5, and filtered by FDR < .001 and enrichment score > 1.0. ("Top of input" for direction is what's interesting since it relates to to genes with low p-values.) Filters can be changed near the bottom of the page.

Here are the top 10 STRING local clusters from its protein-protein interaction network, which I think should be most comparable to the enrichment analysis they did that found synaptic function and proteasomes.

Local Network Cluster (STRING)
1749820661113.png

There are some other interesting gene sets in other collections, though. For example, in DISEASES, there are the following four that match my filters. This includes COVID and two diseases that may be related to sex differences,. If I filter with a less conservative FDR, then autistic disorder is the highest enriched disease with FDR=.0078.

Disease-gene Associations (DISEASES)
1749820466326.png

For the "Tissue Expression (TISSUES)" collection, the two gene sets are "autonomic nervous system" and "brain ventricle". For "Human Phenotype (Monarch)", the top ranked phenotype is "myeloid leukemia".

Tissue Expression (TISSUES)
1749820521793.png

Human Phenotype (Monarch)
1749820407091.png
 
Just finished a quick read of this paper. Overall, this seems promising. I look forward to seeing if this paper’s findings match those of analyses performed by other groups on different data. And the discussion so far in this thread has been insightful. I wish I had the bioinformatics background to contribute.

I do have a bit of background in deep learning research. Some thoughts from that perspective:

  1. In the deep learning literature, when you present a complex model, you are typically asked by reviewers to perform an ablation study. This involves systematically removing different components of your system and reporting how this removal affects the system’s performance. Ablation studies reveal which components of the system are most important. I would be interested in seeing an ablation study for HEAL2. I appreciate that the authors compare HEAL1 to HEAL2, but I would like to see finer granularity. This could involve ablating the various input-preprocessing steps, as well as numerous components of the neural architecture. I expect this would shed useful light on how the model is making its predictions.
  2. Much of the practical value of this paper comes from the list of “ME/CFS risk genes” that it generates. These risk genes are a consequence of attention scores. The use of attention scores for model explanation is reasonable and fairly standard, though not without some controversy (See here and here). It would be interesting to see whether other model explanation techniques (for example, integrated gradients) produce concordant lists of risk genes. If so, this would increase confidence in the results.
 
Back
Top Bottom