Sasha
Senior Member (Voting Rights)
But isn't whole genome sequencing fishing for rare SNPs?I think big numbers is probably more important for SNP fishing, although I know less about the rare gene approach.
But isn't whole genome sequencing fishing for rare SNPs?I think big numbers is probably more important for SNP fishing, although I know less about the rare gene approach.
Excellen thanks @forestglip !The authors have sent me Supplementary Table 2 and allowed me to share it here.
Thanks. I think you might get better results if you don't mention ME/CFS and just ask it if there are any patterns in these genes that were found to be abnormal. Otherwise it will try to connect it to popular memes in ME/CFS research such as inflammation, mitochondrial dysfunction etc.FWIW I ran the top 20 genes through Gemma3:27b on my laptop. The prompt was "I'm going to give you a list of 20 genes. Find interactions to explain mechanisms that produce the disease ME/CFS."
One striking theme is the prevalence of genes involved in intracellular signaling and synaptic function, including NLGN1, NLGN2, DLG2, DLGAP1–4, SYNGAP1, GRM1, CAMK2A, and HOMER2. These genes are closely associated with postsynaptic density and play essential roles in neurodevelopment and synaptic plasticity. Many of them have been implicated in neurodevelopmental disorders such as autism spectrum disorder and intellectual disability, particularly CHD8 and SYNGAP1.
Another major cluster involves oncogenic and cell cycle regulatory pathways, notably genes like DNMT3A, DNMT3B, PIK3CA, NRAS, NOTCH1, RET, CDC6, CDC23, BUB3, HDAC1, E2F6, and SMARCD3. These are commonly mutated or dysregulated in cancers, including hematologic malignancies and solid tumors. Their abnormalities suggest either a proliferative or epigenetic disruption in affected tissues.
There is also a group tied to metabolic regulation and mitochondrial function, including NAMPT, AK2, AK3, PANK1, PANK2, PANK3, GALT, COASY, CA2, and PPCDC. These genes regulate energy metabolism, cofactor biosynthesis, and redox states, hinting at altered metabolic homeostasis, which can be relevant both in cancer and neurodegenerative conditions.
Genes such as INS, LEP, and ADCY10 point toward involvement in endocrine signaling and homeostasis, possibly implicating insulin signaling, glucose metabolism, and neuroendocrine integration.
A fourth group involves immune function and inflammatory response, with entries like IL12A, HLA-C, STAM2, and NFATC3. These suggest dysregulation in immune signaling pathways, which may overlap with inflammation-driven oncogenesis or autoimmune phenomena.
Finally, there’s a surprising overrepresentation of proteasome components and protein degradation machinery, such as PSMB3, PSMB4, PSMB5, PSMD7, PSMC3, and PSMC5. This supports a pattern of disrupted protein turnover or stress responses, which again can tie into either neurodegenerative diseases or cancer biology.
No one study anywhere in science stands alone. Even if it's not as robust as a study with 20,000 people, it still adds to the weight of the evidence, and seeing replications between DecodeME and this study would strengthen the findings. Even a giant study will likely have some meaningless findings come up due to chance. Seeing the same findings in different populations using different methodologies decreases the likelihood of that being the case for those genes, and helps prioritize research directions.If a small, deep-learning study spits out lots of statistically significant genetic associations, do we take them seriously or not? If we have to wait for better information, is this information not reliable? And if it isn't, what's the point of doing such studies?
But until we know that’s the case, I don’t think we should take it on trust. I gather the AUC for the replication cohort wasn’t very impressive here, which doesn’t inspire confidence.That’s key isn’t it. These are newer methods to find the signals in the data and it is hoped they will find things more efficiently
It was nearly the same as their AUC on their training cohort, which is impressive in itself. I would lose confidence if it was a high AUC in training and near 0.5 in test.But until we know that’s the case, I don’t think we should take it on trust. I gather the AUC for the replication cohort wasn’t very impressive here, which doesn’t inspire confidence.
Not necessarily, I think the point of a rare variant analysis is to reaffirm that loss of function in those pathways is in fact critical for disease pathology.If we will only have confidence in these results if they're confirmed by different forms of analysis that we can be confident in, I don't see what information these results are adding. Isn't this the very definition of confirmation bias?
I think it’ll be great when these findings are published because they will finally give us a fairly solid reference point. At the moment, it’s not that hard to find study results to support most theories. Thesse reference points will make much of the literature more interpretable.and what will hopefully come out of DecodeME.
Interesting point. Not speaking from expertise or experience but I would think these networks are sufficiently complex so that it still means quite a lot if multiple genes from a pathway are highlighted.An important point is that the HEAL2 algorithm leverages the STING database to incorporate protein-protein interactions in its attention mechanism. What this means for interpretation is that multiple genes that are all part of the same network are likely to score higher cumulatively. Therefore, having more genes in the same pathway should not necessarily be taken as evidence of the pathway’s relevance above other pathways.
On the other hand: the AUC is measured against current diagnostic practices of ME/CFS which may not be very precise anyway in terms of pathology. Suppose only a small subgroup has pathology involving synaptic function, then the maximum AUC score would be quite low.I gather the AUC for the replication cohort wasn’t very impressive here, which doesn’t inspire confidence.
But isn't whole genome sequencing fishing for rare SNPs?
They are quite complex, however they're highly biased by known interactions in the literature. If nobody thought it was interesting to check if protein A binds to protein B, it’s not going to end up in the database even if it’s very relevant to the disease. And the algorithm has a cut off for number of interactions, so it’s going to be heavily biased by what has been extensively studied already. It’s the same across all biology—something that has already been well characterized continues to get more attention simply because it has already been well characterized.Interesting point. Not speaking from expertise or experience but I would think these networks are sufficiently complex so that it still means quite a lot if multiple genes from a pathway are highlighted.
https://www.actionforme.org.uk/sequenceme-first-of-a-kind-genetic-study/ Dec 2024But I’d be surprised if they got funding for as many of 17,000.
The partners are working together to secure funding for a study which will analyse the entire genetic code of up to 17,000 people with Myalgic Encephalomyelitis (ME) in a bid to uncover the genetic causes of the illness.
Over 17,000 participants who donated saliva samples to DecodeME have consented to further analysis and he SequenceME partners will seek to analyse them all.
This quarter, the study partners concluded a successful pilot phase by completing any-length sequencing of ten individual samples from the DecodeME library, demonstrating the high accuracy and scalability of the study method. The next phase, involving sequencing of 10,000 participants, requires £7 million in funding.
Thanks again @forestglip for prompting them to provide access.FYI, the MedRxiv page is updated with all three supplementary tables.
If we will only have confidence in these results if they're confirmed by different forms of analysis that we can be confident in, I don't see what information these results are adding. Isn't this the very definition of confirmation bias?
That analogy only holds up if we can be certain that we're getting a true signal from both sources that's merely degraded by noise. I think we can take a standard GWAS or WGS analysis as providing a true signal plus noise but do we know that about this machine-learning technique? Could it b simply rubbish plus noise?The confidence comes from the combination. Think of it like two people listening to a piece of music, one through the wall of a concert hall and the other on a really crummy radio with interference. The first person says 'I am. pretty sure it's Beethoven from some of the harmonies but I really can't tell which one. The other person says 'There is definitely singing as well as orchestra so it is either Das Leid von Der Erde or Matthew Passion or Beethoven's 9th.
So it's Beethoven's 9th.
My layman’s understanding is that the machine learning model doesn’t add info. The algorithm it applies might be rubbish, but it doesn’t change the underlying data.That analogy only holds up if we can be certain that we're getting a true signal from both sources that's merely degraded by noise. I think we can take a standard GWAS or WGS analysis as providing a true signal plus noise but do we know that about this machine-learning technique? Could it b simply rubbish plus noise?