Preprint Identification of Novel Reproducible Combinatorial Genetic Risk Factors for [ME] in [DecodeME Cohort] and Commonalities with [LC], 2025, Sardell+

I suppose their combinatorial analysis can be useful to get new or clearer findings, but in this case, it seems to have made things more complicated and muddled.

Their disease signatures map to 2,311 genes, while humans only have approximately 20.000-25.000 protein-coding genes.
 
I am still working in trying to connect the list of genes from this paper with previous research efforts. Some new findings :


NLGN1 : Appears on the Snyder study (HEAL2)

CYP7B1 : The node of this gene appears on the network analysis (2017) - towards the center bottom :

network_clean.jpeg


CH25H : Also identified by previous work (read also what I mention related to Ubiquitination (Snyder et.al) , see below :

Screenshot 2025-12-05 at 15.25.01.png

Source : https://www.healthrising.org/blog/2023/10/21/ai-driven-chronic-fatigue-syndrome-clues/

UGGT1 : A gene directly linked to N-Linked glycosylation. I believe we will be seeing N-Linked glycosylation in the -hopefully near- future more.

I also believe that Glutamate excitotoxicity is something that needs to be looked at for sure.
 
So I used the DAVID tool to enter the core genes identified by the study and I am providing the results of a Pathway analysis :
First KEGG Pathway Analysis. of particular interest the Glutamatergic synapse entry, GABAergic synapse cc @ME/CFS Science Blog


KEGG Pathway.png
Next Reactome :

Of interest : NR1H3 (=LXRa) tagging @MelbME. Transmission across chemical synapses , NR1H2 is LXRb Receptor , Netrin mediated repulsion signals and O-Linked glycosylation.


reactome pathway.png


Finally here is a functional annotation set from DAVID :


(Cell?) membrane appears to the top, note the Polar residues (=I believe directly related to Aminoacids) and my personal "favorite" which was also identified in the paper by Xiao et.al related to L-Asparagine (a key part of N-Linked glycosylation) and the potential role of glycoproteins in general.

functional_annotations_DAVID.png

EDIT : Removed entries related to FXR which are not part of the shown results
 
Last edited:
(Paywall)

AI Summary:

Chronic fatigue syndrome seems to have a very strong genetic element​

The largest study so far into the genetics of chronic fatigue syndrome, or myalgic encephalomyelitis, has implicated 259 genes – six times more than those identified just four months ago​


The largest genetic study to date suggests that chronic fatigue syndrome, also known as myalgic encephalomyelitis (ME/CFS), has a strong genetic component. Researchers identified links to more than 250 genes, six times more than were reported just four months earlier. The findings may help explain why some people develop ME/CFS after infection while others do not, and may support future treatment development.

ME/CFS is a chronic and often disabling condition. A core symptom is post-exertional malaise, in which even small amounts of activity lead to prolonged exhaustion. Although infections often trigger the illness, its underlying causes remain unclear.

The study analysed genomic data from over 10,500 people diagnosed with ME/CFS, comparing it with data from people without the condition in the UK Biobank. Instead of examining single genetic variants, the researchers looked at groups of interacting variants known as single nucleotide polymorphisms. They identified more than 22,000 such groups associated with ME/CFS risk, and found that having more of these groups increased a person’s likelihood of developing the condition.

The variants were mapped to 2311 genes, of which 259 showed the strongest and most common links to ME/CFS. This represents a substantial increase compared with earlier studies and supports previously identified genomic regions.

The researchers also compared the genetic findings with those from long covid studies. About 42 per cent of genes linked to long covid overlapped with those linked to ME/CFS, suggesting the two conditions partially overlap genetically, though differences in analysis limit firm conclusions.
 
How are people feeling about this study now the dust has settled?

From my perspective it seems like we are no closer to understanding PLs methadology.

I am interested in the (non Ampligen) repurposing opportunities, but more so in what these data can show us combined with other genetic data. Obviously it's all very far over my head. But do we think this is a useful study, or one that muddies that waters somewhat?
 
From my perspective it seems like we are no closer to understanding PLs methadology.
It seems to me that the important parts are laid out in their papers, just that it's somewhat of a complicated process. I don't have the energy to try to go through it and parse it, but maybe this summary of their method I had claude.ai write will be helpful.

This is from giving the AI the methods from the thread paper and from their 2022 paper.

Edit: Had an outline from Claude, but I think it made a mistake in describing the validation method, so here is ChatGPT's outline instead:
Step 1: Prepare the data (standard methods)
- Collect genetic data from people with ME and from healthy controls
- Remove low-quality samples and genetic markers
- Limit analysis to people with similar ancestry to avoid false signals
- Split the data into separate groups:
- Discovery (to find signals)
- Refinement (to check them)
- Test (to confirm results)

Step 2: Search for genetic patterns (custom / proprietary)
- Look for small groups of genetic variants that tend to appear together in people with ME
- These groups can contain 1, 2, 3, or more variants
- The search is guided by rules that focus on patterns common in cases but rare in controls
- Only patterns with strong statistics and seen in enough people are kept
- This step uses a custom algorithm owned by PrecisionLife

Step 3: Check against random data (standard idea, custom implementation)
- Randomly shuffle who is labeled as “case” or “control”
- Repeat the same pattern-finding process many times
- See how often strong-looking patterns appear by chance
- Remove real-data patterns that look similar to those commonly found in random data
- Patterns are judged by strength and frequency, not by exact genetic makeup

Step 4: Initial disease signatures
- The remaining patterns are called “disease signatures”
- These are still only candidates and not yet trusted

Step 5: Test patterns in new groups (mostly standard methods)
- Check whether each signature also appears in a different group of people with ME
- Remove patterns that do not repeat
- Remove genetic variants that do not show consistent effects
- Remove patterns that do not add new information

Step 6: Final disease signatures
- Patterns that pass all previous checks
- These appear consistently across multiple independent groups

Step 7: Group related patterns (custom / proprietary)
- Combine overlapping patterns into networks
- Identify key genetic variants that appear in many patterns
- Measure how strongly each network is linked to ME
- This grouping logic is part of PrecisionLife’s platform

Step 8: Link genes and biology (standard methods)
- Match genetic variants to nearby genes
- Use public databases to learn what those genes do
- Look for shared biological processes (immune system, nerves, metabolism, etc.)

Step 9: Test the overall genetic signal (standard methods)
- Count how many final patterns each person has
- Test whether people with more patterns are more likely to have ME
- Confirm this in a group of people never used earlier

Step 10: Compare with other studies (standard methods)
- Compare results with traditional genetic studies (GWAS)
- Check overlap with genes linked to related conditions like long COVID
- Use results to suggest possible biological explanations

Edit: And this might still not be easy to understand. My goal was to make sure people don't think it's all just a black box, or that we have to trust that whatever they're doing behind the scenes is right. There might be some secret parts, like how they select which combinations to even test since it's impossible to test all of them, but that is more like a preparation step for the actual analysis that is described.
 
Last edited:
It seems to me that the important parts are laid out in their papers, just that it's somewhat of a complicated process. I don't have the energy to try to go through it and parse it, but maybe this summary of their method I had claude.ai write will be helpful.

This is from giving the AI the methods from the thread paper and from their 2022 paper.

Edit: Had an outline from Claude, but I think it made a mistake in describing the validation method, so here is ChatGPT's outline instead:


Edit: And this might still not be easy to understand. My goal was to make sure people don't think it's all just a black box, or that we have to trust that whatever they're doing behind the scenes is right. There might be some secret parts, like how they select which combinations to even test since it's impossible to test all of them, but that is more like a preparation step for the actual analysis that is described.
thank you, this is helpful
 
How are people feeling about this study now the dust has settled?

From my perspective it seems like we are no closer to understanding PLs methadology.

I am interested in the (non Ampligen) repurposing opportunities, but more so in what these data can show us combined with other genetic data. Obviously it's all very far over my head. But do we think this is a useful study, or one that muddies that waters somewhat?
My perspective as a student who has been studying bioinformatics for a couple years is that new methods for slicing and dicing various types of big data come out every month—some of them end up significantly outperforming other methods and becoming the new standard tool, but most of them end up forgotten. A few of them try to make money out of their tool and those usually aren’t the ones that become widely adopted.

Like @forestglip says this method isn’t a black box or magic. It’s comparable to a clever thesis project for someone doing a bioinformatics PhD.

If we were in a situation where we had several potentially efficacious treatments for ME/CFS and a lot of heterogeneity in responders/non-responders, I can see how a tool like this could eventually become useful if it was appropriately standardized and trained on different populations. At present, you can consider it one more study alongside DecodeME, Zhang et al. and others. If this method points in the same direction as other studies, it makes you more confident that the finding was not just an artifact of the algorithm’s particular method of slicing and dicing. But I am not sure if it gives us anything far and above what DecodeME already provided.
 
Back
Top Bottom