Preprint Identification of Novel Reproducible Combinatorial Genetic Risk Factors for [ME] in [DecodeME Cohort] and Commonalities with [LC], 2025, Sardell+

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Identification of Novel Reproducible Combinatorial Genetic Risk Factors for Myalgic Encephalomyelitis in the DecodeME Patient Cohort and Commonalities with Long COVID
Jason Sardell; Sayoni Das; Matthew Pearson; Dmitry Kolobkov; Andrzej Malinowski; Leanne Fullwood; Marianna Sanna; Helen Baxter; Kelly McLellan; Michael Natt; Daphne Lamirel; Sonya Chowdhury; Amy Rochlin; Mark Strivens; Steve Gardner

BACKGROUND
Myalgic encephalomyelitis (also known as ME/CFS or simply ME) has severely impacted the lives of tens of millions of people globally, but the disease currently has no accurate diagnostic tools or effective treatments. Identifying the biological causes of ME has proven challenging due to its wide range of symptoms and affected organs, and the lack of reproducible genetic associations across ME populations. This has prolonged misunderstanding, lack of awareness, and denial of the disease, further harming patients.

METHODS
We used the PrecisionLife combinatorial analytics platform to identify disease signatures (i.e., combinations of 1-4 SNP-genotypes) that are significantly enriched in two cohorts of ME participants from DecodeME relative to controls from UK Biobank (UKB). We tested whether the number of these signatures possessed by an individual is significantly associated with increased prevalence of ME in a third disjoint cohort of DecodeME participants. We characterized a number of drug repurposing opportunities for a set of candidate core genes whose disease signatures had the strongest association with ME and which were linked to different mechanisms. We then tested gene overlap between the ME signatures identified and previous studies in long COVID, using two independent approaches to explore these shared genetic commonalities.

RESULTS
We identified 22,411 reproducible disease signatures, comprising combinations of 7,555 unique SNPs, that are consistently associated with increased prevalence of ME in three disjoint patient cohorts. The count of reproducible signatures was significantly associated with increased prevalence of ME (p = 4x10-21), and participants with a top 10% signature count had an odds ratio of disease 1.64 times greater than participants with a bottom 10% signature count, confirming that these genetic signatures increase susceptibility for developing ME. These disease signatures map to 2,311 genes. We identified substantial overlap between the genes found by this combinatorial analysis and previous studies. We found that the 259 candidate core genes most strongly associated with ME are enriched in disease mechanisms including neurological dysregulation, inflammation, cellular stress responses and calcium signaling. We demonstrated that 76 out of 180 genes previously linked to long COVID in UKB and the US All of Us cohorts are also significantly associated with ME in the DecodeME cohort. These findings allowed identification of many existing and novel repurposing opportunities, including candidates linked to several genes with shared etiology for long COVID.

CONCLUSION
These findings provide further evidence that ME is a complex multisystemic condition where the risk of developing the disease has a very clear genetic and biological basis. They give a substantially deeper level of insight into the genetic risk factors and mechanisms involved in ME. The discovery of so many multiply reproducible genetic associations implies that ME is highly polygenic, which has important consequences for its future study and the delivery of clinical care to patients. The striking overlap in genes and mechanisms between long COVID and ME (76 / 180 long COVID genes tested) suggests the potential for development of novel or repurposed drug therapies that could be used to successfully treat either condition. However, although they share significant genetic commonalities, long COVID and ME appear to be best considered as partially overlapping but different diseases.

Web | DOI | PDF | Preprint: MedRxiv | Open Access
 
Good to see this out!

I don't see a data set or table of genes they found, aside from the replicated DecodeME genes. BTN2A2 was found here too, and CA10, OLFM4, SUDS3, DCC, TRIM38 and quite a few others. That's good news!

Can we say replicated when its the same cohort/dataset but different methodology? Obviously its not the same as a study replicating this data in a different cohort but just wondering on use of terminology.

What do we think of the two drug targets they claim to have found? They suggest ampligen for one and a psoriasis drug for the other.

@Jonathan Edwards was this paper the reason for your cryptic statement about drug targets possibly emerging soon but not before Thanksgiving?

They say more analysis is ongoing to confirm specific things so thats interesting.

I did a search for HLA and found nothing.

Interested to see what everyone else makes of it when people have the capacity.
 
Last edited by a moderator:
I’m glad they are publishing some data, but what’s up with this introduction? Are there nobody in the group that are able to cut through all the babble and stick to what we actually know?
Myalgic encephalomyelitis (also known as ME/CFS or simply ME) is a complex, chronic disease characterized by post-exertional malaise (PEM, sometimes referred to as post-exertional neuroimmune exhaustion PENE (Carruthers et al. 2011) ––
First of all, it’s ME/CFS, not ME or Myalgic Encephalomyelitis. And it’s not any more complex than any other disease. It’s just not very well understood.

Second, PENE is not PEM. PEM describes a temporal pattern of symptoms, PENE describes an assume pathology.
in which symptoms disproportionately worsen, or arise, following minimal physical or mental exertion relative to pre-sickness), as well as neurological components (e.g., unrefreshing sleep, pain, neurocognitive impairment, sensory disturbance), evidence of and cognitive impairment immune/gastro-intestinal and/or genitourinary impairment, and of impairments to energy metabolism/ion transport.
I’m not sure we can say that «impairment to energy metabolism/ion transport» has been established in ME/CFS.
Patients may experience a wide spectrum of other symptoms and comorbidities affecting multiple body systems, including dysautonomia, orthostatic intolerance and postural tachycardia, fibromyalgia, IBS, clinical depression, mast cell disease, and connective tissue differences.
Dysautonomia is a meaningless description. Depression is not a part of ME/CFS, and there is no evidence of a higher prevalence. Mast cell disease is unevidenced, same wih connective tissue differences.

———

I fear that they are harming their own credibility by including these claims, and by extension, the patient group as a whole.
 
Am I missing something or do they never state how many participants were in the test cohort? Also, did they not do anything with the test cohort? I can’t find anywhere in the text that they mention the results of validating on the test cohort besides just mentioning that they had one.

Considering that this method would be extremely prone to overfitting, test set validation is basically the most important part…

[Edit: it is there, I just missed it scrolling through]
 
Last edited:
I fear that they are harming their own credibility by including these claims, and by extension, the patient group as a whole
Agree. It’s disappointing to see and I fear that aspect is likely down to the two charities involved (edit, the 25% ME Group use the term PENE and their materials are cited) . Language more in keeping with DecodeME would be better. Hopefully they can be persuaded on changing this.
I don't see a data set or table of genes they found
Looks like there’s more in the extended data tables in the supplementary materials. Will need someone to go through and extract and discuss what’s relevant, sorry not up to it atm.
 
Last edited:
Also, did they not do anything with the test cohort?
Is it this:
In order to evaluate the predictive power of the double-refined disease signatures, we used the DecodeME Test dataset (cohorts D+H) to test if the count of double-refined signatures is associated with increased odds of ME.
We observed a highly significant correlation between the signature count score and ME in the Test dataset (OR = 1.23 per standard deviation increase, p = 4x10-21) when including sex and the top 10 genetic principal components as confounders in the logistic regression. The disease odds ratio for individuals with a top 10% signature count score relative to individuals with a bottom 10% signature count score was 1.64 (Figure 7), while the odds ratio for the top 5% vs. bottom 5% was 1.89.


Am I missing something or do they never state how many participants were in the test cohort?
I think in figure 1 it says the test cohort has 3,579 cases and 113,735 controls.
 
Comma separated list of the 259 candidate core genes. Nice to see ABCA1 and ABCC6 :

AAMDC, ABCA1, ABCC6, ABHD12, ACOX3, ACTL8, ACTR3C, ADAMTS2, ADPRH, ADTRP, AFF1, AFF3, AKAP2, AKAP6, ALDOB, ANGPT1, ANKS1B, ANO3, ANO4, ARHGAP8, ASB2, ASB3, ASXL3, ATCAY, ATXN1, BAG6, BCCIP, BNC2, BRF1, BTBD2, BTBD7, BTBD9, C18orf63, C1orf87, C20orf173, C3, C4orf45, C5orf47, CACNA1A, CACNA1D, CBFA2T3, CCDC148, CCDC149, CCDC171, CCDC85A, CD22, CD82, CD8B, CDH12, CDH13, CELF2, CEP19, CH25H, CHCHD6, CHL1, CKAP4, CNTN4, COL19A1, COL4A4, COLEC12, COX17, CRYBG2, CSE1L, CSMD1, CTNNA2, CYB5RL, CYFIP1, CYP7B1, DAB1, DCC, DDAH1, DDR1, DENND2A, DHX32, DISP2, DLGAP2, DMAC1, DNAH11, DNAJA4, DNAJC25, DOCK2, DPP3, DPP6, EEPD1, EFCAB5, EHMT1, EPHB1, F13A1, FAM172A, FARP1, FBXO7, FHIT, FNTB, FOCAD, FRAS1, FREM3, FRMD4A, FTO, FUT8, GABBR1, GABRB1, GALNT18, GCNT1, GINS1, GNL3, GPSM2, GRIA1, GRIK1, GRK4, GRM7, HACD1, HIST1H2BE, HIST1H2BF, HS3ST4, IGF1R, IGF2BP3, INPP4B, KANSL1, KAZN, KCNIP4, KCNJ16, KLHDC4, KSR1, LAIR1, LOXL2, LPA, LPP, LRMDA, LRRC74A, LYPD5, MAML2, MAX, MDGA2, MED13L, MED25, MICB, MMAB, MOV10, MRPL37, MSI2, MTX2, MUC16, MYT1L, NBEAL2, NCKAP5, NCOR2, NDST3, NECTIN1, NEDD9, NFIA, NOP9, NOS1AP, NTN1, NTRK3, OR10A6, OR5AC2, OR5V1, PARD3B, PARM1, PARS2, PAX5, PCSK6, PDE1C, PDIA3, PEBP4, PIGX, PKM, PLCB1, PLD5, POPDC2, PPM1N, PPP1R36, PSMB9, PTPRD, PTPRG, PUS10, RAI14, RAP1GAP2, RASGEF1B, RASGRF2, RB1CC1, RBFOX1, RERE, RERGL, RGS7, RHBDD2, RHBDL3, RNF150, RYR2, RYR3, SAMD5, SASH1, SCAMP3, SEC23IP, SEZ6L, SFMBT2, SFTA2, SGSM2, SH3PXD2A, SH3RF3, SLC17A2, SLC25A24, SLC28A1, SLC35F3, SLC39A12, SLC5A10, SLCO2A1, SMARCA2, SNX29, SPOCK1, SPOCK3, SPTLC3, SRGAP1, ST6GAL1, STAB1, STIM2, STOX2, SUB1, SULT1C3, SYTL3, TACC1, TAP1, TENM2, THSD7A, TLR3, TM7SF3, TMEM132C, TMEM260, TMEM63C, TMTC1, TOX3, TPH2, TPX2, TRIM26, TRIM31, TSKU, TTC39C, TTLL11, TUBB, UGGT1, UNC93A, UROS, USP45, USP47, UTRN, VIT, XYLT1, YWHAB, ZC3H3, ZC3H7A, ZIC4, ZMAT4, ZNF282, ZNF283, ZNF385B, ZNF423, ZPLD1, ZSCAN5C, ZZEF1

Related to ABCA1 and ABCC6 we have the following:


ABCC6

https://www.s4me.info/threads/abcc6-and-pathogenic-snps.14251/#post-246766

and ABCA1

https://www.s4me.info/threads/presentation-at-euromene-london-uk.5760/
 
The beginning of the sections about Rintatolimod/Ampligen:
Toll-like receptor 3 (TLR3) is a key component of the innate immune system that acts as a critical sensor for viral double‐stranded RNA in several cell types that are key to host antiviral defense (Vercammen, Staal and Beyaert 2008). Beyond its role in detecting exogenous viral RNA, TLR3 also senses endogenous RNA released by damaged, necrotic, or stressed cells, thereby modulating inflammatory responses (Cavassani et al. 2008). Dysregulated TLR3 signaling can lead to chronic inflammation and tissue damage, exacerbating conditions such as autoimmune diseases, chronic viral infections, and cancer (Mohammad Hosseini et al. 2015; Hsieh et al. 2025)

Rintatolimod is a synthetic double-stranded RNA molecule that acts as a selective agonist of TLR3. On binding to TLR3, rintatolimod activates the MyD88 independent TRIF signaling pathway, leading to the production of interferons and other antiviral proteins without triggering excessive systemic inflammation associated with other dsRNA molecules (Mitchell 2016). It has been investigated in several Phase II/III clinical trials with ME patients, where it has shown statistically significant improvements in primary endpoint using exercise tolerance and some secondary endpoints when compared to placebo (Strayer et al. 2012; Mitchell 2016; Strayer, Young and Mitchell 2020).

I’m beginning to understand why most journals use numbers for references and not APA!

The references for Rintatolimod are:

Strayer DR, Carter WA, Stouch BC et al. A double-blind, placebo-controlled, randomized, clinical trial of the TLR-3 agonist rintatolimod in severe cases of chronic fatigue syndrome. PLoS One 2012;7:e31334.
Thread

Mitchell WM. Efficacy of rintatolimod in the treatment of chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME). Expert Rev Clin Pharmacol 2016;9:755–70.
(No thread, it’s just a narrative piece)

Strayer DR, Young D, Mitchell WM. Effect of disease duration in a randomized Phase III trial of rintatolimod, an immune modulator for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. PLoS One 2020;15:e0240403.
Thread
 

PRESS RELEASE​

Groundbreaking myalgic encephalomyelitis study identifies over 250 core genes, shared biology with long COVID, and dozens of drug repurposing opportunities​

The study reinforces that ME is a complex multisystemic condition with a clear genetic basis and lays the foundation for future clinical trials that could be faster to recruit and more likely to succeed

Read our FAQs

OXFORD, UK – 4 December 2025 – PrecisionLife today announced new findings from the most detailed genetic analysis of myalgic encephalomyelitis (ME, also known as ME/CFS) ever conducted, revealing more than 250 core genes associated with the disease, including 76 genes linked with long COVID, and uncovering dozens of drug repurposing opportunities supported by genetic biomarker tests, offering potential for faster and lower-risk routes to developing targeted treatments.

The study, now available as a pre-print and submitted for peer review, applied PrecisionLife's AI-led combinatorial analytics platform to analyze genomic data from two DecodeME cohorts together with UK Biobank to confirm reproducibility of results across three independent datasets. The analysis identified 7,555 genetic variants (including the 8 identified by the recent DecodeME GWAS study ), that were consistently associated with increased disease risk in three different populations.

These results confirm that ME is a deeply polygenic and biologically heterogeneous condition with at least four major disease mechanisms implicated by genetic signals: neurological dysregulation, inflammation, cellular stress response, and calcium signaling.

The findings have important implications for the future of ME research and treatment. They reinforce the need for a stratified approach, with genetic evidence pointing to multiple biological subgroups within the disease. This means that future clinical trials are likely to be more successful when they target specific patient subtypes rather than treating ME as a single, uniform condition. This also aligns closely with the lived experience of many people in the ME community, who have long recognized the diversity of symptoms and disease patterns.

The study also demonstrated a strong genetic overlap between ME and long COVID, with 76 of 180 genes previously linked to long COVID also significantly associated with ME in the DecodeME dataset. This indicates that ME and long COVID are overlapping but different conditions, where their shared biological pathways offer promising potential for developing drug therapies that could successfully treat patients with either condition.

To support global research efforts, PrecisionLife has published the full list of SNPs and genes identified in this analysis, enabling academic groups, clinicians, and biopharma researchers to accelerate drug repurposing studies, target discovery, and development of new mechanism-based therapies.

ME and long COVID together affect an estimated 400 million people worldwide, contributing more than $1 trillion annually to healthcare costs and lost economic productivity . The lack of diagnostic tools, effective treatments or biological clarity on its causes has contributed to decades of unmet need for patients, and prolonged underinvestment in development of research and healthcare pathways.

Dr Steve Gardner, CEO of PrecisionLife:
“These results reinforce that ME has a clear biological and genetic basis and is a complex multisystemic disease. ME is highly polygenic and heterogeneous, so no single drug will help everyone. Stratifying patients by the mechanisms that are driving their disease will be essential for predicting who will benefit from which therapies and for developing accurate diagnostic tests. We’re beginning to have this level of insight, and we hope that in the future the genetic biomarkers we’ve identified for existing and new drug repurposing candidates could help make trials with collaborators worldwide more successful.”

Sonya Chowdhury, CEO of Action for ME:
“These findings offer further hope to people with ME around the world. For decades, people affected by ME have lacked recognition, access to proper diagnosis and effective treatments. PrecisionLife’s results represent a major step forward in understanding the biology of the disease and provide real opportunities for targeted therapies to move into clinical testing. We are proud that DecodeME has helped pave the way for this progress, and we will continue to champion research that delivers meaningful benefits for the community.”

Prof Chris Ponting, Chair of Medical Bioinformatics at the Institute of Genetics and Cancer, University of Edinburgh, and investigator on the DecodeME study:
“DecodeME was designed to reveal the complex genetics of ME by providing a dataset of the scale and quality required for robust discovery. PrecisionLife has shown how making such datasets available can quickly generate new insights into ME disease biology. This is an exciting outcome of making consented DecodeME data available to research partners and we look forward to enabling further future collaborations.”

Helen Baxter, Patient Advocate & Patient and Public Involvement Representative:
“These results greatly enhance our understanding of the biology of ME and present opportunities for drug repurposing which affords hope to the millions of people living with ME and long COVID around the world.”
The research forms part of the LOCOME (LOng COvid and Myalgic Encephalomyelitis) program, led by PrecisionLife and funded in part by Innovate UK. The LOCOME program was delivered in collaboration with Action for ME and the University of Edinburgh.

The LOCOME partners are especially grateful to the tens of thousands of people with ME who contributed their data to DecodeME, often whilst experiencing pain, fatigue, and brain fog.

This work uses also data provided by patients and collected by the NHS as part of their care and support, for which we are also grateful.The partners also wish to thank the patient and public involvement (PPI) representatives on LOCOME, whose insight and lived experience helped shape the delivery and direction of this project.

We hope the findings published today will encourage biopharma, clinical researchers, and academic groups to work together to advance repurposing candidates into targeted clinical trials, supported by mechanistic biomarkers that can identify those patients most likely to respond.

Read the full study paper here: precisionlife.com/locome-preprint

 
My own understanding:

They could only use a subset of the 14 767 DecodeME cases. In DecodeME the 14 767 participants were done in 4 batches and here they could only use the first 3 due to time reasons which made up 12 531 cases. Now due to quality control they could only use 10 569 of those cases (I'm not quite sure how that works but much of it seems to relate to imputation errors) and roughly only 50% of SNPs of DecodeME. Those 10 569 cases were split into 3 groups: One set of 3405 cases (F), one set of 3585 cases (G) and one Test set of 3579 cases (H). The sex ratios in (G) and (H) are the same but the same thing is not true for (F). They now did 2 sets of analysis with these sets of cases, analysis 1 and analysis 2, in each anaylsis they combined the sets with sets from the "controls in the UK biobank", these are sets (B), (C), (D). In Analysis 1 they used (F)+(B) in the discovery process, then they had a refinement process of only involving UKB cases and controls, then they had a second refinement process by using (G)+(C). Anaylsis 2 was the same process reversed they used (G)+(B) for discovery then used the same group of UKB people in the first refinement process and in the second refinement process they used groups (F)+(C). Finally they used a test set consisting of (H) and (D).

Since (F) and (G) don't have the same sex ratio that would mean the sex ration between controls and DecodeME cases also differed at different steps during discovery and refinement, whilst sex ratios in the test set were matched.

The basic idea seems to be that in each refinement step you throw away things that weaken the signal from the previous analysis step or add things that actually in hind sight make for a strong signal. What the exact methodology behind the different cohorts and steps here is, is hard for me to understand, for example: How do your results look if you use 3 DecodeME cohorts instead of the UKB cohort in refinement step 1, what if you use smaller sample sizes and more refinement steps or larger sample sizes and less refinements? Is the UKB cohort necessary as refinement?

They combined the results of Analyis 1 and Analysis 2 to get something called ‘double-refined’ disease signatures, which is essentially combining 2 sets of analysis of 3 independent cohorts. But it isn't clear to me from the paper how this combination works. Do they just combine all results or do something different?

Did I get something wrong up until here?

In the end they get 22,411 double-refined signatures. They call these reproducible. This seems to be based on the following: You look at the test cohort and count how many of these double-refined signatures are present in patients and controls. You call this signature count score. Here each signature is assumed to have the same effect on disease risk and independent, neither of which is the case. But it is used as proxy of something along the lines: If it turns out that on average patients have more of these signatures than controls in some sense then you have reasons to be optimistic. They performed some logistic regression using something they called combinatorial risk score (I'm not quite sure how that is constructed). They also calculated the odds of ME/CFS for each decile of signature count score.

 
Back
Top Bottom