Preprint Actively Protective Combinatorial Analysis: a Scalable Novel Method for Detecting Variants that Contribute to Reduced Disease…, 2025, Sardell+

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Actively Protective Combinatorial Analysis: a Scalable Novel Method for Detecting Variants that Contribute to Reduced Disease Prevalence in High-Risk Individuals
Jason Sardell; Sayoni Das; Krystyna Taylor; Colin Stubberfield; Andy Malinowski; Mark Strivens; Steve Gardner

We present a novel method for routinely identifying disease resilience associations that offers powerful insights for the discovery of a new class of disease protective targets. We show how this can be used to identify mechanisms in the background of normal cellular biology that work to slow or stop progression of complex, chronic diseases.

Actively protective combinatorial analysis identifies combinations of features that contribute to reducing risk of disease in individuals who remain healthy even though their genomic profile suggests that they have high risk of developing disease. These protective signatures can potentially be used to identify novel drug targets, pharmacogenomic and/or therapeutic mRNA opportunities and to better stratify patients by overall disease risk and mechanistic subtype.

We describe the method and illustrate how it offers increased power for detecting disease-associated genetic variants relative to traditional methods. We exemplify this by identifying individuals who remain healthy despite possessing several disease signatures associated with increased risk of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) or amyotrophic lateral sclerosis (ALS). We then identify combinations of SNP-genotypes significantly associated with reduced disease prevalence in these high-risk protected cohorts.

We discuss how actively protective combinatorial analysis generates novel insights into the genetic drivers of established disease biology and detects gene-disease associations missed by standard statistical approaches such as meta-GWAS. The results support the mechanism of action hypotheses identified in our original causative disease analyses. They also illustrate the potential for development of precision medicine approaches that can increase healthspan by reducing the progression of disease.


Link | PDF (Preprint: MedRxiv) [Open Access]
 
We exemplify this by identifying individuals who remain healthy despite possessing several disease signatures associated with increased risk of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS)
There's a disease signature? I'm intrigued. The rest of the paper looks really interesting, but for now I've only looked at the part about this.
In ME/CFS, there have been no replicated genetic associations found in any GWAS study. Our standard combinatorial analysis identified 84 high-order combinations of SNP-genotypes (“disease signatures”) comprised of 199 SNPs mapping to 14 genes that were significantly enriched in a cohort of self-reported ME/CFS patients from UK Biobank(28). We also replicated several of these gene-disease associations in other UK Biobank ME/CFS cohorts as well as a combinatorial analysis of long COVID patients(13) . A SNP associated with one of these genes was also among 30 candidates tested by an independent statistical association analysis and was the only one that showed replicated association with ME/CFS(29). Notably its association with disease remained significant even after multiple test correction even though that study failed to incorporate the higher-order combinatorial dynamics associated with the gene-disease relationship.

We showed in our previous publication(19) that these causative ME/CFS disease signatures further stratified into 15 clusters (“communities”) representing shared cases and potential mechanisms of action relevant to ME/CFS patient subgroups. To check the clinical relevance of our findings, the phenotypic presentations of the mechanistically stratified subgroups of patients were previously compared to confirm that they do in fact present with symptoms consistent with the mechanism of action hypothesis for the underlying genes (Figure 2). This mechanistic patient stratification captures both linear and non-linear effects on disease biology and enables evaluation of the risk of specific symptoms and likelihood of therapy response. The underlying genotypic disease signatures are highly predictive of disease in UK Biobank (average OR=3.7 with p-values from 10-10 to 10-72 ), which is crucial for moving from identifying disease risks to identifying actively protective mechanisms.

It seems to say a gene was significant in three studies. The following two, and I'm not sure what the third one is because the citation number is wrong. Anyone know what the gene is they're talking about?

28. Genetic Risk Factors for ME/CFS Identified using Combinatorial Analysis, 2022, Das et al
29. Genetic variants associated with chronic fatigue syndrome predict population-level fatigue severity and actigraphic measurements, 2024, Liu et al.
 
Not sure if this is the right third study, but it's another genetic study:
Genetic Risk Factors for Severe and Fatigue Dominant Long COVID and Commonalities with ME/CFS Identified by Combinatorial Analysis, 2023, Taylor et al

Quickly looking at the three studies, it looks like ATP9A was highlighted in all of them. Though both the 2022 and 2024 studies used UK Biobank data. Were they the same individuals?

From the 2024 study:
the gene ATP9A, which encodes an ATPase phospholipid transporter. ATP9A has been shown to be involved in glucose metabolism.

Edit: Maybe it's UK Biobank in all three? Not sure.
 
Last edited:
Epigenome-wide meta-analysis of PTSD across 10 military and civilian cohorts identifies novel methylation loci, 2019, Smith et al (Preprint)
Evaluation of the 50 CpG sites in 42 genes or non-coding RNAs associated with PTSD (FDR<.2; Additional File 1) revealed that several were located in genes previously implicated in psychiatric disorders or pharmacologic treatment response (e.g. AGBL1, ATP9A, CUX1, FLJ46321 aka SPATA31D1, GRIN3A, GOT2, HOXA3, NEUROD2, and SYNJ1).


Using the Coriell Personalized Medicine Collaborative Data to conduct a genome-wide association study of sleep duration, 2015, Scheinfeldt et al (American Journal of Medical Genetics)
We also identified a region on chromosome 20 that contains four genes with suggestive relationships to sleep (Fig. 1). Rs2256551, the SNP we identified in the GWAS is located in the intronic region of ATPase, class II, type 9A (ATP9A), which is upstream to spalt-like transcription factor 4 (SALL4). SALL4 mutations cause Duane radial ray syndrome (Okihiro syndrome), which is associated with narcolepsy [Butterworth and Shneerson, 2014]. ATP9A is also located 1.2 Mb upstream of teashirt zinc finger homeobox 2 (TSHZ2), a gene that was implicated in sleep duration in a GWAS conducted by Gottlieb et al. [2007] but did not reach genome-wide significance in the original analysis. Finally, ATP9A lies downstream of nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 2 (NFATC2), which is associated with narcolepsy [Shimada et al., 2010]. Therefore, it appears that this region may contain multiple loci involved in sleep and sleep-related disorders.


SA18 - ANALYSIS OF GENETIC VARIANTS IN MEXICAN CHILDREN WITH AUTISM SPECTRUM DISORDER: AN IMMUNGENOMIC APPROACH, 2019, Morales et al (European Neuropsychopharmacology)
As we expected, immune system genes are involved in ASD. Further, we found some genes associated with mental diseases. Some SNPs are close to immune genes as TNFRSF19 (P=4×10-4), ATG16L1 (P=2×10-4) within genes as MLIP (P=4×10-5), NUBP1 (P=4×10-4) and ATP9A (P=3×10-4).


Cross-Disorder Genome-Wide Analyses Suggest a Complex Genetic Relationship Between Tourette’s Syndrome and OCD, 2014, Yu et al (The American Journal of Psychiatry)
ATP9A was one of the highlighted genes, but none of the genes in the paper passed their threshold for significance.

Exome sequencing in bipolar disorder reveals shared risk gene AKAP11 with schizophrenia, 2022, Palmer et al (Nature Genetics) (Now published, but I took the quote from the preprint because it's paywalled.)
The combined analysis in BD and schizophrenia cases reveal one exome-wide significant gene, AKAP11 (P = 2.83 × 10-9), and one suggestive gene, ATP9A (P = 5.36 × 10-6).


Homozygosity Haplotype and Whole-Exome Sequencing Analysis to Identify Potentially Functional Rare Variants Involved in Multiple Sclerosis among Sardinian Families, 2021, Fazia et al (Current Issues in Molecular Biology)
Here, we aimed to contribute to the understanding of the genetic basis of MS by investigating potentially functional rare variants. To this end, we analyzed thirteen multiplex Sardinian families with Immunochip genotyping data. For five families, Whole Exome Sequencing (WES) data were also available. Firstly, we performed a non-parametric Homozygosity Haplotype analysis for identifying the Region from Common Ancestor (RCA). Then, on these potential disease-linked RCA, we searched for the presence of rare variants shared by the affected individuals by analyzing WES data. We found: [...] (ii) a variant (50245517 A > C) in the splicing region on exon 16 of ATP9A; [...]


OMIM: Neurodevelopmental disorder with poor growth and behavioral abnormalities (NEDGBA)
In 3 patients from 2 unrelated consanguineous families with NEDGBA, Vogt et al. (2022) identified homozygous loss-of-function mutations in the ATP9A gene [...]

In 2 sisters, born of consanguineous Pakistani parents (family 1) with NEDGBA, Mattioli et al. (2021) identified a homozygous splice site mutation in the ATP9A gene [...]

In 3 children from 2 unrelated families with NEDGBA, Meng et al. (2023) identified homozygous or compound heterozygous nonsense mutations in the ATP9A gene [...]
 
Last edited:
Back
Top Bottom