Multi-System Genetic Architecture of Hypermobile Ehlers–Danlos Syndrome: Integrating [ML] with Subject-Level Genomic Analysis, 2026, Shirvani+

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Multi-System Genetic Architecture of Hypermobile Ehlers–Danlos Syndrome: Integrating Machine Learning with Subject-Level Genomic Analysis
Shirvani, Arash; Shirvani, Purusha; Holick, Michael F

BACKGROUND/OBJECTIVES
Hypermobile Ehlers–Danlos syndrome (hEDS) remains genetically unexplained despite decades of clinical investigation, with the molecular basis undefined for the vast majority of cases. This study employs integrated machine learning approaches with rigorous subject-level statistical methods to decode the genetic architecture underlying hEDS.

METHODS
We analyzed 35,923 rare genetic variants (gnomAD MAF < 0.2) across 116 subjects from 43 families (86 hEDS patients diagnosed per 2017 international criteria; 30 unaffected intrafamilial controls) using whole-exome sequencing. Machine learning analysis employed Random Forest feature selection, deep neural networks, and ensemble methods with subject-stratified cross-validation to prevent data leakage. Statistical association testing used subject-level Fishers exact tests with Bonferroni correction (α = 3.77 × 10−6 for 13,281 genes). Sensitivity analyses assessed robustness to family structure.

RESULTS
Subject-level analysis identified statistically significant enrichment in variants associated with three major biological systems: (1) collagen biosynthesis pathway variants (present in 63% of hEDS subjects vs. 17% of controls, Fishers p = 1.06 × 10−5, OR = 8.4), predominantly affecting COL5A1, COL18A1, COL17A1, and post-translational modification enzymes; (2) HLA/adaptive immune axis variants (74% of hEDS vs. 30% of controls, p = 2.23 × 10−5, OR = 6.8), involving HLA-B, HLA-A, HLA-C, and TAP transporters; (3) mitochondrial respiratory chain variants (34% of hEDS vs. 7% of controls, p = 2.29 × 10−3, OR = 7.1), with striking 4.2-fold enrichment in pediatric fracture cases (52% vs. 21%, p = 0.021, 95% CI: 1.2–14.6). These associations require independent validation and functional studies to determine their mechanistic relevance. Genome-wide analysis identified seven genes achieving Bonferroni significance (p < 3.77 × 10−6), all encoding structural/cytoskeletal proteins. Machine learning models with proper subject-stratified cross-validation achieved 80% accuracy (95% CI: 73–86%, sensitivity = 82%, specificity = 77%).

CONCLUSIONS
Our findings suggest that hEDS may involve genetic variation across multiple biological systems beyond classical collagen pathways. These hypothesis-generating associations require validation in independent cohorts and functional studies before mechanistic or clinical conclusions can be drawn.

Web | DOI | PDF | Genes | Open Access
 
the Ehlers–Danlos Syndrome Clinical Research Program and the Ehlers–Danlos Syndrome Translational Genomics Research Laboratory at Boston University School of Medicine were established. Our program represents one of the largest comprehensive EDS research initiatives in the United States, combining clinical expertise in diagnosing and managing EDS patients with cutting-edge genomic technologies and computational approaches. Through systematic clinical phenotyping and genomic analysis of affected families, we aim to uncover the genetic architecture underlying hEDS and translate these findings into improved diagnostic and therapeutic strategies.

Machine learning (ML) approaches represent a paradigm shift in how we analyze complex genetic data, particularly for conditions with suspected polygenic architecture and substantial genetic heterogeneity. Unlike traditional statistical methods that typically examine one variant (or gene) at a time, ML algorithms can simultaneously consider thousands of variants and identify complex, non-linear relationships between genetic features and phenotypes.

The persistent genetic mystery of hEDS, combined with the potential of ML to uncover complex genetic associations, creates a compelling opportunity for discovery. By applying integrated machine learning approaches to a well-characterized hEDS cohort, we hypothesized that we could identify previously unrecognized genetic variants and patterns associated with disease.

Specifically, this study aimed to (1) analyze genome-wide genetic variation in a cohort of hEDS patients and unaffected family controls using multiple ML algorithms, (2) identify statistically significant variants and genes associated with hEDS phenotypes, (3) employ proper cross-validation strategies to ensure that identified associations represent genuine biological signals rather than artifacts of data analysis, and (4) provide a foundation for future functional validation and precision medicine approaches in hEDS.

Acknowledged Limitations: We recognize that intrafamilial controls do not fully satisfy independence assumptions of Fisher’s exact tests. This limitation is discussed in Section 4.6 and results should be interpreted with appropriate caution pending replication with unrelated controls.
 
They proceed with the discussion around the findings framed very carefully as exploratory and hypothesis generating.

Our findings reveal three categories of genetic enrichment, with each observed in distinct proportions of patients. We emphasize that these represent statistical associations generating hypotheses for future mechanistic studies, not established pathogenic mechanisms. First, HLA/adaptive immune gene variants showed the highest prevalence (74% of patients in our cohort), representing a notable statistical enrichment. The enrichment of HLA-B, HLA-A, HLA-C, and HLA-DQA1 variants, along with TAP transporter genes involved in antigen processing, raises the hypothesis that immune-related genetic variation may contribute to hEDS susceptibility.

Collagen pathway variants were observed in 63% of patients in our cohort, a statistically significant enrichment compared to controls. The enrichment of COL5A1, COL18A1, and COL17A1 along with modification enzyme genes PLOD1-3 is consistent with a role for collagen-related genes, though the absence of identifiable collagen variants in 37% of our hEDS cohort suggests that additional genetic factors may contribute to disease susceptibility. These observations require replication in independent cohorts before conclusions about the relative importance of different genetic pathways can be drawn.

Third, mitochondrial respiratory chain gene variants were enriched in 34% of hEDS patients overall, with 4.2-fold higher prevalence in the pediatric fracture subset (52% vs. 21%). The observed enrichment across Complex I, III, IV, and V genes raises the hypothesis that mitochondrial function may be relevant to hEDS, particularly in patients with skeletal fragility. However, we emphasize that genetic variant enrichment does not establish functional mitochondrial dysfunction.

We emphasize that the biological pathways discussed represent statistical enrichments of genetic variants, not validated functional mechanisms. The terms “immune dysregulation”, “collagen dysfunction”, and “mitochondrial impairment” describe the biological systems in which enriched variants are annotated, not confirmed mechanistic contributions to hEDS pathogenesis. Our study provides genetic associations that prioritize hypotheses for such functional studies but does not itself provide functional evidence.

It is essential, however, to clearly distinguish between two fundamentally different categories of information presented here. The first category comprises our empirical statistical findings, namely the observed enrichment of variants in specific genes among hEDS patients compared to controls, which are subject to the methodological limitations we have discussed. The second category consists of known biological functions of these genes as established in the prior literature, independent of our study, which we cite to provide context for why the enriched genes might represent biologically plausible candidates worthy of further investigation.

Ideally, rescue experiments showing that correction of the variant reverses the phenotypic effects would provide the most compelling evidence for causality. Until such rigorous functional validation is performed, all biological interpretations offered in this discussion should be understood as hypothesis generating frameworks based on established gene functions rather than demonstrated pathogenic mechanisms operative in hEDS.

All seven genome-wide significant genes (p < 3.77 × 10−6 ), FLG-AS1, PCDHGA1, SYNE1, RELN, OBSCN, HSPG2, and KRT74, encode proteins annotated with structural or cytoskeletal functions. These proteins are involved in nuclear envelope integrity (SYNE1), extracellular matrix organization (HSPG2), cell adhesion (PCDHGA1), and cytoskeletal architecture (OBSCN, NEB). While this pattern is intriguing and suggests a hypothesis that mechanical tissue properties may be relevant to hEDS, we emphasize that statistical enrichment does not establish functional impairment of these proteins in our patients. Direct experimental validation is required.

We explicitly avoid claiming that our findings establish mechanisms, identify causal variants, or support clinical applications. All such interpretations require independent replication as a prerequisite.

This exploratory study provides the first comprehensive computational genetics analysis of hEDS employing rigorous subject-level statistical methods and proper machine learning cross-validation strategies. Our findings identify statistical enrichments across multiple gene categories, such as structural proteins, HLA/immune genes, and mitochondrial genes, generating the hypothesis that hEDS genetic architecture may extend beyond classical collagen pathways.
 
Last edited:
I am not sure why they just used family controls. Presumably it avoids population ethnic variation problems at least to some extent but it is hard to know how these figures would compare with. general population.

The hEDS probands seem to have lots of rare variants - more than one each - so presumably they are accepting that this is a polygenic situation (i.e. not EDS per se). Some of the collagen gene variants might effectively be monogenic disease determiners but that cannot be true of all the gene variants.

I have not looked at the criteria use but wonder if it includes symptoms as well as just hypermobility.
That might mean that probands have gained a diagnosis for symptoms that are in fact due to some immunological disorder unrelated to any hEDS structural status. I think using the starting pointof 'hEDS' is as problematic as one would expect it to be here.

It is conceivable that some hypermobility relates to e.g. HLA through some immune mechanism that might also be relevant to ME/CFS but to pin that down I think the terminology needs to be clarified. (For instance HLA-B27 is associated with spinal stiffening as ank spond, so it is conceivable that there are immune processes associated with looseness, although that would be much more difficult to understand.)
 
I'd say it's definitely getting increasingly difficult for 'hEDS' sceptics to maintain their position. No one ever claimed there will be easy answers. Clustering them 'somehow' is important and necessary for non-monogenetic diseases to come up with a 'polygenetic score'. This can then potentially be a sensible diagnostic marker/tool and also acts as a starting point for pathomechanism research. I like their cautious wording!
 
Would it be better, in terms of genetics, to select the sample by taking a large group of undifferentiated people, say adolescents in local schools, and testing them for joint hypermobility, regardless of whether they have any other symptoms or conditions. You would then have a sample you could split into hypermobile and controls, with the hypermobile group further subdivided according to whether they have symptoms that may be related to collagen problems.
 
Would it be better, in terms of genetics, to select the sample by taking a large group of undifferentiated people, say adolescents in local schools, and testing them for joint hypermobility, regardless of whether they have any other symptoms or conditions. You would then have a sample you could split into hypermobile and controls, with the hypermobile group further subdivided according to whether they have symptoms that may be related to collagen problems.

That’s essentially what has already happened with the subgrouping of “hEDS” vs “hypermobility.”

My impression is that many 'sceptics' simply haven’t actually read the hEDS diagnostic criteria. They often confuse the formal criteria with what some people describe as “comorbidities” in hEDS, which are largely not relevant for the diagnosis.

'hEDS' is essentially already a 'hypermobility plus', whether or not it is largely an arbitrary distinction doesn't matter all that much as long as you get large enough sample sizes to learn more about it's potential genetics.
 
That’s essentially what has already happened with the subgrouping of “hEDS” vs “hypermobility.”

My impression is that many 'sceptics' simply haven’t actually read the hEDS diagnostic criteria. They often confuse the formal criteria with what some people describe as “comorbidities” in hEDS, which are largely not relevant for the diagnosis.

'hEDS' is essentially already a 'hypermobility plus', whether or not it is largely an arbitrary distinction doesn't matter all that much as long as you get large enough sample sizes to learn more about it's potential genetics.
Yes, I get that, but if you don't compare genetics of hypermobility with and without hEDS additional symptoms, how will you know whether the genetics is different for these 2 groups?

If they turn out to be the same, that would suggest something else non genetic triggers hypermobile individuals to have the additional symptoms.

If they turn out to be different that could be very useful in narrowing down what predisposes hypermobile people to have HEDS.

Has any study been done to separate the 2 different hypermobile groups genetically?
 
But if this is a polygenic phenomenon then there will be no 'syndrome' because the morbidities and prognoses will be different for evey individual case. hEDS will turn out to have been an arbitrary and spurious category. Even hypermobility, as per Beighton, was always known to be a spurious category because there are so many different patterns, several of them related to specific races and some but not others related to sex.

The whole point of these categories is to generalise prognosis from one lot of patients to a new one. If we are dealing with a. vast array of overlapping factors there is no reason to think prognosis can be extrapolated beyond the carriage of any single on eof them
 
If they turn out to be different that could be very useful in narrowing down what predisposes hypermobile people to have HEDS.

As originally defined the hypermobile type of EDS was just extreme hypermobility so there is nothing more to predispose to. It was defined as not bing multisytem. ROdnay Graham, Anne Gabell and I wrote a paper purporting to smash this claim by shoing that they had mitral valve prolapse. There were two problems though. One was that our data did not actually show that - someone seemed to have played around with them after I collected them. The other was that of course if we were studying people with mitral valve problems then they didn't have H type EDS by definition anyway. Someone was trying to prove a self-contradiction. And so it has gone on for another 45 years.
 
Genome-wide analysis identified seven genes achieving Bonferroni significance (p < 3.77 × 10−6), all encoding structural/cytoskeletal proteins.
To my mind, the most straightforward interpretation of this is that they have found genes that contribute to the hypermobility in hEDS, and that all of the other hits are artefacts from the various other symptoms that are required by the criteria.

It does not demonstrate that those criteria are valid or sensible.
 
I'd say it's definitely getting increasingly difficult for 'hEDS' sceptics to maintain their position. No one ever claimed there will be easy answers. Clustering them 'somehow' is important and necessary for non-monogenetic diseases to come up with a 'polygenetic score'. This can then potentially be a sensible diagnostic marker/tool and also acts as a starting point for pathomechanism research. I like their cautious wording!
I think there could be a difference between who meets criteria for inclusion in studies on hEDS and who gets diagnosed with hEDS in practice. My sense is there is more skepticism of the latter? I could be wrong.
 
I think there could be a difference between who meets criteria for inclusion in studies on hEDS and who gets diagnosed with hEDS in practice. My sense is there is more skepticism of the latter? I could be wrong.

For me the problem is confusing true monogenic EDS with a pure H presentation - for which some (one per person) genes have been found but not many - and the modren idea of 'hEDS' which is just hypermobility as far as I can see. I cannot see any reason to classify people as hEDS. Hypermobility is OK, but even that is pretty useless because there are so many different types.
 
Back
Top Bottom