Genetic Risk Factors for Severe and Fatigue Dominant Long COVID and Commonalities with ME/CFS Identified by Combinatorial Analysis, 2023, Taylor et al

Wyva

Senior Member (Voting Rights)
Abstract

Background Long COVID is a debilitating chronic condition that has affected over 100 million people globally. It is characterized by a diverse array of symptoms, including fatigue, cognitive dysfunction and respiratory problems. Studies have so far largely failed to identify genetic associations, the mechanisms behind the disease, or any common pathophysiology with other conditions such as ME/CFS that present with similar symptoms.

Methods We used a combinatorial analysis approach to identify combinations of genetic variants significantly associated with the development of long COVID and to examine the biological mechanisms underpinning its various symptoms. We compared two subpopulations of long COVID patients from Sano Genetics’ Long COVID GOLD study cohort, focusing on patients with severe or fatigue dominant phenotypes. We evaluated the genetic signatures previously identified in an ME/CFS population against this long COVID population to understand similarities with other fatigue disorders that may be triggered by a prior viral infection. Finally, we also compared the output of this long COVID analysis against known genetic associations in other chronic diseases, including a range of metabolic and neurological disorders, to understand the overlap of pathophysiological mechanisms.

Results Combinatorial analysis identified 73 genes that were highly associated with at least one of the long COVID populations included in this analysis. Of these, 9 genes have prior associations with acute COVID-19, and 14 were differentially expressed in a transcriptomic analysis of long COVID patients. A pathway enrichment analysis revealed that the biological pathways most significantly associated with the 73 long COVID genes were mainly aligned with neurological and cardiometabolic diseases.

Expanded genotype analysis suggests that specific SNX9 genotypes are a significant contributor to the risk of or protection against severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3.

Comparison of the genes uniquely associated with the Severe and Fatigue Dominant long COVID patients revealed significant differences between the pathways enriched in each subgroup. The genes unique to Severe long COVID patients were associated with immune pathways such as myeloid differentiation and macrophage foam cells. Genes unique to the Fatigue Dominant subgroup were enriched in metabolic pathways such as MAPK/JNK signaling. We also identified overlap in the genes associated with Fatigue Dominant long COVID and ME/CFS, including several involved in circadian rhythm regulation and insulin regulation. Overall, 39 SNPs associated in this study with long COVID can be linked to 9 genes identified in a recent combinatorial analysis of ME/CFS patient from UK Biobank.

Among the 73 genes associated with long COVID, 42 are potentially tractable for novel drug discovery approaches, with 13 of these already targeted by drugs in clinical development pipelines. From this analysis for example, we identified TLR4 antagonists as repurposing candidates with potential to protect against long term cognitive impairment pathology caused by SARS-CoV-2. We are currently evaluating the repurposing potential of these drug targets for use in treating long COVID and/or ME/CFS.

Conclusion This study demonstrates the power of combinatorial analytics for stratifying heterogeneous populations in complex diseases that do not have simple monogenic etiologies. These results build upon the genetic findings from combinatorial analyses of severe acute COVID-19 patients and an ME/CFS population and we expect that access to additional independent, larger patient datasets will further improve the disease insights and validate potential treatment options in long COVID.

Preprint
Open access: https://www.medrxiv.org/content/10.1101/2023.07.13.23292611v1

Edit: Now published, see post #17
 
Last edited by a moderator:
Discussion in the paper about the overlap between ME/CFS and long covid

We found that the CLOCK gene is significantly associated with Fatigue Dominant long COVID and ME/CFS.
CLOCK (Circadian Locomotor Output Cycles Kaput) is an important regulator of circadian rhythm, disruptions
of which have been associated with pain, insomnia, insulin resistance, immunological function and impaired
mitochondrial function77,78,79,80,81. Interestingly, one of the most common variants identified in ~86% of the
long COVID Fatigue Dominant population mapped to the gene NLGN1. NLGN1 is also transcriptionally
activated by CLOCK in the forebrain82, which could indicate multiple genetic contributions to dysregulated
circadian rhythm in long COVID.

Of the remaining 4 genes common between long COVID and ME/CFS, we identified 3 common variants in the
genes ATP9A, INSR and SLC15A4 in both Severe and Fatigue Dominant cohorts (Table 7).

SLC15A4 encodes a transmembrane transport that has previously been associated with inflammatory
autoimmune diseases such as systemic lupus erythematosus from genome-wide association studies83,84.
However, SLC15A4 also plays a key role in mitochondrial function, with knock down of the gene resulting in
impaired autophagy and mitochondrial membrane potential under cell stress85.

We also hypothesized that the genetic variants in ATP9A and INSR both contribute to dysregulated insulin
signaling in subgroups of ME/CFS patients. Type 2 diabetes-related signaling pathways and insulin resistance
were also a key theme within the genes associated with long COVID, and 11 of the gene targets identified in
this analysis have prior associations with type 2 diabetes in the OpenTargets database (Supplementary Table
12). Metabolic dysfunction and type 2 diabetes may increase risk of developing severe acute COVID-1986 and
epidemiological studies have demonstrated that there is an increased risk of developing diabetes post COVID-
19 compared against controls who had not been infected with SARS-CoV-287. Furthermore, increased
incidence of insulin resistance and glycemic dysregulation was observed in patients 2 months post COVID-19
and in long COVID patients31,88.
 
Last edited:
Eligible participants (n = 1,996) Is this a meaningful cohort for this type of study ?
Because it’s a combinatorial approach, rather than the traditional GWAS, single – SNP approach, yes, this is a meaningful cohort. I haven’t read the paper and I’m not sure how clear cut the findings are. I’d like to see the results replicated/validated in independent cohorts. They say they’re hoping to do this with long Covid, and, as they said in their ME paper, they are also in discussion with DecodeME.
 
Last edited:
Just catching up on the threads on the ME paper precision life and @Simon M's blog post - really interesting. They find 14 possible genes in the ME/CFS study and see 9 of those pop up here in long covid? Seems like a pretty big overlap on the surface but I feel a bit nervous about interpreting their results with their unusual and black box methodology.

I'm curious about how the statistics work out with finding significant disease signatures here when the number of different things they are measuring is so high. In a GWAS like decodeME with 1 million or so SNPs you obviously need to be extremely careful with multiple testing correction as you're doing 1 million tests, hence why such a low p value of 1x10^-8 seems generally accepted as the required threshold for significance.

Since they're using a combinatorial approach it seems to me the number of effective features/comparisons they are making reaches stratospheric levels. Assuming they're looking for combinations of 3, 4, and 5 SNPs as a disease signatures of 1 million SNPs: there are 1.67e+17 possible combinations of 3 SNPs, 4.17e+22 possible combinations of 4 SNPs, and 8.33e+27 combinations of 5 SNPs. This too big a space to computationally compare every single possible pairwise disease signature so I suppose this is why they adopt this random walk style methodology of connecting between dots to explore the space - and which would also be a good way of finding non-linear relationships too as has been mentioned.

Looks like they do Benjamini-Hochberg to multiple test correct - I don't know if this is sufficient in such an extreme circumstance and also where there random walk style method would also be enriching for areas with a high 'signal' (real or noise) in the first place.
 
article
PrecisionLife identifies first detailed genetic risk factors for long Covid

PrecisionLife, a leading computational biology company driving precision medicine in complex chronic diseases, has announced the results of its long COVID study, providing the first detailed genetic insights into the condition and its commonalities with other diseases, including myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). This analysis opens the door for new precision diagnostic and therapeutic approaches to address the massive unmet medical need affecting the lives of millions of patients.

Long COVID is a debilitating chronic condition that has affected over 100 million people globally. It is characterized by diverse symptoms, including fatigue, cognitive dysfunction, and respiratory problems. Despite considerable global research, studies have so far failed to identify the detailed genetic risk factors, the mechanisms behind the disease, or any common pathophysiology with other conditions that present with similar symptoms, such as ME/CFS.

PrecisionLife used its unique combinatorial analytics approach to compare subpopulations of long COVID patients from Sano Genetics’ long COVID GOLD study. The analysis identified 73 genes that were highly associated with severe and fatigue dominant forms of the disease. Of these, 9 genes have prior associations to acute COVID-19 and 14 were differentially expressed in a transcriptomic analysis of long COVID patients.

To understand similarities with other post-viral, fatigue and other complex disorders, PrecisionLife compared the results of its long COVID analysis against known genetic associations across a range of over 170 neurological, cardiovascular, gastrointestinal, autoimmune, and metabolic diseases. This cross-disease analysis highlighted long COVID risk genes that were also implicated in a wide range of diseases and found that 9 genes associated with long COVID were also found in a recent combinatorial analysis of ME/CFS patients.


This study demonstrates the power of combinatorial analytics for reproducibly stratifying heterogeneous populations in complex diseases, offering new approaches for accurate diagnosis and the development of precision-targeted treatments for patients.

full article
https://www.bioindustry.org/news-li...iled-genetic-risk-factors-for-long-covid.html
 
They outline the methods in a bit more detail in the ME paper and I think I have a slightly better handle on how they're doing the statistics. It seems like they test each disease signature with a fisher's exact test against the overall population (I guess the biobank) - this test basically asks the question, given the known frequency of this disease signature in the population, are we seeing it more than we would expect in this subsetted population (the ME cohort). They then modify the disease signatures such as to maximise these fisher's exact test scores.

Then they generate a kind of null set of results to compare these scores to. They take all the data and randomly label everything as ME or healthy, then they subset out their new 'fake' ME data and do the whole fisher's exact testing as above again. They repeat this step many times to create 'results' where we know there's no real signal.

They then compare properties such as the prevalence of a disease signature against this null distribution to get a p value - something like by asking the question what is the chance of getting a result (disease signature prevalence) as high as this or higher in the null distribution. Then they adjust the p values with benjamini-hochberg.

In other words, they try as hard as they can to find disease signatures in the ME data, then try as hard as they can to find disease signatures in random data - and take the disease signatures that 'out-perform' those found in random data to be valid.
 
In other words, they try as hard as they can to find disease signatures in the ME data, then try as hard as they can to find disease signatures in random data - and take the disease signatures that 'out-perform' those found in random data to be valid.
Thanks for such a helpful explanation.

I agree that the concern is they use a black box method. I think the idea with decodeme is to take a sample, randomly split it into two cohorts, and then see if they can replicate the results from the test cohort in the replication one.

I think that would assuage at least some concerns about the unseen method.
 
Things like this also make it sound like the technique has potential:

MECFS Research Review blog said:
PrecisionLife made the first genetic analysis of Covid, which ran on just 725 patients from the UK Biobank. They found 68 genes of interest and reported that 48 have since been associated with Covid in published papers from other groups.

Would be cool to see it done on GWAS data from other diseases where much more is known about the biology, to see if it provides insights beyond what the GWAS is capable of alone but also correlate with what is known from experimental work.
 
Things like this also make it sound like the technique has potential:



Would be cool to see it done on GWAS data from other diseases where much more is known about the biology, to see if it provides insights beyond what the GWAS is capable of alone but also correlate with what is known from experimental work.

Thanks for the analysis chiller. I noticed this in the transcript/recording of the first NIH Roadmap* - i.e. "[DecodeME] prepared to present some of the preliminary data (from the first 4,500 patients) at the webinar in [1st] November**":

"DR. WHITTEMORE: One other plug I'll make for a future webinar in this series is the Genetic Susceptibility Genomics webinar. Oved Amitay, from Solve ME/CFS Initiative, and I have had several very interesting conversations with people from Precision Life and other groups doing genetic studies that the data actually doesn't exist now yet but is being analyzed and will be presented at that November 1st webinar. And because I do believe that there is not one underlying cause of ME/CFS, but there may be different causes or different underlying pathologies that all lead to the symptomatology we see in ME/CFS. And so, as many of you know, I'm sure, there's a very large genetic study that's being supported in the UK, where they're recruiting 20,000 individuals to do genetic GWAS genomic studies. And they're going to be -- what they've shared with us is that they'll be prepared to present some of the preliminary data from the first 4,500 patients at the webinar in November. So, I think taken from that perspective as well, to really look at the underlying potential genetic variability that we're seeing will be critically important as well."

*link here - https://www.s4me.info/threads/usa-n...ent-26-october-2023.18724/page-18#post-496936
https://event.roseliassociates.com/me-cfs-research-roadmap/recordings/
https://event.roseliassociates.com/...Nervous-System_Open-Session-Webinar_final.pdf
**Genomics/Genetic Susceptibilities—November 1, 2023, 11:00AM ET - https://www.s4me.info/threads/usa-n...ent-26-october-2023.18724/page-18#post-499315
 
Last edited:
what they've shared with us is that they'll be prepared to present some of the preliminary data from the first 4,500 patients at the webinar in November.
Not publicly they won't. In the public webinar Chris plans to talk about the study in general and provide an overview of our published analysis of questionnaire answers. In the private session after the webinar has finished, he will talk in more detail, which will include any initial results that we might have by then.
 
Merged thread

Genetic risk factors for severe and fatigue dominant long COVID and commonalities with ME/CFS identified by combinatorial analysis

Abstract

Background
Long COVID is a debilitating chronic condition that has affected over 100 million people globally. It is characterized by a diverse array of symptoms, including fatigue, cognitive dysfunction and respiratory problems. Studies have so far largely failed to identify genetic associations, the mechanisms behind the disease, or any common pathophysiology with other conditions such as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) that present with similar symptoms.

Methods
We used a combinatorial analysis approach to identify combinations of genetic variants significantly associated with the development of long COVID and to examine the biological mechanisms underpinning its various symptoms. We compared two subpopulations of long COVID patients from Sano Genetics’ Long COVID GOLD study cohort, focusing on patients with severe or fatigue dominant phenotypes. We evaluated the genetic signatures previously identified in an ME/CFS population against this long COVID population to understand similarities with other fatigue disorders that may be triggered by a prior viral infection. Finally, we also compared the output of this long COVID analysis against known genetic associations in other chronic diseases, including a range of metabolic and neurological disorders, to understand the overlap of pathophysiological mechanisms.

Results
Combinatorial analysis identified 73 genes that were highly associated with at least one of the long COVID populations included in this analysis. Of these, 9 genes have prior associations with acute COVID-19, and 14 were differentially expressed in a transcriptomic analysis of long COVID patients. A pathway enrichment analysis revealed that the biological pathways most significantly associated with the 73 long COVID genes were mainly aligned with neurological and cardiometabolic diseases.

Expanded genotype analysis suggests that specific SNX9 genotypes are a significant contributor to the risk of or protection against severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3.

Comparison of the genes uniquely associated with the Severe and Fatigue Dominant long COVID patients revealed significant differences between the pathways enriched in each subgroup. The genes unique to Severe long COVID patients were associated with immune pathways such as myeloid differentiation and macrophage foam cells. Genes unique to the Fatigue Dominant subgroup were enriched in metabolic pathways such as MAPK/JNK signaling. We also identified overlap in the genes associated with Fatigue Dominant long COVID and ME/CFS, including several involved in circadian rhythm regulation and insulin regulation. Overall, 39 SNPs associated in this study with long COVID can be linked to 9 genes identified in a recent combinatorial analysis of ME/CFS patient from UK Biobank.

Among the 73 genes associated with long COVID, 42 are potentially tractable for novel drug discovery approaches, with 13 of these already targeted by drugs in clinical development pipelines. From this analysis for example, we identified TLR4 antagonists as repurposing candidates with potential to protect against long term cognitive impairment pathology caused by SARS-CoV-2. We are currently evaluating the repurposing potential of these drug targets for use in treating long COVID and/or ME/CFS.

Conclusion
This study demonstrates the power of combinatorial analytics for stratifying heterogeneous populations in complex diseases that do not have simple monogenic etiologies. These results build upon the genetic findings from combinatorial analyses of severe acute COVID-19 patients and an ME/CFS population and we expect that access to additional independent, larger patient datasets will further improve the disease insights and validate potential treatment options in long COVID.

https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-023-04588-4
 
Last edited by a moderator:
This is from the same team at PecisionLife that used the same approach in their ME/CFS paper last year (thread, my blog).

They are using small cohorts, even allowing for the power of their combinatorial analysis.

First, they defined Long Covid cases based on symptoms at 3 months (which is quite early). They then analysed two subgroups of the long covid cases:

Severe long covid
(cases = 459 [72% F] and controls = 864). The authors focused on severity as they felt this group were most likely to have a prolonged illness.

Fatigue dominant long covid (cases = 477 [74% F] and controls = 909). this group was selected to use a comparision with MEcfs.

Details below:
Severe long COVID cohort
The Severe long COVID cohort (n = 1,323 where cases = 459 and controls = 864) was selected using the difference in scores reported pre- and post-acute COVID-19 for three long COVID symptom groups—namely, respiratory, fatigue and mental health. Severe cases were defined as those with a ‘Total Change’ score for these symptoms greater than or equal to the upper quartile of the distribution. The controls in this study were defined as samples with a ‘Total Change’ score greater than or equal to 0 but below the median of the distribution.

Fatigue dominant long COVID cohort
The Fatigue Dominant cohort (n = 1,386 where cases = 477 and controls = 909) was selected using only a subset of symptoms relating to fatigue in the scores (‘Fatigue Change') reported for pre- and post-acute COVID-19 symptoms (see Additional file 5: Table S1). The controls in this study were defined as samples with a ‘Fatigue Change’ score greater than or equal to 0 but below the median of the distribution.

The characteristics of the two cohorts are shown in Fig. 2 and described in Table 1, Fig. 1, Fig. 2 and Additional file 5: Figure S4.
Controls all had long covid (mostly with a positive test) but showed minimal change in severity (severe cohort) or in fatigue (fatigue dominant cohort), as shown below. This strikes me as a good way to select controls.

PL long covid.jpg
 
Last edited:
If I understand correctly, they identified 199 SNP in the ME/CFS study of which 24 where also associated with long COVID in the Severe cohort and 27 in the Fatigue Cohort (and 12 in both the severe and fatigue cohort).
 
Merged thread
Article: The first major set of genetic associations found in long COVID

PrecisionLife’s Dr Sayoni Das, a computational biologist who leads the research and development of bioinformatics pipelines that generate biological insights from PrecisionLife’s core technology and support drug discovery programmes, details a new study. Using combinatorial analysis, genetic variants associated with long COVID have been identified and, furthermore, it has been found that TLR4 antagonists may be a potential candidate for repurposing long COVID treatment.


Why has it been challenging to identify genetic risk factors for long COVID?
There is an extensive array of symptoms associated with long COVID, with the most common being fatigue and post-exertional malaise, cognitive dysfunction, mood disturbances and respiratory problems. This is likely indicative of the heterogeneous nature of the disorder, and it is this complexity and diversity of clinical presentation and effects across multiple organ systems, that has made efforts to identify genetic risk factors using traditional genomic analysis approaches extremely challenging.

https://www.drugtargetreview.com/article/113093/the-first-major-set-of-genetic-associations-found-in-long-covid/#:~:text=The genes unique to Severe,JNK signalling and cellular respiration.
 
Last edited by a moderator:
Back
Top Bottom