Preprint Long COVID Risk Loci Implicated from Genome-Wide Association Studies of COVID-19 Susceptibility and Hospitalization, 2025, Cheng

forestglip

Senior Member (Voting Rights)
Staff member
Preprint posted on MedRxiv, now posted with new title on SSRN, see post #4

Integrative Genome-Wide Association Studies of COVID-19 Susceptibility and Hospitalization Reveal Risk Loci for Long COVID

Zhongshan Cheng

Abstract
Long COVID presents a significant public health challenge, characterized by over 200 reported symptoms spanning multiple organ systems. Despite its complexity, genome-wide association studies (GWAS) offer a pathway to uncovering genetic risk factors, though progress has been hindered by the disorder's symptom heterogeneity and the limited power of available datasets. Recent long COVID GWASs have highlighted key genetic associations, such as variants close to FOXP4, ABO, and HLA-DQA1, while underscoring the limitations posed by small sample sizes, restricted diversity, and misclassification biases.

Here, an integrative analysis using COVID-19 GWAS data from the Host Genetics Initiative (HGI, release 7) was performed by leveraging proxy phenotypes to overcome the challenges of limited sample size or heterogeneity of long COVID cohorts, resulting in 62 independent single nucleotide polymorphisms (SNPs) prioritized as potentially associated with long COVID. These SNPs were categorized into three groups:

(1) severe COVID-19-specific SNPs, such as SNPs mapped to two well-known loci involved in SARS-CoV2 entry (ACE2 [rs190509934] and TMPRSS2 [rs12329760]), as well as variants of DPP9 (rs7251000), FOXP4 (rs12660421) and HLA-DQA1 (rs17219281), exhibiting associations with severe COVID-19 but displaying weaker signals in non-hospitalized COVID-19 cases;

(2) SNPs associated with both severe and mild COVID-19, including the SNP close to ABO (rs505922), representing a catalog of SNPs predispose to both acute COVID-19 and chronic long COVID;

and (3) non-hospitalization-specific SNPs, such as variants in KCTD16 (rs62401842) and WASF3 (rs56143829), highlighting genetic contributors specific to mild COVID-19 cases that might also contribute to long COVID.

Further transcriptome-wide association studies (TWASs) across 48 GTEx tissues, leveraging GWAS data on COVID-19 hospitalization, susceptibility, and long COVID from the HGI consortium, revealed distinct tissue-specific patterns of association. Compared to the acute COVID-19 phenotypes, long COVID exhibited weaker association signals across heart, brain, and muscle-related tissues, as determined by correlations between gene expression of adjacent genes of candidate SNPs (43 out of 62 SNPs) and different COVID phenotypes. Notable TWAS hits included DPP9 (rs7251000), CCR1 (rs17078348), and THBS3 (rs41264915). Phenome-wide TWASs also identified additional significant associations with long COVID related phenotypes, such as HLA-DQA1 (rs17219281), HLA-A (rs9260038), and HLA-C (rs1634761) associated with immune-related diseases, GSDMB (rs9916158) associated with asthma, FOXP4 (rs12660421) linked with sleep duration, highlighting their potential roles in long COVID pathophysiology.

Therefore, current integrative approach offers a scalable framework for long COVID research by maximizing the statistical power of existing large-scale COVID-19 GWASs and provides novel insights into the genetic underpinnings of long COVID.

Link | PDF (Preprint: MedRxiv) [Open Access]
 
Last edited by a moderator:
To identify potential long COVID SNPs specific to non-hospitalized COVID-19 cases […] identified 20 independent SNPs potentially associated with long COVID that specifically emerged from mild (non-hospitalized) COVID-19 cases

All 20 SNPs showed suggestive genome-wide significant differences in effect sizes between HGI-B1 and HGIB2 (P<6E-5). Although none met the genome-wide significance threshold (P<5E-8), notable findings were observed.

rs62401842 (KCTD16) exhibited the strongest association in HGIB1 (P=8.53E-7), while showing no association in HGI-B2 (P=0.38) and a significant difference between HGI-B1 and HGI-B2 (P=3.18E-7). Gene KCTD16, encoding potassium channel tetramerization domaincontaining 16[19], is highly expressed in brain tissues according to the GTEx database.

In addition, eight SNPs[…] were mapped to genes with brain-related functions, including rs9799354 (NLGN1), rs112842080 (CPLX2), rs61858037 (NRG3), rs61939166 (KIF21A), rs367777 (NAV3), rs56143829 (WASF3), rs11454577 (AKAP6), and rs6049828 (SYNDIG1). Among these, WASF3 has been implicated in mitochondrial dysfunction and may mediate exercise intolerance in myalgic encephalomyelitis/chronic fatigue syndrome.

Additionally, among the 20 candidate SNPs identified, only three (rs62401842 [KCTD16], rs9799354 [NLGN1], and rs112842080 [CPLX2]) exhibit male-biased associations with COVID-19 hospitalization at nominal significance levels. Furthermore, six candidate SNPs, including rs62401842 (KCTD16), rs9799354 (NLGN1), rs112842080 (CPLX2), rs56143829 (WASF3), rs4737438 (PENK), and rs6049828 (SYNDIG1), demonstrate sex-biased association patterns within a genomic window ranging from 500kb to 1000kb. However, the limited sample sizes in sex-stratified COVID-19 GWAS preclude any candidate SNPs from achieving statistical significance after correction for multiple testing in analyses of sex-biased associations with COVID-19 hospitalization.

In summary, while no genome-wide significant SNPs specific to nonhospitalized COVID-19 cases were identified, these 20 SNPs provide suggestive evidence of association with long COVID in mild COVID-19.
 
Merged thread

Long COVID Risk Loci Implicated from Genome-Wide Association Studies of COVID-19 Susceptibility and Hospitalization


Abstract:
Long COVID presents a significant public health challenge, characterized by over 200 reported symptoms spanning multiple organ systems, and genome-wide association studies (GWAS) has been hindered by the disorder's symptom heterogeneity and the limited power of available datasets. To overcome the challenges of limited sample size or heterogeneity of long COVID cohorts, an integrative analysis using COVID-19 GWAS data from the Host Genetics Initiative (HGI, release 7) is performed by leveraging proxy phenotypes, resulting in 62 independent single nucleotide polymorphisms (SNPs) prioritized as potentially associated with long COVID.

These SNPs are categorized into three groups: (1) severe COVID-19-specific SNPs, such as SNPs mapped to two well-known loci involved in SARS-CoV2 entry (ACE2 [rs190509934] and TMPRSS2 [rs12329760]), as well as variants of DPP9 (rs7251000), FOXP4 (rs12660421) and HLA-DQA1 (rs17219281), exhibiting associations with severe COVID-19 but displaying weaker signals in non-hospitalized COVID-19 cases; (2) SNPs associated with both severe and mild COVID-19, including the SNP close to ABO (rs505922), representing a catalog of SNPs predispose to both acute COVID-19 and long COVID; and (3) non-hospitalization-specific SNPs, such as variants in KCTD16 (rs62401842) and WASF3 (rs56143829), highlighting genetic contributors specific to mild COVID-19 cases that might also contribute to long COVID.

Further transcriptome-wide association studies (TWASs) across 48 GTEx tissues, reveal that long COVID exhibits weaker association signals across heart, brain, and muscle-related tissues, as determined by correlations between gene expression of adjacent genes of candidate SNPs (43 out of 62 SNPs) and different COVID phenotypes, with notable TWAS hits including DPP9 (rs7251000), CCR1 (rs17078348), and THBS3 (rs41264915). Phenome-wide TWASs also link HLA-DQA1 (rs17219281), HLA-A (rs9260038), and HLA-C (rs1634761), GSDMB (rs9916158) and FOXP4 (rs12660421) with long COVID. Taken together, current integrative approach offers a scalable framework for long COVID research by maximizing the statistical power of existing large-scale COVID-19 GWASs and provides novel insights into the genetic underpinnings of long COVID.

Link (SSRN preprint, March 2025)
 
Last edited by a moderator:
I was trying to work out what this paper is about; there's some sort of wall so I have only seen the abstract.

It looks like one author has undertaken an analysis of the data from the Host Genetics Initiative.

The COVID-19 host genetics initiative brings together the human genetics community to generate, share, and analyze data to learn the genetic determinants of COVID-19 susceptibility, severity, and outcomes. Such discoveries could help to generate hypotheses for drug repurposing, identify individuals at unusually high or low risk, and contribute to global knowledge of the biology of SARS-CoV-2 infection and disease.

Nothing is written in stone other than we must all act together and with no personal gain or ownership of results – just rapid and immediate dissemination of the maximum possible data and information that can be responsibly released.

Aims
The COVID-19 host genetics initiative is a bottom-up collaborative effort that has three main goals:
  1. Provide an environment to foster the sharing of resources to facilitate COVID-19 host genetics research (e.g. protocols, questionnaires).
  2. Organize analytical activities across studies to identify genetic determinants of COVID-19 susceptibility and severity.
  3. Provide a platform to share the results from such activities, as well as the individual-level data where possible, to benefit the broader scientific community.

The Host Genetics Initiative is an association of organisations undertaking genetics work related to Covid-19, with a commitment to sharing data. Organisations like the UK Biobank are partners; there are a lot of partners from a lot of countries. I don't see DecodeME listed as a partner. @Andy, has the DecodeME team considered being involved, at least some time in the future when data can be made available?
 
About Zhong-Shan Cheng:
https://www.stjude.org/directory/c/zhongshan-cheng.html
Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis
Dr. Zhongshan Cheng specializes in human genetics and cancer genomics. He created a bioinformatics pipeline, Post-GWAS Explorer for Functional Indels and SNPs (PExFInS), to integrate genetic variants with expression quantitative trait loci (eQTLs) data and human genome functional annotation data from public databases.

He's been working on addiction biology. I'm not sure what has prompted him to look at Long Covid data, or how much he knows about Long Covid and ME/CFS research, but, from the little I can see, it's good to have him contributing.

Integrative Genome-Wide Association Studies of COVID-19 Susceptibility and Hospitalization Reveal Risk Loci for Long COVID
Is that preprint paper an earlier version of this one?
 
I don't see DecodeME listed as a partner. @Andy, has the DecodeME team considered being involved, at least some time in the future when data can be made available?
First I've heard of it, so no, we haven't considered being involved. However, we are an ME/CFS study, with our cohort being almost exclusively pre-Covid, so we might not fit with them.
 
Potassium Channel Tetramerization Domain-Containing Protein 16 3
"Predicted to enable G protein-coupled neurotransmitter receptor activity involved in regulation of postsynaptic membrane potential and ... presynaptic membrane potential.

Not sure how relevant this is, but, in case:
This gene is linked to epilepsy and Pelizaeus-Merzbacher Disease (PMD).
PMD is part of a group of leukodystrophies affecting the nervous system's white matter and myelin insulation.
Female carriers may also exhibit PMD features. The null syndrome is a mild form of PMD with demyelinating peripheral neuropathy.
 
Latest preprint 14 June 2025

Integrative Genome-Wide Association Studies of COVID-19 Susceptibility and Hospitalization Reveal Risk Loci for Long COVID

Zhongshan Cheng

[Line breaks added]


Abstract
Long COVID presents a significant public health challenge, characterized by over 200 reported symptoms spanning multiple organ systems, and genome-wide association studies (GWAS) have been hindered by the disorder's symptom heterogeneity and the limited power of available datasets.

To overcome the challenges of limited sample size or heterogeneity of long COVID cohorts, a proxy-based, hypothesis-generating strategy is conducted to prioritize candidate risk loci that may contribute to long COVID by analyzing GWAS summary statistics for COVID-19 susceptibility and hospitalization, as well as long COVID from the COVID-19 Host Genetics Initiative (HGI, Release 7), resulting in 62 candidate loci represented by independent single nucleotide polymorphisms (SNPs). These SNPs are categorized into three groups based on their association with acute-phase COVID-19:

(1) severe COVID-19-specific SNPs, such as SNPs mapped to two well-known loci involved in SARS-CoV2 entry (ACE2 [rs190509934] and TMPRSS2 [rs12329760]), as well as variants of DPP9 (rs7251000), FOXP4 (rs12660421) and HLA-DQA1 (rs17219281), exhibiting associations with severe COVID-19 but displaying weaker signals in non-hospitalized COVID-19 cases;

(2) SNPs associated with both severe and mild COVID-19, including the SNP close to ABO (rs505922);

and (3) non-hospitalization-specific SNPs, such as variants in KCTD16 (rs62401842) and WASF3 (rs56143829), highlighting genetic contributors specific to mild COVID-19 cases.

These candidate SNPs are investigated with recently published long COVID GWAS data from HGI, revealing most of these candidate SNPs display much weaker association with long COVID.

Further transcriptome-wide association studies (TWASs) for GWASs of COVID-19 hospitalization/susceptibility/long COVID across 48 GTEx tissues demonstrate that genes adjacent to these candidate SNPs (43 out of 62 SNPs) exhibits relatively weaker association signals in long COVID across heart, brain, and muscle-related tissues, particular for notable TWAS hits, DPP9 (rs7251000), CCR1 (rs17078348), and THBS3 (rs41264915) that display much weaker association in long COVID compared to acute-phase COVID-19.

Phenome-wide TWASs also link HLA-DQA1 (rs17219281), HLA-A (rs9260038), and HLA-C (rs1634761), GSDMB (rs9916158) and FOXP4 (rs12660421) with other phenotypes closely relevant to long COVID.

These results provide new insights to long COVID by leveraging acute-phase COVID-19.

Link | PDF (Preprint: MedRxiv) [Open Access]
 
Last edited by a moderator:
Back
Top Bottom