Multi-ancestry GWAS of [LC] identifies immune-related loci and etiological links to [CFS], fibromyalgia and depression, 2024, Chaudhary et al

Nightsong

Senior Member (Voting Rights)
Multi-ancestry GWAS of Long COVID identifies‬ ‭immune-related loci and etiological links to chronic fatigue‬ syndrome, fibromyalgia and depression‬ ‭

Abstract
The etiology of Long COVID is poorly understood despite its estimated global burden of 65 million cases. There exists a paucity of genetic studies that can shed light on potential mechanisms leading to Long COVID. Using consented and genotyped data from 23andMe adult research participants, we conducted the largest multi-ancestry meta-analysis of genome-wide association studies of Long COVID across European (42,899 cases, 94,721 controls), Latinx (8,631 cases, 20,351 controls), and African-American (2,234 cases, 5,596 controls) genetic ancestry groups. GWAS of Long COVID identified three genome-wide significant loci (HLA-DQA1 and HLA-DQB, ABO, BPTF:KPAN2:C17orf58). Functional analysis of these genes points to underlying immune and thrombo-inflammatory mechanisms. We present evidence of shared genetic architecture (genetic correlation p-value < 0.001) of Long COVID with thirteen phenotypes of similar symptomatology or pathophysiology. We identified potential causal roles from liability to chronic fatigue (Mendelian randomization OR=1.59, 95% CI[1.51,1.66]), fibromyalgia (OR=1.54, 95% CI[1.49,1.60]), and depression (OR=1.53, 95% CI[1.46,1.61]) with Long COVID, which replicated in the COVID-19 Host Genetics Initiative data, and which are unlikely to originate from collider bias. These findings can help identify populations vulnerable to Long COVID and inform future therapeutic approaches.

Link | PDF (23andMe research preprint, October 2024, open access)
 
Genome-wide significant loci: The top GWAS significant hit, (rs9273363; A/C with C being the effect allele, OR= 1.06, 95% CI: 1.04,1.08, p= 3.79X10-11) was located in chromosome 6 in the intergenic region spanning HLA-DQA1 and HLA-DQB1 (Supplemental Figure 1).The effect sizes for rs9273363 were similar across all three ancestries (p-value for heterogeneity across ancestries= 0.75).
Further analysis of specific HLA alleles and Long COVID showed that HLA-DRB1*11:04 (OR=1.18, 95% CI: 1.11, 1.24, p = 7.0 X 10-09), HLA-C*07:01 (OR=0.94, 95%CI = 0.92, 0.96, p = 1.14 X 10-08) , HLA-B*08:01 (OR=0.93, 95%CI = 0.91, 0.95, p = 1.5 X 10-08), and HLA-DQA1*03:01 (OR=0.95, 95% CI = 0.93, 0.97, p = 4.1 X 10-08) were significantly associated with Long COVID. The effect sizes were similar across three ancestries (Supplementary Figure 2).
The second most significant variant was observed at chromosome 9 in the ABO gene (rs644234, G/T with T being the effect allele, OR= 0.95, 95% CI: 0.94, 0.97, p= 3.66 X 10-09, p-value for heterogeneity across ancestries= 0.56) (Supplementary Figure 3). Specifically, rs644234 is in linkage disequilibrium (LD) with a frame-shift variant (rs8176719) within the ABO gene (European r2= 0.97, African-American r2= 0.44, Latinx r2= 0.90). rs8176719 is one of the main variants determining the ABO blood group and has been previously linked to COVID-19 susceptibility and severity (Bugert et al. 2012; Severe Covid-19 GWAS Group et al. 2020; Shelton et al. 2021).
Another association was observed with rs2080090 (A/T with T being the effect allele; OR= 0.95, 95% CI: 0.93, 0.97, p= 1.36X10-08; p-value for heterogeneity across ancestries=0.69) located in vicinity of the BPTF gene on chromosome 17 (Supplementary Figure 4). There are at least 3 plausible causal genes in the locus; BPTF, KPAN2, and C17orf58. We observed eQTL signals in lymphoblast cells for a variant in LD (rs12601921, r2=0.98, p= 4.31X10-89) with rs2080090 for the BPTF gene (Supplementary Table 2a). rs2080090 was also in LD with a coding variant, rs7502307 (r2=0.82), located at C17orf58 (Supplementary Table 2b).
In European ancestry participants, the top associated SNP in this locus (rs78794747, A/T with T being the effect allele) was mapped to the intergenic region of C17orf58 and KPAN2. rs78794747 is in high LD with multiple variants associated with gene expression in CD14 monocytes, frontal cortex, and islets of Langerhans cells (Supplementary Table 2c).
The C17orf58 gene has been associated with posterior myocardial infarction (Norland et al. 2019). KPAN2 encodes nuclear transport factor importin α1, which is associated with viral suppression (Miyamoto et al. 2021). Previous studies have shown that an accessory protein encoded by the SARS-CoV2 genome, ORF6, binds to importin α1, enhancing viral propagation by inhibiting interferon type 1 signaling (International HIV Controllers Study et al. 2010; Yuen et al. 2020; Miyamoto et al. 2021). Ancestry-specific GWAS hits are described further in Supplemental Results and Supplementary Table 3a-3c
We investigated the potential causal role of three phenotypes (chronic fatigue, fibromyalgia, and depression) with Long COVID through Mendelian randomization (MR). We did not consider chronic pain due to its nonspecific definition in the 23andMe database. The genetic instrument for chronic fatigue included 186 SNPs (mean F-statistic per SNP = 43.17), for fibromyalgia included 349 SNPs (mean F-statistic per SNP=45.04), and for depression included 696 SNPs (mean F-statistic per SNP=53.8).
We found strong evidence of a potential causal effect to each of these three conditions on Long COVID (chronic fatigue: OR=1.59 (95%CI: 1.51, 1.66), fibromyalgia: OR=1.54 (95%CI: 1.49, 1.60), and depression: OR=1.53 (95%CI: 1.46, 1.61); estimates from IVW-MR). These effects persisted when employing robust MR approaches including weighted median and MR Egger (Figure 4, Supplementary Figure 9, Supplementary Table 9a). Steiger filtered and outlier filtered estimates (Supplementary Table 9a) were similar to the respective IVW estimates for chronic fatigue and fibromyalgia. However, for depression, the Steiger filtered estimates were closer to the null than IVW estimates.
 
Those odds ratios don't look very flash. e.g. the top GWAS hit had an odds ratio of 1.06.
I'm very ready to believe that there is a lot of noise in the grouping of people into 'Long covid' and 'not Long Covid', but, even so, given the likely large number of 'significant loci', is an odds ratio of 1.06 impressive?

What sort of odds ratios are found for the same sort of study in other diseases?
 
Here's a GWAS for MS that found a genome-wide association with an odds ratio of 1.25. The gene that association was in is the target of an MS drug, so presumably there was a real biological association.
The first GWAS in MS subjects was published in 2007 and was performed by the International Multiple Sclerosis Genetics Consortium (IMSGC), using 931 family trios and a replication cohort of 2322 MS subjects and 2987 controls (International Multiple Sclerosis Genetics Consortium et al. 2007). This first GWAS identified the first-ever genome-wide association in MS, outside the MHC region, rs12722489, which lies in the first intron of IL2RA (odds ratio [OR] = 1.25, p-value = 2.96 × 10−8). The IL2RA gene encodes the α chain of the interleukin-2 receptor, which plays a role in several immune-related pathways (Liao et al. 2011). Interestingly, IL2RA is the target of the MS-approved drug daclizumab (Bielekova et al. 2004).

What exactly does the odds ratio mean in this context? A person with MS in the sampled population was 1.25 times more likely to have a particular allele or SNP in a gene than a person without MS in the sampled population?
 
A previous Long Covid GWAS (with only 6,450 cases) highlighted FOXP4 with with one SNP having an OR of 1.63 [1.40-1.89] but this wasn't replicated in this 23andME GWAS (p=0.57). They say that the previous GWAS found the same effect sizes for the genes they identified but that these were not picked up as significant because of the smaller sample size.
 
In the second part of the paper they looked at whether several conditions including chronic fatigue increased the risk of Long Covid.

Unfortunately, they used the term 'chronic fatigue/myalgic encephalitis' which does not give a lot of confidence that they defined ME/CFS accurately. Table 1 shows that 1.5% of the controls had it - did anyone find how they defined it?
 
Yes it’s an audio summary, sorry if that wasn’t clear. I can easily post the text too if people prefer? (quoted below)

So I don’t completely derail discussions of the papers maybe people can post feedback to this thread
https://www.s4me.info/threads/enhan...h-technology-feedback-and-ideas-wanted.40207/

This scientific paper investigates the genetic basis of Long COVID and its relationship to other chronic conditions.

The study used data from the 23andMe research cohort, which includes genetic and health information from over 2 million adult participants. The researchers focused on three ancestry groups: European, African American, and Latinx.

Key findings:

1. Genes linked to Long COVID: The study identified three genetic locations, called loci, associated with Long COVID. These loci include genes related to the immune system (HLA-DQA1–HLA-DQB), blood type (ABO), and a novel region containing genes BPTF, KPAN2, and C17orf58.

2. Shared genetic architecture: The study found strong genetic correlations between Long COVID and 13 other health conditions, most notably chronic fatigue syndrome, fibromyalgia, and depression. This means individuals genetically predisposed to these conditions may also be at higher risk for Long COVID.

3. Causal relationship: The researchers used a statistical technique called Mendelian Randomization (MR) to explore potential causal relationships. The results suggest that genetic liability to chronic fatigue syndrome, fibromyalgia, and depression increases the risk of developing Long COVID. This causal relationship was further supported by the increased risk of COVID-19 hospitalization observed for individuals with a genetic predisposition to these chronic conditions.

Methods used:

GWAS: The study used Genome-Wide Association Studies, or GWAS, to scan the entire genome for genetic variants linked to Long COVID.
Meta-analysis: Results from the three ancestry groups were combined in a meta-analysis to increase statistical power and identify consistent findings across ancestries.
Mendelian Randomization: This method was used to infer potential causal relationships between genetic liability to chronic conditions and the risk of developing Long COVID.

Limitations:

Self-reported data: The study relied on participants' self-reported diagnosis of Long COVID and other health conditions, which could introduce misclassification bias.
Selection bias: Participants in the 23andMe cohort are self-selected, which may not be representative of the broader population.
Collider bias: This type of bias can occur in studies of COVID-19 due to the shared risk factor of exposure to the virus. However, the researchers addressed this concern and found evidence to alleviate selection bias concerns.

In conclusion, this study identified several genetic factors associated with Long COVID and provided compelling evidence for a causal relationship between genetic liability to chronic conditions like chronic fatigue syndrome, fibromyalgia, and depression and an increased risk of developing Long COVID. These findings advance our understanding of the biological mechanisms underlying Long COVID and suggest potential avenues for risk stratification and treatment strategies.
 
Yes, this is likely an extremely broad self-defined LC definition.

It may be worthy of note that HLA-DQA1 was once potentially implicated in CFS; there was a small (n=49) genotyping study - defined by the old CDC/Fukuda criteria - from 2005 (J Clin Pathol 2005;58:860–863). The result did not survive full correction for multiple comparisons but this is worth mentioning:
The overall frequency distribution of the HLA-DQA1 and HLA-DQB1 alleles in the CFS group was not significantly different from the controls (table 1). However, examination of adjusted residuals indicated that there were differences in the frequency of the HLA-DQA1*01 and HLA-DQB1*06 alleles between the patients and controls. Analysis by 2 x 2 contingency tables revealed an increased frequency of HLA-DQA1*01 in patients with CFS (51.0% v 35%; OR, 1.93; 95% CI, 1.2 to 3.3; p = 0.008). HLA-DQB1*06 was also increased in the patients with CFS (30.2% v 20.0%; OR, 1.73; 95% CI, 0.96 to 3.1), although this was on the borderline of significance (p = 0.052).
The HLA-DRB1, HLA-DQA1, and HLA-DQB1 associations need to be treated with caution because they have not been corrected for multiple comparisons. If corrected for all 22 possible comparisons then significance was lost. However, a less conservative approach of correcting for multiple comparisons at each multiallelic locus (at HLA-DRB1, HLA-DQA1, and HLA-DQB1 separately), suggests a possible association of HLA-DQA1*01 (p = 0.04) with CFS.
 
Last edited:
I can appreciate the good that can come from GWAS research. Validation leaps to the front. Possible therapeutics as well.

I worry, I think, even more about the potential for harm.

What if we find people that get fibro or ME/CFS or LC or late stage Lyme etc, all share common gene traits that people who rarely if ever get those diseases, do not?

Would that breed resentment and prejudice?

Would we be blamed for higher health care costs? Higher insurance premiums? Higher taxes? Could we be DENIED health care or be forced to pay much higher premium/taxes?

Would health politics assume a new vitriol? Normal people vs those - like us - with inferior or corrupted genes? Would life for us devolve into a Darwinian dystopia where eugenics becomes typical dinner table discussion?

I suppose I read one too many Asimov novels back in the 60's.
 
I can appreciate the good that can come from GWAS research. Validation leaps to the front. Possible therapeutics as well.

I worry, I think, even more about the potential for harm.

What if we find people that get fibro or ME/CFS or LC or late stage Lyme etc, all share common gene traits that people who rarely if ever get those diseases, do not?

Would that breed resentment and prejudice?

Would we be blamed for higher health care costs? Higher insurance premiums? Higher taxes? Could we be DENIED health care or be forced to pay much higher premium/taxes?

Would health politics assume a new vitriol? Normal people vs those - like us - with inferior or corrupted genes? Would life for us devolve into a Darwinian dystopia where eugenics becomes typical dinner table discussion?

I suppose I read one too many Asimov novels back in the 60's.


No , not at all. Here ACEs ( Adverse Childhood Experiences" are a thing, even though the research base is a bit dodgy.
It was being used as a predictive tool in cases - an application so far removed from its initial questionnaire basis - truly disturbing.
A little humanity would surely let you appreciate that a childhood full of multiple crap experiences can impact some, and perhaps there are more supportive routes that could be adopted, but then humanity can be sadly lacking, particularly for already marginalized groups
 
I can appreciate the good that can come from GWAS research. Validation leaps to the front. Possible therapeutics as well.

I worry, I think, even more about the potential for harm.

What if we find people that get fibro or ME/CFS or LC or late stage Lyme etc, all share common gene traits that people who rarely if ever get those diseases, do not?

Would that breed resentment and prejudice?

Would we be blamed for higher health care costs? Higher insurance premiums? Higher taxes? Could we be DENIED health care or be forced to pay much higher premium/taxes?

Would health politics assume a new vitriol? Normal people vs those - like us - with inferior or corrupted genes? Would life for us devolve into a Darwinian dystopia where eugenics becomes typical dinner table discussion?

I suppose I read one too many Asimov novels back in the 60's.


Eugenics never went anywhere. So yeah absolutely, there would be a new different type of prejudice to deal with.
 
From 23andMe blog:

Genetic Associations with Long COVID

"In the largest genetic study of its kind to date, 23andMe scientists have identified variants associated with an increased risk of developing Long COVID and also established a genetic link between the potentially debilitating condition and other chronic conditions such as depression, chronic fatigue syndrome, and fibromyalgia.

Published in the preprint server for health scientists known as MedRxiv, the study suggests that genetic differences in how the body’s immune system recognizes and responds to the virus likely influence the chances of developing Long COVID. Among the strongest genetic associations was in the region of the HLA-DRB1 gene, which plays a critical role in the body’s immune response."

---

"In addition to the HLA finding, 23andMe scientists also focused on the role of the ABO gene, which determines a person’s blood type. The role of the ABO gene has been previously identified by 23andMe scientists and others for its role in the severity of COVID-19 in some individuals. In this study, the scientists suggest that blood type or ABO variation may also influence the likelihood of developing Long COVID.

The ABO blood group also has a role in immune response as well as influences the factors that affect the coagulation of blood. This, in turn, may explain why it is associated with both acute COVID-19 and Long COVID cases. The blood types determined by the ABO gene are also linked to blood clotting and inflammation, both hallmarks of Long COVID.

The scientists working on this study also identified several new associations in or near the genes BPTF, C17orf58, and KPNA2. Researchers believe these genes’ role in susceptibility to Long COVID is also related to the defense system against viruses."

---

"This study is part of a series of studies being conducted by 23andMe scientists since the beginning of the pandemic in 2020, and made possible by 23andMe customers who consented to participate in research."

Link
 
Last edited by a moderator:
Would that breed resentment and prejudice?

No. There are loads of diseases like that already and nobody takes any notice of the genes.

The one gene allele that might have got near problematic is HLA-B27, which greatly increases the risk of ankylosing spondylitis. However, it was subsequently found that it makes you almost immune from AIDS or at least staves it off for years. Moreover, the real problem arose for people for whom the diagnosis of ank spond was questionable and a B27 test was done to 'confirm'. The insurance weighting was on the ank spond. We learnt to guide patients through the issues raised. In the end we hardly ever did B27 tests because they didn't alter care.
 
What exactly does the odds ratio mean in this context? A person with MS in the sampled population was 1.25 times more likely to have a particular allele or SNP in a gene than a person without MS in the sampled population?

I read a little about this, and it's not as straightforward as I had thought.

There are two similar stats: relative risk (RR) and odds ratio (OR).
  • RR deals with change of risk - risk being (# people with disease)/(total people).
  • OR deals with change in odds - odds being (# with disease)/(# without disease).
As an example, assume a study looked at a random cohort of 200 people.
  • 20/100 people without a SNP have MS (0.2% risk)
  • 25/100 with the SNP have MS (0.25% risk)
Relative risk would be calculated using the ratio of risk of having MS in each group: 0.25/0.2. RR would be 1.25. The person is 25% more likely to have MS if they have the SNP.

RR is pretty easy to understand. Odds ratio is less intuitive:
  • Using the above example numbers, odds of people without the SNP having MS is 20/80 = 0.25. It's the ratio of people with to without MS.
  • Odds of people with the SNP having MS is 25/75 = 0.33
The odds ratio is these two odds divided: 0.33/0.25 = 1.32.

Meaning the ratio of MS to not MS is 32% higher in a group of people that have the SNP.

---

Given an RR of 1.25. Assume the prevalence of MS in people without the SNP in the real world is 100/1,000,000. In people with the SNP, it'd be 125/1,000,000.

Given an OR of 1.25 and the same prevalence without the SNP. The prevalence with the SNP is 124.996/1,000,000.

They are very similar if real world prevalence is very low, but they trend apart as prevalence increases or RR/OR increases. Assume prevalence of MS without SNP is 100/1000.

RR=1.25: Prevalence with SNP is 125/1000.

OR=1.25: Prevalence with SNP is 122/1000.

The following chart shows how different the two metrics become when the prevalence (or incidence) is very high:
upload_2024-10-13_23-37-49.png

---

When to use the odds ratio or the relative risk?
The relative risk (RR) and the odds ratio (OR) are the two most widely used measures of association in epidemiology. The direct computation of relative risks is feasible if meaningful prevalences or incidences are available. Cross-sectional data may serve to calculate relative risks from prevalences. Cohort study designs allow for the direct calculation of relative risks from incidences. The situation is more complicated for casecontrol studies. If meaningful prevalences or incidences are not available, the OR provides a valid effect measure: It describes the ratio of disease odds given exposure status, or alternatively the ratio of exposure odds given the disease status. Computationally, both approaches lead to the same result. The OR for a given exposure is routinely obtained within logistic models while controlling for confounders. The availability of this approach in standard statistical software largely explains the popularity of this measure. However, it does not have as intuitive an interpretation as the relative risk. This is where problems start: OR’s are often interpreted as if they were equivalent to relative risks while ignoring their meaning as a ratio of odds. It is for instance common to describe an OR of “2” in terms of a “twofold risk” of developing a disease given exposure. This inaccuracy entails potentially serious problems because the OR always overestimates the RR. This can easily be deduced from the mathematical formulas as depicted in Table 1 because of the way the denominators differ.
 
Last edited:
Our top finding links the HLA region, in particular the HLA-DRB1*11:04 variant, to developing Long COVID.

The top GWAS significant hit, (rs9273363; A/C with C being the effect allele, OR= 1.06, 95% CI: 1.04,1.08, p= 3.79X10-11) was located in chromosome 6 in the intergenic region spanning HLA-DQA1 and HLA-DQB1.

In Immunometabolic changes and potential biomarkers in CFS peripheral immune cells revealed by single-cell RNA sequencing (2024, Journal of Translational Medicine) —

Sun et al said:
In comparison to the Con group, the ME/CFS group exhibited a significant upregulation of ligand-receptor pairs, including APP-CD74, HLA-DRB5-CD4, HLA-DMA-CD4, HLA-DMB-CD4, HLA-DPA1-CD4, HLA-DPB1-CD4, HLA-DQA1-CD4, HLA-DQA2-CD4, HLA-DQB1-CD4, HLA-DRA-CD4, HLA-DRB1-CD4, IL16-CD4, and RETN-CAP1.

HLA-DQB1 was also highlighted in Fine mapping of the major histocompatibility complex MHC in myalgic encephalomyelitis/chronic fatigue syndrome ME/CFS suggests involvement of both HLA class I and class II loci (2021, Brain, Behavior, and Immunity) —

the primary association signal in the HLA class II region was located within the HLA-DQ gene region, most likely due to HLA-DQB1, particularly the amino acid position 57 (aspartic acid/alanine) in the peptide binding groove, or an intergenic SNP upstream of HLA-DQB1
 
Back
Top Bottom