Rare variant contribution to human disease in 281,104 UK Biobank exomes, 2021, Wang et al

Discussion in 'Other health news and research' started by Andy, Aug 19, 2021.

  1. Andy

    Andy Committee Member

    Messages:
    22,422
    Location:
    Hampshire, UK
    Abstract

    Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variation to common disease remains relatively unexplored. The UK Biobank (UKB) contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the impact of rare variation on a broad collection of traits1,2. Here, we studied the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UKB participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UKB participants of African, East Asian, or South Asian ancestry. Together, our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal (http://azphewas.com/).

    Open access, https://www.nature.com/articles/s41586-021-03855-y
     
    Ash, Amw66, johnnydme and 10 others like this.
  2. Andy

    Andy Committee Member

    Messages:
    22,422
    Location:
    Hampshire, UK
  3. alktipping

    alktipping Senior Member (Voting Rights)

    Messages:
    1,261
    did they check to see which genes may have been switched on or of as a comparison .
     
    Ash, Lindberg and Amw66 like this.
  4. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,731
    Location:
    Belgium
    Interesting. Thanks to the person who made the DecodeME thread.

    I think the data for CFS are visible here: https://azphewas.com/phenotypeView/...605c/870fa4a7-d96e-43dd-b35d-de8a49a6941c/glr

    There were 1232 cases and 195 965 controls. None of the rare genes tested reached statistical significance. If I understand correctly, the significant threshold used was a p-value of 10^-8.7 (to correct for the many, many comparisons) while the lowest p-value found was 1.78 times 10^-6.

    Here are the results for (EDIT: rare mutations of) IDO2 which plays a role in the metabolic trap hypothesis. There aren't more of these rare mutations in CFS patients than in controls.
    upload_2021-8-20_10-32-36.png

    My main question would be: does anyone know how one can estimate the power of such as study? 1232 cases is a lot in a normal study with only a few outcomes, but I assume that for this kind of analysis, it is very little? What would be the smallest effect size that it could pick up?
     
    Last edited: Aug 20, 2021
  5. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,731
    Location:
    Belgium
    I'm not sure what the IDO trap hypothesis predicts. I remember that Phair came to IDO2 because he thought it had to be a common mutation because such a large proportion of the population was affected in ME epidemics.

    Does anyone know how relevant this data on rare mutations from the biobank is to the trap hypothesis?
     
    alktipping, Michelle and Barry like this.
  6. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,731
    Location:
    Belgium
    The hypothesis paper says:

    "Outbreaks or epidemics of a noncontagious disease raise the possibility that genetic predisposition to ME/CFS is very common in the population and that the disease has low penetrance only because the initiating triggers are multifactorial, and those pathogenic combinations of triggers are, themselves, rare. Outbreaks are then explained by a geographically localized combination of factors superimposed on a genetic predisposition that is common in the population. Thus, it is the existence of ME/CFS outbreaks that pointed to the potential importance of common damaging mutations"
    The IDO Metabolic Trap Hypothesis for the Etiology of ME/CFS (nih.gov)

    It also gives an overview of 5 IDO mutations, both common and rare. I haven't figured out which ones were tested in the biobank study, I assume the N257K variant?

    upload_2021-8-20_10-54-36.png
     

    Attached Files:

    Simon M, alktipping and Michelle like this.
  7. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,731
    Location:
    Belgium
    Simon M, alktipping and Michelle like this.
  8. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,386
    https://www.omf.ngo/the-ido-metabolic-trap/
     
    Simon M and alktipping like this.
  9. alex3619

    alex3619 Senior Member (Voting Rights)

    Messages:
    2,200
    This could be answered, but someone has to go look. Current claims, based on presumably severe patients, lead to over 98% having an IDO2 mutation. There are potential confounds there arising from the reliance on severe patients. A large dataset should help, but I would expect the prevalence of an IDO2 mutation to go down in large cohorts of mixed severity even if its the culprit.
     
    alktipping, Michelle and J.G like this.
  10. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,731
    Location:
    Belgium
    On twitter, Chris Ponting clarified that the data included the rare and nonrare mutations of IDO2.

    https://twitter.com/user/status/1429183955954278401


    https://twitter.com/user/status/1429751450096312327
     
    FMMM1, alktipping, Milo and 4 others like this.
  11. Andy

    Andy Committee Member

    Messages:
    22,422
    Location:
    Hampshire, UK
  12. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    928
    Location:
    UK
    Whoops, this somehow grew into a blog. Apologies.

    Like the DecodeME study, this exome study of UK Biobank data compares DNA between people with a condition (in this case ME/CFS) and healthy controls. Significant differences point to a cause of the condition. But the two studies go about this process in very different ways and I thought it would be helpful to explain how the studies differ.

    Not least because DecodeME will be storing DNA studies samples so that it can do its own exome study when costs fall in future and it becomes economic to do so.

    In simple terms, the difference is this.

    DecodeME is a genome-wide association study focusing on DNA variants that most commonly differ between people. DecodeME will be looking at 600,000 locations across the whole genome. This broad-brush approach is affordable and has helped point to causes in numerous diseases. However, it is blind to what is happening in most of the 3 billion DNA letters of our genome.

    By contrast, the newer exome approach is a bit more like panning for gold. It goes to the most promising areas and looks in minute detail, hoping to find a few nuggets. And it is very expensive. Rather than looking at the DNA letters (or letters) at 600,000 locations, it looks at DNA sequences that code for proteins. These only make up 1.5% of the total human genome but that still amounts to 45 million DNA letters.

    The reason for focusing on the protein-coding regions is the DNA differences here are most likely to have a big impact.

    Proteins are the body's doing molecules, from antibodies, to neurotransmitters that help send nerve impulses, to immune signalling molecules and receptors, to the proteins responsible for muscle contraction. Even a small change in the protein sequence can break the molecule, or at least make it less effective and lead to a big biological effect.

    For instance, some DNA differences in protein-coding regions cause protein manufacture to halt prematurely, sometimes when it has barely started. This often means that no useful protein is made.

    It's also fairly obvious how a rare DNA difference can contribute to disease because it directly affects the protein sequence. Again, this is different from DecodeME and similar GWAS studies.

    GWAS show which, if any, DNA differences are significant in an illness. But it won't directly reveal the biology behind the difference. In almost all cases, the DNA difference itself plays no direct role – instead it's acting as a "tag" for the nearby, unmeasured DNA difference that is important. But even that DNA difference rarely affects the protein sequence. Almost all DNA differences identified by GWAS simply affect the amount of protein produced – leading to a slight increase or slight decrease in the amount of protein (or of a regulatory RNA transcript produced by genes).

    In short, these rare protein-coding differences might only affect a few people but they have a big effect on those individuals (and offer a clearer biological clue. While the DNA differences identified by GWAS, which only look for common differences, tend to have a small and indirect effect — and require further detective work to identify underlying biology.

    I know, it's not exactly simple, but here's a recap:
    • GWAS like DecodeME make a broad-brush sweep of the human genome by zooming in on several hundred thousand locations where DNA most commonly differs between people — and in the process ignores differences in the vast majority of the genome. The differences it finds are small and subtle and require quite a lot of further analysis to link to biological mechanisms.
    • Looking for rare variants in the Exome is a much more expensive and bespoke process, and delivers clearer biological clues to what causes an illness. But it's a much more expensive approach than a GWAS. However, it should be possible to take this approach in the future for ME/CFS and DecodeME will be collecting the samples to enable this.
     
    paolo, CRG, FMMM1 and 10 others like this.
  13. lunarainbows

    lunarainbows Senior Member (Voting Rights)

    Messages:
    2,828
    Thanks for your post - it was interesting. How much more expensive is it per person, @Simon M? I know DecodeME was given 5 million pounds to do its GWAS study, but how much more would they need to raise in order to do whole exome studies (and how much would costs need to reduce in order for it to actually become feasible)? As you mentioned that the team would like to do whole exome sampling in the future.
     
  14. alex3619

    alex3619 Senior Member (Voting Rights)

    Messages:
    2,200
    I think this is an unjustified assumption. Genes coding for regulatory RNA are possibly even more important. However I do think we should assess all possibilities until we find some probable causes or contributory factors, in addition to any we are still looking at.
     
    alktipping likes this.
  15. Mithriel

    Mithriel Senior Member (Voting Rights)

    Messages:
    2,816
    This is a very strange statement. Outbreaks and epidemics are caused by contagious organisms. It would need a lot of well produced evidence to show that some are caused by a geographically localized combination of factors superimposed on a common genetic predisposition. It may happen like that sometimes but there is no evidence strong enough to show that a microbe is not the cause in most cases.

    Many infections have uncommon consequences in some of the people infected like measles encephalitis and herpes encephalitis. Polio is the poster child. The polioviruses were not isolated until they realised that it caused a respiratory or gut infection in most people with only a relative few getting polio.

    Why some people get more severe consequences of common infections has not been studied as much as it should because the introduction of vaccines and antibiotics made infections seem like something conquered and unimportant. The medical professionals who knew about infections have died off and the present ones do not have the experience or training.

    Before CFS and the psychologists, ME was thought to be a long term consequence of a common infection with the original infection possibly being mild or subclinical, pretty much like longcovid. They were trying to work out why people developed ME just before all the research stopped. It may be a certain genetic profile but an infection is still the most likely cause of an epidemic or outbreak.
     
  16. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    928
    Location:
    UK
    thanks. I think DecodeME got £3.2 million funding. I would need to check but actually, I think DecodeME might be doing whole genome sequencing and I think that costs about $1000 per person, so £10 million for a study of 10,000 people. I'm not sure on the numbers. But I think the timescale is medium-term, perhaps within the next five years. I'm sorry I can't be more precise than that.
     
    Missense, FMMM1, alktipping and 4 others like this.
  17. Jacob Richter

    Jacob Richter Established Member (Voting Rights)

    Messages:
    67
    I hope this question isn't too far off topic, but in today's Financial Times there's a detailed article on Long Covid (link below) which says halfway through "There may be a genetic predisposition that determines who is most likely to suffer [an immune response triggering Long Covid], so researchers are conducting large genome-wide association studies that try to locate genes that patients have in common" - does anyone know what studies the author is referring to? I'd like to track their progress. See here:

    Long Covid: why do some people have symptoms months after infection?
    Researchers say more than 100m suffer ill effects for at least 12 weeks
    https://www.ft.com/content/ed89cad2-6f82-44f0-b01d-c4490e4a7372
     
    ukxmrv, Ash, Simon M and 6 others like this.
  18. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    928
    Location:
    UK
    I asked Chris Ponting who said he thought were referring to: PHOSP-COVID (hospitalised individuals), REACT-LONG COVID (whole genome sequencing for a couple of thousand people) or Sano Genetics (a few thousand, whole genome sequencing).
     
    FMMM1, Trish, Ash and 1 other person like this.
  19. Jacob Richter

    Jacob Richter Established Member (Voting Rights)

    Messages:
    67
    Thank you.
     

Share This Page