A gene-based association method for mapping traits using reference transcriptome data, 2015, Gamazon et al

forestglip

Moderator
Staff member
A gene-based association method for mapping traits using reference transcriptome data

Gamazon, Eric R; Wheeler, Heather E; Shah, Kaanan P; Mozaffari, Sahar V; Aquino-Michaels, Keston; Carroll, Robert J; Eyler, Anne E; Denny, Joshua C; Nicolae, Dan L; Cox, Nancy J; Im, Hae Kyung

Abstract
Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood.

We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates the “imputed” gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype.

The genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome datasets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple testing burden and a principled approach to the design of follow-up experiments.

Our results demonstrate that PrediXcan can detect known and novel genes associated with disease traits and provide insights into the mechanism of these associations.

Web | DOI | PMC | PDF | Nature Genetics | Open Access on PMC
 
The results reported from doing a transcriptome-wide association study (TWAS) based on depression GWAS data made me interested in learning more about TWAS.

The abstract above is from the paper that introduced the first form of TWAS, PrediXcan. The first paper below introduced S-PrediXcan, a version of TWAS that can use summary-level genetic information, instead of individual data. The second paper below reviews various TWAS methods.



Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics
Barbeira, Alvaro N.; Dickinson, Scott P.; Bonazzola, Rodrigo; Zheng, Jiamao; Wheeler, Heather E.; Torres, Jason M.; Torstenson, Eric S.; Shah, Kaanan P.; Garcia, Tzintzuni; Edwards, Todd L.; Stahl, Eli A.; Huckins, Laura M.; Aguet, François; Ardlie, Kristin G.; Cummings, Beryl B.; Gelfand, Ellen T.; Getz, Gad; Hadley, Kane; Handsaker, Robert E.; Huang, Katherine H.; Kashin, Seva; Karczewski, Konrad J.; Lek, Monkol; Li, Xiao; MacArthur, Daniel G.; Nedzel, Jared L.; Nguyen, Duyen T.; Noble, Michael S.; Segrè, Ayellet V.; Trowbridge, Casandra A.; Tukiainen, Taru; Abell, Nathan S.; Balliu, Brunilda; Barshir, Ruth; Basha, Omer; Battle, Alexis; Bogu, Gireesh K.; Brown, Andrew; Brown, Christopher D.; Castel, Stephane E.; Chen, Lin S.; Chiang, Colby; Conrad, Donald F.; Damani, Farhan N.; Davis, Joe R.; Delaneau, Olivier; Dermitzakis, Emmanouil T.; Engelhardt, Barbara E.; Eskin, Eleazar; Ferreira, Pedro G.; Frésard, Laure; Gamazon, Eric R.; Garrido-Martín, Diego; Gewirtz, Ariel D. H.; Gliner, Genna; Gloudemans, Michael J.; Guigo, Roderic; Hall, Ira M.; Han, Buhm; He, Yuan; Hormozdiari, Farhad; Howald, Cedric; Jo, Brian; Kang, Eun Yong; Kim, Yungil; Kim-Hellmuth, Sarah; Lappalainen, Tuuli; Li, Gen; Li, Xin; Liu, Boxiang; Mangul, Serghei; McCarthy, Mark I.; McDowell, Ian C.; Mohammadi, Pejman; Monlong, Jean; Montgomery, Stephen B.; Muñoz-Aguirre, Manuel; Ndungu, Anne W.; Nobel, Andrew B.; Oliva, Meritxell; Ongen, Halit; Palowitch, John J.; Panousis, Nikolaos; Papasaikas, Panagiotis; Park, YoSon; Parsana, Princy; Payne, Anthony J.; Peterson, Christine B.; Quan, Jie; Reverter, Ferran; Sabatti, Chiara; Saha, Ashis; Sammeth, Michael; Scott, Alexandra J.; Shabalin, Andrey A.; Sodaei, Reza; Stephens, Matthew; Stranger, Barbara E.; Strober, Benjamin J.; Sul, Jae Hoon; Tsang, Emily K.; Urbut, Sarah; van de Bunt, Martijn; Wang, Gao; Wen, Xiaoquan; Wright, Fred A.; Xi, Hualin S.; Yeger-Lotem, Esti; Zappala, Zachary; Zaugg, Judith B.; Zhou, Yi-Hui; Akey, Joshua M.; Bates, Daniel; Chan, Joanne; Chen, Lin S.; Claussnitzer, Melina; Demanelis, Kathryn; Diegel, Morgan; Doherty, Jennifer A.; Feinberg, Andrew P.; Fernando, Marian S.; Halow, Jessica; Hansen, Kasper D.; Haugen, Eric; Hickey, Peter F.; Hou, Lei; Jasmine, Farzana; Jian, Ruiqi; Jiang, Lihua; Johnson, Audra; Kaul, Rajinder; Kellis, Manolis; Kibriya, Muhammad G.; Lee, Kristen; Li, Jin Billy; Li, Qin; Li, Xiao; Lin, Jessica; Lin, Shin; Linder, Sandra; Linke, Caroline; Liu, Yaping; Maurano, Matthew T.; Molinie, Benoit; Montgomery, Stephen B.; Nelson, Jemma; Neri, Fidencio J.; Oliva, Meritxell; Park, Yongjin; Pierce, Brandon L.; Rinaldi, Nicola J.; Rizzardi, Lindsay F.; Sandstrom, Richard; Skol, Andrew; Smith, Kevin S.; Snyder, Michael P.; Stamatoyannopoulos, John; Stranger, Barbara E.; Tang, Hua; Tsang, Emily K.; Wang, Li; Wang, Meng; Van Wittenberghe, Nicholas; Wu, Fan; Zhang, Rui; Nierras, Concepcion R.; Branton, Philip A.; Carithers, Latarsha J.; Guan, Ping; Moore, Helen M.; Rao, Abhi; Vaught, Jimmie B.; Gould, Sarah E.; Lockart, Nicole C.; Martin, Casey; Struewing, Jeffery P.; Volpi, Simona; Addington, Anjene M.; Koester, Susan E.; Little, A. Roger; Brigham, Lori E.; Hasz, Richard; Hunter, Marcus; Johns, Christopher; Johnson, Mark; Kopen, Gene; Leinweber, William F.; Lonsdale, John T.; McDonald, Alisa; Mestichelli, Bernadette; Myer, Kevin; Roe, Brian; Salvatore, Michael; Shad, Saboor; Thomas, Jeffrey A.; Walters, Gary; Washington, Michael; Wheeler, Joseph; Bridge, Jason; Foster, Barbara A.; Gillard, Bryan M.; Karasik, Ellen; Kumar, Rachna; Miklos, Mark; Moser, Michael T.; Jewell, Scott D.; Montroy, Robert G.; Rohrer, Daniel C.; Valley, Dana R.; Davis, David A.; Mash, Deborah C.; Undale, Anita H.; Smith, Anna M.; Tabor, David E.; Roche, Nancy V.; McLean, Jeffrey A.; Vatanian, Negin; Robinson, Karna L.; Sobin, Leslie; Barcus, Mary E.; Valentino, Kimberly M.; Qi, Liqun; Hunter, Steven; Hariharan, Pushpa; Singh, Shilpi; Um, Ki Sung; Matose, Takunda; Tomaszewski, Maria M.; Barker, Laura K.; Mosavel, Maghboeba; Siminoff, Laura A.; Traino, Heather M.; Flicek, Paul; Juettemann, Thomas; Ruffier, Magali; Sheppard, Dan; Taylor, Kieron; Trevanion, Stephen J.; Zerbino, Daniel R.; Craft, Brian; Goldman, Mary; Haeussler, Maximilian; Kent, W. James; Lee, Christopher M.; Paten, Benedict; Rosenbloom, Kate R.; Vivian, John; Zhu, Jingchun; Nicolae, Dan L.; Cox, Nancy J.; Im, Hae Kyung
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets.

We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown.

Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms.

Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
Web | DOI | PMC | PDF | Nature Communications | Open Access on PMC | 2018



Transcriptome‐Wide Association Studies (TWAS): Methodologies, Applications, and Challenges
Evans, Patrick; Nagai, Taylor; Konkashbaev, Anuar; Zhou, Dan; Knapik, Ela W.; Gamazon, Eric R.
Transcriptome‐wide association study (TWAS) methodologies aim to identify genetic effects on phenotypes through the mediation of gene transcription.

In TWAS, in silico models of gene expression are trained as functions of genetic variants and then applied to genome‐wide association study (GWAS) data. This post‐GWAS analysis identifies gene‐trait associations with high interpretability, enabling follow‐up functional genomics studies and the development of genetics‐anchored resources.

We provide an overview of commonly used TWAS approaches, their advantages and limitations, and some widely used applications.
Web | DOI | PMC | PDF | Current Protocols | Open Access on PMC | 2024
 
Last edited:
The aim of TWAS is to help identify the consequences of genetic variation in a GWAS on gene expression, which could help identify genes and tissues that may causally influence the disease.

The basic steps of TWAS
1. First, a prediction model is created using data from large reference datasets which combine genetic and expression data (such as the GTEx database). The model is trained to predict the magnitude of a given gene's expression based on an individual's pattern of genetic variants (generally limited to variants around the gene, but it can be extended to far away variants).

(The training of the model only needs to be done once, and multiple GWAS can then use the same model.)​
2. A GWAS is performed on a trait of interest. At this point, we have genetic information about individuals in the GWAS study, but not expression information.

3. The TWAS prediction model from step 1 is applied to the subjects in the GWAS study, and this produces data for predicted gene expression in these new individuals. One can then test to see if there is an association between predicted expression of a given gene in a given tissue and the trait under study.



In essence, TWAS is testing whether the cases and controls have differential expression of a gene without actually testing expression in these individuals (which would be practically impossible for some tissues, such as brain). A further benefit is that the predicted expression being tested is only based on genetic, not environmental, regulation of gene expression, limiting the confounding effect of environment which is present in regular case-control studies of gene expression.

Also, if only variants near a gene are used to predict gene expression, this limits the possibility of reverse causation (i.e. variants cause a trait which in turn cause changes in expression), because an association of variants with the expression of a nearby gene is likely to be due to relatively direct genetic regulation of expression, instead of round-about pathways that influence expression through a trait. Still, TWAS doesn't prove causation, since, for example, variants could theoretically affect both expression and the trait through different pathways. It just produces gene candidates for further validation.

Several TWAS methods have been developed, such as S-PrediXcan, described in the second paper above, which can test the association of predicted gene expression and phenotype using only GWAS summary statistics, instead of individual genetic data. Fusion is another software/approach for performing TWAS based on summary stats.

Limitations exist, such as linkage disequilibrium, which may lead to genes being predicted to be differentially expressed only due to their LD with other genes. Also, it may be difficult to determine which specific tissue is interesting for the trait, if expression in multiple tissues is similarly genetically regulated.

The review linked above provides some examples of when TWAS results have stood up to validation through other methods:
Gusev et al. validated gene-level associations with schizophrenia using data on physical chromatin interactions during brain development (Gusev et al., 2018).
Applying CRISPR/Cas9 gene editing at the TWAS locus 5q13.2 in CD34+ hematopoietic and progenitor cells, Yao et al. identified the causal gene in the locus for neurophil count (Yao et al., 2020).
Unlu et al. (Unlu et al., 2019) showed that GRIK5 contributes to the polygenic liability to develop eye diseases in humans through its GReX [genetically regulated gene expression], which was further mechanistically investigated via depletion of its ortholog in zebrafish.
GReX analysis of COVID-19 severity led to the inclusion of the repurposing candidate baricitinib in a large clinical trial (Pairo-Castineira et al., 2021, 2023). The drug is now the first FDA-approved immunomodulatory treatment for COVID-19 after clinical trials showed therapeutic benefit (Rubin, 2022; Kalil et al., 2021; RECOVERY Collaborative Group, 2022).
 
Last edited:
The above explanation might have had some hard to understand parts, so I think it might be useful to describe TWAS with a really simple toy example. Hopefully, people can get a feel for what it is, in case we start seeing ME/CFS papers using it. To make it simple, I'll focus on one gene and one tissue and one variant.

Simple Explanation
Imagine we are studying ME/CFS and are interested in whether the gene VAL plays an important role in the heart. Specifically, we're interested in knowing if the genetics of people with ME/CFS causes them to have significantly different levels of VAL in the heart compared to healthy people. If this turned out to be the case, it would increase our confidence in VAL as a candidate causal gene, potentially helping to pinpoint underlying pathways of interest.

So for the first step of the TWAS, we go to the trusty GTEx database, which is a big project that tested genetic code and gene expression in around 1000 people. They were trying to figure out which specific changes in the DNA were associated with changes in the amount of expression of a given gene in healthy people (or at least "non-diseased tissues"). Gene expression is the process of genes (specific sections of DNA) being transcribed from the DNA to make mRNA (little molecules that are like copies of those parts of the DNA). mRNA can then be used by the body as a template to make specific proteins, which are used in basically every function of the body. If more of a gene is expressed, there will likely be more of the associated protein in the body as well.

Since we're interested in expression of VAL in the heart, we examine all the people in the GTEx database to see if we can find a pattern when we look at the genetic changes around the gene VAL in the DNA, and the amount of VAL that is expressed in the heart. Lets say we find that there is one specific variant that seems to change how much VAL is produced in the heart. We see that if a person has the letter A at a specific place in their DNA, they tend to produce low amounts of VAL. If someone has a letter T at the same location, they tend to produce high amounts of VAL.

That's our TWAS model. Based on what we saw in GTEx, our model says that people with an A at a certain DNA position will have low VAL and vice versa.

Now we go back to our study population of ME/CFS and healthy participants in the present study. We sequence everyone's DNA so we know whether each person has an A or a T at the location of interest in the DNA. Now we can use our model to predict whether the genetics of each person would tend to cause them to have low or high VAL in their heart. If participant 1 has an A, the model says "low VAL". If participant 2 has a T, the model says "high VAL". And so on.

And then we compare the ME/CFS and healthy groups to see if the predicted VAL expression is significantly different between groups.

That's the basic concept of TWAS.

Extra Discussion
In reality, the TWAS model is more complicated than a single variant effect. Since expression of a given gene can potentially be influenced by many variants, machine learning is used to create a model that takes all the variants around a gene into account in order to make as good a prediction as possible about the amount of gene expression any given person will have.

And a TWAS study will look at more than one gene and more than one tissue. It might test, separately, thousands of different genes in each of dozens of different tissues. If the predicted gene expression for any of these thousands of comparisons shows a very significant difference between ME/CFS and controls, then that's a potential lead on a gene that might increase someone's risk of ME/CFS if more or less of that gene is turned into mRNA.

The depression study I mentioned previously is a striking example of a finding that can come out of a TWAS. They looked at the genetic code of many (>350,000) people with depression and a similarly large number of healthy people, then used a TWAS model to predict expression of each of many genes in many tissues in these participants. One of the most significant differences between the depression group and the healthy group was the finding that in the depression group, the model predicted significantly decreased predicted expression of the gene dopamine receptor D2 (DRD2) in the nucleus accumbens (NAc) region of the brain.

Neural activity is based on neurons activating, then sending signals to other neurons to activate in highly complicated patterns in hignly complicated networks of billions of neurons. Neurotransmitters are the chemical signals that neurons use to communicate to other neurons. Dopamine is one type of neurotransmitter. DRD2 is a receptor on the "receiving" neuron which allows the neuron to sense the dopamine signal from the previous neuron. In the case of the DRD2 receptor specifically, Wikipedia seems to indicate that it is an inhibitory receptor, meaning that when dopamine binds to this receptor, the neuron is less likely to activate and pass on a signal to the next neuron. The NAc is one of the regions known to be highly involved in the processing of rewarding stimuli and the feeling of motivation.

Therefore, the finding from the depression study suggests that people with depression have genetics which cause neurons in the NAc to have fewer DRD2 receptors, and thus to be less sensitive to the inhibitory signal of dopamine, and thus to cause these neurons to be overactive. On the surface, the direction of effect seems to be unintuitive - wouldn't we expect that under-active, not over-active, neurons in the reward part of the brain would lead to depression? There might be some complicated neuroscience that makes sense of it. Nevertheless, dopamine and NAc are both things previously already considered highly involved in motivation and reward.

The researchers didn't go in testing only the specific genes they thought would be interesting, such as DRD2. They tested thousands of genes in each of 14 different tissues. And out of the thousands of possibilities, DRD2 in the nucleus accumbens turned out to be one of the most significant predicted differences.

So this technique seems interesting to me as a translational tool to help get us from genetic variants to gene targets.
 
Back
Top Bottom