Genetics: CA10

Whole genome sequencing would still be looking at SNPs—the main difference is that GWAS arrays look at a finite amount of locations in the genome and then the rest is inferred, whereas whole genome sequences are (ideally) capturing everything. But whole genome analysis would have similar limitations of trying to infer which SNPs are impacting which genes.

It’s a complicated subject! I’ve had quite a bit of GWAS exposure and still am learning a lot from this paper and discussion.
Thanks, very useful! So you still have the issues @forestglip outlined above?

I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population? And so it was that which defined what locations were on the arrays used, because they’re common variations. Although I’ve always been confused about the ‘why these locations’ question for this limited sequencing versus whole genome sequencing.
 
Thanks, very useful! So you still have the issues @forestglip outlined above?
More or less! It’s a little better since you’re not assuming linkage disequilibrium for the purposes of imputation (which might obscure some things), but LD would still cause problems with identifying the actual causal variant and there would still be issues with knowing what gene(s) the mutation actually affects.

I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population? And so it was that which defined what locations were on the arrays used, because they’re common variations. Although I’ve always been confused about the ‘why these locations’ question for this limited sequencing versus whole genome sequencing.
“SNP” just refers to the actual single nucleotide difference in the genome. But GWAS studies limit to SNPs with >1% allele frequency to focus in on locations that are most likely to be fruitful since the methodology already limits how many locations you can assess. If 99.9% of the population has the same allele, and the disease doesn’t have Mendelian inheritance or other indications of a strong genetic component, it’s less likely that an allele with very low occurrence in the population is going to strongly drive disease. But it doesn’t exclude the possibility, which is why whole genome studies are still done.

It’s really just a strategy for trying to maximize a technology that has limited capacity but can be done cheaply on a lot more people, unlike whole genome sequencing (though that’s getting cheaper). And to limit the amount of multiple testing correction you have to do. You could theoretically use other strategies to pare down the list, allele frequency is just the most common choice.
 
Back
Top Bottom