Genetics: CA10

Whole genome sequencing would still be looking at SNPs—the main difference is that GWAS arrays look at a finite amount of locations in the genome and then the rest is inferred, whereas whole genome sequences are (ideally) capturing everything. But whole genome analysis would have similar limitations of trying to infer which SNPs are impacting which genes.

It’s a complicated subject! I’ve had quite a bit of GWAS exposure and still am learning a lot from this paper and discussion.
Thanks, very useful! So you still have the issues @forestglip outlined above?

I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population? And so it was that which defined what locations were on the arrays used, because they’re common variations. Although I’ve always been confused about the ‘why these locations’ question for this limited sequencing versus whole genome sequencing.
 
Thanks, very useful! So you still have the issues @forestglip outlined above?
More or less! It’s a little better since you’re not assuming linkage disequilibrium for the purposes of imputation (which might obscure some things), but LD would still cause problems with identifying the actual causal variant and there would still be issues with knowing what gene(s) the mutation actually affects.

I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population? And so it was that which defined what locations were on the arrays used, because they’re common variations. Although I’ve always been confused about the ‘why these locations’ question for this limited sequencing versus whole genome sequencing.
“SNP” just refers to the actual single nucleotide difference in the genome. But GWAS studies limit to SNPs with >1% allele frequency to focus in on locations that are most likely to be fruitful since the methodology already limits how many locations you can assess. If 99.9% of the population has the same allele, and the disease doesn’t have Mendelian inheritance or other indications of a strong genetic component, it’s less likely that an allele with very low occurrence in the population is going to strongly drive disease. But it doesn’t exclude the possibility, which is why whole genome studies are still done.

It’s really just a strategy for trying to maximize a technology that has limited capacity but can be done cheaply on a lot more people, unlike whole genome sequencing (though that’s getting cheaper). And to limit the amount of multiple testing correction you have to do. You could theoretically use other strategies to pare down the list, allele frequency is just the most common choice.
 
I thought I read somewhere that SNPs where defined as when an allele was present in greater than 1% of the population?
To add to what jnmaciuch explained, I think you might be referring to a common convention. SNVs (single nucleotide variants) are used to describe any places where a single nucleotide/letter is changed in the DNA. SNP (single nucleotide polymorphism) can refer to an SNV that is present in at least 1% of the population, but that's not a hard rule.

Wikipedia
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP /snɪp/; plural SNPs /snɪps/) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more),[1] many publications[2][3][4] do not apply such a frequency threshold.
 
To add to what jnmaciuch explained, I think you might be referring to a common convention. SNVs (single nucleotide variants) are used to describe any places where a single nucleotide/letter is changed in the DNA. SNP (single nucleotide polymorphism) can refer to an SNV that is present in at least 1% of the population, but that's not a hard rule.

Wikipedia
ah thanks for pointing that out, I should have clarified to avoid confusing anyone—there’s a more colloquial version of “SNP” that’s really just synonymous with the actual mutation locus, which is what most biologists actually mean outside of specific technical genetic contexts. Which gets confusing pretty fast sometimes
 
I think the wrong paper is cited for the matching locus with multisite chronic pain:
Shared associations with other traits
Three out of our eight ME/CFS-associated intervals had previously been associated to depression (chr1q25.1, chr13q14.3 and chr20q13.13) (64,65), and one locus to pain (chr17q22) (41) phenotypes. Where these studies provided full summary statistics, we used coloc (32) to investigate the level of support for these genetic signals and our ME/CFS results being underpinned by the same causal variant.

41. Harlow CE, Uzochukwu E, Fernando HA, Mordaunt CE, Hughey JM, Eicher JD, et al. GWAS of Extended Prescription Analgesic Use Identifies Novel Genetic Loci in Chronic Pain [Internet]. 2024 [cited 2025 Jul 24]. Available from: https://www.medrxiv.org/content/10.1101/2024.12.02.24318312v1

The above is a GWAS of a different definition of pain. But the paper they cited itself cites what I think they meant to cite here:

30. Johnston KJA, Ward J, Ray PR, Adams MJ, McIntosh AM, Smith BH, et al. Sex-stratified genome-wide association study of multisite chronic pain in UK Biobank. PLoS Genet. 2021;17(4):e1009428. CrossRef PubMed Google Scholar
 
Looking at that paper, in table 1, they give the sex-stratified results. Here's the locus that I think is what matches with DecodeME (note the position is based on GRCh37 unlike DecodeME, so needs to be converted):

1754893620303.png

Here's the zoomed in manhattan plot of the chromosome 17 "tower" from DecodeME. The red dot is the lead variant from the pain paper (just marking the position, the significance shown is from DecodeME), so it looks like these papers did find the same significant area:
1754893763556.png

Edit: Also matches the position of the highest grey dot (grey dots are from the pain study, green dots from DecodeME) from the DecodeME paper:
1754895213935.png

Edit 2: Actually, I'm not sure if it's the same data. It's the same position, but the pain paper table's p-value doesn't match the significance of the variant in the DecodeME plot of the pain lead variant.

Edit 3: I'm guessing they might be using newer UK Biobank data with more participants for the plot above, as opposed to the exact same data as the study.
 
Last edited:
So in the pain paper, it looks like it was significant in females, but not males or combined. In DecodeME this locus was genome-wide significant in females and combined, and p=~.01 in males.

They also suggest something other than CA10, something called snoZ178, that this locus might be associated with.

Edit: Is that snoz178 thing in the pain paper an error? I don't really know what it is or if I'm understanding what I'm looking at, but on the database page for it, it looks like it's only been identified in rice.
 
Last edited:
I think the wrong paper is cited for the matching locus with multisite chronic pain:


41. Harlow CE, Uzochukwu E, Fernando HA, Mordaunt CE, Hughey JM, Eicher JD, et al. GWAS of Extended Prescription Analgesic Use Identifies Novel Genetic Loci in Chronic Pain [Internet]. 2024 [cited 2025 Jul 24]. Available from: https://www.medrxiv.org/content/10.1101/2024.12.02.24318312v1

The above is a GWAS of a different definition of pain. But the paper they cited itself cites what I think they meant to cite here:

30. Johnston KJA, Ward J, Ray PR, Adams MJ, McIntosh AM, Smith BH, et al. Sex-stratified genome-wide association study of multisite chronic pain in UK Biobank. PLoS Genet. 2021;17(4):e1009428. CrossRef PubMed Google Scholar
My apologies. This is indeed the wrong citation. We'll fix for the next version.
 
They also suggest something other than CA10, something called snoZ178, that this locus might be associated with.

Edit: Is that snoz178 thing in the pain paper an error? I don't really know what it is or if I'm understanding what I'm looking at, but on the database page for it, it looks like it's only been identified in rice.
Oh, snoZ178 actually is/was a gene in humans that looks like it's closer to the DecodeME locus than CA10: LocusZoom

But the Ensembl website says it was retired, which I think might mean that it was predicted to be there, but then that turned out not to be the case. So I'm guessing it's not important.
 
Last edited:
Oh, snoZ178 actually is/was a gene in humans that looks like it's closer to the DecodeME locus than CA10: LocusZoom

But the Ensembl website says it was retired, which I think might mean that it was predicted to be there, but then that turned out not to be the case. So I'm guessing it's not important.
The website just says it was reassigned to "ENSG00000252109.1" (everything after a "." in Ensembl IDs being a version identifier), and the old ID was retired. I think it's a real gene, it's just a small non-coding RNA that hasn't been functionally characterized. They're known to have regulatory functions on gene transcription/translation--it's sometimes possible to predict which genes it regulates because snoRNAs have a region that complements target RNA (although many do not have a known match and might exert regulatory effects without sequence complementarity).

If this SNP was one of the ones with eQTL data, it might be that a mutation in this snoRNA in known to affect levels of CA10. But it's also possible that snoZ178 affects more genes beyond CA10, in which case the link to CA10 is more of a guess than anything.
 
The website just says it was reassigned to "ENSG00000252109.1" (everything after a "." in Ensembl IDs being a version identifier), and the old ID was retired.
Oh, when you click on that reassigned identifier, it goes to a page on an "Archive" Ensembl website, which I assumed was like an archive for genes that are no longer active. I couldn't find that new identifier on the regular Ensembl.
 
Oh, when you click on that reassigned identifier, it goes to a page on an "Archive" Ensembl website, which I assumed was like an archive for genes that are no longer active. I couldn't find that new identifier on the regular Ensembl.
Wasn't kept in GRCh38, yes--unfortunately Ensembl doesn't really have detailed annotation for why certain genes get dropped in the newest release. Sometimes it's because the gene mapping is suspect, sometimes it's for some other logistical reason. I think that happens a lot to snoRNAs and miRNAs in particular just because of the sheer number of them. But that was the reasoning for creating the Archive--the current version is curated with the best intentions, but shouldn't be considered the end all be all.
 
Back
Top Bottom