Genetics: Chromosome 17 CA10

I've been wondering- if CA10 is involved in ME/CFS (or pain disorders) could it (or whatever mechnistic process the CA10 gene finding represents) be directly modulated with drugs in order to stop whatever signals are causing PEM/pain etc? Or is it more of a pointer to the general pathology
 
Inspired by some AlphaGenome stuff I’ve been learning more about and am interested in looking at promoters and enhancers for some of these candidate genes.

Enhancers and promoters are gene-regulatory elements. They are stretches of DNA that help in both eukaryotic and prokaryotic transcription. The promoters are known to initiate transcription, and the enhancers increase the level of transcription
Source

If you look at the genecard for CA10 in the genomics section you can see GeneHancer info on these. There’s lots of interesting stuff in there.

For instance pick the top one, a promoter/enhancer with a high score (it has a little star by it to show this too)
Expand the info and see the location of it is chr17:52158438-52159092
You can zoom in to that location on the DecodeME LocusZoom data and see lots of the hits tie in with this area
The Ensembl info has which tissues these are active in too
Looking at details of all of them a number match with the changes on DecodeME

If I’m interpreting this right it says the changes people with ME/CFS are more likely to have are in these promoters and enhancers in those tissues? So affect expression of CA10 in those tissues? Is that right?

Sorry if this is covering known territory, it’s all new to me. Given the location, at the start of the transcription site, it may be obvious to people thst these are promoters?

I wonder if it’s worth digging through all the Genhancer info for all the genes, especially as some enhancers can be a long way away from the transcription site for the gene.

The EPDnew info will look familiar to anyone who has been looking at AlphaGenome outputs too… I have lots more to learn here.
 
Last edited:
@hotblack you might like to explore some of the tracks on UCSC genome browser--it compiles a lot of this information visually so you can do some exploring.

Here I have it centered on the top DecodeME SNP in the region:

You can "highlight" a region by dragging a box over a region and clicking "add highlight" on the box that comes up, that way you can keep track of where a SNP overlaps with features on other tracks
1770839351753.png

These tracks have a lot of promoter/enhancer/regulatory region info you can play around with (click "show" and then "refresh" to add them.
1770839478359.png

The GENCODE track under "Genes and Gene Predictions" will mark known genes with their exons and introns

If you want to highlight a lot of SNPs at once there are ways you can add a custom track in a BED file format, though it'll take more effort to figure out
 
For instance pick the top one, a promoter/enhancer with a high score (it has a little star by it to show this too)
Expand the info and see the location of it is chr17:52158438-52159092
You can zoom in to that location on the DecodeME LocusZoom data and see lots of the hits tie in with this area

This is more or less exactly what my genetics friends at UCL did when they gave me a presentation of why they thought it was worth picking CA10 for a basic biology project.
 
If you start zoomed in on a SNP, you can also hit the "highlight" button down here which will highlight everything in your viewer
Thanks for the tip!
This is more or less exactly what my genetics friends at UCL did when they gave me a presentation of why they thought it was worth picking CA10 for a basic biology project.
Good to know I’m not talking nonsense! And nice to start to understand it a bit more.
 
I made a script to pull out some data to make it easy to compare to locuszoom and then looked at the top 15 and manually went through deleting those which weren’t clearly/very significant hits. It seems to be mainly the promoter locations.

Gene: CA10 - carbonic anhydrase 10
Location: chr17:51630313-52160017

Found 33 GeneHancer elements:
[GH17J052158] *Promoter/Enhancer | Score: 237 | chr17:52158674-52158830
Sources: ENCODE(Z-Lab),EPDnew
External: ENSR17_9QNZ6, ENSR17_9QNZ9, ENSR17_5G98JK, CA10_2
TFBSs: EZH2, ASH2L, RNF2, RBBP5, ZFX, CTCF, RAD21, SMC3, KDM5A, ZEB1, TCF12, ZNF263, EGR1, REST

[GH17J052146] *Enhancer | Score: 192 | chr17:52146767-52150085
Sources: ENCODE(Z-Lab),FANTOM5
External: ENSR17_9QNT2, ENSR17_9QNT6, ENSR17_83JDPW
TFBSs: ATF7, RUNX3, EP300, SPI1, FOS, FOS, GABPA, POLR2A, MAX, ZBTB33, YY1, REST, SP1, RXRA, MYC, STAT3, HNF4A, JUND, FOXA2, FOXA1, ATF3, FOS, EGR1

[GH17J052160] Promoter | Score: 149 | chr17:52158121-52158181
Sources: EPDnew
External: CA10_1
TFBSs: EZH2, SUZ12, ASH2L, MXI1, RNF2, GATA2, RBBP5

[GH17J052159] Promoter | Score: 94 | chr17:52159997-52160057
Sources: EPDnew
External: CA10_3
TFBSs: EZH2, SUZ12

The scores and stars (meaning an ‘Elite’) are Genehancer info, links to LocusZoom, the EPE or Ensembl info are included and TFBSs are the Transcription Factor Binding Sites.

Does anyone know what the logic is for the circles/triangle and colouring on LocusZoom? The latter seems to change dynamically as you move around so maybe it’s just highlighting most significant in the current view?
 
Last edited:
Wasn't kept in GRCh38, yes--unfortunately Ensembl doesn't really have detailed annotation for why certain genes get dropped in the newest release. Sometimes it's because the gene mapping is suspect, sometimes it's for some other logistical reason. I think that happens a lot to snoRNAs and miRNAs in particular just because of the sheer number of them. But that was the reasoning for creating the Archive--the current version is curated with the best intentions, but shouldn't be considered the end all be all.
Ensembl pipelines are very complex and it would be impossible to provide explanations for all changes between the releases. Sometimes it's possible to figure out or speculate the reasons by looking into intermediate outputs that people who ran the pipelines have but even that is quite challenging. The sheer amount of data and analyses that go into each release... It takes 3-4 or more months and something like 80-100 people working full-time to get a release out. The documentation on the website could be better, though, but it still wouldn't be enough to determine the exact reason behind each change.

I see snoZ178 was last present in Ensembl release 75 on a lower quality assembly which was 12 years ago. Assemblies, in this case GRCh37 and GRCh38, are imported into Ensembl, so if the sequence annotated as snoZ178 on GRCh37 was not present in the better quality assembly (GRCh38), it wouldn't have been annotated. If it is in GRCh38, it's possible that it wasn't predicted due to not passing some threshold somewhere or due to changes in the annotation pipeline. If we think snoZ178 or anything missing from Ensembl might be important, it's possible to contact Ensembl and the relevant team will hopefully have a look at it. There is a team which does manual annotation on the human genome.


I googled "snoZ178" and one of the results was https://humanpaingeneticsdb.ca/ where I found this:

[
{
"Loci ": "CA10; snoZ178",
"Publication Loci ": "CA10; snoZ178",
"Variants ": "rs11079993",
"allele 1 ": "G",
"allele 2 ": "T",
"direction ": "down",
"Phenotype ": "Other Clinical Pain",
"PMID ": "PMID:33830993",
"comments ": "Significantly associated with multisite chronic pain in female"
},
{
"Loci ": "snoZ178",
"Publication Loci ": "snoZ178",
"Variants ": "rs967823",
"allele 1 ": "G",
"allele 2 ": "A",
"direction ": "no direction reported",
"Phenotype ": "Pain",
"PMID ": "PMID:37844115",
"comments ": "Significntly ssocited with pain"
}
]
The variants are in the current Ensembl release: rs11079993, rs967823. When I clicked on "Phenotype data", an additional piece of info (after a table of associations) said "This variant has not been mapped to any Ensembl genes." for both of them.


I found snoZ178 in the current release of Ensembl Plants in Oryza meridionalis and Oryza longistaminata but the annotation is based on Rfam and a different team is in charge of plants annotation.

I don't know if any of this is relevant any more and I see the conversation has moved on, so apologies if irrelevant.
 
Wasn’t sure where to put this, but here’s the info from my script (mentioned above) for all the DecodeME candidate genes, ready to paste into posts and check. I’ll work through seeing if there’s anything interesting but if others want to join in too, the more the merrier.

There’s the same dats in csv format and some of the raw track data from the APIs too, but the txt files are the formatted ones with nice BBCode URLs.
 

Attachments

Sorry this may be going off topic and getting fragmented, but I’ve been through all of these files to check for mentions of other genes in original candidate gene list to answer the question:

Which genes in the decodeme candidate gene list are mentioned in potential binding sites for other genes in the candidate list.

I made a script to do this after manually checking a few and finding lots of mentions for SOX6, looks like that does stand out

ABT1: found in 2 other gene files: HMGN4, ZNF322
ANKRD45: found in 1 other gene files: KLHL20
BTN3A3: found in 1 other gene files: BTN2A2
CCDC92: found in 2 other gene files: DNAH10, ZNF664
CSE1L: found in 1 other gene files: ARFGEF2
DARS2: found in 2 other gene files: KLHL20, ZBTB37
DDX27: found in 2 other gene files: STAU1, ZNFX1
KLHL20: found in 2 other gene files: ANKRD45, SLC9C2
PEBP1: found in 2 other gene files: TAOK3, VSIG10
PRDX6: found in 2 other gene files: SLC9C2, TNFSF4
SERPINC1: found in 2 other gene files: RC3H1, ZBTB37
SLC9C2: found in 1 other gene files: ANKRD45
SOX6: found in 48 other gene files: ABT1, ANKRD45, ARFGEF2, B4GALT5, BTN2A2, BTN3A3, CCDC92, CCPG1, CDK5RAP1, CSE1L, DARS2, DDX27, DNAH10, DNAJC1, ECI2, FBXL4, H4C8, HFE, HMGN4, HTT, KLHL20, LRRC7, MLLT10, MMS22L, MRPL39, PEBP1, PLCL1, POU3F2, PRDX6, PTGIS, RABGAP1L, RC3H1, SERPINC1, SLC2A14, SLC9C2, STAU1, SUDS3, TAOK3, TNFSF4, TRIM38, VRK2, VSIG10, ZBTB37, ZNF311, ZNF322, ZNF644, ZNF664, ZNFX1
STAU1: found in 2 other gene files: DDX27, ZNFX1
SUDS3: found in 1 other gene files: TAOK3
TAOK3: found in 1 other gene files: SUDS3
TNFSF4: found in 2 other gene files: PRDX6, SLC9C2
VSIG10: found in 2 other gene files: PEBP1, TAOK3
ZNF664: found in 2 other gene files: CCDC92, DNAH10
ZNFX1: found in 1 other gene files: DDX27

TFBS matches only:

SOX6: found in 48 other gene files: ABT1, ANKRD45, ARFGEF2, B4GALT5, BTN2A2, BTN3A3, CCDC92, CCPG1, CDK5RAP1, CSE1L, DARS2, DDX27, DNAH10, DNAJC1, ECI2, FBXL4, H4C8, HFE, HMGN4, HTT, KLHL20, LRRC7, MLLT10, MMS22L, MRPL39, PEBP1, PLCL1, POU3F2, PRDX6, PTGIS, RABGAP1L, RC3H1, SERPINC1, SLC2A14, SLC9C2, STAU1, SUDS3, TAOK3, TNFSF4, TRIM38, VRK2, VSIG10, ZBTB37, ZNF311, ZNF322, ZNF644, ZNF664, ZNFX1

So now to manually check if any or how many of these locations for promoters/enhancers and matching TFBSs show up on LocusZoom… this may take some time

Edit: I think most of those matches are textually correct but contextually wrong, they weren’t TFBSs but other mentions of genes in the data, sometimes grep isn’t the right tool for the job! I’ve updated and it is only SOX6 which seems relevant, still a lot of checking to do. I may stop now or rope in some help, report with links to locuszoom urls attached and webpage here but also on the SOX6 thread as that now seems most appropriate. Sorry for the cross posting moderators, I wasn’t sure where any if this was going as I have been exploring…
 

Attachments

Last edited:
I searched the top SNP (rs34626694) at the CA10 locus with the GWAS Atlas PheWAS search. The trait most significantly associated with this SNP is "Ease of getting up in the morning", which would make sense as being related to ME/CFS.

This trait was tested in the following study, and was based on UK BioBank data (n = 385,949): Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways (2019, Nature Genetics)

I downloaded the summary stats to see how well they match up with DecodeME. Plotted together after litover of the "getting up" data to GRCh38, it seems fairly similar:

I also tried to test for a shared variant with the coloc software. I got a 94.6% posterior probability of the traits sharing a causal variant.

I think I'm using it right, but am still not 100% sure, so would appreciate any experts weighing in. Here's the code I used for colocalization:
Code:
library(coloc)
library(dplyr)
library(data.table)

region <- list(chr = 17, start_pos = 52147538, end_pos = 52264626) # CA10 locus

decodeme_region <- fread("../../Data/gwas_1.filtered.gz",
                         select = c("CHROM", "GENPOS", "ID", "ALLELE0", "ALLELE1",
                                    "A1FREQ", "BETA", "SE", "LOG10P")
) %>%
  filter(CHROM == region$chr, GENPOS >= region$start_pos, GENPOS <= region$end_pos) %>%
  mutate(
    MAF = pmin(A1FREQ, 1 - A1FREQ)
  )

gettingup_region <- fread(
  "~/Projects/science/diseases/sleep/Jansen 2019/Ease of getting up/Data/Jansen_2019_Gettingup_GRCh38_liftover.tsv.gz",
  select = c("SNP", "CHR", "BP", "A1", "A2",
             "MAF", "OR", "SE", "P")
) %>%
  filter(CHR == region$chr, BP >= region$start_pos, BP <= region$end_pos) %>%
  rename(BETA = OR) #  OR column appears to actually be BETA, as some values are negative

merged <- inner_join(
  decodeme_region %>%
    select(GENPOS, ALLELE0, ALLELE1, BETA, SE, MAF),
  gettingup_region %>%
    select(SNP, BP, A1, A2, BETA, SE, MAF),
  by = join_by(GENPOS == BP),
  suffix = c("_decodeme", "_gettingup")
) %>%
  mutate(
    alleles_match = (ALLELE0 == A2 &
                       ALLELE1 == A1),
 
    alleles_flipped = (ALLELE0 == A1 &
                         ALLELE1 == A2),
 
    alleles_ok = alleles_match | alleles_flipped,
 
    BETA_gettingup = if_else(alleles_flipped,
                             -BETA_gettingup,
                             BETA_gettingup),
  ) %>%
  filter(alleles_ok) %>%
  select(SNP, GENPOS,
         BETA_decodeme,
         BETA_gettingup,
         MAF_decodeme,
         MAF_gettingup,
         SE_decodeme,
         SE_gettingup,
  )


dataset1 <- list(
  beta = merged$BETA_decodeme,
  varbeta = merged$SE_decodeme^2,
  snp = merged$SNP,
  position = merged$GENPOS,
  type = "cc"
)

dataset2 <- list(
  beta = merged$BETA_gettingup,
  varbeta = merged$SE_gettingup^2,
  snp = merged$SNP,
  position = merged$GENPOS,
  type = "quant",
  N = 384689,
  MAF = merged$MAF_gettingup
)

check_dataset(dataset1)
check_dataset(dataset2)

plot_dataset(dataset1, main = "DecodeME")
plot_dataset(dataset2, main = "Ease of Getting up")
plot_datasets(dataset1, dataset2)

coloc_result <- coloc.abf(
  dataset1 = dataset1,
  dataset2 = dataset2
)

print(format(round(coloc_result$summary, 3), scientific = FALSE))

head(coloc_result$results[order(coloc_result$results$SNP.PP.H4, decreasing = TRUE), ])

So it seems there is a shared variant associated with ME/CFS, multisite chronic pain, and ease of getting up in the morning.

* I downloaded the summary stats for "Getting up in the morning" trait from here. (Look for link under the title of the above paper.)
 
Last edited:
Jonathan found another study mentioning this gene:
There seems to be another more recent study of 'coathanger pain' again using the Biobank that came up with CA10:


Yiwen Tao , Qi Pan, Tengda Cai et al. A genome-wide association study identifies novel genetic variants associated with neck or shoulder pain in the UK biobank (N = 430,193)

Pain Rep 2025 Apr 18;10(3):e1267.
doi: 10.1097/PR9.0000000000001267. eCollection 2025 Jun.
I haven't tried downloading it to compare directly, but here is the paper's plot of the locus:

Here is that shoulder/neck pain lead variant (rs9889282) highlighted in the DecodeME locus:
So I'm not sure, but maybe the variant is shared with this trait as well.

Edit: Fixed a rounding issue with the x-axis positions in the second plot, where multiple spots were labeled 52.2.
 
Last edited:
Here are the top 20 most significant traits associated with the lead DecodeME SNP at the CA10 locus (rs34626694) on GWAS Atlas. Reaction time is also fairly significant.
StudyTraitP-valueSample size
30804565Ease of getting up in the morning2.01E-09385949
31427789Getting up in morning3.47E-09385494
31427789Overall health rating9.40E-09384850
BioRxiv: https://doi.org/10.1101/261081Ever smoker3.14E-08518633
29844566Reaction time6.97E-08330069
31427789Time spent watching television (TV)1.86E-07365236
30696823Chronotype5.90E-07449732
27864402Self-rated health8.16E-07111749
31427789Pain type(s) experienced in last month: Neck or shoulder pain0.000001609385698
31427789Reaction time test - Mean time to correctly identify matches0.000001731383748
31427789Number of self-reported non-cancer illnesses0.000001853386581
30804565Morningness0.000002321345552
31427789Medication for pain relief, constipation, heartburn: Paracetamol0.00000241382089
31427789Number of treatments/medications taken0.000002792386581
31427789Morning/evening person (chronotype)0.000002917345148
31427789Frequency of tiredness / lethargy in last 2 weeks0.000002977375053
30643251Ever smoked regulary0.00000572262990
29899525Strenuous sports or other exercises0.00001350492
31427789Wheeze or whistling in the chest in last year0.00001305379150
31427789Types of physical activity in last 4 weeks: Other exercises (eg: swimming, cycling, keep fit, bowling)0.00001679384450
 
Back
Top Bottom