Genetics: Chromosome 17 CA10

hotblack · Nov 12, 2025

That’s very positive news! Thanks for doing this @Jonathan Edwards
And good probing questions @voner

Jonathan Edwards · Nov 12, 2025

voner said:
what did you learn that can be shared with us?

That some other people think this is really interesting!

Kitty · Nov 12, 2025

Jonathan Edwards said:
That some other people think this is really interesting!

Which is really good news!

V.R.T. · Jan 9, 2026

I've been wondering- if CA10 is involved in ME/CFS (or pain disorders) could it (or whatever mechnistic process the CA10 gene finding represents) be directly modulated with drugs in order to stop whatever signals are causing PEM/pain etc? Or is it more of a pointer to the general pathology

Jonathan Edwards · Jan 9, 2026

V.R.T. said:
Or is it more of a pointer to the general pathology

I suspect that.

hotblack · Feb 11, 2026

Inspired by some AlphaGenome stuff I’ve been learning more about and am interested in looking at promoters and enhancers for some of these candidate genes.

Enhancers and promoters are gene-regulatory elements. They are stretches of DNA that help in both eukaryotic and prokaryotic transcription. The promoters are known to initiate transcription, and the enhancers increase the level of transcription

Source

If you look at the genecard for CA10 in the genomics section you can see GeneHancer info on these. There’s lots of interesting stuff in there.

For instance pick the top one, a promoter/enhancer with a high score (it has a little star by it to show this too)
Expand the info and see the location of it is chr17:52158438-52159092
You can zoom in to that location on the DecodeME LocusZoom data and see lots of the hits tie in with this area
The Ensembl info has which tissues these are active in too
Looking at details of all of them a number match with the changes on DecodeME

If I’m interpreting this right it says the changes people with ME/CFS are more likely to have are in these promoters and enhancers in those tissues? So affect expression of CA10 in those tissues? Is that right?

Sorry if this is covering known territory, it’s all new to me. Given the location, at the start of the transcription site, it may be obvious to people thst these are promoters?

I wonder if it’s worth digging through all the Genhancer info for all the genes, especially as some enhancers can be a long way away from the transcription site for the gene.

The EPDnew info will look familiar to anyone who has been looking at AlphaGenome outputs too… I have lots more to learn here.

hotblack · Feb 11, 2026

Short version: it looks like given the location of the variations seen in the DeocdeME data it’s more likely that in people with ME/CFS the CA10 protein is being produced in the wrong amounts in certain tissues

jnmaciuch · Feb 11, 2026

@hotblack you might like to explore some of the tracks on UCSC genome browser--it compiles a lot of this information visually so you can do some exploring.

Here I have it centered on the top DecodeME SNP in the region:

https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr17%3A52127451%2D52238561&hgsid=3676089975_BEkeA67B2JkKWyS6cMkL3z6tsVek

You can "highlight" a region by dragging a box over a region and clicking "add highlight" on the box that comes up, that way you can keep track of where a SNP overlaps with features on other tracks

These tracks have a lot of promoter/enhancer/regulatory region info you can play around with (click "show" and then "refresh" to add them.

The GENCODE track under "Genes and Gene Predictions" will mark known genes with their exons and introns

If you want to highlight a lot of SNPs at once there are ways you can add a custom track in a BED file format, though it'll take more effort to figure out

Genome Browser FAQ

hotblack · Feb 11, 2026

jnmaciuch said:
@hotblack you might like to explore some of the tracks on UCSC genome browser--it compiles a lot of this information visually so you can do some exploring.

Thanks! The highlighting doesn’t seem very touchscreen friendly unfortunately. But the rest seems to work, I’ll have an explore.

jnmaciuch · Feb 11, 2026

hotblack said:
Thanks! The highlighting doesn’t seem very touchscreen friendly unfortunately. But the rest seems to work, I’ll have an explore.

If you start zoomed in on a SNP, you can also hit the "highlight" button down here which will highlight everything in your viewer

Jonathan Edwards · Feb 11, 2026

hotblack said:
For instance pick the top one, a promoter/enhancer with a high score (it has a little star by it to show this too)
Expand the info and see the location of it is chr17:52158438-52159092
You can zoom in to that location on the DecodeME LocusZoom data and see lots of the hits tie in with this area

This is more or less exactly what my genetics friends at UCL did when they gave me a presentation of why they thought it was worth picking CA10 for a basic biology project.

hotblack · Feb 12, 2026

jnmaciuch said:
If you start zoomed in on a SNP, you can also hit the "highlight" button down here which will highlight everything in your viewer

Thanks for the tip!

Jonathan Edwards said:
This is more or less exactly what my genetics friends at UCL did when they gave me a presentation of why they thought it was worth picking CA10 for a basic biology project.

Good to know I’m not talking nonsense! And nice to start to understand it a bit more.

hotblack · Feb 12, 2026

I made a script to pull out some data to make it easy to compare to locuszoom and then looked at the top 15 and manually went through deleting those which weren’t clearly/very significant hits. It seems to be mainly the promoter locations.

Gene: CA10 - carbonic anhydrase 10
Location: chr17:51630313-52160017

Found 33 GeneHancer elements:
[GH17J052158] *Promoter/Enhancer | Score: 237 | chr17:52158674-52158830
Sources: ENCODE(Z-Lab),EPDnew
External: ENSR17_9QNZ6, ENSR17_9QNZ9, ENSR17_5G98JK, CA10_2
TFBSs: EZH2, ASH2L, RNF2, RBBP5, ZFX, CTCF, RAD21, SMC3, KDM5A, ZEB1, TCF12, ZNF263, EGR1, REST

[GH17J052146] *Enhancer | Score: 192 | chr17:52146767-52150085
Sources: ENCODE(Z-Lab),FANTOM5
External: ENSR17_9QNT2, ENSR17_9QNT6, ENSR17_83JDPW
TFBSs: ATF7, RUNX3, EP300, SPI1, FOS, FOS, GABPA, POLR2A, MAX, ZBTB33, YY1, REST, SP1, RXRA, MYC, STAT3, HNF4A, JUND, FOXA2, FOXA1, ATF3, FOS, EGR1

[GH17J052160] Promoter | Score: 149 | chr17:52158121-52158181
Sources: EPDnew
External: CA10_1
TFBSs: EZH2, SUZ12, ASH2L, MXI1, RNF2, GATA2, RBBP5

[GH17J052159] Promoter | Score: 94 | chr17:52159997-52160057
Sources: EPDnew
External: CA10_3
TFBSs: EZH2, SUZ12

The scores and stars (meaning an ‘Elite’) are Genehancer info, links to LocusZoom, the EPE or Ensembl info are included and TFBSs are the Transcription Factor Binding Sites.

Does anyone know what the logic is for the circles/triangle and colouring on LocusZoom? The latter seems to change dynamically as you move around so maybe it’s just highlighting most significant in the current view?

Felis Catus · Feb 12, 2026

jnmaciuch said:
Wasn't kept in GRCh38, yes--unfortunately Ensembl doesn't really have detailed annotation for why certain genes get dropped in the newest release. Sometimes it's because the gene mapping is suspect, sometimes it's for some other logistical reason. I think that happens a lot to snoRNAs and miRNAs in particular just because of the sheer number of them. But that was the reasoning for creating the Archive--the current version is curated with the best intentions, but shouldn't be considered the end all be all.

Ensembl pipelines are very complex and it would be impossible to provide explanations for all changes between the releases. Sometimes it's possible to figure out or speculate the reasons by looking into intermediate outputs that people who ran the pipelines have but even that is quite challenging. The sheer amount of data and analyses that go into each release... It takes 3-4 or more months and something like 80-100 people working full-time to get a release out. The documentation on the website could be better, though, but it still wouldn't be enough to determine the exact reason behind each change.

I see snoZ178 was last present in Ensembl release 75 on a lower quality assembly which was 12 years ago. Assemblies, in this case GRCh37 and GRCh38, are imported into Ensembl, so if the sequence annotated as snoZ178 on GRCh37 was not present in the better quality assembly (GRCh38), it wouldn't have been annotated. If it is in GRCh38, it's possible that it wasn't predicted due to not passing some threshold somewhere or due to changes in the annotation pipeline. If we think snoZ178 or anything missing from Ensembl might be important, it's possible to contact Ensembl and the relevant team will hopefully have a look at it. There is a team which does manual annotation on the human genome.

I googled "snoZ178" and one of the results was https://humanpaingeneticsdb.ca/ where I found this:

[
{
"Loci ": "CA10; snoZ178",
"Publication Loci ": "CA10; snoZ178",
"Variants ": "rs11079993",
"allele 1 ": "G",
"allele 2 ": "T",
"direction ": "down",
"Phenotype ": "Other Clinical Pain",
"PMID ": "PMID:33830993",
"comments ": "Significantly associated with multisite chronic pain in female"
},
{
"Loci ": "snoZ178",
"Publication Loci ": "snoZ178",
"Variants ": "rs967823",
"allele 1 ": "G",
"allele 2 ": "A",
"direction ": "no direction reported",
"Phenotype ": "Pain",
"PMID ": "PMID:37844115",
"comments ": "Significntly ssocited with pain"
}
]

The variants are in the current Ensembl release: rs11079993, rs967823. When I clicked on "Phenotype data", an additional piece of info (after a table of associations) said "This variant has not been mapped to any Ensembl genes." for both of them.

I found snoZ178 in the current release of Ensembl Plants in Oryza meridionalis and Oryza longistaminata but the annotation is based on Rfam and a different team is in charge of plants annotation.

I don't know if any of this is relevant any more and I see the conversation has moved on, so apologies if irrelevant.

hotblack · Feb 13, 2026

Wasn’t sure where to put this, but here’s the info from my script (mentioned above) for all the DecodeME candidate genes, ready to paste into posts and check. I’ll work through seeing if there’s anything interesting but if others want to join in too, the more the merrier.

There’s the same dats in csv format and some of the raw track data from the APIs too, but the txt files are the formatted ones with nice BBCode URLs.

hotblack · Feb 14, 2026

Sorry this may be going off topic and getting fragmented, but I’ve been through all of these files to check for mentions of other genes in original candidate gene list to answer the question:

Which genes in the decodeme candidate gene list are mentioned in potential binding sites for other genes in the candidate list.

I made a script to do this after manually checking a few and finding lots of mentions for SOX6, looks like that does stand out

ABT1: found in 2 other gene files: HMGN4, ZNF322
ANKRD45: found in 1 other gene files: KLHL20
BTN3A3: found in 1 other gene files: BTN2A2
CCDC92: found in 2 other gene files: DNAH10, ZNF664
CSE1L: found in 1 other gene files: ARFGEF2
DARS2: found in 2 other gene files: KLHL20, ZBTB37
DDX27: found in 2 other gene files: STAU1, ZNFX1
KLHL20: found in 2 other gene files: ANKRD45, SLC9C2
PEBP1: found in 2 other gene files: TAOK3, VSIG10
PRDX6: found in 2 other gene files: SLC9C2, TNFSF4
SERPINC1: found in 2 other gene files: RC3H1, ZBTB37
SLC9C2: found in 1 other gene files: ANKRD45
SOX6: found in 48 other gene files: ABT1, ANKRD45, ARFGEF2, B4GALT5, BTN2A2, BTN3A3, CCDC92, CCPG1, CDK5RAP1, CSE1L, DARS2, DDX27, DNAH10, DNAJC1, ECI2, FBXL4, H4C8, HFE, HMGN4, HTT, KLHL20, LRRC7, MLLT10, MMS22L, MRPL39, PEBP1, PLCL1, POU3F2, PRDX6, PTGIS, RABGAP1L, RC3H1, SERPINC1, SLC2A14, SLC9C2, STAU1, SUDS3, TAOK3, TNFSF4, TRIM38, VRK2, VSIG10, ZBTB37, ZNF311, ZNF322, ZNF644, ZNF664, ZNFX1
STAU1: found in 2 other gene files: DDX27, ZNFX1
SUDS3: found in 1 other gene files: TAOK3
TAOK3: found in 1 other gene files: SUDS3
TNFSF4: found in 2 other gene files: PRDX6, SLC9C2
VSIG10: found in 2 other gene files: PEBP1, TAOK3
ZNF664: found in 2 other gene files: CCDC92, DNAH10
ZNFX1: found in 1 other gene files: DDX27

TFBS matches only:

SOX6: found in 48 other gene files: ABT1, ANKRD45, ARFGEF2, B4GALT5, BTN2A2, BTN3A3, CCDC92, CCPG1, CDK5RAP1, CSE1L, DARS2, DDX27, DNAH10, DNAJC1, ECI2, FBXL4, H4C8, HFE, HMGN4, HTT, KLHL20, LRRC7, MLLT10, MMS22L, MRPL39, PEBP1, PLCL1, POU3F2, PRDX6, PTGIS, RABGAP1L, RC3H1, SERPINC1, SLC2A14, SLC9C2, STAU1, SUDS3, TAOK3, TNFSF4, TRIM38, VRK2, VSIG10, ZBTB37, ZNF311, ZNF322, ZNF644, ZNF664, ZNFX1

So now to manually check if any or how many of these locations for promoters/enhancers and matching TFBSs show up on LocusZoom… this may take some time

Edit: I think most of those matches are textually correct but contextually wrong, they weren’t TFBSs but other mentions of genes in the data, sometimes grep isn’t the right tool for the job! I’ve updated and it is only SOX6 which seems relevant, still a lot of checking to do. I may stop now or rope in some help, report with links to locuszoom urls attached and webpage here but also on the SOX6 thread as that now seems most appropriate. Sorry for the cross posting moderators, I wasn’t sure where any if this was going as I have been exploring…

forestglip · Apr 21, 2026

I searched the top SNP (rs34626694) at the CA10 locus with the GWAS Atlas PheWAS search. The trait most significantly associated with this SNP is "Ease of getting up in the morning", which would make sense as being related to ME/CFS.

This trait was tested in the following study, and was based on UK BioBank data (n = 385,949): Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways (2019, Nature Genetics)

I downloaded the summary stats to see how well they match up with DecodeME. Plotted together after litover of the "getting up" data to GRCh38, it seems fairly similar:

I also tried to test for a shared variant with the coloc software. I got a 94.6% posterior probability of the traits sharing a causal variant.

I think I'm using it right, but am still not 100% sure, so would appreciate any experts weighing in. Here's the code I used for colocalization:

Code:

library(coloc)
library(dplyr)
library(data.table)

region <- list(chr = 17, start_pos = 52147538, end_pos = 52264626) # CA10 locus

decodeme_region <- fread("../../Data/gwas_1.filtered.gz",
                         select = c("CHROM", "GENPOS", "ID", "ALLELE0", "ALLELE1",
                                    "A1FREQ", "BETA", "SE", "LOG10P")
) %>%
  filter(CHROM == region$chr, GENPOS >= region$start_pos, GENPOS <= region$end_pos) %>%
  mutate(
    MAF = pmin(A1FREQ, 1 - A1FREQ)
  )

gettingup_region <- fread(
  "~/Projects/science/diseases/sleep/Jansen 2019/Ease of getting up/Data/Jansen_2019_Gettingup_GRCh38_liftover.tsv.gz",
  select = c("SNP", "CHR", "BP", "A1", "A2",
             "MAF", "OR", "SE", "P")
) %>%
  filter(CHR == region$chr, BP >= region$start_pos, BP <= region$end_pos) %>%
  rename(BETA = OR) #  OR column appears to actually be BETA, as some values are negative

merged <- inner_join(
  decodeme_region %>%
    select(GENPOS, ALLELE0, ALLELE1, BETA, SE, MAF),
  gettingup_region %>%
    select(SNP, BP, A1, A2, BETA, SE, MAF),
  by = join_by(GENPOS == BP),
  suffix = c("_decodeme", "_gettingup")
) %>%
  mutate(
    alleles_match = (ALLELE0 == A2 &
                       ALLELE1 == A1),
 
    alleles_flipped = (ALLELE0 == A1 &
                         ALLELE1 == A2),
 
    alleles_ok = alleles_match | alleles_flipped,
 
    BETA_gettingup = if_else(alleles_flipped,
                             -BETA_gettingup,
                             BETA_gettingup),
  ) %>%
  filter(alleles_ok) %>%
  select(SNP, GENPOS,
         BETA_decodeme,
         BETA_gettingup,
         MAF_decodeme,
         MAF_gettingup,
         SE_decodeme,
         SE_gettingup,
  )


dataset1 <- list(
  beta = merged$BETA_decodeme,
  varbeta = merged$SE_decodeme^2,
  snp = merged$SNP,
  position = merged$GENPOS,
  type = "cc"
)

dataset2 <- list(
  beta = merged$BETA_gettingup,
  varbeta = merged$SE_gettingup^2,
  snp = merged$SNP,
  position = merged$GENPOS,
  type = "quant",
  N = 384689,
  MAF = merged$MAF_gettingup
)

check_dataset(dataset1)
check_dataset(dataset2)

plot_dataset(dataset1, main = "DecodeME")
plot_dataset(dataset2, main = "Ease of Getting up")
plot_datasets(dataset1, dataset2)

coloc_result <- coloc.abf(
  dataset1 = dataset1,
  dataset2 = dataset2
)

print(format(round(coloc_result$summary, 3), scientific = FALSE))

head(coloc_result$results[order(coloc_result$results$SNP.PP.H4, decreasing = TRUE), ])

So it seems there is a shared variant associated with ME/CFS, multisite chronic pain, and ease of getting up in the morning.

* I downloaded the summary stats for "Getting up in the morning" trait from here. (Look for link under the title of the above paper.)

forestglip · Apr 21, 2026

Jonathan found another study mentioning this gene:

Jonathan Edwards said:
There seems to be another more recent study of 'coathanger pain' again using the Biobank that came up with CA10:

A genome-wide association study identifies novel genetic variants associated with neck or shoulder pain in the UK biobank (N = 430,193) - PubMed

In summary, this study has identified novel genetic variants for neck or shoulder pain. Sex-stratified GWAS also suggested that sex played a role in the occurrence of the phenotype.

pubmed.ncbi.nlm.nih.gov

Yiwen Tao , Qi Pan, Tengda Cai et al. A genome-wide association study identifies novel genetic variants associated with neck or shoulder pain in the UK biobank (N = 430,193)
Pain Rep 2025 Apr 18;10(3):e1267.
doi: 10.1097/PR9.0000000000001267. eCollection 2025 Jun.

I haven't tried downloading it to compare directly, but here is the paper's plot of the locus:

Here is that shoulder/neck pain lead variant (rs9889282) highlighted in the DecodeME locus:

So I'm not sure, but maybe the variant is shared with this trait as well.

Edit: Fixed a rounding issue with the x-axis positions in the second plot, where multiple spots were labeled 52.2.

forestglip · Apr 21, 2026

Here are the top 20 most significant traits associated with the lead DecodeME SNP at the CA10 locus (rs34626694) on GWAS Atlas. Reaction time is also fairly significant.

Study Trait P-value Sample size
30804565 Ease of getting up in the morning 2.01E-09 385949
31427789 Getting up in morning 3.47E-09 385494
31427789 Overall health rating 9.40E-09 384850
BioRxiv: https://doi.org/10.1101/261081 Ever smoker 3.14E-08 518633
29844566 Reaction time 6.97E-08 330069
31427789 Time spent watching television (TV) 1.86E-07 365236
30696823 Chronotype 5.90E-07 449732
27864402 Self-rated health 8.16E-07 111749
31427789 Pain type(s) experienced in last month: Neck or shoulder pain 0.000001609 385698
31427789 Reaction time test - Mean time to correctly identify matches 0.000001731 383748
31427789 Number of self-reported non-cancer illnesses 0.000001853 386581
30804565 Morningness 0.000002321 345552
31427789 Medication for pain relief, constipation, heartburn: Paracetamol 0.00000241 382089
31427789 Number of treatments/medications taken 0.000002792 386581
31427789 Morning/evening person (chronotype) 0.000002917 345148
31427789 Frequency of tiredness / lethargy in last 2 weeks 0.000002977 375053
30643251 Ever smoked regulary 0.00000572 262990
29899525 Strenuous sports or other exercises 0.00001 350492
31427789 Wheeze or whistling in the chest in last year 0.00001305 379150
31427789 Types of physical activity in last 4 weeks: Other exercises (eg: swimming, cycling, keep fit, bowling) 0.00001679 384450

Hutan · Apr 21, 2026

That's an amazing match with the 'Ease of getting up in the morning' gene. Very impressive investigation FG.

forestglip said:
haven't tried downloading it to compare directly, but here is the paper's plot of the locus:

I'm not understanding why the x axis in the neck shoulder pain chart is different. Can you help me understand?

Study	Trait	P-value	Sample size
30804565	Ease of getting up in the morning	2.01E-09	385949
31427789	Getting up in morning	3.47E-09	385494
31427789	Overall health rating	9.40E-09	384850
BioRxiv: https://doi.org/10.1101/261081	Ever smoker	3.14E-08	518633
29844566	Reaction time	6.97E-08	330069
31427789	Time spent watching television (TV)	1.86E-07	365236
30696823	Chronotype	5.90E-07	449732
27864402	Self-rated health	8.16E-07	111749
31427789	Pain type(s) experienced in last month: Neck or shoulder pain	0.000001609	385698
31427789	Reaction time test - Mean time to correctly identify matches	0.000001731	383748
31427789	Number of self-reported non-cancer illnesses	0.000001853	386581
30804565	Morningness	0.000002321	345552
31427789	Medication for pain relief, constipation, heartburn: Paracetamol	0.00000241	382089
31427789	Number of treatments/medications taken	0.000002792	386581
31427789	Morning/evening person (chronotype)	0.000002917	345148
31427789	Frequency of tiredness / lethargy in last 2 weeks	0.000002977	375053
30643251	Ever smoked regulary	0.00000572	262990
29899525	Strenuous sports or other exercises	0.00001	350492
31427789	Wheeze or whistling in the chest in last year	0.00001305	379150
31427789	Types of physical activity in last 4 weeks: Other exercises (eg: swimming, cycling, keep fit, bowling)	0.00001679	384450

Genetics: Chromosome 17 CA10

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Attachments

Senior Member (Voting Rights)

Attachments

Moderator

Moderator

Yiwen Tao , Qi Pan, Tengda Cai et al. A genome-wide association study identifies novel genetic variants associated with neck or shoulder pain in the UK biobank (N = 430,193)​

Moderator

Moderator

Yiwen Tao , Qi Pan, Tengda Cai et al. A genome-wide association study identifies novel genetic variants associated with neck or shoulder pain in the UK biobank (N = 430,193)