Genetics: SOX6

hotblack

Senior Member (Voting Rights)
I didn’t see a thread about SOX6, apologies if I missed it

SOX6 seems interesting as it is not only a peak in LocusZoom itself but also a Transcription Factor Binding Site mentioned in Genehancer data for promoters and enhancers for many (well most) other genes in the DecodeME candidate gene list (including other transcription factors).
 
Last edited:
Question for the more knowledgeable. If a transcription factor itself and multiple genes with that transcription factor in transcription factor binding sites for their promoters and enhancers are all found (as they seem to have been here) would that cascade and magnify any effect?

I don’t know enough of the biology or if this is rare or common (it just stood out compared to other transcription factors in the list) but it seems like the sort of thing were a few small changes could quickly have an outsized impact.
 
Last edited:
Genecards info

The NCBI summary
This gene encodes a member of the D subfamily of sex determining region y-related transcription factors that are characterized by a conserved DNA-binding domain termed the high mobility group box and by their ability to bind the minor groove of DNA. The encoded protein is a transcriptional activator that is required for normal development of the central nervous system, chondrogenesis and maintenance of cardiac and skeletal muscle cells. The encoded protein interacts with other family members to cooperatively activate gene expression. Alternative splicing results in multiple transcript variants.

And from the UniProt summary
Transcription factor that plays a key role in several developmental processes, including neurogenesis, chondrocytes differentiation and cartilage formation (Probable). Specifically binds the 5'-AACAAT-3' DNA motif present in enhancers and super-enhancers and promotes expression of genes important for chondrogenesis

There’s various studies, mainly mouse though, mentioned on OMIM around oligodendrocyte development in mouse spinal cord, selectively expression in distinct subpopulations of mouse embryonic and adult midbrain dopamine (mDA) neurons, the role in the differentiation of cortical interneurons, dopaminergic neurons in the substantia nigra, and oligodendrocyte development. Also regulation of glucose-stimulated insulin secretion by reducing transcription of genes for insulin and ATP production in mitochondria. And roles in cartilage formation.
 
The relation to sex determination sounds interesting in relation to the sex ratio issue.
If a transcription factor itself and multiple genes with that transcription factor in transcription factor binding sites for their promoters and enhancers are all found (as they seem to have been here) would that cascade and magnify any effect?

I am out of my depth here too but I suspect not - that the links are all ways of weighting the same relevant pathway a bit in favour of the pathological process in any given individual.
 
I am out of my depth here too but I suspect not - that the links are all ways of weighting the same relevant pathway a bit in favour of the pathological process in any given individual.
I guess so, they would also all need to be pointing in the same direction to magnify, with all the permutations I can see it being just as likely that a change in one cancels out a change in another.

It does though seem significant that SOX6 shows up in the binding sites of potential promoters and enhancers for so many of the other genes (48 out of the 59) while other transcription factors or regulatory genes either don’t at all or are only in one or two. There could be many reasons for or implications of this though I suppose. Maybe a thread for someone to pull on though.
 
Last edited:
A webpage (because it was too long for a post) with details from genehancer of all promotors are enhancers for DecodeME candidate genes which mention SOX6 as a transcription factor binding site. Included are links to the locus in the DecodeME data and genehancer sources with info on tissues/etc.

Uncurated so will include some which are not statistically significant in the DecodeME data, but a lot do seem notable and tbh I’m too fried to check now… also not sure what the best threshold would be.
 
It does though seem significant that SOX6 shows up in the binding sites of potential promoters and enhancers for so many of the other genes (48 out of the 59) while other transcription factors or regulatory genes either don’t at all or are only in one or two. There could be many reasons for or implications of this though I suppose. Maybe a thread for someone to pull on though.
Could be an interesting clue. The one thing to check is just whether SOX6 always comes up if you have a GWAS skewed for genes highly expressed in the brain or something like that. If it's easy to do with your existing code and you feel up to it, it would be worthwhile to see if you get the same SOX6 pattern looking at GWAS for something like PTSD or schizophrenia
 
The one thing to check is just whether SOX6 always comes up if you have a GWAS skewed for genes highly expressed in the brain or something like that. If it's easy to do with your existing code and you feel up to it, it would be worthwhile to see if you get the same SOX6 pattern looking at GWAS for something like PTSD or schizophrenia
Funny you should mention that… :) Good suggestion on the conditions thanks, I was thinking of comparing to a random selection or another GWAS set when I’m up to it, but hadn’t thought of looking at more brain related conditions, makes sense!
Question for the more knowledgeable. If a transcription factor itself and multiple genes with that transcription factor in transcription factor binding sites for their promoters and enhancers are all found (as they seem to have been here) would that cascade and magnify any effect?
Any thoughts on this? It sort of feels intuitively like it could and I’ve been searching and found out about feed forward loops but don’t entirely understand them and am not sure if it’s relevant?
 
Good suggestion on the conditions thanks, I was thinking of comparing to a random selection or another GWAS set when I’m up to it, but hadn’t thought of looking at more brain related conditions, makes sense!
Would be good to compare to other non-brain-dominant GWAS too as an additional control. We’d just want to make sure at least one or two comparison GWAS have a similar tissue distribution as in DecodeME since it was so starkly brain-dominated—otherwise we might wrongly assume it’s an ME/CFS-specific feature when it’s just a proxy of tissue-specific enrichment

Any thoughts on this? It sort of feels intuitively like it could and I’ve been searching and found out about feed forward loops but don’t entirely understand them and am not sure if it’s relevan
It’s a bit abstract so I couldn’t tell you off the top of my head. I agree with Jonathan that, if we could determine that the pattern we’re seeing here is somewhat ME/CFS-specific, it would just point to the relevance of the overall pathway. Having two “hits” in the pathway might make someone more susceptible to developing ME/CFS than someone who just has one, but finding evidence of that doesn’t really tell us anything additionally useful about the biology of ME/CFS—it would just confirm that the pathway is important.
 
Short version:
It looks like this is may not be as significant as I initially hoped. SOX6 seems to be everywhere!

Longer details:
Maybe this is because it is a common transcription factor or perhaps a bias in the computed data on binding sites? Either way, it may still be useful information, and seeing variations on LocusZoom match these sites and the high percentage of genes linked to SOX6 in DecodeME candidate genes (48 of 58) seems interesting, but having it pop up a lot seems not uncommon looking at some other studies.

My scripts needed to be updated to be more flexible but they should now be so and hopefully I haven’t broken anything, will share them soon too.

Something I’ve used for gene sets before is this : Curated Gene-Disease Association Evidence Scores 2025
You can download json of gene sets and then process them with something like jq to get a newline or comma separated list of gene symbols, so for example
jq -r '.associations[].gene.symbol' Schizophrenia.json

Then there’s the GWAS Catalog, here there’s more data and options, so for this I looked for publications with 20-50 associations which included SOX6 and european populations to give a decent comparison. Processing the tab separated file is more of a pain as there are often multiple mapped genes and they seem to show them delimited inconsistently (either , or - and I’m not sure why, possibly because of mapping from rsid/snps?). Anyway onto the results.

Results:

Schizophrenia Curated gene set
No SOX6 in the main candidate list, but SOX6 does appear in binding sites for 12 of the 17 genes
Loaded 1 candidate genes.
Analyzing 16 regulatory element files... (Mode: TFBSs Column Only)

--- Gene Binding Site Report ---
Search Mode: TFBSs Column Only
Format: Binding Gene -> Matches

SOX6: found in 12 other gene files: ABCA13, AKT1, C4A, COMT, DGCR2, DGCR8, NOS1AP, RTN4R, SYN2, TOP3B, YWHAE, ZDHHC8

Multi-site chronic pain study (which includes SOX6)
23 out of 33 matches on SOX6
Loaded 34 candidate genes.
Analyzing 33 regulatory element files... (Mode: TFBSs Column Only)

--- Gene Binding Site Report ---
Search Mode: TFBSs Column Only
Format: Binding Gene -> Matches

NMT1: found in 5 other gene files: ECM1, FAF1, GMPPB, MLN, PRC1
SOX6: found in 23 other gene files: ASTN2, CEP120, CTNNA2, ECM1, EXD3, FAF1, FAM120A, GMPPB, KCND3, KNDC1, MAML3, MLLT10, MLN, MON1A, MON1B, NMT1, NUMB, PRC1, SDK1, SLC39A8, SP4, STAG1, UTRN

PTSD and GAD (which includes SOX6)
20 out of 51
Loaded 52 candidate genes.
Analyzing 48 regulatory element files... (Mode: TFBSs Column Only)

--- Gene Binding Site Report ---
Search Mode: TFBSs Column Only
Format: Binding Gene -> Matches

SOX6: found in 20 other gene files: ACTN1, BIN3, CLEC18B, EGR3, FAM120AOS, FBXL17, FES, GNGT1, KCNB2, LINC01023, LINC02770, MAD1L1, MAPT, MAPT-IT1, NOS1, OR5AZ1P, OR5BA1P, SP4, TCF4, TERF1
(My numbers may seem to not add up as not all genes ended up with regulatory files with site info and I excluded SOX6 from the totals if it was in the starting set, maybe I should pick a more consistent total for clarity, I have a feeling my numbers may be off somewhere, but it’s close enough for this exercise)

There’s some other ones which may be useful to look at, more ptsd, lupus and tea consumption (erm…)
 
Last edited:
Great work @hotblack, it’s really good to have those comparisons. We can still consider SOX6 potentially relevant to ME/CFS on the basis of it being a hit, and it’s possible SOX6-regulated genes are slightly more enriched in ME/CFS than they are in other conditions.

I suspect that it comes up so frequently because of its importance in developmental biology—suites of genes active in specific tissues/systems will often be under the control of a small set of TFs so cells can make those genomic regions accessible all together during cell differentiation.
 
It's very unlikely a developmental gene is going to be involved in the etiology of ME/CFS because you would see a much higher manifestation in early development (including at birth). The distribution of manifestation of this disorder peaks in middle age, so would not count on this gene too much. p-value is also less than the gwas p-value cutoff, meaning it's not significantly associated with me/cfs.
 
It might relate to non-canonical functions outside neurodevelopment.

Emerging roles of Sox6 in the renal and cardiovascular system (2020) —

We and others have shown that renin promoter possesses the binding site for Sox6, and along with other transcription modulators regulate renin expression. Using a loss of function mouse model, in which Sox6 is specifically knockout in renin expressing cells, Sox6 was shown to regulate renin expression and JG recruitment in response to sodium depletion and dehydration). These studies show that Sox6 is one of the main contributors of renin regulation. Renin is the rate-limiting enzyme in renin angiotensin aldosterone system (RAAS).

Sox6 as a new modulator of renin expression in the kidney (2020) —

our findings indicate that Sox6 has a previously undefined role in modulating renin expression in response to Na and volume deprivation. Given this critical function, Sox6 might also be a therapeutic target for the treatment of hypertension.
 
Last edited:
It's very unlikely a developmental gene is going to be involved in the etiology of ME/CFS because you would see a much higher manifestation in early development (including at birth).

We are talking about a gene that contributes to neurodevelopment - but that by no means it just does that. And the argument about early manifestation in a sense applies to all genes (and misleadingly) since they are all used since birth (pretty much). Fair skin genes give you fair skin at birth but maybe melanoma at 79 years old. Quite a lot of monogenetic neurological disorders that involve genes contributing to neurodevelopment typically present in middle years.

p-value is also less than the gwas p-value cutoff, meaning it's not significantly associated with me/cfs.

The p value in DecodeME is purely an artefact of the number of SPs interrogated. Statistical significance is no guide to biological significance here. There will be large numbers of type 2 errors. They will be type 2 errors for relatively small differences in risk for common SNPs but that might be because common SNP variants around the gene don't have big effects on it. This particular gene may well not turn out to be informative but not for these reasons I think.
 
Back
Top Bottom