ME/CFS Bioinformatics Repository

tralfamadorian97

Established Member (Voting Rights)
Since the DecodeME preprint was released, I have been teaching myself bioinformatics by analyzing the summary statistics from DecodeME and other GWAS. I’ve been publishing my code on GitHub here, and documenting my results here.

While I don’t have dramatic headline results, I still thought that this work would be of interest to Science4ME forum members, because there are a few analyses that supplement recent forum discussions. For example, I ran gene-level H-MAGMA on the DecodeME summary statistics.

If anyone wants to contribute, I will happily accept fixes or additions to the documentation or codebase. For minor changes, you can just create a GitHub pull request. For major additions, it is probably better to first create a GitHub issue with a brief proposal, which we can discuss.
@forestglip already found the repo and has made some very helpful contributions
 
It's been incredible watching this project being developed. I stumbled across it a few months ago, and it was immediately obvious tralfamadorian97 is remarkably motivated, organized, and intelligent.

There are a whole lot of different results and even lessons about bioinformatics tools in the documentation which I found very interesting to explore.
 
Last edited:
Impressive @tralfamadorian97 , thanks.

Do check out Paolo Maccallini's meta-analysis which found some stronger results than DecodeME alone:

I'm particularly interesting in the cell type eccentric medium spinal neuron that was significant in the meta-analysis:

We've also been trying to use tools such as FLAMES to help identify the causal genes but havent' really managed to make it work. Perhaps you might be able to do it?
 
Do check out Paolo Maccallini's meta-analysis which found some stronger results than DecodeME alone:
Yes, I did see Paolo's paper. Impressive. I'll read it in more detail when I get a chance.


We've also been trying to use tools such as FLAMES to help identify the causal genes but havent' really managed to make it work. Perhaps you might be able to do it?

I've create a GitHub Issue to track this here. This might take a while, but at the moment I don't see any insurmountable barriers to running this.
 
I'm particularly interesting in the cell type eccentric medium spinal neuron that was significant in the meta-analysis:
See this page for the results of MAGMA using DecodeME sumstats on a brain cell-type dataset, like Paolo did. The same dataset as one of the two Paolo used actually: Siletti 2023. And the most significant finding was eccentric MSNs.

I don't know the details of the cell-type data, but I think it might be slightly different in this analysis because Paolo's analysis gives specific brain regions, while this seems to be more focused on cell-type in general. Maybe tralfamadorian can clarify. It looks more significant in this analysis.
 
@tralfamadorian97 if I remember correctly I got the pops analysis running, I can put a pull request for you to check that if you want.


The fine mapping is far more difficult than it seems, using Susie-r is one thing but if I remember correctly you need a large file, possibly many gb’s to make this, called a linkage disequilibrium…It’s been a while, unfortunately I’ve been in a bit of a down trend since the beginning of the year so I just haven’t had the clarity of mind to learn something so complex.
 
See this page for the results of MAGMA using DecodeME sumstats on a brain cell-type dataset, like Paolo did. The same dataset as one of the two Paolo used actually: Siletti 2023. And the most significant finding was eccentric MSNs.
Thanks, I see this as confirmation that the data really points to eMSN and that it wasn't a fluke or error from the meta-analysis Paolo did.
 
See this page for the results of MAGMA using DecodeME sumstats on a brain cell-type dataset, like Paolo did.
Think it's worth looking into these other cell types as well. Seems like there's a link to splatter cells as well which are also poorly understood.

I also have a question @tralfamadorian97: does the signal for Amygdala excitatory (Cluster419) point to the amygdala's intercalated cells. Cause I read that they have the same developmental origin as the eMSN, with some saying they "represent a ventral extension of the dorsal striatum."

1779463380137.png
 

Attachments

  • 1779463371614.png
    1779463371614.png
    253 KB · Views: 3
The fine mapping is far more difficult than it seems, using Susie-r is one thing but if I remember correctly you need a large file, possibly many gb’s to make this, called a linkage disequilibrium…It’s been a while, unfortunately I’ve been in a bit of a down trend since the beginning of the year so I just haven’t had the clarity of mind to learn something so complex.
Yes, I was able to run SUSIE-R. Some example results are here. I did have to to download the linkage disequilibrium (LD) matrices, which as you say are quite large. I got the LD matrices from here.

Overall, I found that SUSIE produced rather diffuse credible sets. That is, it returned about 50-100 possible causal variants, and could not narrow things down beyond this. I believe this is just a sample-size issue: I think you often need >50k cases to get narrow credible sets.

Nevertheless, these diffuse credible sets may still be useful. When I get around to it, my plan is to feed them into FLAMES.

Hope the downward trend improves, @ChronicallyOverIt !
 
I also have a question @tralfamadorian97: does the signal for Amygdala excitatory (Cluster419) point to the amygdala's intercalated cells. Cause I read that they have the same developmental origin as the eMSN, with some saying they "represent a ventral extension of the dorsal striatum."
Interesting question! Unfortunately, I don't know enough neuroscience to give a good answer. I do see that eccentric medium spiny neurons are labeled as originating mostly from the Amygdala. Their top three regions are: Amygdala: 75.9%, Cerebral cortex: 14.6%, Thalamus: 5.4%. It might be helpful to go back to the original Siletti paper and its supplementary material to understand more.

The HBA reference data I used for that MAGMA plot with the eccentric medium spiny neurons was prepared the authors of this paper. They preprocessed the raw Siletti 2023 scRNAseq data to produce a matrix suitable for consumption by MAGMA, as described in their github repo.
 
I thought they were from the striatum in the basal ganglia. I can’t remember where I read that but it seemed to be part of the definition.
The Human Protein Atlas at least reiterates high expression in amygdala, though I think they may be basing it on the same dataset used in the decodeme analysis:
The Eccentric medium spiny neurons include cells detected in the forebrain regions. This cluster is especially prominent in amygdala, basal ganglia and hypothalamus. As shown in Table 1, 114 genes show elevated expression in Eccentric medium spiny neuron compared to other brain cell clusters. Neurons of this cluster are inhibitory neurons, expressing both GAD1 and GAD2, where GAD2 is classified as group enriched in this cluster along with other interneuronal containing cell clusters. The Dopamine receptor D1 (DRD1)show enriched expression in this cluster, and Tyrosine hydroxylase (TH) show elevated expression.
 
The Human Protein Atlas at least reiterates high expression in amygdala, though I think they may be basing it on the same dataset used in the decodeme analysis:
Ah thanks.

I had gotten the basal ganglia from here but it seems it doesn’t contradict but it does make it seem that’s their main location.
A type of central nervous neuron comprising more than 95% of the neurons in basal ganglia input structures, such as the caudate nucleus, putamen, nucleus accumbens and striatal districts in the olfactory tubercle. The cell body has a diameter in a range between 15 and 18 μm and gives rise to three to five primary dendrites that are aspiny proximally, but densely spiny beginning at about the first branch point.

Medium Spiny Neuron — Encyclopedia of Neuroscience (Springer)

I assumed eMSN are a subtype of MSN. And would also follow this distribution but maybe I’m wrong.
 
Yes, I was able to run SUSIE-R. Some example results are here. I did have to to download the linkage disequilibrium (LD) matrices, which as you say are quite large. I got the LD matrices from here.

Overall, I found that SUSIE produced rather diffuse credible sets. That is, it returned about 50-100 possible causal variants, and could not narrow things down beyond this. I believe this is just a sample-size issue: I think you often need >50k cases to get narrow credible sets.

Nevertheless, these diffuse credible sets may still be useful. When I get around to it, my plan is to feed them into FLAMES.

Hope the downward trend improves, @ChronicallyOverIt !
It’s been a while but I think UK bio bank has LD matrices, but I don’t think they are public and are also in the range of petabytes. That would probably increase the accuracy though…
 
It’s been a while but I think UK bio bank has LD matrices, but I don’t think they are public and are also in the range of petabytes. That would probably increase the accuracy though…
The matrices I used were actually generated from the UK Biobank by the Broad Institute. While the underlying individual-level UKBB data is not public, the generated LD matrices are public. There are more than 2000 matrices, each one for a different region of the genome. In total, they are several terabytes. Luckily, I only needed to download the matrices for the specific regions of the genome I wanted to fine-map with SUSIE. This was a few gigabytes, and so was manageable.
 
@tralfamadorian97 if I remember correctly I got the pops analysis running, I can put a pull request for you to check that if you want.

Thanks for the offer! I'm trying to ensure that all the code in the repo runs through the task system as described here. The goal of the task system is to support reproducibility and iteration. The idea is that a user can just run a few lines of Python to reproduce any analysis. All auxiliary downloads are automatically performed.



I've sketched an outline of how I was planning to add POPs via the Task system here. If you are interested in doing any of those steps, let me know. However, I probably wouldn't recommend this as a first contribution, since it requires implementing new Task classes. Instead, if you are interested, I would suggest first getting a feel for how things work by using existing Task classes to analyze a new dataset.
 
Back
Top Bottom