For those with strength, courage and coding skills:
I've noticed that the Dutch authors of MAGMA have a new method called FLAME. It combines multiple approaches to finding the effector gene within a significant GWAS locus using a machine-learning framework.
Looks interesting. I tried to see if I could do anything, but it's too much stuff I don't know how to do, like the part about creating credible set files.
In other news, based on a suggestion by
@hotblack, I tried to use the UK BioBank reference panel for FUMA instead of the 1000 Genomes reference as I did before. It looks like that was the main reason my results were somewhat different from the paper's results.
Now the tissue enrichment is almost identical (first chart is DecodeME). In my
previous post, the highest -log10 p-value was around 7, and now it's around 8.5 like the study. There's still two pairs of tissues that swapped positions, so it's not exactly the same, but the p-values are all now very close to the study's values.
Here are the updated top ten gene sets:
Links to descriptions for these:
The first mention of synapse (GOCC_SYNAPTIC_MEMBRANE) moved down to rank 31 (out of 17,006 gene sets).
I also reran the cell-type analysis, testing the same brain region datasets as
last time. Even more cell types are significant now!
There's something like a three step process, where it shows all the cell-types that showed significant enrichment of the DecodeME genes:
Then it removes redundant cell-types from within a dataset if multiple cell-types from one dataset are very similar to each other:
Then it looks for redundant cell-types between different datasets. I don't really know how to interpret this, but if anyone wants to have a go, the
FUMA tutorial describes this analysis:
It looks like it now includes neurons from two new areas of the cortex (and one from before is gone), GABAergic neuron from the cerebellum, neuron from white matter, neuron from cerebral nuclei, and many specific cells (mostly subtypes of excitatory neurons, and one subtype of oligodendrocyte) from the primary motor cortex.
But I think the last image is showing that a lot of these cell-types are very correlated to each other, so many non-interesting neurons might just be showing up because they're so similar to a cell-type of interest, not because they all play a part.
Edit: I probably wouldn't put too much stock in the top gene sets. When I plot all the p-values, it looks to be basically a uniform distribution you'd expect if almost all gene sets were not true effects. While there might be some real enriched gene sets in there, there are probably too many false positives to know which they are. Nothing significant even with the less strict FDR. It makes sense that they didn't report any significant gene sets.
