Using MAGMA on ME/CFS genetic data

forestglip

Moderator
Staff member
Posts moved from: Eccentric medium spiny neuron (eMSN)


I ran FUMA cell type analysis on all of the datasets generated from the Siletti 2023 brain study.

Data
  • Cell types dissected from 105 brain regions
  • 3172 total cell type/region combinations were tested with MAGMA using DecodeME GWAS-1 data.
    • 1135 level 1 cell types (General cell types, like "neuron" or "astrocyte")
    • 2037 level 2 cell types (More specific cell types, like "Upper_layer_intratelencephalic" or "Eccentric_medium_spiny_neuron")
Results
  • 24 cell types from the 3172 tests were significant after Bonferroni correction at P.adj<.05.
    • All of these were "neuron" from different areas of the brain.
  • No level 2 cell types were significant after Bonferroni correction
    • The level 2 cell types with the lowest p-values are "Upper_layer_intratelencephalic" from several regions of the brain.

DatasetDissectionCell_typeNGENESBETABETA_STDSEPP.adj.pdsP.adj
33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1Cerebral cortex (Cx) - Middle frontal gyrus (MFG) - A46neuron166490.029850.0620670.00609174.84E-074.84E-060.0015342964
45_Siletti_CerebralNuclei.CEN_Human_2022_level1Amygdaloid complex (AMY) - Central nuclear group - CENneuron168560.0288320.0622750.00610371.17E-061.29E-050.0037055304
18_Siletti_CerebralCortex.SCG.A25_Human_2022_level1Cerebral cortex (Cx) - Subcallosal Gyrus (SCG) - Subgenual (subcallosal) division of MFC - A25neuron168230.0267290.0578930.00575561.72E-061.72E-050.0054663076
110_Siletti_Thalamus.LNC.VA_Human_2022_level1Thalamus (THM) - Lateral nuclear complex (LNC) - ventral anterior nucleus of thalamus - VAneuron163130.0234750.0546520.00516582.78E-063.05E-050.0088064236
12_Siletti_CerebralCortex.STG_Human_2022_level1Cerebral cortex (Cx) - Superior Temporal Gyrus - STGneuron166990.0250610.051490.00552082.84E-063.13E-050.0090154584
28_Siletti_CerebralCortex.AG.LEC_Human_2022_level1Cerebral cortex (Cx) - Anterior parahippocampal gyrus (AG) - Lateral entorhinal cortex - LECneuron168110.0252930.0564140.00563563.62E-063.62E-050.0114791508
53_Siletti_CerebralNuclei.GP.Gpi_Human_2022_level1Basal nuclei (BN) - Globus pallidus (GP) - Internal segment of globus pallidus - Gpineuron163420.0417670.0530060.00936574.14E-064.14E-050.0131244672
30_Siletti_CerebralCortex.ACC_Human_2022_level1Cerebral cortex (Cx) - Anterior cingulate cortex - ACCneuron167760.0253080.05620.00567684.16E-064.16E-050.0132056704
15_Siletti_CerebralCortex.Ig_Human_2022_level1Cerebral cortex (Cx) - Short insular gyri - Granular insular cortex - Igneuron167740.0264810.0569710.00601875.45E-065.45E-050.0173029428
31_Siletti_CerebralCortex.FuGt.TF_Human_2022_level1Cerebral cortex (Cx) - Occipitotemporal (fusiform) gyrus, temporal part (FuGt) - Temporal area TFneuron166620.0259820.0536190.00590655.48E-065.48E-050.0173768504

DatasetDissectionCell_typeNGENESBETABETA_STDSEPP.adj.pdsP.adj
82_Siletti_Midbrain.SC_Human_2022_level2Midbrain (M) - Superior colliculus and nearby nuclei - SCUpper_layer_intratelencephalic166190.0343770.0670730.00919689.31E-050.0021416680.295363952
81_Siletti_Midbrain.IC_Human_2022_level2Midbrain (M) - Inferior colliculus and nearby nuclei - ICUpper_layer_intratelencephalic167050.0232460.0542820.00644680.000156130.003278730.49524436
84_Siletti_Midbrain.PAG-DR_Human_2022_level2Midbrain (M) - Periaqueductal gray and Dorsal raphe nucleus - PAG-DRUpper_layer_intratelencephalic163660.0345050.0608230.009750.000201430.004230030.63893596
30_Siletti_CerebralCortex.ACC_Human_2022_level2Cerebral cortex (Cx) - Anterior cingulate cortex - ACCUpper_layer_intratelencephalic167760.0344490.081630.0100480.000304420.005783980.96562024
18_Siletti_CerebralCortex.SCG.A25_Human_2022_level2Cerebral cortex (Cx) - Subcallosal Gyrus (SCG) - Subgenual (subcallosal) division of MFC - A25Upper_layer_intratelencephalic168230.0367540.0844330.0108060.000336190.00672381
26_Siletti_CerebralCortex.LiG.V1C_Human_2022_level2Cerebral cortex (Cx) - Lingual gyrus (LiG) - Primary Visual Cortex - V1CUpper_layer_intratelencephalic165800.0343390.0717810.010570.000580710.009291361
12_Siletti_CerebralCortex.STG_Human_2022_level2Cerebral cortex (Cx) - Superior Temporal Gyrus - STGUpper_layer_intratelencephalic166990.0303580.0654240.00954710.000738330.01476661
31_Siletti_CerebralCortex.FuGt.TF_Human_2022_level2Cerebral cortex (Cx) - Occipitotemporal (fusiform) gyrus, temporal part (FuGt) - Temporal area TFUpper_layer_intratelencephalic166620.0326320.071130.0103580.000816310.014693581
83_Siletti_Midbrain.PTR_Human_2022_level2Midbrain (M) - Pretectal region - PTRUpper_layer_intratelencephalic166920.01920.0445680.00611020.000839830.018476261
15_Siletti_CerebralCortex.Ig_Human_2022_level2Cerebral cortex (Cx) - Short insular gyri - Granular insular cortex - IgUpper_layer_intratelencephalic167740.0325120.0729030.0103730.000862960.015533281

DatasetDissectionCell_typeNGENESBETABETA_STDSEPP.adj.pdsP.adj
38_Siletti_CerebralCortex.PRG.A35-A36_Human_2022_level2Cerebral cortex (Cx) - Perirhinal gyrus (PRG) - A35-A36Eccentric_medium_spiny_neuron167320.0308490.0655710.011420.00345710.06568491

Dataset: Dataset name
Dissection: Dissection associated with dataset name, from https://github.com/tanyaphung/scrnaseq_viewer/blob/main/resources/fumacelltype_datasets_master.csv
Cell_type: Cell type name
NGENES (v1.07)/OBS_GENES (v1.06): Number of genes used in the analysis.
BETA: Effect size
BETA_STD: Standardised effect size
SE: Standard error
P: P-value
P.adj.pds: Adjusted P-value per dataset
P.adj: Adjusted P-value across dataset

I believe the gene sets that tralfamadorian97 used are clusters defining cell types, but they aren't based on a specific location of the brain. There were 461 clusters in tralfamadorian's analysis, but 3172 tests in this region-specific analysis. So maybe when eMSNs from many parts of the brain were combined into a small number of clusters, it made the values less noisy and thus more significant.

Edit: I attached the results of the analysis.
 

Attachments

Last edited:
Another factor that may explain some of the discrepancies is how the RNAseq matrices are transformed.

A key step in applying MAGMA via RNAseq data is transforming the RNAseq matrix [whose (i,j) entry tells use how much RNA for gene i was detected in cell type j] into a specificity matrix [whose (i,j) entry tells us how specific gene i is for cell type j]. There are competing approaches, and different approaches can produce different MAGMA results.

- I used the specificity matrix generated by Duncan et al. for their paper. In the Duncan paper (see section "Gene Expression Data"), they describe producing the specificity matrix by log-transforming the RNSAseq matrix, then row-normalizing.

- I believe FUMA may use a different transformation. According to their documentation, FUMA log-transforms and adds an auxiliary average column as a control, but does not row-normalize.

To test this, we would want to fix the version of the Siletti dataset and compare different specificity matrix preparation strategies.
 
- I used the specificity matrix generated by Duncan et al. for their paper. In the Duncan paper (see section "Gene Expression Data"), they describe producing the specificity matrix by log-transforming the RNSAseq matrix, then row-normalizing.

- I believe FUMA may use a different transformation. According to their documentation, FUMA log-transforms and adds an auxiliary average column as a control, but does not row-normalize.
I'm going to have to read the Watanabe paper again to be sure I understand, but I wanted to get my thoughts out about this.

So I think I've been thinking about the MAGMA ranking problem wrong. I don't think p-values are comparable across datasets, or really even within a dataset, because of these techniques of specificity or controlling for the average. I think the goal of these analyses is just to identify reliable and unique MAGMA associations, but without any implication that you can compare the strengths of the associations. And I think this may partly explain why the boring "neuron" is so strongly significant, even more so than any of the more interesting specific cell types.

If I understand the procedure correctly: when the regression is done between expression of genes in a cell type and gene scores from GWAS, it is controlled for average expression of the genes in all cell types in the dataset. If we imagine that there is a real association with neurons but not astrocytes, this means if your dataset only has, say, 5 types of neuron, then you'd be controlling for neurons while testing for an association with neurons, so the p-value will not be very significant. If your dataset has one type of neuron and four types of astrocytes, then you'd be controlling mainly for astrocyte expression, which will not fit the GWAS scores well, and thus the neuron assocation can still be significant.

Going back to the actual FUMA analysis I did, here are the cell types included in the dataset that the most significant neuron finding came from:
From 33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1:

* neuron
oligodendrocyte_precursor_cell
astrocyte
oligodendrocyte
pericyte
central_nervous_system_macrophage
endothelial_cell
fibroblast
vascular_associated_smooth_muscle_cell
leukocyte
Quite a wide assortment, where only one cell type out of ten is a neuron. The average expression across all these cell types will be nothing like neuron expression, and neuron can be very significant.

On the other hand, here are the cell types found in the dataset that the most significant specific neuron type (Upper_layer_intratelencephalic) came from:
From 82_Siletti_Midbrain.SC_Human_2022_level2:

* Upper_layer_intratelencephalic
Miscellaneous
* Thalamic_excitatory
* CGE_interneuron
* Deep_layer_intratelencephalic
* LAMP5_LHX6_and_Chandelier
* MGE_interneuron
Astrocyte
* Midbrain_derived_inhibitory
Oligodendrocyte
Oligodendrocyte_precursor
* Deep_layer_corticothalamic_and_6b
* Splatter
* Cerebellar_inhibitory
Committed_oligodendrocyte_precursor
* Upper_rhombic_lip
Ependymal
* Amygdala_excitatory
* Deep_layer_near_projecting
Microglia
Vascular
Choroid_plexus
Fibroblast
Assuming I correctly labeled all the neuron cell types in this dataset with a star, then 13/23 cell types in this dataset are neurons. The average expression that is controlled for will be much more "neuron-like" than in the other dataset, and will weaken the assocation with any given neuron cell type.

Even other level 2 datasets differ among each other. "10_Siletti_CerebralCortex.PaO.A43_Human_2022_level2" contains 18 total cell types, unlike 23 for the above dataset, and this will affect the average expression that is being controlled for.

All this is to say, it seems to me that if we want to zoom in on the potentially most interesting cell types, we don't want any of this controlling for average expression or specificity. Just a measure of the association of a cell type's expression with GWAS scores. This seems closer to making ranking of cell types actually mean something. I think MAGMA outputs a standardized beta, which might work for comparing between cell types. There may be an issue of if a cell type has outlier genes with extremely high expression, which I think might skew the standardized beta down, making a cell type look less interesting than it should look. In which case, maybe the expression values could be converted to ranks to make it robust?
 
I'm not able to comment on the new thread that discusses MAGMA (It says: "You have insufficient privileges to reply here.")
Using MAGMA on ME/CFS genetic data | Science for ME EDIT: Fixed now!

But having downloaded the FUMA processed files here, I think Forestglip is right.

FUMA calculated gene expression for each of the 31 supercluster cell types per dissection and then added a covariate to the MAGMA regression with the average gene expression per dissection. So the genes that stand out for a supercluster like 'eccentric medium spiny neuron' are relative to the brain region of the dissection, not the entire brain.

The approach by Duncan et al. 2025 likely used the 461 cell clusters and calculated relative expression of genes in each cluster compared to the entire Siletti dataset. I think this is the more interesting analysis for DecodeME.
 
Last edited:
Forestglip's FUMA cell type analysis of Siletti datasets level 2 has 2,037 results, of which 52 are for eMSN, exactly the same as for Paolo.
So those are the 31 superclusters measured in each dissection separately.

It's strange that eMSN stand out much more in Paolo's analysis but I think this is due to the addition of the MVP dataset. If I take the most significant result from Paolo's analysis, I got (my bolding):

Dataset "47_Siletti_CerebralNuclei.Cla_Human_2022_level2"
Cell_type "Eccentric_medium_spiny_neuron"
BETA "0.045468"
SE "0.01053"
P "7.9263e-06"
P_bon "0.01650256"
P_bh "0.01650256"
BETA_DME_1 "0.026689"
SE_DME_1 "0.010881"
P_DME_1 "0.0070937"
P_bon_DME_1 "1"
P_bh_DME_1 "0.3197161"
BETA_MVP "0.022589"
SE_MVP "0.010388"
P_MVP "0.014844"
P_bon_MVP "1"
P_bh_MVP "0.7023911"
P_HEAL2 "0.1347193"
P_bon_HEAL2 "1"
P_bh_HEAL2 "0.9212313"
GeneRatio "11/72"
BgRatio "893/8444"
FoldEnrichment "1.444631"
genes "ENSG00000119772/ENSG00000221914/...

This corresponds to forestglips analysis:
region "Siletti_CerebralNuclei.Cla"
cell_type "Eccentric_medium_spiny_neuron"
beta "0.025722"
beta_std "0.053982"
p "0.0070827"

There are slight differences but the p-value Paolo found for DecodeME alone (p = 0.007) is the same as what forestglip found in the file 'magma_celltype_47_Siletti_CerebralNuclei.Cla_Human_2022_level2.gsa.out'

The large MVP dataset also had a low p-value (0.0148) for eMSN, so that's probably why Paolo's meta-analysis had stronger results for eMSN.
 
The approach by Duncan et al. 2025 likely used the 461 cell clusters and calculated relative expression of genes in each cluster compared to the entire Siletti dataset. I think this is the more interesting analysis for DecodeME.
I agree that it probably more clearly highlights the more important cell types, but I think there are still issues if controlling for average expression across clusters.

We can imagine that eMSNs and glutamatergic neurons are both equally important. If 50 of the 461 clusters are similar to glutamatergic neurons but only 10 other clusters are similar to eMSNs, then the regression will condition out more of the signal for glutamatergic neurons, making it less significant just because of what kind of other clusters there are.

I don't know that that specific scenario is how it happened. Maybe it's the opposite, where eMSNs should be even more significant. The point is that the rankings are somewhat arbitrary, and I don't think this technique of controlling for average expression is optimal for identifying where the signal is strongest.
 
Was kind of scratching my head over why these analyses are controlling for ave expression.
I think it's because we kind of expect genes expressed in many tissues to significantly correlate to MAGMA scores, even if the tissues aren't actually important.

If, for example, the brain is important, then we might expect a gene highly expressed in brain, maybe like a serotonin receptor, to have a high GWAS p-value. But serotonin receptors are highly expressed in other places, like the small intestine. If there are enough similarities like this, then the small intestine will show up as significantly associated with GWAS scores.

So MAGMA controls for average expression of genes in all tissues, so that any given tissue is only significant if it associates with GWAS scores based on unique gene expression patterns.

Maybe that's good for if you don't want a false positive of small intestine. But it seems to me that if you instead don't add that covariate of average expression, then even if small intestine is spuriously significant, brain will still be more significant and have a larger effect size, suggesting it's a more promising possibility.
 
I suspect it's because without a covariate, pretty much all the signal will be for genes that are common in every cell, making it harder to notice the signals for genes that have more specific functions. It could be that it then picks up cells that don't have much specificity and mainly have genes involved in normal cell function.
 
For what it's worth, I made these plots of the MAGMA analyses that have been done using the FUMA version of the Siletti brain atlas and ME/CFS data.

This one is the data that forestglip posted using DecodeME. I ordered the results according to the 31 superclusters.

1780252938755.png

It looks similar to the DecodeME-only data that Paolo reported. The upper_layer_intratelencephalic ones get the lowest p-values.

1780253044159.png


In the MVP dataset, it was mostly the (eccentric) MSN that stood out:
1780253145560.png


Which gave this result for the meta-analysis:

1780253170095.png


But from what we understand now, these results are likely less useful (won't post them on the eMSN thread) because the p-values are based on relative gene expression per dissection. So you not only get a significant p-value by having the right genes, but also if the other cells in the dissection have the wrong genes. That might favour a cell type that is relatively rare and distinct in each area like eMSN.

So the analysis is mainly useful when you already know in the relevant brain region and are curious which cell type is involved there. It's less useful for scanning the entire brain looking for which cell type best matches the GWAS data. It still tells something, but the Duncan et al. approach (gene expression relative to the entire dataset) is probably more informative for this question.
 
Ok, I think I see an issue with the proposed approach of not doing the average control. I tried running the MAGMA analysis using @tralfamadorian97's project on the DecodeME data with just the main body tissues, but without that covariate.

With controlling for average expression across all GTEx body tissues, the p value for the most significant tissue, frontal cortex BA9, is 5.72e-08.
newplot(1).png
Without controlling for average expression, p value for this tissue is much less significant, at 1.58e-05, and fewer regions overall pass the Bonferroni threshold.
newplot(2).png

So not controlling for other tissues does result in a lot of noise. In the case of these few dozen tissues, this isn't a big deal, since we can still see that the frontal cortex passes strict multiple test correction. But when testing thousands of cell types, we need as much statistical power as possible in order to get significant findings.

However, I predict that the most significant cell type, if not controlling for average expression, will be at least a little more significant than the frontal cortex above, since it will be more specific, so probably p < 1e-5. If we test 2000 different cell types, the Bonferroni significance threshold will be p < 2.5e-5, so the strongest findings would hopefully squeeze under that.
 
Last edited:
I just looked at the preprocessed cell type data from FUMA and noticed that there was one more Siletti dataset that was not included in my analysis for some reason:

"36_Siletti_CerebralCortex.Pro_Human_2022_level2" ("Cerebral cortex (Cx) - Cuneus, rostral part - Area Prostriata - Pro")

I went back to the FUMA interface to see if that dataset was not an option on the website, but it is there, available for selecting. I then checked Paolo's supplementary table 5, and the dataset is missing from there too. So it seems like they may have realized it was missing within the past day and added it as an option.

It's probably not that important, but just noting this in case someone is confused about why one dataset is missing in the result files.

Edit: Odd, now a different dataset seems to be missing from the FUMA cell type interface. The following menu should include a level 2 version of the one that is checked, called "29_Siletti_CerebralCortex.V2_Human_2022_level2".

1780287859628.png

I'll try to let them know.

Edit: They fixed it.
 
Last edited:
I finally managed to run MAGMA. My biggest stumbling block before was not being sure how to correctly add rsIDs to the variants, but I found out that the GWASLab software makes it easy.

First, I did MAGMA on GTEx tissues, while controlling for average expression of genes in all tissues, same as DecodeME and tralfamadorian97. Mine looks similar but is a little less significant. DecodeME's most significant tissue was at around log10p=8. tralfamadorian's most significant was at around 7. And mine is at around 6.
There are two main reasons for this. The first, which applies to both my and tralfamadorian's analyses, is likely that we used 1000 Genomes European LD reference, as opposed to UK BioBank's LD, since the latter is not available to the public.

The reason mine is even less significant than tralfamadorian's is because I kept all the data in GRCh38 coordinates, while their analysis first did liftover to GRCh37. The specific cause seems to be a file which provides the locations of all the genes, which is required for MAGMA. For some reason, the gene boundaries for the earlier assembly seem different in a way that makes the tissues more significant.

I was able to achieve around the same significance level as tralfamadorian if I lifted over to GRCh37. I figured it might just be luck that it's more significant in that assembly, so I decided to just keep the data in the newer assembly. It's possible I'm wrong about the reason for the difference.

Here is the same analysis, but without controlling for average expression. Many brain tissues still pass the Bonferroni threshold, but not as strongly.


Then I tested MAGMA on a single midbrain dataset of cell types from the Human Brain Atlas (Siletti 2023). I controlled for average expression of all cell types in this dataset, as FUMA does.
For comparison, I tested the same dataset using FUMA's cell type MAGMA interface, using the 1000G European LD reference. The results are similar but not identical. This may be partly because FUMA required that the data be lifted over to GRCh37 for their analysis.

1780453253669.png

Finally, I tested all of the cell types from the Human Brain Atlas, but without controlling for average expression. These are the top 30 most significant:
Here are the top 40 in a more readable format:
Cell typeDissectionBETABETA_STDSEP
Medium_spiny_neuronHypothalamus (HTH) - supraoptic region of HTH - HTHso (anterior hypothalamic nucleus, AHN) - tuberal region of HTH - HTHtub (ventromedial and dorsomedial hypothalamic nucleic nuclei, VMH, DMH)0.0192980.0456310.00324221.35E-09
Upper_layer_intratelencephalicHypothalamus (HTH) - supraoptic region of HTH - HTHso (anterior hypothalamic nucleus, AHN) - tuberal region of HTH - HTHtub (ventromedial and dorsomedial hypothalamic nucleic nuclei, VMH, DMH)0.0178730.0450950.00305392.47E-09
Upper_layer_intratelencephalicMidbrain (M) - Periaqueductal gray and Dorsal raphe nucleus - PAG-DR0.0268880.0485490.00460382.66E-09
Upper_layer_intratelencephalicThalamus (THM) - intralaminar nuclear complex (ILN) - posterior group of intralaminar nuclei (PILN) - centromedian and parafasicular nuclei - CM and Pf0.0204680.0468270.00351122.84E-09
Eccentric_medium_spiny_neuronCerebral cortex (Cx) - Perirhinal gyrus (PRG) - A35-A360.0217670.0469760.00373512.87E-09
Upper_layer_intratelencephalicBasal forebrain (BF) - substantia innominata and nearby nuclei - SI0.0202880.0460690.00351774.11E-09
Medium_spiny_neuronHead of hippocampus (HiH) - CA10.0187840.0450760.00326234.35E-09
Upper_layer_intratelencephalicClaustrum - Cla0.0198910.0455120.00345524.38E-09
Eccentric_medium_spiny_neuronBasal nuclei (BN) - Putamen - Pu0.0200060.045570.00347634.43E-09
Eccentric_medium_spiny_neuronThalamus (THM) - Lateral nuclear complex (LNC) - ventral anterior nucleus of thalamus - VA0.0195880.0453890.00340434.45E-09
Amygdala_excitatoryMidbrain (M) - Substantia Nigra - SN0.0195760.0456540.00341074.84E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Rostral gyrus (RoG) - Dorsal division of MFC - A320.0213520.0454560.00372835.22E-09
Medium_spiny_neuronAmygdaloid complex (AMY) - basolateral nuclear group (BLN) - basolateral nucleus (basal nucleus) - BL0.0213950.0457750.0037475.77E-09
Upper_layer_intratelencephalicBasal forebrain (BF) - septal nuclei - SEP0.0193060.0447990.00338636.07E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Posterior intermediate orbital gyrus (POrG) - Caudal division of OFCi - A130.0210830.045280.00369876.10E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Anterior cingulate cortex - ACC0.0188210.0446560.00330446.26E-09
Upper_layer_intratelencephalicAmygdaloid complex (AMY) - basolateral nuclear group (BLN) - basomedial nucleus (accessory basal nucleus) - BM0.0192690.0452490.00338616.46E-09
Eccentric_medium_spiny_neuronBasal nuclei (BN) - Body of the Caudate - CaB0.0193130.0450090.00339396.46E-09
Upper_layer_intratelencephalicHead of hippocampus (HiH) - Tail of Hippocampus (HiT) - Subicular cortex - Sub0.0232660.0462370.00408966.51E-09
Upper_layer_intratelencephalicBasal nuclei (BN) - Globus pallidus (GP) - External segment of globus pallidus - Gpe0.0203130.0448520.00357526.81E-09
Deep_layer_intratelencephalicClaustrum - Cla0.0189840.0446880.00334246.87E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Subcallosal Gyrus (SCG) - Subgenual (subcallosal) division of MFC - A250.0195330.0451540.00344036.96E-09
Upper_layer_intratelencephalicMidbrain (M) - Superior colliculus and nearby nuclei - SC0.0229910.0458350.00405917.54E-09
Upper_layer_intratelencephalicExtended amygdala (EXA) - Bed nucleus of stria terminalis and nearby - BNST0.0199330.0452210.00352167.71E-09
Upper_layer_intratelencephalicPaleocortex (PalCx) - Piriform cortex - Pir0.020080.0447910.00355358.13E-09
Eccentric_medium_spiny_neuronBasal nuclei (BN) - Globus pallidus (GP) - External segment of globus pallidus - Gpe0.0193430.0447520.00342528.31E-09
Upper_layer_intratelencephalicPaleocortex (PalCx) - Anterior Olfactory Nucleus - AON0.0187550.0443840.00332148.33E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Long insular gyri (LIG) - Dysgranular insular cortex - Idg0.018930.0445320.00335648.67E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Supramarginal gyrus (SMG) - A400.0201020.0447290.00356518.73E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Caudal cingulate gyrus (CgGC) - A230.0219890.0450160.00390048.78E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Temporal pole (TP) - Temporopolar area - A380.0187940.044560.00333518.92E-09
Upper_layer_intratelencephalicCerebral cortex (Cx) - Frontal agranular insular cortex - FI0.0186420.0443980.0033098.98E-09
Eccentric_medium_spiny_neuronBasal forebrain (BF) - septal nuclei - SEP0.0217210.0451020.0038639.57E-09
Deep_layer_intratelencephalicAmygdaloid complex (AMY) - Central nuclear group - CEN0.0189250.0446170.00336659.63E-09
Deep_layer_intratelencephalicBasal nuclei (BN) - Putamen - Pu0.0189380.0442740.00338011.07E-08
Eccentric_medium_spiny_neuronAmygdaloid complex (AMY) - Basolateral nuclear group (BLN) - lateral nucleus - La0.0218110.0457330.00389731.11E-08
Upper_layer_intratelencephalicCerebral cortex (Cx) - Inferior frontal gyrus (IFG) - Ventrolateral prefrontal cortex - A44-A450.0196170.0441060.00350861.15E-08
Amygdala_excitatoryCerebral cortex (Cx) - Posterior parahippocampal gyrus (PPH) - TH-TL0.0189330.0437870.00338641.15E-08
Medium_spiny_neuronAmygdaloid complex (AMY) - Basolateral nuclear group (BLN) - lateral nucleus - La0.021630.0457230.00387061.17E-08

We can see that, for example, the "Upper layer intratelencephalic" cell that was most significant in the previous average-control test, but with log10p of only 3.3, here has a much more significant log10p=8.1. (It's the 24th row.)

What we see is still fairly similar to tralfamadorian's analysis of 461 cell clusters. eMSNs are still near the top here. Amygdala excitatory is very significant here as well. Instead of deep-layer intratelencephalic, this analysis mostly identified upper-layer intratelencephalic, though.

I haven't tested which of these cell types represent independent or shared signals, but in tralfamadorian's analysis, the eMSNs, amygdala excitatory cells, and deep-layer intratelencephalic cell were found to represent three different independent signals.

The main reason I did this wasn't to get lower p-values. Instead, I think this puts all the cell types on a more even playing field, where we can be more confident about the findings with the lowest p-values.

It may be useful to do an analysis controlling each cell type for average body expression of each gene (same control as the GTEx bulk tissue analysis), as opposed to a separate average control for each dataset. This may still remove some potential confounding from housekeeping genes, but would be consistent across cell types.

I attached the full results for the final large cell-type analysis as a spreadsheet. And I wrote up the procedure I used and put it on GitHub.

Edit: Replaced spreadsheet with one that includes the dataset titles.
 

Attachments

Last edited:
I tested all of the cell types from the Human Brain Atlas, but without controlling for average expression.
Great, thanks. So the results are largely comparably when using absolute gene expression? Think it was worth checking so that we can be more confident about the results.

How did you avoid adding the dissection average gene expression as covariate in MAGMA: is there a option for that, that you have to add or not?
 
So the results are largely comparably when using absolute gene expression? Think it was worth checking so that we can be more confident about the results.
Yes, it seems to be pretty similar. Not really pointing to a specific region, but several analyses (Paolo's, tralfamadorian's, and mine) are coming up with similar cell types like eMSNs, excitatory neurons, and intratelencephalic neurons. No big surprises in my analysis.

How did you avoid adding the dissection average gene expression as covariate in MAGMA: is there a option for that, that you have to add or not?
It's not an option in FUMA to not control for this, so I had to do it all on my own computer. The MAGMA command for the actual regression against gene expression is:
Code:
magma --gene-results [MAGMA_GENE_SCORES_FILE] --gene-covar [GENE_EXPRESSION_FILE] --model direction-covar=greater condition-hide=Average --out [OUTPUT_PREFIX]

The gene expression file is just a big table where rows are genes and the columns are tissues or cell types. By default, there's no controlling for anything (except built-in technical confounders like gene length).

What FUMA did is averaged the gene expression across all tissues for each gene, and put it in a new column called "Average". Then in the MAGMA command above, you indicate that you want to control for this column with condition-hide=Average. If you leave that part of the command off, it doesn't control for it.
 
For what it's worth, I made these plots of the MAGMA analyses that have been done using the FUMA version of the Siletti brain atlas and ME/CFS data.

This one is the data that forestglip posted using DecodeME. I ordered the results according to the 31 superclusters.

1780252938755.png
If it's not too difficult to do one more, I'd be curious how the new data looks in one of these pretty plots.
 
Back
Top Bottom