Using MAGMA on ME/CFS genetic data

forestglip · 2026-05-30T22:17:10+0100

Posts moved from: Eccentric medium spiny neuron (eMSN)

I ran FUMA cell type analysis on all of the datasets generated from the Siletti 2023 brain study.

Data

Cell types dissected from 105 brain regions
3172 total cell type/region combinations were tested with MAGMA using DecodeME GWAS-1 data.
- 1135 level 1 cell types (General cell types, like "neuron" or "astrocyte")
- 2037 level 2 cell types (More specific cell types, like "Upper_layer_intratelencephalic" or "Eccentric_medium_spiny_neuron")

Results

24 cell types from the 3172 tests were significant after Bonferroni correction at P.adj<.05.
- All of these were "neuron" from different areas of the brain.
No level 2 cell types were significant after Bonferroni correction
- The level 2 cell types with the lowest p-values are "Upper_layer_intratelencephalic" from several regions of the brain.

Dataset	Dissection	Cell_type	NGENES	BETA	BETA_STD	SE	P	P.adj.pds	P.adj
33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1	Cerebral cortex (Cx) - Middle frontal gyrus (MFG) - A46	neuron	16649	0.02985	0.062067	0.0060917	4.84E-07	4.84E-06	0.0015342964
45_Siletti_CerebralNuclei.CEN_Human_2022_level1	Amygdaloid complex (AMY) - Central nuclear group - CEN	neuron	16856	0.028832	0.062275	0.0061037	1.17E-06	1.29E-05	0.0037055304
18_Siletti_CerebralCortex.SCG.A25_Human_2022_level1	Cerebral cortex (Cx) - Subcallosal Gyrus (SCG) - Subgenual (subcallosal) division of MFC - A25	neuron	16823	0.026729	0.057893	0.0057556	1.72E-06	1.72E-05	0.0054663076
110_Siletti_Thalamus.LNC.VA_Human_2022_level1	Thalamus (THM) - Lateral nuclear complex (LNC) - ventral anterior nucleus of thalamus - VA	neuron	16313	0.023475	0.054652	0.0051658	2.78E-06	3.05E-05	0.0088064236
12_Siletti_CerebralCortex.STG_Human_2022_level1	Cerebral cortex (Cx) - Superior Temporal Gyrus - STG	neuron	16699	0.025061	0.05149	0.0055208	2.84E-06	3.13E-05	0.0090154584
28_Siletti_CerebralCortex.AG.LEC_Human_2022_level1	Cerebral cortex (Cx) - Anterior parahippocampal gyrus (AG) - Lateral entorhinal cortex - LEC	neuron	16811	0.025293	0.056414	0.0056356	3.62E-06	3.62E-05	0.0114791508
53_Siletti_CerebralNuclei.GP.Gpi_Human_2022_level1	Basal nuclei (BN) - Globus pallidus (GP) - Internal segment of globus pallidus - Gpi	neuron	16342	0.041767	0.053006	0.0093657	4.14E-06	4.14E-05	0.0131244672
30_Siletti_CerebralCortex.ACC_Human_2022_level1	Cerebral cortex (Cx) - Anterior cingulate cortex - ACC	neuron	16776	0.025308	0.0562	0.0056768	4.16E-06	4.16E-05	0.0132056704
15_Siletti_CerebralCortex.Ig_Human_2022_level1	Cerebral cortex (Cx) - Short insular gyri - Granular insular cortex - Ig	neuron	16774	0.026481	0.056971	0.0060187	5.45E-06	5.45E-05	0.0173029428
31_Siletti_CerebralCortex.FuGt.TF_Human_2022_level1	Cerebral cortex (Cx) - Occipitotemporal (fusiform) gyrus, temporal part (FuGt) - Temporal area TF	neuron	16662	0.025982	0.053619	0.0059065	5.48E-06	5.48E-05	0.0173768504

Dataset	Dissection	Cell_type	NGENES	BETA	BETA_STD	SE	P	P.adj.pds	P.adj
82_Siletti_Midbrain.SC_Human_2022_level2	Midbrain (M) - Superior colliculus and nearby nuclei - SC	Upper_layer_intratelencephalic	16619	0.034377	0.067073	0.0091968	9.31E-05	0.002141668	0.295363952
81_Siletti_Midbrain.IC_Human_2022_level2	Midbrain (M) - Inferior colliculus and nearby nuclei - IC	Upper_layer_intratelencephalic	16705	0.023246	0.054282	0.0064468	0.00015613	0.00327873	0.49524436
84_Siletti_Midbrain.PAG-DR_Human_2022_level2	Midbrain (M) - Periaqueductal gray and Dorsal raphe nucleus - PAG-DR	Upper_layer_intratelencephalic	16366	0.034505	0.060823	0.00975	0.00020143	0.00423003	0.63893596
30_Siletti_CerebralCortex.ACC_Human_2022_level2	Cerebral cortex (Cx) - Anterior cingulate cortex - ACC	Upper_layer_intratelencephalic	16776	0.034449	0.08163	0.010048	0.00030442	0.00578398	0.96562024
18_Siletti_CerebralCortex.SCG.A25_Human_2022_level2	Cerebral cortex (Cx) - Subcallosal Gyrus (SCG) - Subgenual (subcallosal) division of MFC - A25	Upper_layer_intratelencephalic	16823	0.036754	0.084433	0.010806	0.00033619	0.0067238	1
26_Siletti_CerebralCortex.LiG.V1C_Human_2022_level2	Cerebral cortex (Cx) - Lingual gyrus (LiG) - Primary Visual Cortex - V1C	Upper_layer_intratelencephalic	16580	0.034339	0.071781	0.01057	0.00058071	0.00929136	1
12_Siletti_CerebralCortex.STG_Human_2022_level2	Cerebral cortex (Cx) - Superior Temporal Gyrus - STG	Upper_layer_intratelencephalic	16699	0.030358	0.065424	0.0095471	0.00073833	0.0147666	1
31_Siletti_CerebralCortex.FuGt.TF_Human_2022_level2	Cerebral cortex (Cx) - Occipitotemporal (fusiform) gyrus, temporal part (FuGt) - Temporal area TF	Upper_layer_intratelencephalic	16662	0.032632	0.07113	0.010358	0.00081631	0.01469358	1
83_Siletti_Midbrain.PTR_Human_2022_level2	Midbrain (M) - Pretectal region - PTR	Upper_layer_intratelencephalic	16692	0.0192	0.044568	0.0061102	0.00083983	0.01847626	1
15_Siletti_CerebralCortex.Ig_Human_2022_level2	Cerebral cortex (Cx) - Short insular gyri - Granular insular cortex - Ig	Upper_layer_intratelencephalic	16774	0.032512	0.072903	0.010373	0.00086296	0.01553328	1

Dataset	Dissection	Cell_type	NGENES	BETA	BETA_STD	SE	P	P.adj.pds	P.adj
38_Siletti_CerebralCortex.PRG.A35-A36_Human_2022_level2	Cerebral cortex (Cx) - Perirhinal gyrus (PRG) - A35-A36	Eccentric_medium_spiny_neuron	16732	0.030849	0.065571	0.01142	0.0034571	0.0656849	1

Dataset: Dataset name
Dissection: Dissection associated with dataset name, from https://github.com/tanyaphung/scrnaseq_viewer/blob/main/resources/fumacelltype_datasets_master.csv
Cell_type: Cell type name
NGENES (v1.07)/OBS_GENES (v1.06): Number of genes used in the analysis.
BETA: Effect size
BETA_STD: Standardised effect size
SE: Standard error
P: P-value
P.adj.pds: Adjusted P-value per dataset
P.adj: Adjusted P-value across dataset

I believe the gene sets that tralfamadorian97 used are clusters defining cell types, but they aren't based on a specific location of the brain. There were 461 clusters in tralfamadorian's analysis, but 3172 tests in this region-specific analysis. So maybe when eMSNs from many parts of the brain were combined into a small number of clusters, it made the values less noisy and thus more significant.

Edit: I attached the results of the analysis.

tralfamadorian97 · 2026-05-31T00:44:01+0100

Another factor that may explain some of the discrepancies is how the RNAseq matrices are transformed.

A key step in applying MAGMA via RNAseq data is transforming the RNAseq matrix [whose (i,j) entry tells use how much RNA for gene i was detected in cell type j] into a specificity matrix [whose (i,j) entry tells us how specific gene i is for cell type j]. There are competing approaches, and different approaches can produce different MAGMA results.

- I used the specificity matrix generated by Duncan et al. for their paper. In the Duncan paper (see section "Gene Expression Data"), they describe producing the specificity matrix by log-transforming the RNSAseq matrix, then row-normalizing.

- I believe FUMA may use a different transformation. According to their documentation, FUMA log-transforms and adds an auxiliary average column as a control, but does not row-normalize.

To test this, we would want to fix the version of the Siletti dataset and compare different specificity matrix preparation strategies.

forestglip · 2026-05-31T05:30:21+0100

tralfamadorian97 said:
- I used the specificity matrix generated by Duncan et al. for their paper. In the Duncan paper (see section "Gene Expression Data"), they describe producing the specificity matrix by log-transforming the RNSAseq matrix, then row-normalizing.

- I believe FUMA may use a different transformation. According to their documentation, FUMA log-transforms and adds an auxiliary average column as a control, but does not row-normalize.

I'm going to have to read the Watanabe paper again to be sure I understand, but I wanted to get my thoughts out about this.

So I think I've been thinking about the MAGMA ranking problem wrong. I don't think p-values are comparable across datasets, or really even within a dataset, because of these techniques of specificity or controlling for the average. I think the goal of these analyses is just to identify reliable and unique MAGMA associations, but without any implication that you can compare the strengths of the associations. And I think this may partly explain why the boring "neuron" is so strongly significant, even more so than any of the more interesting specific cell types.

If I understand the procedure correctly: when the regression is done between expression of genes in a cell type and gene scores from GWAS, it is controlled for average expression of the genes in all cell types in the dataset. If we imagine that there is a real association with neurons but not astrocytes, this means if your dataset only has, say, 5 types of neuron, then you'd be controlling for neurons while testing for an association with neurons, so the p-value will not be very significant. If your dataset has one type of neuron and four types of astrocytes, then you'd be controlling mainly for astrocyte expression, which will not fit the GWAS scores well, and thus the neuron assocation can still be significant.

Going back to the actual FUMA analysis I did, here are the cell types included in the dataset that the most significant neuron finding came from:

From 33_Siletti_CerebralCortex.MFG.A46_Human_2022_level1:

* neuron
oligodendrocyte_precursor_cell
astrocyte
oligodendrocyte
pericyte
central_nervous_system_macrophage
endothelial_cell
fibroblast
vascular_associated_smooth_muscle_cell
leukocyte

Quite a wide assortment, where only one cell type out of ten is a neuron. The average expression across all these cell types will be nothing like neuron expression, and neuron can be very significant.

On the other hand, here are the cell types found in the dataset that the most significant specific neuron type (Upper_layer_intratelencephalic) came from:

From 82_Siletti_Midbrain.SC_Human_2022_level2:

* Upper_layer_intratelencephalic
Miscellaneous
* Thalamic_excitatory
* CGE_interneuron
* Deep_layer_intratelencephalic
* LAMP5_LHX6_and_Chandelier
* MGE_interneuron
Astrocyte
* Midbrain_derived_inhibitory
Oligodendrocyte
Oligodendrocyte_precursor
* Deep_layer_corticothalamic_and_6b
* Splatter
* Cerebellar_inhibitory
Committed_oligodendrocyte_precursor
* Upper_rhombic_lip
Ependymal
* Amygdala_excitatory
* Deep_layer_near_projecting
Microglia
Vascular
Choroid_plexus
Fibroblast

Assuming I correctly labeled all the neuron cell types in this dataset with a star, then 13/23 cell types in this dataset are neurons. The average expression that is controlled for will be much more "neuron-like" than in the other dataset, and will weaken the assocation with any given neuron cell type.

Even other level 2 datasets differ among each other. "10_Siletti_CerebralCortex.PaO.A43_Human_2022_level2" contains 18 total cell types, unlike 23 for the above dataset, and this will affect the average expression that is being controlled for.

All this is to say, it seems to me that if we want to zoom in on the potentially most interesting cell types, we don't want any of this controlling for average expression or specificity. Just a measure of the association of a cell type's expression with GWAS scores. This seems closer to making ranking of cell types actually mean something. I think MAGMA outputs a standardized beta, which might work for comparing between cell types. There may be an issue of if a cell type has outlier genes with extremely high expression, which I think might skew the standardized beta down, making a cell type look less interesting than it should look. In which case, maybe the expression values could be converted to ranks to make it robust?

ME/CFS Science Blog · 2026-05-31T14:19:29+0100

~~I'm not able to comment on the new thread that discusses MAGMA (It says: "You have insufficient privileges to reply here.")~~
~~Using MAGMA on ME/CFS genetic data | Science for ME~~ EDIT: Fixed now!

But having downloaded the FUMA processed files here, I think Forestglip is right.

Functional Mapping and Annotation of Genome-wide association studies

FUMA calculated gene expression for each of the 31 supercluster cell types per dissection and then added a covariate to the MAGMA regression with the average gene expression per dissection. So the genes that stand out for a supercluster like 'eccentric medium spiny neuron' are relative to the brain region of the dissection, not the entire brain.

Functional Mapping and Annotation of Genome-wide association studies

The approach by Duncan et al. 2025 likely used the 461 cell clusters and calculated relative expression of genes in each cluster compared to the entire Siletti dataset. I think this is the more interesting analysis for DecodeME.

ME/CFS Science Blog · 2026-05-31T14:29:35+0100

Forestglip's FUMA cell type analysis of Siletti datasets level 2 has 2,037 results, of which 52 are for eMSN, exactly the same as for Paolo.
So those are the 31 superclusters measured in each dissection separately.

It's strange that eMSN stand out much more in Paolo's analysis but I think this is due to the addition of the MVP dataset. If I take the most significant result from Paolo's analysis, I got (my bolding):

Dataset "47_Siletti_CerebralNuclei.Cla_Human_2022_level2"
Cell_type "Eccentric_medium_spiny_neuron"
BETA "0.045468"
SE "0.01053"
P "7.9263e-06"
P_bon "0.01650256"
P_bh "0.01650256"
BETA_DME_1 "0.026689"
SE_DME_1 "0.010881"
P_DME_1 "0.0070937"
P_bon_DME_1 "1"
P_bh_DME_1 "0.3197161"
BETA_MVP "0.022589"
SE_MVP "0.010388"
P_MVP "0.014844"
P_bon_MVP "1"
P_bh_MVP "0.7023911"
P_HEAL2 "0.1347193"
P_bon_HEAL2 "1"
P_bh_HEAL2 "0.9212313"
GeneRatio "11/72"
BgRatio "893/8444"
FoldEnrichment "1.444631"
genes "ENSG00000119772/ENSG00000221914/...

This corresponds to forestglips analysis:
region "Siletti_CerebralNuclei.Cla"
cell_type "Eccentric_medium_spiny_neuron"
beta "0.025722"
beta_std "0.053982"
p "0.0070827"

There are slight differences but the p-value Paolo found for DecodeME alone (p = 0.007) is the same as what forestglip found in the file 'magma_celltype_47_Siletti_CerebralNuclei.Cla_Human_2022_level2.gsa.out'

The large MVP dataset also had a low p-value (0.0148) for eMSN, so that's probably why Paolo's meta-analysis had stronger results for eMSN.

forestglip · 2026-05-31T14:41:27+0100

ME/CFS Science Blog said:
The approach by Duncan et al. 2025 likely used the 461 cell clusters and calculated relative expression of genes in each cluster compared to the entire Siletti dataset. I think this is the more interesting analysis for DecodeME.

I agree that it probably more clearly highlights the more important cell types, but I think there are still issues if controlling for average expression across clusters.

We can imagine that eMSNs and glutamatergic neurons are both equally important. If 50 of the 461 clusters are similar to glutamatergic neurons but only 10 other clusters are similar to eMSNs, then the regression will condition out more of the signal for glutamatergic neurons, making it less significant just because of what kind of other clusters there are.

I don't know that that specific scenario is how it happened. Maybe it's the opposite, where eMSNs should be even more significant. The point is that the rankings are somewhat arbitrary, and I don't think this technique of controlling for average expression is optimal for identifying where the signal is strongest.

ScoutB · 2026-05-31T16:09:01+0100

forestglip said:
The point is that the rankings are somewhat arbitrary, and I don't think this technique of controlling for average expression is optimal for identifying where the signal is strongest.

Have been too foggy to follow closely but agree with this. Was kind of scratching my head over why these analyses are controlling for ave expression.

forestglip · 2026-05-31T16:25:30+0100

ScoutB said:
Was kind of scratching my head over why these analyses are controlling for ave expression.

I think it's because we kind of expect genes expressed in many tissues to significantly correlate to MAGMA scores, even if the tissues aren't actually important.

If, for example, the brain is important, then we might expect a gene highly expressed in brain, maybe like a serotonin receptor, to have a high GWAS p-value. But serotonin receptors are highly expressed in other places, like the small intestine. If there are enough similarities like this, then the small intestine will show up as significantly associated with GWAS scores.

So MAGMA controls for average expression of genes in all tissues, so that any given tissue is only significant if it associates with GWAS scores based on unique gene expression patterns.

Maybe that's good for if you don't want a false positive of small intestine. But it seems to me that if you instead don't add that covariate of average expression, then even if small intestine is spuriously significant, brain will still be more significant and have a larger effect size, suggesting it's a more promising possibility.

ME/CFS Science Blog · 2026-05-31T18:20:59+0100

I suspect it's because without a covariate, pretty much all the signal will be for genes that are common in every cell, making it harder to notice the signals for genes that have more specific functions. It could be that it then picks up cells that don't have much specificity and mainly have genes involved in normal cell function.

Using MAGMA on ME/CFS genetic data

forestglip

Moderator

Attachments

tralfamadorian97

Established Member (Voting Rights)

forestglip

Moderator

ME/CFS Science Blog

Senior Member (Voting Rights)

ME/CFS Science Blog

Senior Member (Voting Rights)

forestglip

Moderator

ScoutB

Senior Member (Voting Rights)

forestglip

Moderator

ME/CFS Science Blog

Senior Member (Voting Rights)