Preprint Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome, 2025, DecodeMe Collaboration

Looks interesting. I tried to see if I could do anything, but it's too much stuff I don't know how to do, like the part about creating credible set files.
I tried looking into this, seems like you can use FINEMAP to create the source file needed for credible sets. Then I ran into the problem with FINEMAP needing an LD matrix... which seems to come from a large file of people with European descent? This is all very outside of my wheel house, I'm surprised how fragmented all this software, they weren't kidding when someone said these bioinformatic pipelines are all over the place.
 
I tried looking into this, seems like you can use FINEMAP to create the source file needed for credible sets. Then I ran into the problem with FINEMAP needing an LD matrix... which seems to come from a large file of people with European descent? This is all very outside of my wheel house, I'm surprised how fragmented all this software, they weren't kidding when someone said these bioinformatic pipelines are all over the place.
Yeah, I don't know where to start to find the right files.

It's all so interesting. It feels like there's so much hidden treasure in this data file of DNA, and all these free tools across the internet to analyze it. I'm just very lacking in the experience and energy departments, so most of it is frustratingly out of my reach, and I have to just wait for the smarter folks to give us more gems.
 
I am not going to be contributing for a couple of days. Basically Sonya is right . The study shows that MECFS picks out a real biological problem (or a cluster). The results are pretty much what we saw in the last advisory board. There are immune genes and nerve genes but there are also some unexpected things which is good. There should be some mention of MHC but this turned out to be complicated and puzzling. I think it will prove relevant.

My understanding is that the main mitochondria linked gene wouldn't explain "feeble mitochondria". If anything maybe the reverse, but I wonder if it may show that the metabolic clues we have had make sense in an unexpected way.

No doubt when I am on dry land you will have sorted it all out.
Can we say this for sure, though?

I understand that you can identify significantly different SNPs for an enormous variety of different groupings. e.g, socioeconomic status, and even political views. Because there are always nonrandom patterns that determine who ends up in which category.
 
Can we say this for sure, though?

I understand that you can identify significantly different SNPs for an enormous variety of different groupings. e.g, socioeconomic status, and even political views. Because there are always nonrandom patterns that determine who ends up in which category.

Yes, I think we can. If there are non random patterns of gene variants that cause you to be in a group that group represents the outcome of a real common biological process or cluster of processes. So low socioeconomic status is the real result of a real, partially genetically determined, set of processes.

It may not seem much of a step forward but up until now a major proportion of the medical profession (along with the public) have taken the view that 'ME' (most have not even heard of ME/CFS) is an entirely bogus category arbitrary allocated to people by themselves or others, like 'loser'. The results show that this isn't the case. The ME/CFS category defines the real adverse result of some genes (and other things) just as low economic status does.

I think the results will tell us much more than that but until we have firmer evidence about exactly which genes are involved, which will hopefully come from rare allele studies, there are a lot of uncertainties.
 
Yes, I think we can. If there are non random patterns of gene variants that cause you to be in a group that group represents the outcome of a real common biological process or cluster of processes. So low socioeconomic status is the real result of a real, partially genetically determined, set of processes.
Hmm, at a surface level, yes, but the implications could be very different. Imagine that you had a particular ethnic group that got marginalised as a result of a some historical military invasion (as has happened all the time throughout history). Members of that ethnic group might be more likely to appear in low SES groupings even centuries later, simply because it takes many generations for social mobility to completely eradicate such effects. So any significant differences could be telling us less about the "genetic cause" of low SES and more about the pathways to low SES in that particular historic timeline. Its still a casual factor, but one that would have very different implications - and a failure to consider this possibiity could lead to real injustice.

To get back to MECFS, what if those with MECFS are non-random with respect to something not causally related to their disease? Some of the obvious souces of variabiliy are already controlled for - like ethinicity, education level. But others might be unknown. We always need to keep in mind that there is a long pathway that leads to membership of the group of interest, and it includes things like access to medical care, medical attitudes and propenisities, patient persistence, maybe also social status of the patient and family support? And probably dozens of other things I haven't thought of. If any of these variables are assocaited with even a slightly unusual genome, then this might be what we're seeing.

Obviosuly, I hope that's not the case, but I think its a question that still needs to be asked.
 
particular ethnic group that got marginalised

Agreed, the causation could go back further but realistically forME/CFS this seems unlikely and race is specifically dealt with in the control procedure.
To get back to MECFS, what if those with MECFS are non-random with respect to something not causally related to their disease?

The basic point of a genetic study like this is that they must be. The genes are antecedent in every possible causal network that can lead to the disease state.

includes things like access to medical care, medical attitudes and propenisities, patient persistence, maybe also social status of the patient and family support? And probably dozens of other things I haven't thought of. If any of these variables are assocaited with even a slightly unusual genome, then this might be what we're seeing.

And these are all real biological processes. So yes, the claim takes 'biological' very wide, but I have always tried to point out that it does that. All of this is biology. I am not claiming anything about which subdiscipline it might fall under.

A few years back I had a conversation with Robert Souhami, who most UK physicians have revered as one of the sharpest and most down to earth and common sensical teachers of his time. I grew up to believe that if you could not convince Bob that something was valid you needed to start again. Interestingly, I failed to convince him that my rituximab study design was valid and I proved him wrong. But the next time i met him the first thing he said was 'I was wrong.'

Bob asked me why there should be a category of ME/CFS - what justified separating off this group of patients? He could not see any reason to do so. So I wrote a Qeios article on the Concept of ME/CFS to try to answer him. I was arguing a case, which I think DecodeME now makes cast iron. There is a distinct biological category. If the sharpest minds in medicine can be persuaded of that, there is some hope that it will trickle down.
 
Some results from running MAGMA locally.

Both gene based and gene-set analysis using the above method on each of the DecodeME GWAS subgroups against the full Molecular Signatures Database Human Collections release 2025.1 (MSigDB 2025.1.Hs).

Please bear in mind this method differs from the way done in the paper and may not have been performed correctly by me!

That said results are largely similar and it seemed interesting to share. This may help spur some discussion or indeed someone wiser pointing out errors and what the right way to do all this is!

Stats stuff, with thanks to @forestglip for advice and answering my questions:

P-values shown have not been corrected. For Gene based analysis all genes which seem significant are shown (p<2.69e-6 which is 0.05/18544 or the number of genes tested).

For gene-set analysis just the top 10 are shown as tbh I’m not sure how to interpret what’s going on here or go about correction.

Code:
Report for: DecodeME_gwas1
  GENE  CHR     START      STOP  NSNPS  NPARAM      N  ZSTAT          P  SYMBOL                                       LONGNAME
 57554    1  69568773  70148192   1602      70 275488 6.0070 9.4488e-10   LRRC7               leucine rich repeat containing 7
  6780   20  49113339  49188370    151      16 275488 5.6831 6.6127e-09   STAU1  staufen double-stranded RNA binding protein 1
  1434   20  49046246  49096960    118      11 275488 5.4403 2.6592e-08   CSE1L                  chromosome segregation 1 like
 84614    1 173868082 173887458     56      11 275488 5.0893 1.7970e-07  ZBTB37       zinc finger and BTB domain containing 37
 51347   12 118149801 118372945    447      23 275488 4.9869 3.0685e-07   TAOK3                                   TAO kinase 3
 55157    1 173824645 173858544     37       7 275488 4.9706 3.3379e-07   DARS2      aspartyl-tRNA synthetase 2, mitochondrial
 10564   20  48921721  49036693    264      16 275488 4.8197 7.1905e-07 ARFGEF2       ARF guanine nucleotide exchange factor 2
144348   12 123973215 124015439     75       9 275488 4.7416 1.0604e-06  ZNF664                        zinc finger protein 664
282890    6  28994785  29005628     32       8 275488 4.7366 1.0866e-06  ZNF311                        zinc finger protein 311
 80212   12 123936409 123972985     85       8 275488 4.6644 1.5474e-06  CCDC92               coiled-coil domain containing 92
  5334    2 197804602 198149884    636      33 275488 4.6596 1.5842e-06   PLCL1              phospholipase C like 1 (inactive)
   777    1 181483311 181806784    790      59 275488 4.6019 2.0933e-06 CACNA1E calcium voltage-gated channel subunit alpha1 E
 64426   12 118373189 118418035    120      10 275488 4.5935 2.1797e-06   SUDS3       SIN3A corepressor complex component SDS3
  8365    6  26285126  26285499      3       1 275488 4.5852 2.2675e-06    H4C8                         H4 clustered histone 8


Report for: DecodeME_gwas1_female
 GENE  CHR     START      STOP  NSNPS  NPARAM      N  ZSTAT          P  SYMBOL                                       LONGNAME
57554    1  69568773  70148192   1602      70 231782 5.7130 5.5514e-09   LRRC7               leucine rich repeat containing 7
 6780   20  49113339  49188370    151      16 231782 5.3587 4.1922e-08   STAU1  staufen double-stranded RNA binding protein 1
 1434   20  49046246  49096960    118      11 231782 4.9493 3.7245e-07   CSE1L                  chromosome segregation 1 like
  777    1 181483311 181806784    790      59 231782 4.9162 4.4116e-07 CACNA1E calcium voltage-gated channel subunit alpha1 E
 1630   18  52340172  53535903   4538     109 231782 4.7619 9.5870e-07     DCC                          DCC netrin 1 receptor
 8365    6  26285126  26285499      3       1 231782 4.7012 1.2930e-06    H4C8                         H4 clustered histone 8
51347   12 118149801 118372945    447      23 231782 4.5879 2.2386e-06   TAOK3                                   TAO kinase 3
11055    7  49850322  50093264    573      19 231782 4.5733 2.4010e-06    ZPBP                 zona pellucida binding protein


Report for: DecodeME_gwas1_infectious_onset
 GENE  CHR    START     STOP  NSNPS  NPARAM      N  ZSTAT          P SYMBOL                         LONGNAME
57554    1 69568773 70148192   1602      70 269647 4.7630 9.5345e-07  LRRC7 leucine rich repeat containing 7
 8365    6 26285126 26285499      3       1 269647 4.5709 2.4281e-06   H4C8           H4 clustered histone 8
 

Report for: DecodeME_gwas1_male
 GENE  CHR     START      STOP  NSNPS  NPARAM     N  ZSTAT          P SYMBOL             LONGNAME
 9140    5 115828196 115841851     22       7 43706 4.5824 2.2982e-06  ATG12 autophagy related 12
 

Report for: DecodeME_gwas1_non_infectious_onset
 GENE  CHR     START      STOP  NSNPS  NPARAM      N  ZSTAT          P SYMBOL                                 LONGNAME
84614    1 173868082 173887458     56      11 265750 4.5824 2.2979e-06 ZBTB37 zinc finger and BTB domain containing 37


Report for: DecodeME_gwas2
  GENE  CHR     START      STOP  NSNPS  NPARAM      N  ZSTAT          P  SYMBOL                                      LONGNAME
 57554    1  69568773  70148192   1602      70 171369 5.7492 4.4826e-09   LRRC7              leucine rich repeat containing 7
  6780   20  49113339  49188370    150      16 171369 5.5782 1.2148e-08   STAU1 staufen double-stranded RNA binding protein 1
  1434   20  49046246  49096960    118      11 171369 5.2914 6.0691e-08   CSE1L                 chromosome segregation 1 like
 84614    1 173868082 173887458     56      11 171369 5.2698 6.8281e-08  ZBTB37      zinc finger and BTB domain containing 37
 55157    1 173824645 173858544     37       7 171369 5.2501 7.5993e-08   DARS2     aspartyl-tRNA synthetase 2, mitochondrial
 51347   12 118149801 118372945    447      23 171369 5.2086 9.5150e-08   TAOK3                                  TAO kinase 3
  5334    2 197804602 198149884    636      33 171369 4.8967 4.8735e-07   PLCL1             phospholipase C like 1 (inactive)
 10564   20  48921721  49036693    263      17 171369 4.8428 6.4025e-07 ARFGEF2      ARF guanine nucleotide exchange factor 2
144348   12 123973215 124015439     75       9 171369 4.6874 1.3836e-06  ZNF664                       zinc finger protein 664
 64426   12 118373189 118418035    120      10 171369 4.6485 1.6714e-06   SUDS3      SIN3A corepressor complex component SDS3
 80212   12 123936409 123972985     85       8 171369 4.6303 1.8253e-06  CCDC92              coiled-coil domain containing 92


Code:
Report for: DecodeME_gwas1
                        VARIABLE TYPE  NGENES     BETA  BETA_STD       SE          P                                               FULL_NAME
                         MIR9903  SET      50 0.556150  0.028833 0.136930 2.4477e-05                                                 MIR9903
 ZHONG_PFC_MAJOR_TYPES_EXCITA...  SET       8 1.401300  0.029091 0.346320 2.6147e-05                 ZHONG_PFC_MAJOR_TYPES_EXCITATORY_NEURON
 GOBP_PRESYNAPTIC_MEMBRANE_AS...  SET       6 1.744600  0.031369 0.431450 2.6435e-05                      GOBP_PRESYNAPTIC_MEMBRANE_ASSEMBLY
                          DBP_Q6  SET     240 0.219560  0.024810 0.057967 7.6312e-05                                                  DBP_Q6
                      HP_DYSURIA  SET      16 0.728930  0.021397 0.194350 8.8464e-05                                              HP_DYSURIA
STARK_PREFRONTAL_CORTEX_22Q1...2  SET     188 0.238920  0.023928 0.064449 1.0516e-04               STARK_PREFRONTAL_CORTEX_22Q11_DELETION_UP
 HP_ABNORMALITY_OF_COORDINATI...  SET    1065 0.097939  0.022782 0.026466 1.0792e-04                          HP_ABNORMALITY_OF_COORDINATION
     HP_DEVELOPMENTAL_REGRESSION  SET     388 0.157540  0.022543 0.043083 1.2813e-04                             HP_DEVELOPMENTAL_REGRESSION
                   HP_SYNKINESIS  SET      32 0.603630  0.025047 0.166230 1.4147e-04                                           HP_SYNKINESIS
 GOBP_MATURE_CONVENTIONAL_DEN...  SET       6 1.429200  0.025698 0.395590 1.5185e-04 GOBP_MATURE_CONVENTIONAL_DENDRITIC_CELL_DIFFERENTIATION
 
 
Report for: DecodeME_gwas1_female
                        VARIABLE TYPE  NGENES    BETA  BETA_STD       SE          P                                         FULL_NAME
                          DBP_Q6  SET     240 0.26574  0.030028 0.057724 2.0908e-06                                            DBP_Q6
                         MIR4480  SET      52 0.45995  0.024316 0.110130 1.4889e-05                                           MIR4480
 GOBP_SYNAPTIC_MEMBRANE_ADHES...  SET      29 0.73791  0.029151 0.180820 2.2525e-05                   GOBP_SYNAPTIC_MEMBRANE_ADHESION
     HP_DEVELOPMENTAL_REGRESSION  SET     388 0.17206  0.024621 0.042907 3.0473e-05                       HP_DEVELOPMENTAL_REGRESSION
                  HP_DYSPAREUNIA  SET      36 0.60231  0.026506 0.151260 3.4314e-05                                    HP_DYSPAREUNIA
 HP_ABNORMALITY_OF_COORDINATI...  SET    1065 0.10251  0.023846 0.026359 5.0507e-05                    HP_ABNORMALITY_OF_COORDINATION
          GOCC_SYNAPTIC_MEMBRANE  SET     420 0.17092  0.025424 0.045170 7.7422e-05                            GOCC_SYNAPTIC_MEMBRANE
STARK_PREFRONTAL_CORTEX_22Q1...1  SET     492 0.13361  0.021468 0.035838 9.6703e-05         STARK_PREFRONTAL_CORTEX_22Q11_DELETION_DN
STARK_PREFRONTAL_CORTEX_22Q1...2  SET     188 0.23811  0.023847 0.064191 1.0419e-04         STARK_PREFRONTAL_CORTEX_22Q11_DELETION_UP
GSE15930_STIM_VS_STIM_AND_IF...3  SET     196 0.22068  0.022562 0.059976 1.1721e-04 GSE15930_STIM_VS_STIM_AND_IFNAB_48H_CD8_T_CELL_DN


Report for: DecodeME_gwas1_infectious_onset
                        VARIABLE TYPE  NGENES     BETA  BETA_STD       SE          P                                                       FULL_NAME
GSE2770_UNTREATED_VS_TGFB_AN...4  SET     191 0.258430  0.026086 0.058798 5.5652e-06 GSE2770_UNTREATED_VS_TGFB_AND_IL12_TREATED_ACT_CD4_TCELL_48H_UP
       GOBP_CHROMATIN_REMODELING  SET     605 0.128340  0.022794 0.033476 6.3324e-05                                       GOBP_CHROMATIN_REMODELING
                         MEF2_04  SET      24 0.704410  0.025319 0.184580 6.7969e-05                                                         MEF2_04
              MEF2C_TARGET_GENES  SET    1438 0.088202  0.023585 0.023155 6.9936e-05                                              MEF2C_TARGET_GENES
     GOBP_CHROMATIN_ORGANIZATION  SET     770 0.113720  0.022682 0.030284 8.6885e-05                                     GOBP_CHROMATIN_ORGANIZATION
STARK_PREFRONTAL_CORTEX_22Q1...2  SET     188 0.237170  0.023754 0.063413 9.2245e-05                       STARK_PREFRONTAL_CORTEX_22Q11_DELETION_UP
REACTOME_REGULATION_OF_ENDOG...1  SET     118 0.283620  0.022547 0.075861 9.2780e-05                 REACTOME_REGULATION_OF_ENDOGENOUS_RETROELEMENTS
      GOCC_GLUTAMATERGIC_SYNAPSE  SET     555 0.140730  0.023974 0.037763 9.7305e-05                                      GOCC_GLUTAMATERGIC_SYNAPSE
KEGG_MEDICUS_ENV_FACTOR_NNK_...1  SET       9 0.951140  0.020944 0.260400 1.3017e-04 KEGG_MEDICUS_ENV_FACTOR_NNK_NNN_TO_CHRNA7_E2F_SIGNALING_PATHWAY
            GOBP_PHOSPHORYLATION  SET     769 0.112880  0.022499 0.031717 1.8673e-04                                            GOBP_PHOSPHORYLATION


Report for: DecodeME_gwas1_male
                        VARIABLE TYPE  NGENES    BETA  BETA_STD       SE          P                                              FULL_NAME
 GOMF_TRNA_METHYLTRANSFERASE_...  SET      29 0.49012  0.019364 0.131840 1.0095e-04                   GOMF_TRNA_METHYLTRANSFERASE_ACTIVITY
               DLX4_TARGET_GENES  SET     750 0.10968  0.021604 0.030818 1.8665e-04                                      DLX4_TARGET_GENES
       REACTOME_MITOTIC_PROPHASE  SET     117 0.25306  0.020034 0.074654 3.5062e-04                              REACTOME_MITOTIC_PROPHASE
 GOBP_REGULATION_OF_RENAL_SOD...  SET       7 1.09010  0.021173 0.322480 3.6267e-04              GOBP_REGULATION_OF_RENAL_SODIUM_EXCRETION
                LIU_LIVER_CANCER  SET      31 0.42574  0.017390 0.126040 3.6601e-04                                       LIU_LIVER_CANCER
               WP_FANCONI_ANEMIA  SET      46 0.39158  0.019476 0.115960 3.6741e-04                                      WP_FANCONI_ANEMIA
              GOMF_SIRNA_BINDING  SET       8 0.97241  0.020190 0.289950 3.9956e-04                                     GOMF_SIRNA_BINDING
     GOMF_REGULATORY_RNA_BINDING  SET      46 0.37487  0.018645 0.112830 4.4717e-04                            GOMF_REGULATORY_RNA_BINDING
GSE46242_CTRL_VS_EGR2_DELETE...2  SET     185 0.19927  0.019801 0.060194 4.6672e-04 GSE46242_CTRL_VS_EGR2_DELETED_ANERGIC_TH1_CD4_TCELL_UP
 REACTOME_NUCLEAR_PORE_COMPLE...  SET      36 0.47123  0.020740 0.142490 4.7215e-04          REACTOME_NUCLEAR_PORE_COMPLEX_NPC_DISASSEMBLY


Report for: DecodeME_gwas1_non_infectious_onset
                       VARIABLE TYPE  NGENES    BETA  BETA_STD       SE          P                                                  FULL_NAME
                        MIR9903  SET      50 0.60707  0.031473 0.133530 2.7476e-06                                                    MIR9903
         HP_FOCAL_ONSET_SEIZURE  SET     313 0.20895  0.026910 0.047363 5.1624e-06                                     HP_FOCAL_ONSET_SEIZURE
HP_POOR_FINE_MOTOR_COORDINAT...  SET      58 0.48252  0.026937 0.109420 5.2018e-06                            HP_POOR_FINE_MOTOR_COORDINATION
HP_ABNORMALITY_OF_CENTRAL_NE...  SET     552 0.15350  0.026081 0.035164 6.3880e-06 HP_ABNORMALITY_OF_CENTRAL_NERVOUS_SYSTEM_ELECTROPHYSIOLOGY
  HP_INTERICTAL_EEG_ABNORMALITY  SET     297 0.20463  0.025683 0.047856 9.5650e-06                              HP_INTERICTAL_EEG_ABNORMALITY
     HP_POOR_MOTOR_COORDINATION  SET      73 0.38030  0.023809 0.092948 2.1521e-05                                 HP_POOR_MOTOR_COORDINATION
HP_ABNORMAL_NERVOUS_SYSTEM_E...  SET     647 0.13231  0.024273 0.032529 2.3880e-05               HP_ABNORMAL_NERVOUS_SYSTEM_ELECTROPHYSIOLOGY
              HP_HYPSARRHYTHMIA  SET     167 0.25787  0.024356 0.064118 2.9001e-05                                          HP_HYPSARRHYTHMIA
HP_ABNORMALITY_OF_COORDINATI...  SET    1065 0.10208  0.023745 0.025789 3.7923e-05                             HP_ABNORMALITY_OF_COORDINATION
                        chr2q23  SET      21 1.00500  0.033794 0.262830 6.5937e-05                                                    chr2q23


Report for: DecodeME_gwas2
                        VARIABLE TYPE  NGENES    BETA  BETA_STD       SE          P                                               FULL_NAME
STARK_PREFRONTAL_CORTEX_22Q1...2  SET     188 0.25711  0.025751 0.064992 3.8249e-05               STARK_PREFRONTAL_CORTEX_22Q11_DELETION_UP
 ZHONG_PFC_MAJOR_TYPES_EXCITA...  SET       8 1.35160  0.028062 0.346210 4.7470e-05                 ZHONG_PFC_MAJOR_TYPES_EXCITATORY_NEURON
 GOBP_MATURE_CONVENTIONAL_DEN...  SET       6 1.54120  0.027712 0.395260 4.8434e-05 GOBP_MATURE_CONVENTIONAL_DENDRITIC_CELL_DIFFERENTIATION
                         MIR9903  SET      50 0.49895  0.025868 0.133260 9.0817e-05                                                 MIR9903
               GOCC_NEURON_SPINE  SET     162 0.26418  0.024578 0.072303 1.2960e-04                                       GOCC_NEURON_SPINE
     HP_DEVELOPMENTAL_REGRESSION  SET     388 0.15686  0.022447 0.042983 1.3183e-04                             HP_DEVELOPMENTAL_REGRESSION
                      HP_DYSURIA  SET      16 0.70071  0.020569 0.194130 1.5385e-04                                              HP_DYSURIA
 GOBP_PRESYNAPTIC_MEMBRANE_AS...  SET       6 1.53610  0.027621 0.430690 1.8127e-04                      GOBP_PRESYNAPTIC_MEMBRANE_ASSEMBLY
                 HP_HOARSE_VOICE  SET     102 0.30284  0.022393 0.085229 1.9075e-04                                         HP_HOARSE_VOICE
                      MIR520F_3P  SET     193 0.22014  0.022336 0.061976 1.9165e-04                                              MIR520F_3P
 
Last edited:
There is one result from gene-set analysis I thought may be worth looking into

On the group DecodeME_gwas1_infectious_onset the top result was this GSE2770_UNTREATED_VS_TGFB_AND_IL12_TREATED_ACT_CD4_TCELL_48H_UP (link to the MSigDB page for this set)

This is a set which comprises "Genes up-regulated in CD4 T cells: untreated (0h) versus activated by anti-CD3 and anti-CD28 and then stimulated by TGFB1 and IL-12 (48h)."

I've posted the paper which describes this here

 
I understand that you can identify significantly different SNPs for an enormous variety of different groupings. e.g, socioeconomic status, and even political views. Because there are always nonrandom patterns that determine who ends up in which category.

We always need to keep in mind that there is a long pathway that leads to membership of the group of interest, and it includes things like access to medical care, medical attitudes and propenisities, patient persistence, maybe also social status of the patient and family support? And probably dozens of other things I haven't thought of. If any of these variables are assocaited with even a slightly unusual genome, then this might be what we're seeing.

Hi Woolie, great to see you again - though I may have missed you previously.

That's true - here's one at the top of the tree on Google

At the same time, research studies are always strongly skewed to higher SES, amongst many things. Yet for 15 years, GWAS have been finding biological differences linked to disease that either explain symptoms or chime with existing known mechanisms. The question is, are the differences that could be attributed to ME/CFS strong enough to explain anything like 8 loci and p< 5 x 10^-8?

I don't know if the team looked at things like SES - but it should be easy to do. I don't know if anyone here can look at e.g. known SES loci/genes vs ME/CFS?

(There seems to be no limit to what some people here can do, it is so impressive. I can't help wondering if any other part of the ME/CFS research community is making as much use of the data, though Chris did say in the webinar that there have been 42 downloads of the summary stats.)

Imagine that you had a particular ethnic group that got marginalised as a result of a some historical military invasion (as has happened all the time throughout history). Members of that ethnic group might be more likely to appear in low SES groupings even centuries later, simply because it takes many generations for social mobility to completely eradicate such effects.
DecodeME restricted the initial analysis to those with European ancestry, and within that spent a lot of effort controlling population ancestry, which can create such effects (immune ones are the strongest, esp on HLA genes, as particular common variants have proved protective or risky for historical pandemics). It helps that UK Biobanks is so big, making it much easier to find controls well matched by ancestry. So such an effect is unlikely. I think @ME/CFS Science Blog also posted a graph showing minimal such effects on more common variants, though higher for less common ones, though they said the more common ones were more prominent in the findings.
 
Obviosuly, I hope that's not the case, but I think its a question that still needs to be asked.
I think it's very unlikely.

The ancestry was controlled for by first choosing similar European ethnicity as the UK Biobank controls and then by adding the first 20 principal components as covariates in the regression analysis. These PCs showed no clear pattern anymore between patients and controls as seen in Supplementary fig 1.
1756050673323.png

There was also no strong sign of inflation in the QQ-plot. If there was some systematic difference between patients and controls we would expect p-values to deviate from a uniform distribution across the board except only the very low ones as we see now. This suggests the GWAS is picking up a more subtle signal than selection bias or ancestry differences.

Next there are the hits found. Two loci had only 1 protein coding gene so we have strong reasons to suspect these are involved: CA10 for chromosome 17 and OLFM4 for chromosome 13.

CA10 has been found in a chronic pain GWAS
This gene encodes a protein that belongs to the carbonic anhydrase family of zinc metalloenzymes, which catalyze the reversible hydration of carbon dioxide in various biological processes. The protein encoded by this gene is an acatalytic member of the alpha-carbonic anhydrase subgroup, and it is thought to play a role in the central nervous system, especially in brain development. Multiple transcript variants encoding the same protein have been found for this gene. [provided by RefSeq, Jul 2008]
CA10 Gene - GeneCards | CAH10 Protein | CAH10 Antibody

OLFM4 has been found to be a biomarker for the severity of Infectious Diseases
This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. [provided by RefSeq, Mar 2011]
OLFM4 Gene - GeneCards | OLFM4 Protein | OLFM4 Antibody

These hits seem to point to the underlying pathology of ME/CFS and not some confounding factor.

The MAGMA analysis (which takes all SNP-influenced genes into account) strongly points to the brain, more so than many other GWAS that forestglip posted here.

I think it's quite unlikely that this is all a coincidence or driven by something else than ME/CFS.
 
Last edited:
I think it's very unlikely.

As do I, but in a sense I think that is icing on the cake. The first step is showing that ME/CFS is a biologically meaningful diagnostic grouping - which was the point originally referred back to.

The BPS people have pointed out that we have had genetic studies reporting risk-associated alleles in the past so this is nothing new. But these were studies of small samples that would need replication and as far as I am aware none of them have replicated. There may still be replication issues with DecodeME but my reading is that it is much harder to argue that data from a large, well documented study is due to chance.

Nevertheless, I am interested in the fact that previous studies have picked out HLA-DQ and HLA-C (I don't recall anything else much) and I am still unclear as to whether all these results are in fact compatible, even if confusing at present.
 
In other news, based on a suggestion by @hotblack, I tried to use the UK BioBank reference panel for FUMA instead of the 1000 Genomes reference as I did before. It looks like that was the main reason my results were somewhat different from the paper's results.
I’ve been thinking more about this and it would be really useful if we could get hold of that UKB reference data to use with standalone MAGMA and see if it helps align our results. I wonder if it’s worth asking them?

It’s not here
It’s mentioned here but not available
There is a MAGMA ‘all’ here but it’s huge and no mention of UKB
 
Hi Woolie, great to see you again - though I may have missed you previously.

That's true - here's one at the top of the tree on Google

At the same time, research studies are always strongly skewed to higher SES, amongst many things. Yet for 15 years, GWAS have been finding biological differences linked to disease that either explain symptoms or chime with existing known mechanisms. The question is, are the differences that could be attributed to ME/CFS strong enough to explain anything like 8 loci and p< 5 x 10^-8?

I don't know if the team looked at things like SES - but it should be easy to do. I don't know if anyone here can look at e.g. known SES loci/genes vs ME/CFS?

(There seems to be no limit to what some people here can do, it is so impressive. I can't help wondering if any other part of the ME/CFS research community is making as much use of the data, though Chris did say in the webinar that there have been 42 downloads of the summary stats.)


DecodeME restricted the initial analysis to those with European ancestry, and within that spent a lot of effort controlling population ancestry, which can create such effects (immune ones are the strongest, esp on HLA genes, as particular common variants have proved protective or risky for historical pandemics). It helps that UK Biobanks is so big, making it much easier to find controls well matched by ancestry. So such an effect is unlikely. I think @ME/CFS Science Blog also posted a graph showing minimal such effects on more common variants, though higher for less common ones, though they said the more common ones were more prominent in the findings.
Thanks for the reply, @Simon M!

I do see that the proof of the pudding will be in the eating, as it were. As an exercise that can generate new research questions, this study is amazing!

I also get your point that we could identify SNPs that could be linked to certain possible confounding variables, and look at those. Nice idea.

I just think we should take care when making the "biological proof" argument based just on these data, because its a bit fragile.
 
Just a guess, but I think the UK BioBank data might not be open access like 1000 Genomes is. Maybe worth checking though.
Yes looks that’s probably the case. The eligibility criteria seem to make that clear and I can see them being able to use the data in a publicly accessible tool like FUMA but not being able to share it. A shame as it limits our ability to remove variables that may be causing differences in results.
 
Some more thoughts on the local MAGMA results above. The difference in the gene based analysis is worth looking at too. It would be useful to understand why (discussing with @forestglip we wondered if it was the difference in SNP conversion methods and assemblies used, or perhaps the reference panels, but we don’t really know) and if this tells us anything.

The following were in the official DecodeME results but not the local MAGMA results
  • DNAH10OS (couldn’t find anywhere in my results, maybe lost in translation?)

And these were in the local MAGMA results (in groups mentioned) but not the official DecodeME ones (apart from in table S3 where mentioned)
  • ATG12 (gwas1_male only)
  • CACNA1E (both gwas1 and gwas1_female) (also in table S3)
  • DCC (gwas1_female only) (also in table S3)
  • PLCL1 (gwas_1 and gwas_2) (also in table S3)
  • ZPBP (gwas1_female only)

While these were in both

Do the similarities mean we can be more certain of those results? Should we discount the differences in the local MAGMA analysis (that half were also in supplemental table S3 from the paper is perhaps reassuring). Or are both results telling us something? What do the subgroup findings tell us?

I’d like to better understand if/what is wrong in my method and accounting for differences seems a good place to start. It may be we can recreate things with FUMA and get the full output to compare results more closely (maybe differences are subtle with some things falling just in or outside the p-value thresholds). If anyone who knows these tools has any thoughts please do chip in. Now it’s all setup it should at least be trivial to change variables and rerun.

Edit: noticed a mistake, HIST1H4H is an alias for H4C8 so it is in both sets!
 
Last edited:
Some more thoughts on the local MAGMA results above. The difference in the gene based analysis is worth looking at too. It would be useful to understand why (discussing with @forestglip we wondered if it was the difference in SNP conversion methods and assemblies used, or perhaps the reference panels, but we don’t really know) and if this tells us anything.
I see a few reasons:

- Different reference panel (1000G vs UK BioBank).
- Liftover to another assembly only loses 0.3% of variants, while our mapping to SNP method lost 6% (though it's unclear how many more are lost later in both methods when MAGMA refers to the LD reference panels)
- I don't think we mapped rs ids correctly. I think probably most are correct, but since different rsids can refer to the same position but with different letters, if we only map by position, then we might get an rsid that is referring to a different letter change at the same position.

I'm looking into a tool called bcftools annotate for being able to add rsids to the data. Some links about it being used for this: 1, 2, 3, 4, 5.

I think the main steps are:
1. Download the huge (28 GB) dbSNP reference file from this link (I think GCF_000001405.40 is a synonym for GCRh38, while the other is for 37) and the index for it (the tbi file).
2. Convert the summary stats file to VCF format.
3. I think use bcftools sort command to sort the summary stats.
4. Zip and index the summary stats.
5. Run bcftools annotate using the syntax from the links above.

I haven't tried it yet, that's just my understanding so far.

I'm not really that worried about replicating MAGMA locally (if you want to do that, I think you'd be better off doing liftover instead of mapping to rsids. I tried the pyliftover library mentioned previously, and it wasn't too hard to use. Though I'm not sure if using a position ID instead of an rsid will work with MAGMA with the 1000G reference panel without some changes. Maybe that's not possible, I'm not sure. Or maybe just using the better rsid mapping technique would be enough to make it match.) I mainly want to add rsids so I can upload to the other post-GWAS analysis tools I mentioned previously.

If you're interested in pyliftover, here's a script I had Claude write. The input file is the same format as the original summary stats file, just filtered for qced variants.
Run this to create the filtered file:
Bash:
awk 'FNR==NR {ids[$1]++; next} NR==1 || ($3 in ids)' gwas_qced.var <(zcat gwas_1.regenie.gz) | gzip > filtered_regenie_file.txt.gz

Then run the following script to liftover:
Python:
import pandas as pd
from pyliftover import LiftOver

def liftover_gwas_sumstats_chunked(input_file, output_file, chunk_size=50000):
    """
    Convert GWAS summary statistics from GRCh38 to GRCh37/hg19 using chunked processing
 
    Args:
        input_file: Path to input summary statistics file
        output_file: Path to output lifted-over file
        chunk_size: Number of rows to process at once (default: 50000)
    """
 
    # Initialize liftover (GRCh38/hg38 to GRCh37/hg19)
    print("Initializing liftover...")
    lo = LiftOver('hg38', 'hg19')
 
    # Process file in chunks
    first_chunk = True
    processed_count = 0
    successful_conversions = 0
    failed_conversions = 0
 
    print(f"Processing in chunks of {chunk_size}...")
 
    # Read and process file in chunks
    chunk_reader = pd.read_csv(input_file, sep=' ', chunksize=chunk_size)
 
    for chunk_num, chunk in enumerate(chunk_reader):
        print(f"Processing chunk {chunk_num + 1} (rows {processed_count + 1} to {processed_count + len(chunk)})...")
 
        # Process this chunk
        processed_chunk = process_chunk(chunk, lo)
 
        # Count successful conversions in this chunk
        chunk_success = len(processed_chunk)
        chunk_failed = len(chunk) - chunk_success
 
        successful_conversions += chunk_success
        failed_conversions += chunk_failed
 
        print(f"  Chunk {chunk_num + 1}: {chunk_success} successful, {chunk_failed} failed")
 
        # Write to output file
        if first_chunk:
            # Write with header for first chunk
            processed_chunk.to_csv(output_file, sep='\t', index=False, mode='w')
            first_chunk = False
        else:
            # Append without header for subsequent chunks
            processed_chunk.to_csv(output_file, sep='\t', index=False, mode='a', header=False)
 
        processed_count += len(chunk)
 
        print(f"  Progress: {processed_count} variants processed")
 
    print(f"\nConversion summary:")
    print(f"Total variants processed: {processed_count}")
    print(f"Successfully converted: {successful_conversions}")
    print(f"Failed conversions: {failed_conversions}")
    print(f"Success rate: {successful_conversions/processed_count*100:.2f}%")
    print(f"Output saved to: {output_file}")
 
    return successful_conversions, failed_conversions

def process_chunk(chunk, liftover_obj):
    """
    Process a single chunk of data
 
    Args:
        chunk: DataFrame chunk to process
        liftover_obj: LiftOver object
 
    Returns:
        DataFrame with successfully converted variants
    """
 
    # Create lists to store results
    new_chromosomes = []
    new_positions = []
    conversion_success = []
 
    # Process each variant in the chunk
    for idx, row in chunk.iterrows():
        chrom = str(row['CHROM'])
        pos = int(row['GENPOS'])
 
        # Add 'chr' prefix if not present
        if not chrom.startswith('chr'):
            chrom = 'chr' + chrom
 
        # Convert coordinate (subtract 1 because pyliftover is 0-based)
        result = liftover_obj.convert_coordinate(chrom, pos - 1)
 
        if result and len(result) > 0:
            # Successful conversion (add 1 back to convert to 1-based)
            new_chrom = result[0][0].replace('chr', '')  # Remove 'chr' prefix
            new_pos = result[0][1] + 1
            new_chromosomes.append(new_chrom)
            new_positions.append(new_pos)
            conversion_success.append(True)
        else:
            # Failed conversion
            new_chromosomes.append(None)
            new_positions.append(None)
            conversion_success.append(False)
 
    # Add new coordinate columns
    chunk['CHROM_hg19'] = new_chromosomes
    chunk['GENPOS_hg19'] = new_positions
    chunk['liftover_success'] = conversion_success
 
    # Filter to only successfully converted variants
    chunk_success = chunk[chunk['liftover_success'] == True].copy()
 
    if len(chunk_success) > 0:
        # Update the CHROM and GENPOS columns to the new coordinates
        chunk_success['CHROM'] = chunk_success['CHROM_hg19']
        chunk_success['GENPOS'] = chunk_success['GENPOS_hg19'].astype(int)
 
        # Update the ID column to reflect new coordinates
        chunk_success['ID'] = (chunk_success['CHROM'].astype(str) + ':' +
                              chunk_success['GENPOS'].astype(str) + ':' +
                              chunk_success['ALLELE0'].astype(str) + ':' +
                              chunk_success['ALLELE1'].astype(str))
 
        # Drop helper columns
        chunk_success = chunk_success.drop(['CHROM_hg19', 'GENPOS_hg19', 'liftover_success'], axis=1)
 
    return chunk_success

if __name__ == "__main__":
    input_file = "filtered_regenie_file.txt.gz"
    output_file = "decodeme_grch37.tsv"
 
    success, failed = liftover_gwas_sumstats_chunked(input_file, output_file)
 
Last edited:
It might not help with the practical problems—hard to know because I don't understand any of it!—but there is a catalogue of GWAS studies at the Biometrics Institute.

Thought I'd link it in case it hadn't already come up.
There's also this gene atlas which provides lots of data of previous gwas including manhatten and qqplots for comparison. Unfortunately, it looks like they stopped updating it since 2019.
 
Back
Top Bottom