Preprint Cluster analysis of ME/CFS symptoms in DecodeME reveals two subgroups and a link to onset type, 2026, St-Jean et al

forestglip

Administrator
Staff member
Cluster analysis of ME/CFS symptoms in DecodeME reveals two subgroups and a link to onset type

St-Jean, Christa; Dibble, Joshua James; Ponting, Chris P; Prigge, Regina

Background
Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a debilitating, often infection-triggered illness with no cure and no effective treatment. Marked symptom heterogeneity hampers diagnosis, disease management, and trial design. Using phenotype data from the world's largest ME/CFS cohort, this study aimed to identify groups of patients with similar symptom profiles using cluster analysis, to assess the association between cluster membership and onset type, and to explore genetic associations with cluster membership.

Methods
This study included 19,019 DecodeME participants, ages 16 and over, with ME/CFS in the UK, from 2022-2024. We performed a k-modes cluster analysis of individuals based on similar symptoms. Cluster metrics identified the optimal number of clusters, which were characterised and compared.

A sex-stratified subgroup analysis explored differences between clusters among males and females. The association between ME/CFS onset type (infectious, non-infectious, or unknown) and cluster membership was assessed with logistic regression models, adjusting for sex, age, deprivation, and ethnicity. Genetic associations with cluster membership were assessed using a genome-wide association study.

Results
We identified two clusters in our study population: a high symptom burden cluster (HSBC; 57% of participants) and a lower symptom burden cluster (LSBC; 43%).

The HSBC was characterised by higher prevalence of symptoms across all domains, more comorbidities, and greater illness severity. Individuals with infectious and unknown onset had 1.24 times (95% CI: 1.15-1.35) and 1.30 times (95% CI: 1.18-1.43) higher adjusted odds of HSBC membership relative to non-infectious onset, respectively. A similar pattern was observed in the sex-stratified analyses, although it showed an overall higher symptom prevalence for females and a higher proportion of females in the HSBC compared to males.

No genetic variant was significantly associated with cluster membership.

Conclusions
This large-scale cluster analysis of DecodeME symptom data reinforces that ME/CFS is a heterogeneous condition with clinical subtypes. The identification of symptom-based phenotypes, along with sex-based differences in symptom burden and cluster characteristics, highlights the importance of incorporating symptom burden and sex in future research, clinical decision-making, and public health strategies. Tailoring future interventions to these subgroups could enhance patient management and improve outcomes.

Web | DOI | PDF | medRxiv | Preprint
 
Last edited:
Happy to see such an analysis, well done!
This large-scale cluster analysis of DecodeME symptom data reinforces that ME/CFS is a heterogeneous condition with clinical subtypes.
I'm not really sure here what the authors mean here. Of course different people have different severities and those with higher severities will have more symptoms and be more likely to be depressed and some group of people may have more of some symptoms than the others, but does that automatically make something a clinical subtype if the illness processes needn't be too distinct? It seems that this is how other illnesses are charactized in the literature as well, but does it really mean much more than an illness having a large spread of symptoms?

A general possible trend between infectious origin and illness patterns seems interesting and this has also come up elsewhere (as also cited in the paper). I'm very happy that the authors carried out this analysis, this is really good, but does the lack of association of genetic signal and cluster membership not rather possibly point in the other direction to what the authors are writing here, rather than support it?
 
Last edited:
Happy to see subgroups created out of DecodeME.
From briefly reading this it seems it's just two subgroups, high symptom burden and low symptom burden both with no genetic association. This is not the over hyped subgrouping you will find on twitter. Of interest infectious onset was more likely to put you in the high symptom burden, along with being female.
 
I'm not really sure here what the authors mean here. Of course different people have different severities and those with higher severities will have more symptoms and be more likely to be depressed and some group of people may have more of some symptoms than the others, but does that automatically make something a clinical subtype? Is this how one speaks of other illnesses that have the above patterns as well?

A general possible trend between infectious origin and illness patterns seems interesting and this has also come up elsewhere (as also cited in the paper). I'm very happy that the authors carried out this analysis, this is really good, but does the lack of association of genetic signal and cluster membership not rather possibly point in the other direction to what the authors are writing here, rather than support it?
People with infectious origin are 1.24 times higher odds of being in the high symptom group, with such a big difference I think that they are saying that this itself is a possibly a way to distinguish a subtype, but this is in the discussion section not conclusion:
The observed association between infectious or unknown onset and membership in the HSBC
suggests that these onset types could delineate clinically and potentially biologically distinct ME/CFS subtypes
Then in the conclusion:
The observed associations between symptom burden, sex, and onset type, and particularly the
heightened symptom severity linked to infectious and unknown onset, suggest the existence of
clinically meaningful subtypes, potentially reflecting distinct biological mechanisms.
 
People with infectious origin are 1.24 times higher odds of being in the high symptom group, with such a big difference I think that they are saying that this itself is a possibly a way to distinguish a subtype, but this is in the discussion section not conclusion:

Then in the conclusion:
Yes, but a clinical subtype can just be a cluster of symptoms as far as I understand it. I agree that the general trend between infectious origin and outcomes is the interesting part and deserves further investigations, especially as it might point to something biologically.

At the same time the adjusted odds ratio of landing in HSBC is bigger for the unknown group here than for the infectious group so things become trickier.
 
I am sceptical of linking anything with onset type because there seems to be a lot of us who just don't know whether an infection was involved or not. It's often guesswork. You might say, 'well, you're in the unknown group then', but others with a very similar history might have chosen the infectious or non-infectious group. It's very subjective. I had a hard time answering that question.

At the same time the adjusted odds ratio of landing in HSBC is bigger for the unknown group here than for the infectious group so things become trickier.
Yes, it might have been a more convincing story if the odds ratio of HSBC was (significantly) higher in the infectious-onset group.
 
Last edited:
I don't see this as anything to get excited about. Are people who need several boxes of tissues during a cold and people who just have a few sniffles different clinical subtypes? Even if the answer is "yes", does that mean that it's a breakthrough in understanding the common cold, or a breakthrough in treatments? Unfortunately, there are career rewards for publishing even useless studies.

Also, the clustering might just be an artifact of data collection, such as how people think about the questions depending on how severe their symptoms are.
 
No genetic variant was significantly associated with cluster membership.

The observed associations between symptom burden, sex, and onset type, and particularly the
heightened symptom severity linked to infectious and unknown onset, suggest the existence of
clinically meaningful subtypes, potentially reflecting distinct biological mechanisms.
Wouldn't we expect to see different gene variants for different biological mechanisms (modulo statistical power of course)?

Maybe this is even good evidence against biological distinct subtypes?
 
Given that the data was collected at a single time point, and many of us have had great differences in severity over our years with ME/CFS, surely all they are showing is that when we are sicker we have more symptoms on the list.

So any individual moves between the two clusters as their illness severity changes. If that is the case, then there won't be a genetic difference between the clusters.
 
@EndME I wholeheartedly agree with your skepticism of the subtype narrative, both in the context of this study as well as more broadly.

Strictly speaking, "clinical subtypes" simply means that different patients cluster differently based on symptoms. Definitionally, it does not suggest that there are different disease mechanisms or underlying biology behind those subtypes.

And yet, that's all I seem to see and hear. We explain away every failed trial with the notion that we included multiple subtypes with different disease mechanisms, and so of course the treatment wouldn't work across all of them. Every research organization stresses the importance of subtyping, and yet there is zero evidence that distinct subtypes actually exist beyond symptom presentation.

If two people contract the flu and one is sick for a few days while the other is severely ill and ends up with pneumonia, are those two different subtypes of the virus influenza? No, of course not, and yet that's what this study seems to be saying.
 
For the GWAS between the two clusters, they used 2401 participants in the lower symptom burden cluster (LSBC) and 3264 in the high symptom burden cluster (HSBC). So it's a much smaller sample than DecodeME.
For this analysis, we retained only participants who were included in DecodeME GWAS-1 and had matching questionnaire data, leading to a dataset with 15,328 participants. Of these, 2,401 participants were allocated to the LSBC, and 3,264 to the HSBC.

There were still some variants that were somewhat close to significance.
No variant reached genome-wide significance at the standard threshold of p < 5 x 10-8. At a less stringent threshold of p < 8 x 10-7 three associations were identified (Figure 3, Supplemental Table 3), among which approximately one is expected to be a false positive discovery (20).

Supplemental Table 3. Three associations with sub-genome wide significance (p < 8 x 10-7) of association to HSBC status.
Chromosome: position (GRCh38) allelesVariant tested, rsIDNearest Protein Coding
Gene(s)
p-value, effect size (b), allele frequency (AF)Gene function annotation
15: 60,884,050 C/Grs341388RORA2.0x10-7
b = +0.219
AF = 0.341
Chronobiology, inflammation, type 2 innate lymphoid cell development
7: 150,346,134 T/Ars883139RARRES25.8x10-7
b =-0.333
AF = 0.097
Adipokine, immune response, antimicrobial, anti-inflammation.
2: 234,081,189 A/Grs60995367SPP2 /
TRPM8
7.59x10-7
b = -0.499
AF = 0.038
SPP2 (bone metabolism; liver); TRPM8 (temperature regulation, pain sensing, migraine)

I wasn't able to follow all the details, but it seems that the clusters are basically low vs. high severity:
Severe and very severe illness were both around three times more common in the HSBC than in the LSBC (severe: 18% vs 6%, p < 0.001; very severe: 1.1% vs 0.3%, p < 0.001). The HSBC also had more individuals reporting worsening illness (22.7% vs 11.4%, p < 0.001) and fewer reporting improvement (1.2% vs 4.3%, p < 0.001). Comorbidities were generally more prevalent in the HSBC, while cancer was the only comorbidity more prevalent in the LSBC (p < 0.001).

So this GWAS seems kind of like a "dose-response" study for severity, and it would kind of be validation of a variant if it showed up both here and in DecodeME. Unfortunately, none of the above three variants look very significant in the main DecodeME GWAS:

15: 60,884,050 C/G
p=.077
1783008617759.webp

7: 150,346,134 T/A
p=.079
1783008545224.webp

2: 234,081,189 A/G
p=.78
1783008963616.webp

Edit: Added links to LocusZoom plots.
 
Last edited:
For the GWAS between the two clusters, they used 2401 participants in the lower symptom burden cluster (LSBC) and 3264 in the high symptom burden cluster (HSBC). So it's a much smaller sample than DecodeME.


There were still some variants that were somewhat close to significance.


Supplemental Table 3. Three associations with sub-genome wide significance (p < 8 x 10-7) of association to HSBC status.
Chromosome: position (GRCh38) allelesVariant tested, rsIDNearest Protein Coding
Gene(s)
p-value, effect size (b), allele frequency (AF)Gene function annotation
15: 60,884,050 C/Grs341388RORA2.0x10-7
b = +0.219
AF = 0.341
Chronobiology, inflammation, type 2 innate lymphoid cell development
7: 150,346,134 T/Ars883139RARRES25.8x10-7
b =-0.333
AF = 0.097
Adipokine, immune response, antimicrobial, anti-inflammation.
2: 234,081,189 A/Grs60995367SPP2 /
TRPM8
7.59x10-7
b = -0.499
AF = 0.038
SPP2 (bone metabolism; liver); TRPM8 (temperature regulation, pain sensing, migraine)

I wasn't able to follow all the details, but it seems that the clusters are basically low vs. high severity:


So this GWAS seems kind of like a "dose-response" study for severity, and it would kind of be validation of a variant if it showed up both here and in DecodeME. Unfortunately, none of the above three variants look very significant in the main DecodeME GWAS:

15: 60,884,050 C/G
p=.077
View attachment 33068

7: 150,346,134 T/A
p=.079
View attachment 33067

2: 234,081,189 A/G
p=.78
View attachment 33069
Yes, it might still be interesting if certain symptom clusters could be tested against variants that came up significant in DecodeME, for example whether "pain clusters" show differences in CA10 to other clusters.
 
An interesting study.

I don't have the energy to read the paper, but I'd i'd like to know more about the definition of symptom burden/symptoms severity. Was this simply symptom count? I'm pretty sure that they didn't ask participants about the severity of individual symptoms.

And I think there was a correlation between symptom burden and overall illness severity category. Exactly what data do they show on that? It would be interesting to see a correlation analysis between category of illness severity and mean symptom burden (symptom count?) for each of those categories.

Only if someone has come across that information or ready. Thanks.
 
Last edited:
I really appreciate how this paper was written. It’s so much easier to read and follow than most of what we see here. Kudos to the team!

The used binary variables for the symptoms, which I assume were yes or no:
This analysis is based on 67 binary symptom variables, ascertained through the DecodeME study baseline questionnaire (9) which was developed from the CCC and IOM/NAM diagnostic criteria by individuals with lived experience of ME/CFS (3).

I can’t find any data in the main text about the threshold for being classified as having a high or low symptom burden. Perhaps something to include explicitly in the final version if I have not just overlooked something? If there is no threshold, maybe include the median and average amount of symptoms with SD ranges?
Characterisation of the clusters indicated that the primary distinction between subtypes was overall symptom burden, separating participants into a high symptom burden cluster (HSBC) and a lower symptom burden cluster (LSBC). (Supplemental Table 2).

It’s interesting that we now have data showing that the more severe patients on average have a larger range of symptoms than the moderate or mild patients - assuming the data is representative and my calculations are correct:
The HSBC contained more individuals (n = 10,849, 57%) than the LSBC (n = 8,170, 43%), and a higher proportion of females (89% vs 80% in the LSBC, p < 0.001).
Severe and very severe illness were both around three times more common in the HSBC than in the LSBC (severe: 18% vs 6%, p < 0.001; very severe: 1.1% vs 0.3%, p < 0.001).
Percentage of patients in the HSBC per severity (and ratio (HS/LS)):

Mild/moderate: 53 % (1.15)
Severe: 80 % (3.98)
Very severe: 83 % (4.86)

Granted, there are only about 2500 severe and 150 very severe patients.

It is also interesting that the differences are there for almost all symptoms. Does that mean that there aren’t really any symptoms that are exclusive to the HSBC?
In this analysis, 58 out of the 67 symptoms, all of which are more prevalent in the HSBC, have moderate-to-large effects on cluster separation, based on Cohen’s h ³ 0.5 (Table 1).
(…)
While nearly all symptoms were less prevalent in the LSBC, reflecting a generally milder and less multisystem pattern, this was not an inevitable outcome given the known heterogeneity of ME/CFS symptoms. That such a distinction emerged underscores the potential value of clustering approaches in elucidating population-level symptom patterns.

The finding that younger patients were overrepresented in the HSBC might indicate that the people in the first age peak are more likely to have a worse course.

Maybe it’s because they are more likely to have more abnormalities in the processes that are involved in the disease (and therefore also more likely to by chance get it earlier in life)?

One small criticism at the end:
Please don’t use the phrase «multifisciplinary care». This is a dogwhistle for «psychosomatic and rehab care», which we do not need. I know that’s not what you meant and the phrase is ambiguous, but it might be something to note for the final version.
Although curative treatment remains the ideal goal, no effective interventions currently exist for ME/CFS (41). In the meantime, multidisciplinary care focused on symptom management is essential. This study reinforces the importance of ensuring that such care is widely accessible and responsive to complex clinical needs. Symptom management may include advice on pacing, comorbidity screening, sensory modulation, social and psychological support, and pharmacologic treatment to alleviate specific symptoms (42).
Also, is there any way to say «sensory modulation» without opening for increasing the sensory burden, which is what the BPS folks want to do?
 
If two people contract the flu and one is sick for a few days while the other is severely ill and ends up with pneumonia, are those two different subtypes of the virus influenza? No, of course not, and yet that's what this study seems to be saying.
No, that’s not quite it. The study actually shows two key things:

First, they separated the participants into two groups based entirely on symptom severity. They found that those in the more severe group were about 1.24 times more likely to have had an infectious trigger at the start of their ME/CFS compared to those with lower symptom burden.

Second, and perhaps most interestingly, they found that these two groups do not differ genetically. This means human genes are not the reason some people end up with a much higher symptom burden.

To us patients, this might seem trivial. We already know firsthand that the main cause of severe deterioration is overexertion relative to our individual baseline. But researchers have a habit of making things more complicated because they have to prove everything objectively. In the end, I guess it’s still a good thing we have them doing this work.
:)
 
And yet, that's all I seem to see and hear. We explain away every failed trial with the notion that we included multiple subtypes with different disease mechanisms, and so of course the treatment wouldn't work across all of them. Every research organization stresses the importance of subtyping, and yet there is zero evidence that distinct subtypes actually exist beyond symptom presentation.
Who’s «we»? I don’t recognise those arguments from discussions here so I’m assuming you’re referring to other groups?
If two people contract the flu and one is sick for a few days while the other is severely ill and ends up with pneumonia, are those two different subtypes of the virus influenza? No, of course not, and yet that's what this study seems to be saying.
No, this is like comparing the people that ended up with pneumonia and seeing if there are differences there.

The initial DecodeME study looked at your example: are there differences between the people that develop ME/CFS and the people that have not gotten it.
Second, and perhaps most interestingly, they found that these two groups do not differ genetically. This means human genes are not the reason some people end up with a much higher symptom burden.
That goes beyond the evidence due to the limited sample sizes when using subgroups. We know from other genetic studies that reaching a critical mass is crucial to be able to find the relevant genes. We need larger cohorts to say anything more definitively.
 
Back
Top Bottom