Preprint Comparing DNA Methylation Landscapes in Peripheral Blood from [ME/CFS] and Long COVID Patients, 2025, Peppercorn et al

Nightsong

Senior Member (Voting Rights)
Abstract
Post-viral conditions, Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Long COVID (LC), share >95% of their symptoms, but the connection between disturbances in their underlying molecular biology is unclear. This study investigates DNA methylation patterns in peripheral blood mononuclear cells (PBMC) from patients with ME/CFS, LC, and healthy controls (HC). Reduced Representation Bisulphite Sequencing (RRBS) was applied to the DNA of age- and sex-matched cohorts: ME/CFS (n=5), LC (n=5) and HC (n=5).

The global DNA methylomes of the three cohorts were similar and spread equally across all chromosomes, except the sex chromosomes, but there were distinct minor changes in the exons of the disease cohorts towards more hypermethylation. A principal component analysis (PCA) analysing significant methylation changes (p<0.05) separated the ME/CFS, LC and HC cohorts into three distinct clusters. Analysis with a limit of >10% methylation difference and at P<0.05 identified 214 Differentially Methylated Fragments (DMF) in ME/CFS, and 429 in LC compared to HC. Of these 118 DMFs were common to both cohorts. Those in promoters and exons were mainly hypermethylated, with a minority hypomethylated.

There were rarer examples with either no change in methylation in ME/CFS but a change in LC, or a methylation change in ME/CFS but in the opposite direction in LC. The differential methylation in a number fragments was significantly greater in the LC cohort than in the ME/CFS cohort. Our data reveal a generally shared epigenetic makeup between ME/CFS and LC but with specific distinct changes. Differences between the two cohorts likely reflect the stage of the disease from onset (LC 1 year vs ME/CFS 12 years) but specific changes imposed by the SARS-COV-2 virus in the case of the LC patients cannot be discounted.

These findings provide a foundation for further studies with larger cohorts at the same disease stage and for functional analyses to establish clinical relevance.

Link (MDPI Preprint, May 2025, open access)
 
Last edited:
It is an enormously impressive separation. It's so good, it's got me wondering what the flaw in the paper is.

Each group of five people is really tightly clustered together, and there's a lot of separation between the ME/CFS group and the LC group, even though it sounds as though the LC group is pretty much post-Covid-19 ME/CFS. There is quite a range of people in each group (from young to old, 4/5 females) (although each trio is well matched across the three groups).

Figure 1: PCA plot illustrating three distinct clusters representing HC, ME/CFS, and LC based on 3,363 DMFs common to all cohorts filtered by P < 0.05, without considering methylation difference
I think the PCA chart probably has a problem we have seen before - it is a chart of only the differentially methylated fragments that are statistically different between the groups. And so it finds that the groups are different.

So, there were 73239 DNA fragments that were found in all of the participants. But the PCA is run only on the fragments that were statistically different between one group and another. I think that is wrong, if it is used to suggest that the groups look very different to each other. I think we could probably expect that, by chance in any group of 15 people, that there would be some characteristics that could separate a group of 5 of them from the rest.

I expect that we could randomly group the 15 participants into three groups, and find several thousands of differences in methylation that could produce a similar PCA chart.

I understand that the researchers wouldn't want to do a PCA with all 73239 DNA fragments, as any differences between the groups would probably get swamped by noise. But, neither should they do a PCA that only uses characteristics that they have already found separates the groups. It's cherry picking. There might be some real findings in there, but I don't think the PCA is the tool to show them.

I don't blame people seeing charts like that and then getting upset that the clear differences between ME/CFS and healthy people that they think have been shown in the literature are being ignored. I don't think researchers should be doing this.
 
Last edited:
To identify the functional pathways associated with the DMFs in LC and ME/CFS, pathway enrichment analysis was performed using Metascape [32]. This analysis revealed several shared pathways between the two conditions (Figure S1A & S1B), suggesting common biological mechanisms.

Notably, Response to Wounding (GO:0009611) emerged as a key pathway involved in tissue repair and inflammatory processes, aligning with the immune dysregulation and chronic inflammatory responses observed in both conditions.

Additionally, Regulation of System Process (GO:0044057) was significantly enriched, highlighting disruptions in physiological processes such as circulation and metabolism that may contribute to cardiovascular and neurological dysfunctions.

Cellular Response to Cytokine Stimulus/Acid Chemical (GO:0071345/GO:0071229) was identified, which likely reflects the important role in immune signaling and inflammation, both of which are known to be altered in ME/CFS and LC.

Furthermore, Regulation of Small GTPase-Mediated Signal Transduction (GO:0051057/GO:0051056) was enriched, implicating intracellular signaling pathways that modulate immune responses, cell migration, and tissue repair.

Growth Regulation (GO:0040007/GO:0040008) was a prominent pathway, suggesting that aberrant growth signaling could contribute to impaired tissue regeneration in both conditions.

Other than these similarities, the functional pathway analysis showed blood vessel morphogenesis, muscle organ development, AGE RAGE pathway, Neutrophil degranulation as other notable pathways in LC, whereas thyroid hormone production, leukocyte differentiation, negative regulation of T cell receptor pathway, heart development and blood circulation were highlighted in ME/CFS.
 
I think it might be more useful to separate out the various sorts of PBMC, and then do this analysis?


Discussion said:
The hierarchical clustering of the methylation changes in the 118 DMFs shared between LC and ME/CFS demonstrated a high degree of correlation (Pearson R = 0.88), indicating substantial overlap between the two conditions.
The researchers seem to have made another similar error here. They have selected DMFs where both the LC and ME/CFS groups had a similar result. Then they say that because the methylation changes look similar, there is a substantial overlap between the two conditions.

Discussion said:
However, in contrast the two matched older ME/CFS and LC patients are tightly clustered within the other patients of their cohorts, suggesting the changes in the DNA methylome caused by the disease are more influential than those reflecting age.
I don't see how they can say that. They picked out the fragments that supported the LC - ME/CFS- healthy control groups (fragments where the methylation level was more than 10% different between a disease group and the healthy controls). So, of course those fragments support the separation into those groups, rather than other groups e.g. age.
 
Screenshot 2025-05-21 at 6.57.43 pm.png
Screenshot 2025-05-21 at 6.57.54 pm.png

It was a good preliminary investigation and they have found interesting things. I just think the report presentation and analysis really lets it down.

Take that first fragment in the table - it is associated with CCDC130, and the LC and ME/CFS groups both have more methylation than the healthy controls:
Wikipedia said:
There are several proteins listed that interact with CCDC130, including EEF1A1, NINL, TRAF2, ZBTB16, ZNF165, and ZNF24. EEF1A1 is a eukaryotic elongation factor that is involved in the binding of aminoacyl-tRNA to the A-site of ribosomes during translation.[20] NINL is a ninein-like protein that is involved in microtubule organization and has calcium ion binding activity.[20]TRAF2, tumor necrosis factor (TNF) receptor associated factor 2, is part of some E3 ubiquitin ligase complexes and is involved in ubiquitinating proteins so they can get degraded by the proteasome.[20] ZBTB16, zinc finger and BTB domain-containing protein 16, is also part of the E3 ubiquitin ligase complex and is most likely involved in substrate recognition.
PB-CD8+ T cells had the highest relative CCDC130 expression
Many websites also say that it is involved in the cell's response to viral infection, but there is no specific information on this nor any elaboration.
 
Last edited:
So, there were 73239 DNA fragments that were found in all of the participants. But the PCA is run only on the fragments that were statistically different between one group and another. I think that is wrong, if it is used to suggest that the groups look very different to each other. I think we could probably expect that, by chance in any group of 15 people, that there would be some characteristics that could separate a group of 5 of them from the rest.

I expect that we could randomly group the 15 participants into three groups, and find several thousands of differences in methylation that could produce a similar PCA chart.

I don't think researchers should be doing this.

Yeah exactly, it looks like it's a negative finding to me. If they've done 73239 tests which is how it seems, they have a huge multiple testing problem and it seems like they probably haven't multiple test corrected?

By chance, with this many tests you would expect 0.05x73239 = 3661 false positive results. In the paper it looks as though they find 3363 p<0.05 significant differentially methylated fragments. I think it's all noise, and that PCA result is just a reflection of the number of tests they've done. If there is a methylation pattern associated with ME, this kind of analysis wouldn't be able to find it because of the number of tests being far greater than the sample size.
 
Yes.
That said, the second fragment on that table is associated with IRF2BPL - interferon regulatory factor 2 binding protein, that was found to be less methylated than in the healthy controls:
AI said:
IRF2BPL stands for "interferon regulatory factor 2 binding protein like" and is a gene that, when mutated, can cause neurological problems. The IRF2BPL gene encodes a protein involved in the proteasome-mediated ubiquitin-dependent degradation of target proteins, and is expressed in various tissues including the brain.

The protein encoded by IRF2BPL is thought to be involved in neuronal development and homeostasis, and may also have a role in other cellular processes

Like the first gene mentioned in that table, it fits with some of the ideas we have been tossing around elsewhere.
 
Is it surprising to find effects with such minuscule numbers of people - five in each group? Does this mean that if there's an effect, it's massive? I'm surprised they thought the study worth doing on only five each in the first place.
 
I think the PCA chart probably has a problem we have seen before - it is a chart of only the differentially methylated fragments that are statistically different between the groups. And so it finds that the groups are different.
I think there is a discrepancy between what the text says and what the figure legend says. I initially had the same concern as you that they might be pre-selecting the features that would make the findings look best, so I checked the text and saw this:

To begin the evaluation of the methylation landscape, the fragments were first sub-selected with a strict criterion as being present in all five patients of each of the three cohorts, and 73,239 fragments were then available for further analysis (Figure 1). A Principal Component Analysis (PCA) separated the cohorts into three distinct clusters (Figure 2). This indicated that both the ME/CFS and LC cohorts showed differential methylation within these common fragments compared with the healthy controls. The separation of the ME/CFS and LC cohorts into individual clusters (Figure 2) suggested that there might be differences between the extent of the methylation change at specific sites between the two cohorts, or changes at specific sites in only one of the disease cohorts. For this analysis a significance of P<0.05 was imposed for the methylation change, but no limits on the degree of methylation change. For this analysis, 3363 fragments met that criterion.
This certainly suggested to me that all 70K sites were used for this PCA, which assuaged my concerns. However, I didn’t check the figure legend, which seems to tell an entirely different story. That is highly concerning as that particular detail changes the whole narrative of that figure.

I understand that the researchers wouldn't want to do a PCA with all 73239 DNA fragments, as any differences between the groups would probably get swamped by noise.

That wouldn’t necessarily be a concern as the plot only shows PC1 and PC2. PCA starts with the axis of greatest variation and then proceeds orthogonally until the pre specified number of components, so you’d expect that the first two carry the most information regardless of how many data points were considered. The main issue would be if the largest variation between all the data points was something other than your feature of interest (i.e. sex or age or some other confounding factor), in which case that would dominate the first PCs. Typically studies will show the PCA on all considered sites precisely to show what features are driving variation in the cohort and whether it is a confounder.

So either they did exactly what I would expect from a good analysis and simply made a mistake on the figure legend (which is a bit concerning but can be addressed in good faith), or they’re being quite dishonest by not also showing the PCA on all sites.

[Edit: I will reach out to the authors and let them know about the discrepancy so hopefully it can be fixed in the published manuscript. In the mean time, I’ll give them some benefit of the doubt since the rest of the study seems to be honest to my eye.]
 
Last edited:
Is it surprising to find effects with such minuscule numbers of people - five in each group? Does this mean that if there's an effect, it's massive? I'm surprised they thought the study worth doing on only five each in the first place.
I haven’t done a DNA methylation analysis like this paper but I do have comparable experience from ATAC-seq which looks at open chromatin (another epigenetic modification that uses similar protocols for sequencing). That study used 3 replicates on old vs. young mice. Even using genetically identical mice raised in the same environment, there was significant variation. The amount of sites that passed initial QC was comparable to this study. Which is interesting since with humans you expect massively more variation. Though much of that may be chalked up to a difference in methodology, I can’t exactly say what you’d expect from BSS.

I should have clarified in my first comment that it was a rather curiously good separation, as @Hutan caught on to, I just didn’t have time then to dig into the details. It’s not completely unexpected, you’d certainly hope to see a stark difference between healthy and disease states, but also quite impressive for 5 human replicates each if you only take it at face value. I suspect some of the “cleanness” is driven by their stringent selection of sites across replicates right at the beginning, which means that they are only choosing sites they are highly confident in (good thing), but also only sites that have a strong signal in the first place. Further replication would absolutely be necessary, which the authors themselves acknowledge.
 
Last edited:
Yes.
That said, the second fragment on that table is associated with IRF2BPL - interferon regulatory factor 2 binding protein, that was found to be less methylated than in the healthy controls:


Like the first gene mentioned in that table, it fits with some of the ideas we have been tossing around elsewhere.
Yes that piqued my interest for the same reason, though I’m definitely trying not to get too excited when this is a very preliminary study.
 
Is it surprising to find effects with such minuscule numbers of people - five in each group? Does this mean that if there's an effect, it's massive? I'm surprised they thought the study worth doing on only five each in the first place.
All you would need is for a disease-associated change to be present in nearly all of the people in the ME/CFS and Long Covid groups, and not present in the healthy controls for it to show up in the list. So, not massive, just (almost) ubiquitous in people with the disease. That is possible.

But, I can't see how they could separate a finding like that from statistical noise when they have so many fragments and so few samples. Surely some of the people involved would have known that before the lab work was done. But, perhaps they thought it was still worth doing, in order to develop the necessary skills, and to make a stronger case for getting funds for a better sized study. And I would agree.

It's the presentation of the results that is the problem.
 
Back
Top