Urine Metabolomics Exposes Anomalous Recovery after Maximal Exertion in Female ME/CFS Patients 2023, Glass, Hanson et al

I have suspected for maybe two decades and a half that our disturbed sleep might be associated with altered circadian patterns of many metabolites. This was my concern in the late 90s when early morning urinary metabolites were first being looked at. Change the time, or have patients with altered circadian patterns, and you might get different results. We cannot yet be sure that our results will be stable over the course of a sleep-wake cycle. This needs to be established, or an optimal time of day or sleep-wake cycle needs to be determined.

[Edited to correct how long its been.]
This
 
This got a lot of attention 2-3 years ago when it came out.

Do we have any much news on a replication attempt or similar?
A screen like this will probably not ever get sufficient funding for a replication attempt. It’s just seen as a hypothesis generating exercise, so if already done once no funding body sees the point in doing it again (even the very rare private funds specifically for replication)
 
A screen like this will probably not ever get sufficient funding for a replication attempt. It’s just seen as a hypothesis generating exercise, so if already done once no funding body sees the point in doing it again (even the very rare private funds specifically for replication)
That's crazy! Surely the lack of expected change seen here could have diagnostic purposes? And theorising around it would be much easier if people could be confident in it being replicable.

The way people talk about capital s Science compared to the reality of how scientific research works in the real world is so disappointing.
 
Last edited:
Surely the lack of expected change seen here could have diagnostic purposes? And theorising around it would be much easier if people could be confident in it being replicable.
I'll be honest, the chances of this figure reflecting a real biological phenomenon of no change after exercise is so small I would dismiss it outright.

Screenshot 2025-12-23 at 10.30.57 AM.png
What this graph means is that either the intragroup variability for every single metabolite was so wild that there was no way for any data point to reach significance, or there was a problem in the statistical analysis (either human error while running the code, an inappropriate model for the data, [edit: or incorrectly calculating the q value]). Even running the same sample in two different batches won't give you a graph like this.

The Hanson lab didn't run these samples, they were sent off to a company, so there might have been no way to ask for the samples to be rerun to confirm whether there was a technical issue in the sample processing that made things go funny.

So to your point, it would be ideal to get funding for another study to redo this and see what the actual changes were, but given that there actually have been other urinalysis studies done in ME/CFS and no smoking gun, I think a funding body wouldn't see the point.
 
Last edited:
definitionally for a q-value by BH you should have a number of false positives determined by your false discovery rate above the dotted line.
I don't think this is correct. For example, if all the findings are actually null, which would result in a uniform distribution of p-values between 0 and 1, there's a good chance nothing will pass the FDR threshold.

I verified with some quick code which output a minimum q-value of 0.31 if the p-values aren't skewed at all to low values:
Python:
In [1]: import numpy as np
   ...: from scipy.stats import false_discovery_control
   ...:
   ...: ps = np.random.uniform(0, 1, 10000)
   ...:
   ...: fdr_ps = false_discovery_control(ps, method='bh')
   ...:
   ...: np.min(fdr_ps)
Out[1]: 0.31231890557870123

There still might be an issue somewhere, but it's theoretically possible with regard to q-value.
 
I don't think this is correct. For example, if all the findings are actually null, which would result in a uniform distribution of p-values between 0 and 1, there's a good chance nothing will pass the FDR threshold.

I verified with some quick code which output a minimum q-value of 0.31 if the p-values aren't skewed at all to low values:
Python:
In [1]: import numpy as np
   ...: from scipy.stats import false_discovery_control
   ...:
   ...: ps = np.random.uniform(0, 1, 10000)
   ...:
   ...: fdr_ps = false_discovery_control(ps, method='bh')
   ...:
   ...: np.min(fdr_ps)
Out[1]: 0.31231890557870123

There still might be an issue somewhere, but it's theoretically possible with regard to q-value.
Yes, sorry for not clarifying, this was with the assumption that you’re nearly always going to have some low p-value false positives in a screen like this if you include enough features. It’s possible to end up with a situation where absolutely nothing passes the threshold after correction numerically, but something strange is usually happening to get that in this biological context

[Edit: and I’m probably adding to the conclusion by using my terminology loosely here]
 
Last edited:
Actually thanks @forestglip for prompting me to go back and check this—looking at some of my old datasets I do see a scenario where I got nothing past the q value threshold in a mouse study with very small n. So I take back my original comment

I think what tends to happen in human cohorts of disease with decent n is that when you measure enough things, you can virtually guarantee “real” differences in a subset of metabolites simply due to confounders that you couldn’t possibly account for with the study design. Less activity even when you use sedentary controls, higher proportion of sick people being prescribed psychiatric medications, more people with dietary restrictions in the sick group, etc.

Even in the context of pre vs post exercise within ME/CFS, there’s going to be differences just from day to day biological variation. Usually it’s a small enough confounding effect that it’ll wash out anyways with stringent p value correction, but it does mean that you can expect the raw p-values to skew low before correction if that makes sense. So that’s one reason why this data set off alarm bells. Not so much a statistical argument rather than an interpretive one
 
Even in the context of pre vs post exercise within ME/CFS, there’s going to be differences just from day to day biological variation. Usually it’s a small enough confounding effect that it’ll wash out anyways with stringent p value correction, but it does mean that you can expect the raw p-values to skew low before correction if that makes sense. So that’s one reason why this data set off alarm bells. Not so much a statistical argument rather than an interpretive one
Yeah, I wouldn't expect there to be actually zero real difference between days because I expect exercise to do things to anyone, so would expect some skew in the raw p-values, though maybe not so much it passes correction.

But I agree it seems very strange to have so many very significant metabolites in one group and not a single significant metabolite in the other. I would assume, and this could be wrong, that those 255 significant metabolites in controls are parts of multiple unrelated pathways that get changed due to exercise or other confounders in various ways, so it'd be surprising for not one of these pathways to change between days in another group.

Edit: I mean, I don't know much about the physiology of exercise, but maybe this could be real? Maybe all these metabolites represent downstream consequences of one specific process that happens due to exercise in healthy people, and this is what is specifically not working in ME/CFS, or maybe is delayed? I hope we can get an attempt at replication, in one form or another.
 
Last edited:
Edit: I mean, I don't know much about the physiology of exercise, but maybe this could be real? Maybe all these metabolites represent downstream consequences of one specific process that happens due to exercise in healthy people, and this is what is specifically not working in ME/CFS, or maybe is delayed? I hope we can get an attempt at replication, in one form or another.
What could be plausible is that a bunch of metabolites are induced less in ME/CFS relative to how much they are induced in healthy people (and the converse, metabolites that are downregulated less than they would be in healthy people). Which would also mean that we’d probably see some signs of abnormal metabolism somewhere—as various systems reflect insufficient adaption to increased strain.

But no, the idea of absolutely nothing changing after maximal exercise is unbelievably improbable to me when you can expect to have even a few significant differences in a screen like this just from sampling the same people at timepoints a few hours apart
 
Back
Top Bottom