Urine Metabolomics Exposes Anomalous Recovery after Maximal Exertion in Female ME/CFS Patients 2023, Glass, Hanson et al

I think it is incompatible with the data just showing much more variance in ME/CFS than the control group
I think they considered whether it might be due to increased variance, and found no general difference in variance between groups. Specifically for changes due to exercise, this would be the Delta plot in S3A below.

The plots you posted above might show increased variance in ME/CFS, but this is just a small selection of compounds, chosen essentially in part because of their significance in the control group, which could be due in part to low variance.

Since the following indicates that the variance doesn't differ much on the whole, you might expect to see a similar number of significant findings in ME/CFS as well if they were similar.
The fact that there are metabolite changes in the controls and not in the patients is not due to increased variation in metabolite levels in the ME/CFS patients compared to the controls. There is no trend toward higher standard deviation in the ME/CFS group when comparing the standard deviations for ME/CFS to controls for each compound (Supplementary Figure S3).

1770041243683.png
 
I think they considered whether it might be due to increased variance, and found no general difference in variance between groups. Specifically for changes due to exercise, this would be the Delta plot in S3A below.
Variance was perhaps the wrong word to use, what would matter is directional variability between timepoints. Because in the control plots shown there is similar spread but also all trend upward, which drives a mean difference despite similar SD within the before and after data points. Does that make sense?

[Edit: so what’s different between controls and ME/CFS is a lack of consistent directional change as I said before—which doesn’t present the same implications as failure to adapt or no change]
 
Because in the control plots shown there is similar spread but also all trend upward, which drives a mean difference despite similar SD within the before and after data points. Does that make sense?
Yes, but that's what the delta column is showing. It's not variance of raw values of compounds, it's variance of how much a compound changed before to after exercise.

If control compounds tend to trend upward a similar amount, while ME/CFS compounds tend to go up or down in a more spread out manner, then delta for controls for a given compound would have a lower SD in the plot above.
 
Yes, but that's what the delta column is showing. It's not variance of raw values of compounds, it's variance of how much a compound changed before to after exercise.

If control compounds tend to trend upward a similar amount, while ME/CFS compounds tend to go up or down in a more spread out manner, then delta for controls for a given compound would have a lower SD in the plot above.
Ah okay I missed delta there, thanks for pointing it out. So really just an interpretational issue for what no significant mean change within group could mean biologically
 
So really just an interpretational issue for what no significant mean change within group could mean biologically
My baseline interpretation is, we shouldn't interpret a lack of significance as evidence for a lack of effect.

Though what makes me be a bit hesitant to discount on the basis of one group randomly not reaching significance is the stark difference between groups, with hundreds of compounds significant between days in controls, and zero in ME/CFS. I'm not sure seeing so many differences is something that would be expected when we consider that the variance between groups is similar, and there are even more ME/CFS patients than controls (10 vs 8), so we would tend to see more significant compounds in ME/CFS, not less.

My intuition says this is likely an artifact of some kind. But it's possible there's something there.

Edit: Though maybe these hundreds of compounds are highly correlated to each other, and would tend to all be significant or all be not significant together, in which case I'd go back to primarily thinking that we shouldn't use lack of significance as good evidence of lack of effect.
 
My intuition says this is likely an artifact of some kind. But it's possible there's something there.
As far as I remember, this wasnt a finding that was mirrored in the plasma studies from the same participants (I will try to go back and double check later today if I have time). Which makes me think it might have something to do with osmolarity in urine samples. I remember some folks on the forum talk about needing to urinate much more when other symptoms are worse
 
Which makes me think it might have something to do with osmolarity in urine samples.
They may have considered that too:
Additionally, these changes are detected despite normalizing to urine osmolality, which is trending toward an increase in the controls from baseline to post-exercise (Supplementary Figure S1). In order to increase significantly after normalization, levels of a metabolite have to increase even higher than any increase in overall urine concentration.
iMarkup_20260202_095625.jpg
 
They may have considered that too:
No, I know, I mean a difference that is obscured by correcting for osmolarity (which means there wouldnt be a good way to detect changes anyways if there was a big between-group osmolarity difference—I’ll try to check that from their public data as well)

[edit: I’ve had similar issues trying to find transcriptomic differences between infected and unifected cells when viruses inhibit transcription of host cells—you have to be really careful with normalization to avoid just flattening differences]
 
Last edited:
Quick skim of the plasma CPET metabolomics study: definitely many significant metabolites between D1 vs. D2 in ME/CFS. Also this caught my eye:
The most striking observation is the lack of significant differences in the controls (no P < 0.05). This suggests that the consequences of exercise on day 1 and day 2 are similar for the control cohort.
referring to day 1 vs. day 2 pathway analysis in the female cohort

And this indicates that there isn’t just a simple kidney problem leading to buildup of metabolites in the blood and failure to excrete in the urine in ME/CFS [edit: because if so you would expect metabolite retention in the blood compared to controls]:
At P < 0.05 and D1PRE, 75% of metabolites were lower in female participants while 67% were lower at D2POST.

 
Last edited:
Tukey’s post-hoc from emmeans also assumes independence of samples—someone correct me if I’m wrong, but I thought it wouldn’t be appropriate for comparing timepoints in the same group (though it is oft-used for that purpose).
 
Last edited:
Tukey’s post-hoc from emmeans also assumes independence of samples—someone correct me if I’m wrong, but I thought it wouldn’t be appropriate for comparing timepoints in the same group (though unfortunately it is commonly used for that purpose).
This part?:
There was a trend toward increased osmolality in the controls 24 h post-exercise (p < 0.1, linear mixed effects model, followed by pairwise comparisons with Tukey’s posthoc test).
This seems to suggest that it's okay to do this: https://repub.github.io/DLC_statistical_guides/docs/R/repeated-measures-ANOVA.html

Maybe emmeans accounts for the non-independence if it's a mixed effects model somehow. Whether it's actually good for this, I don't know. Some stackexchangers with differing viewpoints about it, though I don't really understand it: https://stats.stackexchange.com/questions/430539/pairwise-comparisons-via-emmeans
 
This part?:
That part too but they don’t specify the test used for just the volcano plot comparisons, which makes me think they just used the default everywhere. Pretty sure that's Tukey's for calling pairs() from emmeans but they also specify using BH correction so maybe something else was done for the tests outside of osmolarity and not specified? If they did not do Tukey's then it would have been the t statistic, which also assumes independence
Maybe emmeans accounts for the non-independence if it's a mixed effects model somehow. Whether it's actually good for this, I don't know. Some stackexchangers with differing viewpoints about it, though I don't really understand it:
yeah its...complicated. TLDR when you're doing emmeans, you're not actually running a statistical test on the data points but on a model built with all data points. So your comparison is asking "all other things being equal, what is the difference in means predicted by the model between group X vs. Y? Is that difference significant considering the standard error?" They do have a random effect per subject which normally would account for it mathematically, but as that last comment states its not so simple with this model structure (where the standard errors for the model itself). Plus there's also bleed happening even before the modeling because the features were median scaled (I'm assuming across all values for the experiment).

I'm confused about the github example since I was explicitly told not to do that for time series data...maybe there's just a lot of unresolved internal debate and half of all biostatisticians think its fine. All in all it might make only small differences on the final results--it matters somewhat less when you have multiple strong trends that would show up regardless of how you arrive at the p-value. But we're dealing with the opposite case here which is probably why I'm getting hung up on it.
 
Last edited:
If control compounds tend to trend upward a similar amount, while ME/CFS compounds tend to go up or down in a more spread out manner, then delta for controls for a given compound would have a lower SD in the plot above.
Sorry, something just occurred to me--that's not necessarily true.

This is what the plot is showing:
Shown are the standard deviations for the 1154 compounds analyzed, after missing values were imputed and log transformation. The “delta” dataset is the log2 fold change of post-exercise / baseline for each subject.
It's plotting the distribution of standard deviations of the per sample per metabolite logFCs. It doesn't actually address the issue I was talking about

Let's say we had 10 participants in each group. For the ME/CFS group, I simulate a normal distribution of 10 log2FCs centered around 0 (representing no mean difference). For the control group, I simulate distribution centered around 0.25 (representing a positive mean difference). Same SD for both.

deltas.png

Then I pull two sets of day 1 values from the same normal distribution and use the randomly generated deltas to calculate day 2 values for the controls and cases, respectively. That ends up with a plot very similar to the ones in Fig 6 and 7
difference.png

So same SD in the logFCs, but HC ends up with a significant mean difference and ME/CFS doesn't. You could rerun the same simulation with different SDs and end up with a very similar supplemental plot. But if the ME/CFS groups has more directional variability in logFCs per metabolite across the board, you'll still end up with no significant differences between time points.

Code:
library(tidyverse)
library(magrittr)

set.seed(12)

# Generate normal distributions of day1 values (similar range as scale and log-transformed from fig 6)
HC <- data.frame(day1 = rnorm(10, mean = 0, sd = 0.25),
group = "HC")
MECFS <- data.frame(day1 = rnorm(10, mean = 0, sd = 0.25),
group = "MECFS")

df <- rbind(HC, MECFS)

# Add sample identifier
df %<>% mutate(sampleID = paste0("P", 1:nrow(df)))

# Generate a normal distribution of delta values (log2FC) with a mean delta of 0.5
# Representing a metabolite where most samples had an increase
delta_HC <- rnorm(10, mean = 0.25, sd = 0.25)

# Generate a normal distribution of delta values (log2FC) with a mean delta of 0
# Representing a metabolite where the average difference was 0
delta_MECFS <- rnorm(10, mean = 0, sd = 0.25)

# Add to data frame
df$delta <- c(delta_HC, delta_MECFS)

# Plot distribution of deltas
ggplot(df,
aes(x = delta,
fill = group)) +
geom_histogram(bins = 5) +
facet_wrap(vars(group), ncol = 1)

# Generate day2 values (data is on log scale so just need to add FC)
df %<>% mutate(day2 = day1 + delta)

# Pivot wider to plot
df %<>% pivot_longer(cols = contains("day"),
names_to = "day")

# Create data frame for mean points
mean_df <- df %>%
group_by(group, day) %>%
summarize(value = mean(value))

# Plot
ggplot(df,
aes(x = day,
y = value)) +
geom_line(aes(group = sampleID,
color = group)) +
geom_point(data = mean_df,
aes(color = group,
shape = group),
size = 3)
 
So same SD in the logFCs, but HC ends up with a significant mean difference and ME/CFS doesn't.
I'm a bit confused about what you're trying to show. HC ends up with a significant difference because they do have a difference, while ME/CFS doesn't because they don't. Your plot has equal SDs for one metabolite, and a true difference between groups, so the plot seems like what we'd expect.

What it seemed to me that you were arguing before was that the ME/CFS group wasn't significant because of high variability in deltas compared to HC leading to low statistical power.

The authors considered this and showed that, at least on average over all metabolites, the variability in how much a metabolite changed was the same between groups.

Maybe it would have been better to check the delta variability of the specific metabolites that were significant in controls, to see if they were more variable in ME/CFS. Actually, yes, that makes more sense than looking at the distribution of all deltas' SDs, I think. I'm not sure if that's what you were getting at, but I agree, the distribution of SDs of deltas isn't the last word.
 
What it seemed to me that you were arguing before was that the ME/CFS group wasn't significant because of high variability in deltas compared to HC leading to low statistical power.
It would be because in the ME/CFS group you have more “evenness” between people whose level increases and decreases between time point, basically negating the mean difference. So variability in directionally rather than variance was what I’ve been trying to describe—that’s what I’m trying to show with simulating a normal distribution of log2FCs centered at 0. It would be overall variability that leads to all the metabolites ending up like this, as opposed to some proportion of metabolites with more consistent directionally leading to a mean difference (and some proportion of significant hits). I’m not sure if there’s a specific term for what I’m referring to so hard to describe
 
Maybe it would have been better to check the delta variability of the specific metabolites that were significant in controls, to see if they were more variable in ME/CFS.
It could be an additional issue, but the main problem would be the degree of “cancel-out-ness”, the centeredness around 0. It doesn’t need to be a larger spread so long as things cancel out

TLDR it is a intra group variability issue but not one that translates to variance necessarily (and not one that is being assessed by their supplemental analysis)

[Edit: sorry it’s hard to put things into words while dropping in to sketch out my thoughts on this in the midst of a big project, probably not doing myself a favor by trying to comment atm. The relevant assessment is whether the ME/CFS changes are actually purely random or if the slopes do reflect a change, just a very directional inconsistent one between participants. I’m trying to show that the supplemental assessment doesn’t answer that question but that’s the crucial question for being able to say “no change”, even weakly]
 
Last edited:
I think I understand, but it seems to me that this is just what we expect to see all the time.

I'll refer to this:
For instance, we could imagine some variable that has a huge amount of variability day to day in most people, like step count. If we do an intervention that should have no effect on that, like a sugar pill, there will still be large changes both up and down in different participants the next day. But they'll essentially be random changes that are non-significant when testing association with the intervention, so it might be a similar situation of a researcher interpreting that as a "lack of change in steps after placebo".

If one group gets a true effect intervention, while the other gets the placebo, then the plot of change in steps would look pretty similar to the plots from the study. The code you wrote just seems like the same thing as well - what you'd see when there's a true effect intervention that increases the mean in one group and not the other.

The "cancelling out" of opposite directions is essentially the null hypothesis we are testing for.
 
The "cancelling out" of opposite directions is essentially the null hypothesis we are testing for
For individual metabolites, yes. Though the question of “no change after exertion” is about all the comparisons, relying on the assumption that no-significant-metabolites is actually what you would expect to see if the only thing driving change between time points is random fluctuation. We’ve been assuming that’s automatically true, but that’s actually a new null hypothesis, albeit one that statistics isn’t particularly well poised to answer (I agree with earlier points about lack of significant change being weak evidence).

So the better question you can pose is whether there is something happening that would drive more directional variability than you would expect just by chance—causing some homeostatic switches to get flipped on and off under exertion with less consistency in than in controls, if you will. The supplemental SD plot would answer that question only if the magnitude of changes was wilder in ME/CFS than control.

But I don’t think that’s how we would expect that alternate explanation to manifest anyways, both because values themselves have been constrained by extensive normalization/scaling and also because even dysregulated homeostasis doesn’t allow things to fluctuate that far outside of a certain range. Sort of like a pendulum with a short string, where the healthy controls got a small push and all ended up in one direction, but ME/CFS got way bigger pushes and you’re seeing snapshots of where the wildly-swinging-pendulums ended up at one moment in time. So you don’t want to check SD of deltas (that’s what my code demonstrates), you want to check how things compare overall against truly random fluctuation in a healthy system (I’d have to do more coding).
 
Last edited:
Back
Top Bottom