Cardiopulmonary and metabolic responses during a 2-day CPET in [ME/CFS]: translating reduced oxygen consumption [...], Keller et al, 2024

Murph · Jul 12, 2024

So, this was an attempt to replicate two-day cpet with a really big sample and the differences persisted but don't look very big.

Is that a fair summary? Is that why most of this thread is discussion of the inclusion criteria? People trying to explain why the differences don't look big?

I tried to read this paper myself but holy heck, is it long. and dense.

Another question: are these the same patients as in Hanson's huge metabolomics studies? she did two-day exercise challenges there too .

forestglip · Jul 12, 2024

Murph said:
So, this was an attempt to replicate two-day cpet with a really big sample and the differences persisted but don't look very big.

Is that a fair summary? Is that why most of this thread is discussion of the inclusion criteria? People trying to explain why the differences don't look big?

I tried to read this paper myself but holy heck, is it long. and dense.

Another question: are these the same patients as in Hanson's huge metabolomics studies? she did two-day exercise challenges there too .

I think the inclusion criteria line of thought was mostly just speculation by me. I wouldn't put too much stock in it. But yes, it was an attempt to explain why the differences seem smaller with deconditioned controls. Although, even though they are smaller, this is yet another study showing significant differences between groups, even in the (somewhat controversial) fitness matched pairing.

Side question: Have any of the controlled 2-day CPET studies shown greater reductions in the control group for VO2 or work at VAT, even non-significantly greater, or is it universally pwME showed greater reduction (including non-significant)?

SNT Gatchaman · Jul 12, 2024

Murph said:
I tried to read this paper myself but holy heck, is it long. and dense.

It would have been much better served as two papers, I think.

SNT Gatchaman said:
Well that paper was a game of two halves. I wished they'd stopped at the limitations section instead of wildly speculating on autonomic dysregulation and treatments thereof.

ME/CFS Science Blog · Aug 4, 2024

Just wanted to highlight that this paper, in contrast to previous exercise studies in ME/CFS, found no evidence of chronotropic incompetence or the inability to increase HR during the exercise test. They used 3 measures for this (%predicted HRmax, %HRRadjusted, and CTIpeak) and none seem to differ between patients and controls.

forestglip · Aug 29, 2024

The raw data is now on mapmecfs.org.

ME/CFS Science Blog · Aug 29, 2024

Already had a quick peak and it seems that there is quite a lot of overlap between the two groups if you plot the difference between day 1 an day 2. Here's for example the workload at the ventilatory threshold for the total sample, which in the past showed the biggest differences.

(EDIT: The previous plot I posted had the same data but with each datapoint was accidentally shown multiple times.)

Perhaps this is not the best statistical approach but if I do a t-test on the difference between day1 and day2, I get a p-value of 0.17 and a standardized effect size of 0.12.

Plan to take a closer look in the coming days. Kudos to the authors for putting the data online.

forestglip · Aug 29, 2024

I wanted to visualize what the differences in the matched pairs were individually, to see how many people in the ME group had a larger difference in the metric than their matched control. If pretty much all pwME had a larger drop in VO2 or another metric than their matched counterpart, that'd be pretty impressive and promising as a biomarker. Now, I did this really quickly and may have made some mistakes, but I manually checked a couple points and they matched the raw data. On first glance, it doesn't look like it's that cut and dry.

Each x value is a matched pair. So x=1 will have two dots, one for the HC (blue) and one for the MECFS (red) in the first VO2peak matched pair. The y value is the difference in the chosen metric at VAT between the two CPETs. I ordered by the HC's difference just to make it a little easier to look at, but this order isn't otherwise meaningful.

Lots of MECFS (red dots) both below and above their corresponding blue dot in all metrics that the study showed were significant for pwME but not for HC.

Edit: Here are the graphs using percentage difference between days instead of absolute difference.

Also, for 23.8% of ME/CFS patients (using full cohort), their VO2 at max increased on the second day. For 33.3% their VO2 at VAT increased.

Here is the Python code that can be run in Jupyter in case anyone wants to verify the code to make these graphs is correct:

Code:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from a TSV file
file_path = 'cpet_clinical_data.tsv'
df = pd.read_csv(file_path, sep='\t')

df_matched = df.dropna(subset=['matched_pair']).copy()

metrics = [
    'DBP',        # Diastolic Blood Pressure
    'HR',         # Heart Rate
    #'OUES',       # Oxygen Uptake Equivalent Slope
    'PETCO2',     # End-Tidal CO2
    'PETO2',      # End-Tidal O2
    'PP',         # Pulse Pressure
    'RER',        # Respiratory Exchange Ratio
    'RPE',        # Rating of Perceived Exertion
    'RPM',        # Pedal RPM
    'RPP',        # Rate Pressure Product
    'RR',         # Respiratory Rate
    'SBP',        # Systolic Blood Pressure
    'VCO2',       # Carbon Dioxide Production
    'VO2',        # Oxygen Consumption
    'VO2_HR',     # Oxygen Consumption per Heart Rate
    'VO2_t',      # Oxygen consumption (ml/min)
    'Ve_BTPS',    # Ventilation (BTPS)
    'Ve_VCO2',    # Ventilatory Equivalent for CO2
    'Ve_VO2',     # Ventilatory Equivalent for O2
    'Vt_BTPS_L',  # Tidal Volume (BTPS in Liters)
    'wkld',        # Workload
]

# Convert all metric columns to numeric, coercing errors
for metric in metrics:
    df_matched[metric] = pd.to_numeric(df_matched[metric], errors='coerce')

# Split the data by time points
time_points = ['AT', 'max', 'rest']

# Initialize an empty list to store differences
diff_data = []

for time_point in time_points:
    # Filter data for the current time point
    df_tp = df_matched[df_matched['Time_Point'] == time_point]
 
    # Split the data into Day 1 and Day 2
    day1 = df_tp[df_tp['Study_Visit'] == 'D1']
    day2 = df_tp[df_tp['Study_Visit'] == 'D2']
 
    # Merge day1 and day2 on ParticipantID
    merged = pd.merge(day1, day2, on=['ParticipantID', 'matched_pair', 'sex', 'Time_Point', 'race', 'phenotype'], suffixes=('_D1', '_D2'))
 
    # Calculate the percentage difference for each metric
    for metric in metrics:
        merged[metric + '_pct_diff'] = ((merged[metric + '_D2'] - merged[metric + '_D1']) / merged[metric + '_D1']) * 100
 
    # Keep only relevant columns and add time point information
    diff_columns = ['ParticipantID', 'matched_pair', 'sex', 'Time_Point', 'phenotype'] + [metric + '_pct_diff' for metric in metrics]
    merged['Time_Point'] = time_point
    diff_data.append(merged[diff_columns])

# Combine the differences for all time points
cpet_diff = pd.concat(diff_data)


# Define the metric you want to plot
metric_to_plot = 'PETCO2'  # Change this to the metric you want to plot

timepoint_to_plot = 'max' # Change this to the time point you want to plot [max, AT, rest]

# Filter the data for the chosen time point
timepoint_data = cpet_diff[cpet_diff['Time_Point'] == timepoint_to_plot].copy()

# Separate 'HC' phenotype data and sort it by the selected metric
hc_data = timepoint_data[timepoint_data['phenotype'] == 'HC']
hc_data_sorted = hc_data.sort_values(by=metric_to_plot + '_pct_diff')

# Create a mapping from matched_pair to its sorted index
matched_pair_order = {mp: i for i, mp in enumerate(hc_data_sorted['matched_pair'])}

# Reorder the 'matched_pair' column based on the sorted indices
timepoint_data['matched_pair_order'] = timepoint_data['matched_pair'].map(matched_pair_order)

# Set up the plot
plt.figure(figsize=(12, 8))

# Create a scatter plot for the 'AT' time point, ordered by matched_pair_order
sns.scatterplot(
    data=timepoint_data,                
    x='matched_pair_order',      
    y=metric_to_plot + '_pct_diff',  
    hue='phenotype',            
    palette={'MECFS': 'red', 'HC': 'blue'},
    s=100                        
)

# Customizing the plot
plt.xlabel('Matched Pair')
plt.ylabel(f'Percentage Difference in {metric_to_plot}')
plt.title(f'Percentage Difference in {metric_to_plot} for Time Point: {timepoint_to_plot}')


# Rotate x-axis labels to prevent overlap
plt.xticks(ticks=range(len(matched_pair_order)), labels=[int(mp) for mp in matched_pair_order.keys()], rotation=45, ha='right')

plt.legend(title='Phenotype', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()

plt.savefig(f'{metric_to_plot}_{timepoint_to_plot}_pct_diff_plot.png', bbox_inches='tight')

# Show the plot
plt.show()



#####
##### The following is just to make the percentage of MECFS at max that increased VO2.
#####

# Filter the original df for ME/CFS participants at the AT time point
mecfs_at_data = df[(df['phenotype'] == 'MECFS') & (df['Time_Point'] == 'max')]

# Split the data into Day 1 (D1) and Day 2 (D2)
day1 = mecfs_at_data[mecfs_at_data['Study_Visit'] == 'D1']
day2 = mecfs_at_data[mecfs_at_data['Study_Visit'] == 'D2']

# Merge Day 1 and Day 2 data on ParticipantID
merged_mecfs = pd.merge(day1, day2, on=['ParticipantID', 'Time_Point'], suffixes=('_D1', '_D2'))

# Calculate the percentage of ME/CFS participants where VO2 increased from D1 to D2
total_mecfs = merged_mecfs.shape[0]
mecfs_increased = merged_mecfs[merged_mecfs['VO2_D2'] > merged_mecfs['VO2_D1']].shape[0]

# Calculate the percentage
if total_mecfs > 0:
    percent_increased = (mecfs_increased / total_mecfs) * 100
else:
    percent_increased = 0

# Print the result
print(f"Percentage of ME/CFS participants with increased VO2 at max from D1 to D2: {percent_increased:.2f}%")

ME/CFS Science Blog · Aug 29, 2024

Interesting graphs @forestglip, the red ME/CFS dots seem scattered around the blue HC ones without a clear pattern.

Has anyone been able to replicate their calculation of effect sizes?

For example for VO2 (ml.kg−1.min−1) at maximal exercise, they report mean values for the ME/CFS group of 20.8 on day1 and 19.7 on day 2. They report an effect size for this of 0.33.

But when I tried to calculate this I got 0.218 using different methods described here:
https://real-statistics.com/students-t-distribution/paired-sample-t-test/cohens-d-paired-samples/

EDIT: or 0.448 when dividing the mean of the differences by the sd of the differences as described here:
https://stats.stackexchange.com/questions/598615/effect-size-for-paired-t-test

ME/CFS Science Blog · Aug 29, 2024

ME/CFS Skeptic said:
EDIT: or simply 0.448 when dividing the mean of the differences by the sd of the differences as described here:
https://stats.stackexchange.com/questions/598615/effect-size-for-paired-t-test

I've tried to use JASP, the statistical program Keller et al. used and it gave the same result as the method explained in the quote above.

If I calculate the mean and std I get the same results as in the paper so I don't think I've made an error in data extraction. Anyone who can explain the difference?

ME/CFS Science Blog · Aug 29, 2024

The authors wrote:

Tree indices of exertion were measured during CPET to assess for maximum effort; (1) respiratory exchange ratio (RER)≥1.10, (2) attainment of heart rate greater than or equal to 85% of age-predicted maximum heart rate, and (3) RPE≥17/20

I see that Heart Rate (HR) is included as a variable in the data, but no the percentage of age-predicted maximum heart rate. Did anyone find something about this in the paper or data, or how they might have calculated this?

ME/CFS Science Blog · Aug 29, 2024

Someone also pointed out to me that the effect sizes reported for VO2 (ml/min) are sometimes quite different than for VO2, (ml/kg-1 min-1).

For example In Table 3, for the matched pairs at the anaerobic threshold, ME/CFS patients had an effect size of 0.16 for VO2 (ml/min) but an effect size of 0.21 for VO2 (ml/kg-1 min-1), an increase of more than 30% that is unlikely due to rounding error. These are the same measures but the latter is standardised for weight. In the dataset provided, the weight inserted for each participant is the same on day1 as on day2, so this cannot explain the difference.

forestglip · Aug 29, 2024

ME/CFS Skeptic said:
Someone also pointed out to me that the effect sizes reported for VO2 (ml/min) are sometimes quite different than for VO2, (ml/kg-1 min-1).

For example In Table 3, for the matched pairs at the anaerobic threshold, ME/CFS patients had an effect size of 0.16 for VO2 (ml/min) but an effect size of 0.21 for VO2 (ml/kg-1 min-1), an increase of more than 30% that is unlikely due to rounding error. These are the same measures but the latter is standardised for weight. In the dataset provided, the weight inserted for each participant is the same on day1 as on day2, so this cannot explain the difference.

Are the reported mean metric values correct? Looking at Table 3 in the study at the four values that represent the full cohort of controls for those two metrics you mentioned on both days: 12.7, 11.8 for standardized to weight and 960.0, 934.5 for without weight. The percentage decrease for the standardized to weight metric is 7.0% and non-standardized is 2.7%. These percentages should be identical, right?

Nightsong · Aug 29, 2024

age-predicted maximum heart rate

Haven't been following this thread but it was my understanding that age-predicted maximum HR is usually calculated based on standard formulae (often just 220 - age) - below are some relevant snippets from exercise physiology/testing texts (L Wasserman's Principles of Exercise Testing and Interpretation, R ACSM's guidelines for Exercise Testing):

Wasserman_Principles_Exercise_Testing_and_Interpretation__PeakHR.jpg

forestglip · Aug 29, 2024

forestglip said:
Are the reported mean metric values correct? Looking at Table 3 in the study at the four values that represent the full cohort of controls for those two metrics you mentioned on both days: 12.7, 11.8 for standardized to weight and 960.0, 934.5 for without weight. The percentage decrease for the standardized to weight metric is 7.0% and non-standardized is 2.7%. These percentages should be identical, right?

I just checked all means for those two metrics (VO2 and VO2/kg). The only one that didn't match up with my calculation was full cohort of controls on day 1. I got 12.18 instead of 12.7. This shouldn't affect the effect sizes you were talking about, though @ME/CFS Skeptic .

forestglip · Aug 30, 2024

I made some charts to see if there is any obvious indicator that decreases in CPET metrics correlate to deconditioning/fitness. I charted using VO2peak as the deconditioning metric, as well as percentage of 24 hours lying or sitting, number of hours in bed during 24 hr day, and BMI. Greater drops in performance on VO2 or workload are lower on the chart.

VO2peak on day 1 vs. change in VO2 at VAT

VO2peak on day 1 vs. change in workload at VAT

Hours in bed vs. change in VO2 at VAT

Hours in bed vs. change in workload at VAT

Time reclined vs. change in VO2 at VAT

Time reclined vs. change in workload at VAT

BMI vs. change in VO2 at VAT

BMI vs. change in workload at VAT

Correlations between these variables for the full cohort of MECFS and HC:

Edit: I'm not completely sure what the "percentage of 24 hours lying or sitting" (q_reclined) metric means. How can it be 0% for so many people if humans normally require sleep? If it doesn't include time spent sleeping, how can it be 100% of the day in others? Maybe it's percentage of waking hours, not percentage of 24 hours.

forestglip · Aug 31, 2024

I checked a couple other potential fitness metrics (VO2 peak normalized to weight on day 1 and VO2 at VAT on day 1) against the difference metrics, but nothing stood out to me there either.

I don't see anything here that makes me think deconditioning is associated with a decrease on second day CPET. (Edit: Well, within this one study. Maybe compared to non-deconditioned control groups from other 2-day CPET studies, the small mean effect in the controls here might provide some evidence of that.) (Edit2: Maybe a small correlation here on some of the deconditioning vs. difference metrics, but with a high p/q value, so should be looked at with a larger, more varied sample.)

Though I also don't see good evidence here that 2-day CPET is useful for classifying these two groups of ME/CFS and sedentary controls.

The fact that so many people in both groups had increases in VO2 at VAT all the way up to 20% higher on day 2 (and a few even higher) makes me think there is a lot of natural day to day variation, and it would probably take at least more than a 20% decrease to have good specificity.

Looking at the charts above, it does look like most of the people who decreased more than 20% are ME/CFS (looks like about 11/13 people). Maybe these are the most severe. I'll have to check how Bell Activity Score correlates with the difference in days.

---

Maybe a cohort that includes patients mild through severe would show a clearer effect, though that would be difficult to do for obvious reasons.

Would also be good to have maybe at least a week of actigraphy data to get more accurate data on deconditioning in HC and disease severity in ME/CFS.

Also, as I think I said elsewhere, maybe the second test should be done after 48 or 72 hours, instead of 24, to be sure most patients are experiencing PEM.

EndME · Aug 31, 2024

The easiest thing for such studies to do would in my eyes be to ask patients "are you currently experiencing PEM" and rate it on a scale of 1-10 (the scale is a bit arbitrary and someone probably has a better idea, but the idea would be to get some notion for where the patients were during the 1st vs 2nd time) during the first and second CPET. I'm not sure why people aren't doing this.

Wouldn't this be the most straightforward and obvious thing to do?

ME/CFS Science Blog · Aug 31, 2024

EndME said:
Wouldn't this be the most straightforward and obvious thing to do?

I think the problem might be that people would interpret PEM very differently.

ME/CFS Science Blog · Aug 31, 2024

forestglip said:
The fact that so many people in both groups had increases in VO2 at VAT all the way up to 20% higher on day 2 (and a few even higher) makes me think there is a lot of natural day to day variation, and it would probably take at least more than a 20% decrease to have good specificity.

Yes I suspect the data refutes rather than validates previous finding on 2-day CPET.

Haven't been able to analyse everything but it looks like the authors did two types of significance tests:

They looked at changes over time (from day 1 to day 2) in each group separately and calculated an effect size for this. Then if the effect was significant in the ME/CFS but not or less so in the control group, they highlighted this in the paper and abstract.
In tables 2 and 3 they also tested between group differences. The legend of tables 2 and 3 for example say: 'a p ≤ 0.05, aa p ≤ 0.01 between groups for CPET-1, b p ≤ 0.05, bb p ≤ 0.01 between groups for CPET-2.' But it seems that these test the difference for each day separately rather than the difference between days.

So in my view the test that really matters is one that compares the differences (from day 1 to day 2), between groups (ME/CFS versus controls). I don't think they have done this and a independent t-test of these difference showed a really small effect that was not significant for VO2 and workload.

I'm not sure of a t-test of the differences would be the best approach. One alternative would be a ANCOVA of day 2 CPET values that controls for day 1 CPET as a covariate.

EndME · Aug 31, 2024

ME/CFS Skeptic said:
The problem might be that people would interpret PEM very differently.

Sure and there will be problems related to "cognitive function/cognitive PEM" that for example wouldn't be picked up, but at least for each individual there should be some consistency right? That is to say "are you feeling like you are experiencing more PEM on the first test vs second test" should tell us what this person is feeling to some degree. It could also help us understand whether the whole undertaking of going there, possibly by plane etc had a similarly exhausting effect as the 1st CPET or not.

I'm no statistician but surely there would be a way to analyse the data in a somewhat meaningful way (comparing average decline in pwME vs HC in workload at the ventilatory threshold and also looking at the average of "experience of PEM on 1st day vs 2nd day" seems unsuitable if people are interpreting PEM very differently and group differences would just arbirtraly average out such effects, but an analysis looking at 1 pwME vs 1 HC by taking to account the "experience of PEM on 1st day vs 2nd day" could possibly be sensible)?

It could also be helpful to know that we're getting useful data on HCs. Because unless you are using deconditioned controls, where muscle pain on the second day should be expected, you would expect HCs to feel good on both attemps right?

ME/CFS Skeptic said:
I think the problem might be that people would interpret PEM very differently.

On a more philosophical note I am wondering about how sensible this argument is. These people are participating in this procedure essentially because they have described to be people that experience PEM in general (the problem here might lie that they have described to be experiencing PEM in general vs PEM following a CPET which this procedure hopes to somehow measure in some realted form) and because the procedure is supposedly supposed to measure exactly that (more precisely it measures the effects of physical exercise in the hopes of that somehow capturing something that is related to PEM). If the interpretation of what people are experiencing during the 2 different rounds is that different that it makes asking a question on their experience uninterpretable or doesn't result in usable data it appears to me one could have a dilemma i.e. "we believe this measures something related to PEM because this is what the people have said vs we can't ask them whether it measures something in relation to PEM because we cannot rely on their different interpretations of the PEM experience" how sensible is the procedure in the first place?

Essentially I’m not able to see how both statements below could make sense at the same time:
Person says he experiences PEM in day to day life-> Hope to measure effects of experiencing PEM via CPETs
but also
Cannot ask whether person experiences PEM at CPETs because interpretation of PEM is different for everyone

Cardiopulmonary and metabolic responses during a 2-day CPET in [ME/CFS]: translating reduced oxygen consumption [...], Keller et al, 2024

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Moderator

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)