Cardiopulmonary and metabolic responses during a 2-day CPET in [ME/CFS]: translating reduced oxygen consumption [...], Keller et al, 2024

The fact that so many people in both groups had increases in VO2 at VAT all the way up to 20% higher on day 2 (and a few even higher) makes me think there is a lot of natural day to day variation, and it would probably take at least more than a 20% decrease to have good specificity.

Would be good for someone with a background in exercise physiology to weigh in on this. Cardiopulmonary Exercise Test Methodology for Assessing Exertion Intolerance in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome writes
The 2-day CPET methodology is useful for assessing impaired recovery because CPET measures are readily reproduced in both healthy and diseased populations. Therefore, a failure to reproduce CPET measures on a subsequent test, despite peak effort on both tests, indicates a derangement of homeostasis.

Sure and there will be problems related to "cognitive function/cognitive PEM" that for example wouldn't be picked up, but at least for each individual there should be some consistency right? That is to say "are you feeling like you are experiencing more PEM on the first test vs second test" should tell us what this person is feeling to some degree. It could also help us understand whether the whole undertaking of going there, possibly by plane etc had a similarly exhausting effect as the 1st CPET or not.

I'm no statistician but surely there would be a way to analyse the data in a somewhat meaningful way (comparing average decline in pwME vs HC in workload at the ventilatory threshold and also looking at the average of "experience of PEM on 1st day vs 2nd day" seems unsuitable if people are interpreting PEM very differently and group differences would just arbirtraly average out such effects, but an analysis looking at 1 pwME vs 1 HC by taking to account the "experience of PEM on 1st day vs 2nd day" could possibly be sensible)?

It could also be helpful to know that we're getting useful data on HCs. Because unless you are using deconditioned controls, where muscle pain on the second day should be expected, you would expect HCs to feel good on both attemps right?



On a more philosophical note I am wondering about how sensible this argument is. These people are participating in this procedure essentially because they have described to be people that experience PEM in general (the problem here might lie that they have described to be experiencing PEM in general vs PEM following a CPET which this procedure hopes to somehow measure in some realted form) and because the procedure is supposedly supposed to measure exactly that (more precisely it measures the effects of physical exercise in the hopes of that somehow capturing something that is related to PEM). If the interpretation of what people are experiencing during the 2 different rounds is that different that it makes asking a question on their experience uninterpretable or doesn't result in usable data it appears to me one could have a dilemma i.e. "we believe this measures something related to PEM because this is what the people have said vs we can't ask them whether it measures something in relation to PEM because we cannot rely on their different interpretations of the PEM experience" how sensible is the procedure in the first place?

Essentially I’m not able to see how both statements below could make sense at the same time:
Person says he experiences PEM in day to day life-> Hope to measure effects of experiencing PEM via CPETs
but also
Cannot ask whether person experiences PEM at CPETs because interpretation of PEM is different for everyone

I have had a look at “Cardiopulmonary Exercise Test Methodology for Assessing Exertion Intolerance in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome” https://www.frontiersin.org/journals/pediatrics/articles/10.3389/fped.2018.00242/full which is a guideline for performing 2-day CPETs in ME/CFS by van Ness (the person who first published on this subject in ME/CFS) and others. The guideline is very much focused on “this test provokes PEM” and the authors state that “CPET also elicits a robust post-exertional symptom flare (termed, post-exertional malaise)”.

Unfortunately I didn't find any information on how this had been ensured or was somehow quantified (I may have simply missed it).

Other than that the authors do state the importance and give ways to ensure that patients are at their “usual rested levels” before the 1st CPET procedure, which if always adhered to would reduce my worries on people being exhausted more than usual going into the first test

To ascertain the magnitude of change in CPET2 due to CPET1, it is critical that the ME/CFS patient begin the test in a baseline state representative of the patient's well-rested capacity. Characteristics unique to ME/CFS patients require special pre-test preparations that should be addressed beginning as early as 2–3 weeks prior to a scheduled 2-day CPET. The objective is to minimize pre-fatigue and PEM in a patient who is preparing to travel in order to complete the 2-day CPET.

Pre-test Considerations

Factors such as travel to the test site, immediate pre-test (day of or even day before) paperwork that taxes cognitive function, and prolonged time in a common waiting area, even if seated, can all contribute to pre-test fatigue. Fatigue and PEM are exacerbated by physical, cognitive and emotional stressors (1), so every effort should be made to reduce such stressors where possible. Likewise, many ME/CFS patients experience hypersensitivity to light, noise, temperature, odors, and/or chemicals, so it is helpful to minimize environmental stimuli and maintain a generally low level of activity in the waiting area and testing environment.


Pretest directions/instructions should be in writing and given to the patient at least 1–2 weeks prior to arrival at a clinic. Included in these materials should be a clearly written pre-test checklist to assure that the patient adheres to pre-test preparation instructions (e.g., alcohol, caffeine, exercise and food restrictions prior to CPET, appropriate attire, etc.). Directions to the facility should include availability of disabled parking close to the building, and clear directions to the elevator or other lift assist as needed. Stairs (up and down) and long walks to the clinic should be avoided if possible as this will pre-fatigue the patient. It is reasonable to ask the patient prior to arrival if wheelchair assistance is indicated. Likewise, it is essential that the patient understands the importance of not becoming fatigued prior to the test, and plans travel to the test site with that in mind. When the test site is more than 1 h away, if feasible the patient should be encouraged to arrive the day before the scheduled test and spend the night locally. For some patients, 2 days of rest following air travel to a clinic may be necessary. It is essential that patients understand they should not drive a motor vehicle away from the clinic following either CPET, and plan accordingly. These recommendations may limit patient accessibility to testing, but should be considered to optimize quality of CPET data and patient safety.


Pre-test Forms/Questionnaires

Forms and questionnaires should be sent to the patient at least 2–3 weeks prior to a scheduled test. Completion of forms can be cognitively taxing for a person with ME/CFS and contribute to PEM, so sufficient time should be allowed for completion and return of forms to the clinic. In a clinic environment where a physician is present only part-time, prior arrangements are necessary to provide medical supervision during the 2-day CPET when testing a patient that meets criteria for high risk (7, 45). Similarly, sufficient time is necessary for the patient's physician to complete and return the referral form prior to testing the ME/CFS patient. Information provided to the patient should include explicit pretest instructions. Patients who experience cognitive impairment may be unable to process and respond quickly to copious or complex information, so providing simple, easily understood documentation helps improve adherence to pretest instructions. Paperwork that should be sent to the patient 2–3 weeks prior to a scheduled test may include the following:


Test-Day Considerations

-Seek to minimize time in the waiting area prior to preparations for a CPET. A place to recline or semi-recline is helpful for a waiting patient, or when reviewing or clarifying pretest paperwork and procedures with the patient.

-Provide water throughout testing, and following CPET2, electrolyte replacement beverages can be helpful. Many ME/CFS patients have orthostatic intolerance so maintaining hydration with fluid and electrolytes (e.g., coconut water, sport drink) following CPET2 is helpful for expediting recovery. There are a number of anecdotal reports of plasma volume or salt loading reducing recovery time. Patients may consider arranging with their physician for a prescription of 1 L of IV normal saline infusion following completion of the 2-day CPET. However, if possible, there should be no intervention between the two CPETs.
 

Later in the paper they say:
Test-retest measures of oxygen consumption and work that correspond with VAT are stable over time with the same test modality, and vary within about 7–12% in both healthy individuals (5, 6, 18, 28) and a number of other pathological conditions (21, 22).
 
Last edited:
Looking at the charts above, it does look like most of the people who decreased more than 20% are ME/CFS (looks like about 11/13 people). Maybe these are the most severe. I'll have to check how Bell Activity Score correlates with the difference in days.

Not a lot of correlation using Bell scale either. Lower scores indicate more severe disability.

BAS vs change in VO2 at VAT:
bas_vo2diff.png

BAS vs change in workload at VAT:
bas__wkld_diff.png

Correlations (only MECFS):
upload_2024-8-31_10-7-35.png

Correlations (full cohort):
upload_2024-8-31_10-10-39.png
 
I've tried to calculate the differences between CPET1 and CPET2 for each group and then compared them using a t-test. Here's what I got for the peak values:

Most of the effect sizes are quite small and none will be statistically significant if one were to correct for multiple tests.
upload_2024-9-8_12-28-40.png

Here are the values at the ventilatory threshold. Here the differences seem even smaller, especially for Work (wkld) and VO2.

upload_2024-9-8_12-31-29.png
 
I've tried to calculate the differences between CPET1 and CPET2 for each group and then compared them using a t-test. Here's what I got for the peak values:

Most of the effect sizes are quite small and none will be statistically significant if one were to correct for multiple tests.
View attachment 23124

Here are the values at the ventilatory threshold. Here the differences seem even smaller, especially for Work (wkld) and VO2.

View attachment 23125

Should this maybe be using percent decrease to account for different size bodies?

I didn't even really look at day 2 peak when I was analyzing because if they used scores even when people quit before hitting max, it's no longer a very objective measurement.
 
I didn't even really look at day 2 peak when I was analyzing because if they used scores even when people quit before hitting max, it's no longer a very objective measurement.
Yes good point. I still have to look at these criteria so the values I reported above used all the data and will probably be quite different once I restrict the analysis to those who met the required thresholds.
 
I still have to look at these criteria so the values I reported above used all the data and will probably be quite different once I restrict the analysis to those who met the required thresholds.
Had a look but it seems that surprisingly only 10 participants did not meet the maximum effort criteria which are described as follows:
Tree indices of exertion were measured during CPET to assess for maximum efort;
(1) respiratory exchange ratio (RER)≥1.10,
(2) attainment of heart rate greater than or equal to 85% of age-predicted maximum heart rate, and
(3) RPE≥17/20.

[...]
Attainment of two of three criteria is considered acceptable to determine that maximum effort was achieved in healthy individuals.
Here's how I implemented this in my code (using Python) - hopefully somebody can check and try to replicate.
Code:
df_original['HR_predicted'] = df_original['HR'] / (220 - df_original['age'])
df = df_original[df_original['Time_Point'] == 'max']

criteria = (
(df['RER'] >= 1.10).astype(int) +
(df['HR_predicted'] >= 0.85).astype(int) +
(df['RPE'] >= 17).astype(int)
) >= 2

excluded_participants = df[~criteria]['ParticipantID'].unique()
The results look like this. These are the 10 excluded participants, of which 8 where ME/CFS patients:
upload_2024-9-8_22-14-7.png

And here's an overview of how many participants did not reach one of the three criteria:

upload_2024-9-8_22-14-38.png

Because these were only 10 patients, excluding them did not have large effect on the effect size:

upload_2024-9-8_22-58-17.png

The authors probably excluded the 2 HC and the 2 MECFS that did not reach the criteria on day 1 (thus not excluding the 6 MECFS patients who did not met the criteria only on day 2). I've compared those results with those where all 10 patients who did meet the threshold were excluded:

upload_2024-9-8_22-58-24.png

So I think, all in all minor difference that do not really matter in this paper. Strange that they did not mention this in the paper.
 
Had a look but it seems that surprisingly only 10 participants did not meet the maximum effort criteria which are described as follows:

Here's how I implemented this in my code (using Python) - hopefully somebody can check and try to replicate.
Code:
df_original['HR_predicted'] = df_original['HR'] / (220 - df_original['age'])
df = df_original[df_original['Time_Point'] == 'max']

criteria = (
(df['RER'] >= 1.10).astype(int) +
(df['HR_predicted'] >= 0.85).astype(int) +
(df['RPE'] >= 17).astype(int)
) >= 2

excluded_participants = df[~criteria]['ParticipantID'].unique()
The results look like this. These are the 10 excluded participants, of which 8 where ME/CFS patients:
View attachment 23134

And here's an overview of how many participants did not reach one of the three criteria:

View attachment 23135

Because these were only 10 patients, excluding them did not have large effect on the effect size:

View attachment 23142

The authors probably excluded the 2 HC and the 2 MECFS that did not reach the criteria on day 1 (thus not excluding the 6 MECFS patients who did not met the criteria only on day 2). I've compared those results with those where all 10 patients who did meet the threshold were excluded:

View attachment 23143

So I think, all in all minor difference that do not really matter in this paper. Strange that they did not mention this in the paper.

Just checking the excluded participants, I got a couple things different. Number of observations for below .85 of HR predicted = 88. That was 38 MECFS and 22 HC. This was probably just missing PI-057 on D2, the only participant to have no HR value for day 2.

PI-043 was excluded for day 1, not day 2.

Exclusions for day 1 and day 2:
Screenshot from 2024-09-08 19-51-40.png Screenshot from 2024-09-08 19-51-55.png

Code I used:
Code:
import pandas as pd

cpet_data = pd.read_csv('cpet_clinical_data.tsv', sep='\t')

# Only max timepoint
cpet_data_max = cpet_data[cpet_data['Time_Point'] == 'max'].copy()

# Convert to numbers
cpet_data_max['HR'] = pd.to_numeric(cpet_data_max['HR'], errors='coerce')

# Get age-predicted max HR
cpet_data_max['Predicted_HR'] = 220 - cpet_data_max['age']

# Create columns for satisfying criteria
cpet_data_max['RER_include'] = cpet_data_max['RER'] >= 1.10
cpet_data_max['HR_include'] = cpet_data_max['HR']/cpet_data_max['Predicted_HR'] >= 0.85
cpet_data_max['RPE_include'] = cpet_data_max['RPE'] >= 17

# Create column for whether they satisfied at least two criteria
cpet_data_max['at_least_two_true'] = (cpet_data_max[['RER_include', 'HR_include', 'RPE_include']].sum(axis=1) >= 2)

# Dataframe for all rows that did not satisfy at least two criteria
exclusions_df = cpet_data_max[cpet_data_max['at_least_two_true'] == False][['ParticipantID', 'phenotype', 'Study_Visit', 'Time_Point', 'RER_include', 'HR_include', 'RPE_include']]

print(exclusions_df[exclusions_df['Study_Visit'] == 'D1'])
print(exclusions_df[exclusions_df['Study_Visit'] == 'D2'])
 
I've tried to calculate the differences between CPET1 and CPET2 for each group and then compared them using a t-test. Here's what I got for the peak values:

Most of the effect sizes are quite small and none will be statistically significant if one were to correct for multiple tests.
View attachment 23124

Here are the values at the ventilatory threshold. Here the differences seem even smaller, especially for Work (wkld) and VO2.

View attachment 23125

I get the same Cohen's D, but slightly different p values from you. Most are the same, but for example, for wkld, I got .09 instead of .08.

For max:
upload_2024-9-8_21-51-10.png

It's kind of a mess, but if you want to look at my code:

Code:
from numpy import var, mean, sqrt
from scipy import stats
from pandas import Series
import pandas as pd

def cohend(d1: Series, d2: Series) -> float:

    # calculate the size of samples
    n1, n2 = len(d1), len(d2)

    # calculate the variance of the samples
    s1, s2 = var(d1, ddof=1), var(d2, ddof=1)

    # calculate the pooled standard deviation
    s = sqrt(((n1 - 1) * s1 + (n2 - 1) * s2) / (n1 + n2 - 2))

    # calculate the means of the samples
    u1, u2 = mean(d1), mean(d2)

    # return the effect size
    return (u1 - u2) / s

cpet_data = pd.read_csv('cpet_clinical_data.tsv', sep='\t')

metrics = [
    'DBP',        # Diastolic Blood Pressure
    'HR',         # Heart Rate
    #'OUES',       # Oxygen Uptake Equivalent Slope
    'PETCO2',     # End-Tidal CO2
    'PETO2',      # End-Tidal O2
    'PP',         # Pulse Pressure
    'RER',        # Respiratory Exchange Ratio
    'RPE',        # Rating of Perceived Exertion
    'RPM',        # Pedal RPM
    'RPP',        # Rate Pressure Product
    'RR',         # Respiratory Rate
    'SBP',        # Systolic Blood Pressure
    'VCO2',       # Carbon Dioxide Production
    'VO2',        # Oxygen Consumption
    'VO2_HR',     # Oxygen Consumption per Heart Rate
    'VO2_t',      # Oxygen consumption (ml/min)
    'Ve_BTPS',    # Ventilation (BTPS)
    'Ve_VCO2',    # Ventilatory Equivalent for CO2
    'Ve_VO2',     # Ventilatory Equivalent for O2
    'Vt_BTPS_L',  # Tidal Volume (BTPS in Liters)
    'wkld',        # Workload
    'time_sec'
]

# Convert all metric columns to numeric, coercing errors
for metric in metrics:
    cpet_data[metric] = pd.to_numeric(cpet_data[metric], errors='coerce')

# Split the data by time points
time_points = ['AT', 'max', 'rest']

# Initialize an empty list to store differences
diff_data = []

for time_point in time_points:
    # Filter data for the current time point
    df_tp = cpet_data[cpet_data['Time_Point'] == time_point]
 
    # Split the data into Day 1 and Day 2
    day1 = df_tp[df_tp['Study_Visit'] == 'D1']
    day2 = df_tp[df_tp['Study_Visit'] == 'D2']
 
    # Merge day1 and day2 on ParticipantID
    merged = pd.merge(day1, day2, on=['ParticipantID', 'matched_pair', 'sex', 'Time_Point', 'race', 'phenotype'], suffixes=('_D1', '_D2'))
 
    # Calculate the absolute difference for each metric
    for metric in metrics:
        merged[metric + '_abs_diff'] = merged[metric + '_D2'] - merged[metric + '_D1']

    # Calculate the percentage difference for each metric
    for metric in metrics:
        merged[metric + '_pct_diff'] = ((merged[metric + '_D2'] - merged[metric + '_D1']) / merged[metric + '_D1']) * 100
 
 
    # Keep only relevant columns and add time point information
    diff_columns = ['ParticipantID', 'matched_pair', 'sex', 'Time_Point', 'phenotype'] + [metric + '_pct_diff' for metric in metrics]
    merged['Time_Point'] = time_point
    diff_data.append(merged)

# Combine the differences for all time points
cpet_diff = pd.concat(diff_data)



 
# List of absolute difference metrics
abs_diff_metrics = [metric + '_abs_diff' for metric in metrics]

# Dictionary to store DataFrames for each time point
time_point_dfs = {}

# Calculate differences, Cohen's d, and p-values for each time point
for time_point in time_points:
    # Filter data for the current time point
    tp_data = cpet_diff[cpet_diff['Time_Point'] == time_point]
 
    # Initialize lists to store results
    results = []
 
    for metric in abs_diff_metrics:
        # Get data for each group
        mecfs_data = tp_data[tp_data['phenotype'] == 'MECFS'][metric].dropna()
        hc_data = tp_data[tp_data['phenotype'] == 'HC'][metric].dropna()
 
        # Calculate means
        mecfs_mean = mecfs_data.mean()
        hc_mean = hc_data.mean()
        mean_diff = mecfs_mean - hc_mean
 
        cohens_d = cohend(mecfs_data, hc_data)
 
        # Perform t-test
        t_stat, p_value = stats.ttest_ind(mecfs_data, hc_data)
 
        # Store results
        results.append({
            'Metric': metric,
            'Mean_Difference': mean_diff,
            'Cohens_d': cohens_d,
            'p_value': p_value
        })
 
    # Convert results to DataFrame
    df = pd.DataFrame(results)
 
    # Sort by p-value
    df = df.reindex(df['p_value'].sort_values(ascending=True).index)
 
    # Store the DataFrame in the dictionary
    time_point_dfs[time_point] = df

at_diff = time_point_dfs['AT']
max_diff = time_point_dfs['max']
rest_diff = time_point_dfs['rest']

Edit: I'm confused about VO2 vs VO2_t. The code book says they are "Oxygen consumption (L/min)" and "Oxygen consumption (ml/min)", so shouldn't one be one just be 1000 times the other? Is VO2_t something else?
 
Last edited:
This was probably just missing PI-057 on D2, the only participant to have no HR value for day 2.
Yes that was the difference. I forgot to include him because he had no valid data for HR.

PI-043 was excluded for day 1, not day 2.
Yes I got the same result but made an error in writing it down in my table/overview.

I get the same Cohen's D, but slightly different p values from you
The different p-values might be due to me using:

t_value, p_value = stats.ttest_ind(difference_MECFS, difference_HC, equal_var=False)​

Setting equal_var = False is called a Welch test and is often considered a better default (it is the default in R) because it does not assume that the variance in both groups is the same. But I don't think it makes a big difference.

Thanks so much for checking and replicating. This is really helpful. I think we got pretty much the same results.

I plan to look at the correlations with the Bell scale today, will check if I get the same results as you.
 
Yes that was the difference. I forgot to include him because he had no valid data for HR.


Yes I got the same result but made an error in writing it down in my table/overview.


The different p-values might be due to me using:

t_value, p_value = stats.ttest_ind(difference_MECFS, difference_HC, equal_var=False)​

Setting equal_var = False is called a Welch test and is often considered a better default (it is the default in R) because it does not assume that the variance in both groups is the same. But I don't think it makes a big difference.

Thanks so much for checking and replicating. This is really helpful. I think we got pretty much the same results.

I plan to look at the correlations with the Bell scale today, will check if I get the same results as you.

Can confirm changing equal_var to False makes the p values the same.

I just wanted to check percent difference too, but not much more impressive. Workload at AT drops to last place. For max, I didn't remove the people who don't meet the criteria above.

Percent decrease at max:
upload_2024-9-9_8-26-13.png

Percent decrease at AT:
upload_2024-9-9_8-27-9.png

For the correlations of Bell score, I used Pearson, though that might not have been best as there are a couple outliers. Unfortunately, I didn't do absolute differences for this analysis, and don't have time right now, but here is Spearrman, with the same features from before:

MECFS only:
upload_2024-9-9_8-56-40.png

Full cohort:
upload_2024-9-9_8-56-52.png
 
Last edited:
@ME/CFS Skeptic Do you know what VO2_t is? It looks like it's a different ratio to VO2 for every person. For example VO2_t/VO2 for PI-002 is 75.00 at all timepoints and days. 100.91 for PI-003.

Edit: Oh, VO2_t is ml/min. And VO2 is that but normalized to weight, using kilograms. The codebook just doesn't have the correct label.
 
Last edited:
I just wanted to check percent difference too, but not much more impressive.
Thanks, got the same values as you.

EDIT: this is an error see: https://www.s4me.info/threads/cardi...on-keller-et-al-2024.39219/page-4#post-552976
I noticed that these average percentage changes are quite smaller than if you calculate the percentage change based on the reported means. This is what we used to for previous 2day CPET studies because these summary statistics were all that was reported.

So take for example Work at AT for the MECFS group: it goes from 51.2 on day 2 to 46.4 on day 2. In percentage that is a change of 9.4%. Based on the means you would expect that MECFS patients decrease by an average of 9.4%.

But if calculate the percentage change for each participant and then take the mean, the change is only 0.45%. That is a surprisingly big difference. I suppose it means that larges changes were seen in those who had large baseline score differences?

For the correlations of Bell score, I used Pearson, though that might not have been best as there are a couple outliers. Unfortunately, I didn't do absolute differences for this analysis, and don't have time right now, but here is Spearrman, with the same features from before:
Got the same results as you. I don't know which one is more appropriate (spearman or pearson) but they both are really small and non-significant, so I suppose the message is clear enough.
 
Last edited:
@ME/CFS Skeptic Do you know what VO2_t is? It looks like it's a different ratio to VO2 for every person. For example VO2_t/VO2 for PI-002 is 75.00 at all timepoints and days. 100.91 for PI-003.
I think that VO2 is VO2 divided per weight of the participant (so ml kg−1 min−1) while VO2_t is just the VO2 (ml/min), probably an error in the codebook.

There's still something about these two values that don't add up because they should result in the exact same effect sizes but they often don't, with some small differences. I do not have an explanation for this because it seems that they used the same weight for each participant on day1 and day2.
 
I think that VO2 is VO2 divided per weight of the participant (so ml kg−1 min−1) while VO2_t is just the VO2 (ml/min), probably an error in the codebook.

There's still something about these two values that don't add up because they should result in the exact same effect sizes but they often don't, with some small differences. I do not have an explanation for this because it seems that they used the same weight for each participant on day1 and day2.

Thanks for checking that. And yes, I was confused about the effect sizes, and my brain is kind of short circuiting trying to visualize it so I might be wrong, but I think effect sizes can be different for absolute difference, while for percentage difference they should be the same, which they are.

But if calculate the percentage change for each participant and then take the mean, the change is only 0.45%. That is a surprisingly big difference. I suppose it means that larges changes were seen in those who had large baseline score differences?

That is a large difference. Regression to the mean, maybe?
 
I've now recalculated with the correct comparison of ME/CFS patients but it is still the same large difference:

This calculation first takes the means, then expresses the change in means as a percentage
(day2_MECFS.mean() - day1_MECFS.mean()) / day1_MECFS.mean() * 100
Result: 9.4%

This one takes the percentage change for each participant first, then takes the mean
((day2_MECFS - day1_MECFS) / day1_MECFS).mean() * 100
Result: 0.08%
 
Anyway, this is a bit besides the point.

I plan to write a blog post about this because what the data shows is a quite different than what the paper reports and focuses on.

- I think the data show that there is no significant effect for any of the outcomes, whether you look at AT or max values, matched pairs or not, excluding patients who failed to meet the maximum value criteria or not.

- VO2 and workload differences do not correlate with severity as measured with the Bell Scale (they call it bell activity scale but it is really a measure of disability rather than activity).

- As the authors point out, the differences in the ME/CFS group are often slightly larger than in the control group but this was not statistically significant and possibly due to chance. If there is an effect it will likely be a very small one, one that can only be detected reliably with even larger sample sizes.

- Surprisingly the largest effect sizes were seen at peak values, not at the ventilatory threshold. This could be due to the criteria to determine peak effort which have been questioned in the literature. Some argue to use lactate measurements instead of relying on %predicted HR.

- The large overlap between ME/CFS patients and controls means that 2day CPET is not a useful measurement for ME/CFS disease activity or PEM.​
 
Back
Top Bottom