Properties of measurements obtained during cardiopulmonary exercise testing in individuals with ME/CFS, 2020, Davenport et al

Andy

Retired committee member
Full title: Properties of measurements obtained during cardiopulmonary exercise testing in individuals with myalgic encephalomyelitis/chronic fatigue syndrome
Background: Diminished cardiopulmonary exercise test (CPET) performance indicates the physiological basis for reduced capacity for activities of daily living and work. Thus, it may be a biomarker for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS).

Objective: To determine statistical properties of cardiac, pulmonary, and metabolic measurements obtained during CPET in people with ME/CFS.

Methods: Fifty-one females with ME/CFS and 10 sedentary females with similar age and body mass received cardiac, pulmonary, and metabolic measurements during 2 CPETs separated by 24 hours. Two-way analysis of variance and effect size calculations (Cohen’s d) were used to assess the magnitude and statistical significance of differences in measurements between groups. Reliability of CPET measurements was estimated using intraclass correlation coefficients (formula 2,1; ICC2,1). Responsiveness of CPET measurements was assessed using minimum detectable change outside the 95% confidence interval (MDC95) and coefficients of variation (CoV).

Results: CPET measurements demonstrated moderate to high reliability for individuals with ME/CFS. Comparing subjects with ME/CFS and control subjects yielded moderate to large effect sizes on all CPET measurements. MDC95 for all individuals with ME/CFS generally exceeded control subjects and CoVs for CPET measurements were comparable between groups.

Conclusions: CPET measurements demonstrate adequate responsiveness and reproducibility for research and clinical applications.
Paywall, https://content.iospress.com/articles/work/wor203170
Sci hub, https://sci-hub.tw/10.3233/WOR-203170
 
This paper looks as though it was missed by many. From the abstract it seems important for confirming that there are real differences in CPET performance.

The Sci Hub link isn't working. @Snow Leopard and others with an interest in exercise physiology - what did you think? What measurements looked most useful?
 
Thanks @mango.
I can't see the figures on the charts and I've run out of brain power. But the differences from one test to the next aren't looking enormous.
Although there was a general decrease in CPET measurements obtained at peak exertion and VAT for subjects with ME/CFS, only workload demonstrated a significant group * test effect (p < 0.01).
Screen Shot 2020-10-01 at 3.05.27 PM.png

So, looking at ventilatory anaerobic threshold:
VO2 increased in controls from test 1 to test 2, and decreased in ME/CFS participants.
workload increased slightly in controls from Test 1 to test 2 (58.0 to 63.5). In ME/CFS participants, it decreased from 49.5 to 44.1.
heart rate increased in controls, while it decreased in ME/CFS
systolic BP increased in controls and decreased in ME/CFS

So the findings do seem to support the idea of a drop in performance at ventilatory threshold in ME/CFS.

But I found the paper's focus on test-re-test reliability a bit strange.
CPET measurements in this study largely demonstrated at least moderate 24-hour test-retest reliability in individuals with ME/CFS.
Despite these remaining unanswered questions, CPET measurements of cardiac, pulmonary, and metabolic characteristics appear to have sufficient test-retest reliability to be considered as a clinical evaluation and as an endpoint in future clinical trials.

Surely the point of the study was to prove that things change on re-test in ME/CFS? As far as I can see, they didn't make the people with ME/CFS do two sets of two CPETs.

I don't know why there wasn't a graph of the percentage changes from test 1 to test 2 by individual for each parameter.
 
Last edited:
Non sci-hub link: https://workwellfoundation.org/wp-c...ned-during-CPET-in-individuals-with-MECFS.pdf

@Snow Leopard and others with an interest in exercise physiology - what did you think? What measurements looked most useful?

I could have sworn I have commented on this already, but I can't find another thread!

There is clearly an error in the manuscript, with figure 2. (B) not agreeing with the data in Table 1.
I have asked Todd Davenport to clarify, roughly once per month since July 2nd, and he keeps saying, oh, we're busy, we'll get around to looking into it etc. I am not impressed, if my manuscript had a major error like this, I'd want to correct it ASAP.

Since I do not know if the data is correct, I cannot comment.

If any readers have a relationship with any of the authors of this manuscript, please kindly ask them to fix it!
 
Thanks @Snow Leopard, it certainly helped being able to see the figures. I see they are in fact of differences between the Day 1 test and the Day 2 test - so I had that wrong in my previous comment. I see what you mean about the charts not matching the table. No wonder I was thoroughly confused in my brain fogged state yesterday.

Figure 1 b shows peak workload. It looks as though mean workload didn't change much for the sedentary group and increased for the ME/CFS group.

Screen Shot 2020-10-02 at 10.03.13 AM.png

And yet, in table 1, the mean peak workload went up for controls and down for ME/CFS.

Screen Shot 2020-10-02 at 10.09.40 AM.png




And for workload at ventilatory threshold, in Figure 2b the workload drops in the controls and also drops in the ME/CFS people. (It's still hard to see the chart, but all the figures on the y axis are negatives there, with 0.0 above each band.)

Screen Shot 2020-10-02 at 10.06.12 AM.png

And yet here's the results in Table 1:
..................................................Controls 1 and 2. ............. ME/CFS 1 and 2.
Screen Shot 2020-10-02 at 10.16.54 AM.png

The table says workload at ventilatory threshold went up for controls and down for ME/CFS.

So, it's not even as though it's just a problem of (Test1 - Test2) vs (Test2 - Test1) i.e. the positive and negatives are inverted. The data in the charts bears no relationship to the data in the table.

The x axis data looks wrong too. If you look at Figure 2b, the mean workload at ventilatory threshold is around 80 for ME/CFS, but in Table 1 it's more like 50.

I have absolutely no idea what to make of all that, other than this paper is not the one to pull out to try to make the case for a consistent and abnormal decline in work rate from Day 1 CPET to Day 2 CPET.

I agree with @Snow Leopard, this paper needs some attention. It really harms Workwell's credibility.
 
Last edited:
Some comments

Strange study design

Leaving aside the issue of some table data not matching the figures, the study has a weird design.

The group has spent the last decade focusing on the difference in ME/CFS patients between maximum exercise tests on day two compared with day one. Now, this study aims to show that day one results can be reproduced on day 2.

Unsurprisingly, it wasn't a huge success. At peak exertion, controls demonstrated a very high test-test reliability (ICC values 0.834-0.990) while ME/CFS subject retest reliability was only moderate to high (0.631-0.871). At ventilatory (anaerobic) threshold, all measurements showed moderate to high real reliability for patients and controls.

After demonstrating that there is a falloff in performance on day two and a huge symptom flare, if you really want to demonstrate the reliability/reproducibility of the test, you would need to give patients the chance to completely recover from the first test before taking the second. Or, as the authors put it in their discussion section, "Future studies… In individuals with ME/CFS might give consideration to allowing for complete recovery to baseline symptomatic status".

I just wonder if the author is designed and conducted the study with one aim in mind and perhaps the Journal said they wanted a different kind of study. I just can't make sense of it otherwise.

But study findings do show a fall-off at Day 2 ventilatory threshold.

Perhaps we should just look at the study as a conventional day one versus day exercise test comparing patients and sedentary controls. Sedentary was defined as exercising to the point of perspiration once a week or less. Which is better than a lot of studies, but I would expect a lot of the population is more sedentary than that.

Assuming that the table is correct, the key finding is Vat workload
Comparing the change between day one and day two for patients versus controls. The only significant difference was for workload at ventilatory threshold (p <0.001). Which I think confirms previous findings but perhaps with a lower p value due to the higher sample size (51 patients).

DA30CEA9-CDA1-44CF-BD68-4CEA81788DD2.jpeg

Even so, based on standard deviations, the effect size for patients versus controls is moderate at best (probably a bit over 0.5). That still doesn't compute with the enormous difference in energy levels experienced in PEM triggered by exertion vastly low than by exercising to exhaustion as in the study protocol. The effect size for fatigue and other symptoms is much, much bigger than that (due to migraines I can't dig out the papers but I'm sure someone could find the data easily).
 
Last edited:
Some comments

Strange study design

Leaving aside the issue of some table data not matching the figures, the study has a weird design.

The group has spent the last decade focusing on the difference in ME/CFS patients between maximum exercise tests on day two compared with day one. Now, this study aims to show that day one results can be reproduced on day 2.

Unsurprisingly, it wasn't a huge success. At peak exertion, controls demonstrated a very high test-test reliability (ICC values 0.834-0.990) while ME/CFS subject retest reliability was only moderate to high (0.631-0.871). At ventilatory (anaerobic) threshold, all measurements showed moderate to high real reliability for patients and controls.

After demonstrating that there is a falloff in performance on day two and a huge symptom flare, if you really want to demonstrate the reliability/reproducibility of the test, you would need to give patients the chance to completely recover from the first test before taking the second. Or, as the authors put it in their discussion section, "Future studies… In individuals with ME/CFS might give consideration to allowing for complete recovery to baseline symptomatic status".

I just wonder if the author is designed and conducted the study with one aim in mind and perhaps the Journal said they wanted a different kind of study. I just can't make sense of it otherwise.

But study findings do show a fall-off at Day 2 ventilatory threshold.

Perhaps we should just look at the study as a conventional day one versus day exercise test comparing patients and sedentary controls. Sedentary was defined as exercising to the point of perspiration once a week or less. Which is better than a lot of studies, but I would expect a lot of the population is more sedentary than that.

Assuming that the table is correct, the key finding is Vat workload
Comparing the change between day one and day two for patients versus controls. The only significant difference was for workload at ventilatory threshold (p <0.001). Which I think confirms previous findings but perhaps with a lower p value due to the higher sample size (51 patients).

View attachment 12108

Even so, based on standard deviations, the effect size for patients versus controls is moderate at best (probably a bit over 0.5). That still doesn't compute with the enormous difference in energy levels experienced in PEM triggered by exertion vastly low than by exercising to exhaustion as in the study protocol. The effect size for fatigue and other symptoms is much, much bigger than that (due to migraines I can't dig out the papers but I'm sure someone could find the data easily).

Could the weird design be due to the fact that you can't prove a negative i.e. you must test a positive. In this case they try to test the (+ve) hypothesis that day 2 performance will be the same as day 1.

If I'm talking c--p here then feel free to point it out!
 
Could the weird design be due to the fact that you can't prove a negative i.e. you must test a positive. In this case they try to test the (+ve) hypothesis that day 2 performance will be the same as day 1.

The null hypothesis is that there would be no difference in performance between the two days.

This contrasts with a hypothesis of difference: either that performance would increase or decrease on the second day. (this is a two-tailed hypothesis - if you only want to test for decreased performance, this would be a single tailed hypothesis)

The authors state:
One long-standing tenet of CPET measurements is they have high intertest reliability [4, 5, 7, 8, 10, 27, 32–37], which is supported by findings from sedentary individuals in this study. These data suggest that CPET measurements ordinarily have a low error variance.

The authors discuss whether certain statistical measures of test-retest reliability are relevant and suggest the standard method for calculation of "minimum detectable change" (MDC) is flawed as a result:

Although challenging for CPET reliability studies in ME/CFS, the observation of measurement deviation between days actually may be clinically important when using a methodology like CPET that is known to ordinarily demonstrate a low error variance. This raises the importance of absolute reliability measures, such as CoV, which are less sensitive to heterogeneity in test performance than relative reliability measures, such as ICC. In this study, CoV were generally comparable between groups on Test #1 and Test #2, without a clear pattern that within-groups heterogeneity in performance affected one group more than the other.

The "strange design" comments relate to the fact that all prior studies have found a decrease in performance on the second day (workload at ventilatory threshold), whereas this has not been found in control participants (most were healthy, but there were also controls with other diseases). The question is, is the level of change simply normal variation, or does it reflect pathology that should be detected? Hence a determining "minimum detectable change" would favour a different study design. Instead of comparing patients to controls on the 2 day CPET, patient performance on the 2 day CPET should be compared to longer time frames where the patients have had time to recover. This would limit the any within-group heterogeneity biases due to limited sample size that limited the relevance of the statistical analysis of MDC in this study.

edit - additional quote from the authors:

Perhaps an ideal study design for reliability analyses would involve application of a test in a large number of subjects that are not expected to change in clinical presentation. However, people with ME/CFS are known to vary symptomatically, functionally, and physiologically between days on a two-day CPET paradigm [13, 14, 16, 17, 28, 39, 40]. Future studies regarding the reliability of CPET measurements in individuals with ME/CFS might give consideration to allowing for complete recovery to baseline symptomatic status, or could involve individuals with another fatiguing health condition that has stable reproducibility with respect to CPET measures. In addition, larger prospective datasets from future multicenter studies involving standardized CPET methodology [15] and a priori sample size calculations [41] may be compared against the findings of this study.



Lastly (and unfortunately), based on the rest of the statements in the discussion section, I don't believe the authors of this study really understand the relationship between effort perception, afferent feedback, cortical motor drive and ventilatory drive and metabolic factors. Hence they're unlikely to build upon this finding in a meaningful way until they dig into this more deeply.
 
Last edited:
The null hypothesis is that there would be no difference in performance between the two days.

This contrasts with a hypothesis of difference: either that performance would increase or decrease on the second day. (this is a two-tailed hypothesis - if you only want to test for decreased performance, this would be a single tailed hypothesis)

The authors state:


The authors discuss whether certain statistical measures of test-retest reliability are relevant and suggest the standard method for calculation of "minimum detectable change" (MDC) is flawed as a result:



The "strange design" comments relate to the fact that all prior studies have found a decrease in performance on the second day (workload at ventilatory threshold), whereas this has not been found in control participants (most were healthy, but there were also controls with other diseases). The question is, is the level of change simply normal variation, or does it reflect pathology that should be detected? Hence a determining "minimum detectable change" would favour a different study design. Instead of comparing patients to controls on the 2 day CPET, patient performance on the 2 day CPET should be compared to longer time frames where the patients have had time to recover. This would limit the within-group heterogeneity that limited the statistical analysis of MDC in this study.

Lastly (and unfortunately), based on the rest of the statements in the discussion section, I don't believe the authors of this study really understand the relationship between effort perception, afferent feedback, cortical motor drive and ventilatory drive and metabolic factors. Hence they're unlikely to build upon this finding in a meaningful way until they dig into this more deeply.

Thanks @Snow Leopard maybe need a cup of tea and a clear head before I re-read this!
 
Thanks @Snow Leopard maybe need a cup of tea and a clear head before I re-read this!

But wait, there is more:

Since Davenport et al. was attempting to replicate the "Minimum detectable change"/"Smallest real difference" statistical analysis as proposed in the following study, I provide some further analysis to cast doubt on the appropriateness of the method:

https://www.s4me.info/threads/an-analysis-of-2‐day-cardiopulmonary-exercise-testing-to-assess-unexplained-fatigue-in-gwi-2020-falvo-et-al.16729/#post-302233

It makes no sense to apply the MDC/SRD analysis to the ME/CFS group and the control sample size was once again too small (N=10) for this to be statistically meaningful.
 
Last edited:
The data in this paper (Davenport et al. 2020) is the same as the data reported by Snell et al.2013.
Discriminative validity of metabolic and workload measurements for identifying people with chronic fatigue syndrome - PubMed (nih.gov)

This was acknowledges in this 2022 meta-analysis by Franklin et al.
Review - Repeated maximal exercise tests of peak oxygen consumption in people with ME/CFS: a systematic review and meta-analysis, Franklin & Graham, 2022 | Science for ME (s4me.info)

upload_2024-9-15_20-37-20-png.23401


The data matches almost perfectly except for a major difference for workload at the ventilatory threshold.

- Snell 2013 reported a difference from 49.51W to 22.2W in the ME/CFS group, which is an enormous effect size (cohen d of 2.2).​

- Davenport 2020 reported a difference from 49.5W to 44.1W which seems more realistic.​
So perhaps the 22.2 was an error of some sort. Franklin said he asked the research team but that they were unable to clarify which I find concerning.

Also unfortunate that the Davenport et al. paper does not make it clear that this is the same data that they reported previously.
 
Back
Top Bottom