Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Barry · Oct 4, 2019

Esther12 said:
If we could be confident that their 'fatigue' measures were measuring fatigue then combining their fatigue measure would presumably be okay

Except that as @Lucibee points out, they seem to combine absolute measures with delta measures across whole groups of people, with little heed to getting the aggregation right (cannot recall the full details now).

I think it goes like this by way of example (please correct me if I've got this wrong Lucibee):

Take an individual's weight, then later their change in weight.
Aggregate the two to arrive at their new weight.
Do this to lots of people within a group.
Run statistical analysis on group data.
I believe this is is valid.

The invalid case:

Take an individual's weight, then later their change in weight.
Do this to lots of people within a group.
Aggregate the group's weights.
Aggregate the group's changes in weights.
Run statistical analysis on group data.

Is this how it goes Lucibee?

Lucibee · Oct 4, 2019

Barry said:
Is this how it goes Lucibee?

If by "take an individual's weight" you mean, "ask individual how much they think they weigh" then sort of.

They didn't look at individual change in 'weight' though. All data seems to be change from baseline to endpoint in the group means. The "mean difference" they refer to is the average of the differences in group means (mean difference in means).

eta: the individual patient data review would have allowed them to look at individual changes and to aggregate them by group, but not this review.

Esther12 · Oct 4, 2019

The mention of IPD made me wonder about these bits of the new review, seeing as the IPD protocol has been withdrawn:

A project aimed at undertaking
IPD analyses of the studies included in this review has started, and
should shed new light on the aggregate level analyses presented
here.

Larun 2014
LarunL, Odgaard-JensenJ, BrurbergKG, ChalderT,
DybwadM, Moss-MorrisRE, et al. Exercise therapy for
chronic fatigue syndrome (individual patient data).
Cochrane Database of Systematic Reviews 2014, Issue 4. [DOI:
10.1002/14651858.CD011040]

Dolphin · Oct 4, 2019

Esther12 said:
The mention of IPD made me wonder about these bits of the new review, seeing as the IPD protocol has been withdrawn:

I just checked the 2017 text and the same wording was in it:

A project aimed at undertaking IPD analyses of the trials included in the present review has been initiated, and when the IPD analyses are presented, they are likely to shed some new light on the aggregate level analyses presented in the current systematic review.

so they may simply have forgotten to remove it.

Esther12 · Oct 4, 2019

Snow Leopard said:
It is notable that the main conclusion which was a sticking point for David Tovey, namely downgrading the evidence from "probably" to "may" and from "moderate" to "low-moderate" has not made it into the revised article. (See the FOI correspondence on 29th of May)

I suggest this is a point of contention that we can leverage.

This is also an indication that the new person is even worse than the last.

NelliePledge · Oct 4, 2019

Esther12 said:
This is also an indication that the new person is even worse than the last.

Or simply that is a lot easier to be frank when you are leaving a job than when you have just taken it on.

MSEsperanza · Oct 5, 2019

Esther12 said:
eg: Talking about PACE not being adequately controlled to account for the biases likely to afflict their primary outcomes, and it therefore being questionable whether it should be classed as an RCT, is one thing.

But just saying it's not an RCT, even though participants were randomized to four groups, is something that goes against the assumptions of many researchers and so could be interpreted as showing criticism of PACE is unreasonable or ill-informed. When there are so many researchers who view poor quality work as acceptable and assume ME/CFS patient criticism of trials like PACE is driven by our ideological opposition to psychologically informed treatments, it's worth trying to avoid any potential misunderstandings.

Thank you for explaining. I agree it's worth trying to avoid misunderstandings.

I rather shouldn't post when I don't feel up to it, but just leave two more thoughts here today:

I think there is no need to be afraid of the (reasonable/ high?) standards we apply to research or other scientific work if we apply them to all scientific work, not only the psychologizationers'. It seems to me we have a problem though if some people involved in otherwise valuable ME advocacy don't apply the same standards to all scientific work or their own research. This is something I would like to discuss more (perhaps not in public), but am not able to contribute at the moment and the near future. Having said this, I very much appreciate @Michiel Tack's proposal to MEAction and think this is a very good start for a discussion.

Regarding the RCT definitions: Just thought whether it might be redundant to call a randomized trial an RCT just due to having comparison groups without being adequately controlled since randomization already implies that there have to be at least two groups. Put it another way, applying the loose definition, the 'R' in 'RCT' already contains the 'C'. Happy to be corrected if I'm wrong.

Esther12 said:
edit: The comment I made about 'exaggerated criticism' was not about your post, but just my fears.

It's also somehow about my post then I think. ;-) That's fine though. I just try to understand.

Once again, apologies for just popping in. Not able to read the revised review and follow the discussion here. Anyway, it's good to see others who better understand discussing here .

[Edited for clarity.]

ME/CFS Science Blog · Oct 5, 2019

I thought it might be useful to get an overview of the major changes compared to the 2017 version. I prefer focusing on the main comparison of exercise therapy versus a passive control condition. I’ll update this forum post if anyone notices other changes so that we’ll maintain an overview of all the major changes.

1) Different description of CFS
Myalgic encephalomyelitis (ME) makes its way into the abstract. The description of CFS has been changed from a common, debilitating and serious serious health problem characterized by medically unexplained fatigue to ”a serious disorder characterised by persistent postexertional fatigue and substantial symptoms related to cognitive, immune and autonomous dysfunction.”

2) Diagnostic criteria
The amended review makes clear that the results only apply to patients selected with the Fukuda or Oxford criteria. The conclusion in the abstract now reads: “All studies were conducted with outpatients diagnosed with 1994 criteria of the Centers for Disease Control and Prevention or the Oxford criteria, or both. Patients diagnosed using other criteria may experience different effects.”

3) Standard mean differences (SMD)
The 2017 version focused on mean differences (MD) where all the results that use the same version of a questionnaire are pooled together. The problem with this approach is that you don’t get an overview of all the results for one outcome (say fatigue) if different questionnaires were used. And that’s of course what interest readers the most: the result for all fatigue outcomes taken together. That requires a pooling of results on different questionnaires for the same outcome into what is called a standardized mean difference (SMD). In the old version, SMD’s were only reported in the sensitivity analysis. The late Robert Courtney pointed out that this not according to the protocol (Edmonds et al. 2004) and that it allowed the authors to present their results more favorably. One example: the effect on fatigue at follow up was not statistically significant when expressed in SMD, but by focusing on MD’s for separate versions of the Chalder Fatigue Scale this was not easily visible in the review.

4) Recalculation to the 33-point Chalder Fatigue Scale
The downside of an SMD is that it is difficult to interpret because the results no longer relate to an actual questionnaire. Cochrane has therefore asked the authors to recalculate the size of SMD results for all fatigue outcomes into an MD for the 33 point version of the Chalder Fatigue Scale, which is now the most commonly used version. So first all fatigue results were pooled together and then it was calculated how large that effect would be on the 33-point version of the Chalder Fatigue Scale. The SMD for fatigue was -0.66 suggesting a moderate effect size. But when reexpressed on the Chalder Fatigue Scale, this corresponded to a 3.4 point reduction on the Chalder Fatigue Scale, which seems rather small.

5) Minimal Important Differences (MID)
To estimate whether a 3.4 point reduction on the Chalder Fatigue Scale is clinically significant, the authors searched for minimal important differences (MID). They found no study on CFS that did this but a paper on Lupus reported a threshold around 2.3 points on the Chalder Fatigue Scale. According to the authors, this indicates that the change caused by exercise therapy was clinically significant. They estimated MID for other outcomes measures as well.

6) Standardised language reflecting the GRADE assessment system
In the old review, the authors did not use a consistent method to describe the strength of evidence. They made statements that reflect their own impression of the evidence such as “encouraging evidence suggests that exercise therapy can contribute to alleviation of some symptoms of CFS” or “Patients with CFS may generally benefit […] following exercise therapy” or “We think the evidence suggests that exercise therapy might be an effective and safe intervention” or “seven studies consistently showed a reduction in fatigue following exercise therapy at end of treatment”. The new wording is standardized and reflects quality scores of the GRADE assessment system. The word ‘probably’ reflects moderate-quality evidence, ‘may’ reflects low-quality evidence and ‘uncertain’ reflects very low quality evidence. In general, this means that the results are more carefully worded to reflect the underlying evidence. One example: In the 2017 version the word ‘uncertain’ was used once, in the amended version it is used 76 times.

7) Evidence on adverse events becomes 'uncertain'
One of the most notable changes of consistently using the GRADE assessment system is how the evidence on adverse events is presented. The new version restricts itself to cautious statements such as “we are uncertain about the risk of serious adverse reactions because the certainty of the evidence is very low.” The previous version did recognize that sparse data made it difficult to draw conclusion, but it also made strong statements such as “no evidence suggests that exercise therapy may worsen outcomes” or “few serious adverse reactions were reported” or “exercise therapy did not worsen symptoms for people with CFS.” In their conclusion the author wrote: “We think the evidence suggests that exercise therapy might be an […] safe intervention.” These statements have now been deleted or reworded.

8) Uncertain results at follow-up
Another notable change is the evidence on the long-term follow-up for outcomes such as fatigue and physical function. The analysis of the data shows that at this measurement point the improvements were no longer statistically significant. As the late Robert Courtney pointed out, this was not mentioned in the abstract or explained in the main text. The old abstract confusingly wrote that “study authors reported a positive effect of exercise therapy at end of treatment with respect to […] physical function […] and self-perceived changes in overall health.” It was not made clear that this ‘positive effect’ was not statistically significant when data were pooled together. The results for fatigue at follow-up were not mentioned in the abstract. The new abstract makes clear that for each outcome except for sleep the results at follow-up are uncertain because the certainty of the evidence is very low.

9) Elaboration of the summary of findings tables
The results for fatigue and physical function at follow-up are now presented in the summary of findings tables, which wasn’t the case in the previous version. Instead of mentioning whether a measurement was taken post-treatment or at follow-up, the summary tables now give the exact time point or interval of outcome assessments. Overall, these summary of findings tables have become more elaborated and also present the results for comparison 2 exercise therapy versus psychological treatment, comparison 3 exercise therapy versus adaptive pacing therapy and 4 exercise therapy versus antidepressants.

10) Probably
The authors have rated the results for fatigue post-treatment as moderate quality, which is reflected in the wording “exercise therapy probably has a positive effect on fatigue.” The old version also rated the evidence for post-treatment fatigue as ‘moderate-quality’ but it used a different phrasing. The conclusion wrote: “Patients with CFS may […] feel less fatigued following exercise therapy." The word probably wasn’t used.

11) High risk of performance and detection bias highlighted
The amended abstract makes clear that the studies in the review have a high risk of bias for certain domains. It reads: “Most studies had a low risk of selection bias. All had a high risk of performance and detection bias.” The old version was more ambiguous and wrote: “Risk of bias varied across studies, but within each study, little variation was found in the risk of bias across our primary and secondary outcome measures.” In the Discussion section the old version even claimed that “risk of bias across studies was relatively low.”

12) The 11-point version of the Chalder Fatigue Scale for the FINE Trial
The authors have now used the 11-point version of the Chalder Fatigue Scale for the FINE trial (Wearden et al. 2010) instead of the 33-point version, which was not published in the peer-review literature. This has caused a change in the SMD for the FINE Trial from -0.43 to -0.27. The overall SMD for fatigue however only changed little because of this: instead of -0.68 [-1.02, -0.35] it now reads -0.66 [-1.01,-0.31].

13) More sensitivity analyses
The amended review has more sensitivity analyses. These are extra analyses made to see if the results remain the same if something is interpreted differently or if some studies are left out of the analysis. The old version tested for example how excluding the study by Powell et al. 2001, influenced the results because this study reported much larger improvements than other studies. The new version also tests how exclusion of the PACE and FINE trial influences the results for key outcomes such a fatigue and physical function. The amended review also has sensitivity analyses for outcomes of sleep and self-perceived changes in overall health, which were not reported in the old version.

14) Two additional studies mentioned: GETSET and Marques et al.
The authors noted that since they have performed their systemic search of the literature in may 2014, two more randomized trials have been published that are relevant and could be included in future updates. These have also reported positive findings for GET:

Marques M, De Gucht V, Leal I, Maes S. Effects of a selfregulation based physical activity program (the "4-STEPS") for unexplained chronic fatigue: a randomized controlled trial. International Journal of Behavioral Medicine 2015;2:187-96. [DOI: 10.1007/s12529-014-9432-4]

Clarke LV, Pesola F, Thomas JM, Vergara-Williamson M, Beynon M, White PG. Guided graded exercise self-help plus specialist medical care versus specialist medical care alone for chronic fatigue syndrome (GETSET): a pragmatic randomised controlled trial. Lancet 2017;390(10092):363-73. [DOI: 10.1016/ S0140-6736(16)32589-2]

15) Extra feedback and comments
Extra feedback has been submitted. According to Richard Gardner the statement that there is no evidence that exercise therapy may worsen outcome, may be misleading as no conclusion could be made about the drop-out rates. Adrienne Wooding noted that the Cochrane review erroneously places ME/CFS in its mental health category. Mark Vink referred to his reanalysis and critique of the Cochrane review which indicates that objective outcomes generally do not show improvements following exercise therapy.

16) Minor, non-important changes to the text
If one puts the old and amended texts next to each other, one will notice that some sections have been rewritten, shortened or reformatted. In my view, these are not important changes to the analysis. Instead, they seem more like clarifications, explanations of the changes made or shortening of the text because it had otherwise become too long. I have therefore chosen not to specify these minor changes in detail because the overview would then be much more complicated. If anyone does see important changes to the text that I have overlooked, please let me know, so that this overview can be updated.

EDIT: the text has been changed. The changes made to the Cochrane review are not dan update (which would include a new search and data from new studies) but an amendment.

Sly Saint · Oct 5, 2019

unfortunately all that comes up on a google search now is
"
Exercise as treatment for adults with chronic fatigue syndrome ...

https://www.cochrane.org › DEPRESSN_exercise-treatment-adults-chronic...
3 days ago - Authors' conclusions: Exercise therapy probably has a positive effect on fatigue in adults with CFS compared to usual care or passive therapies. The evidence regarding adverse effects is uncertain. ... Existing treatment strategies primarily aim to relieve symptoms and improve function."

Simon M · Oct 5, 2019

Thanks very much, @Michiel Tack.

Standard mean differences/minimal important differences

I don't think there is a real issue with describing the effect on fatigue as measured by the Chalder Fatigue Scale as "moderate" , or the 3.4 point reduction as a "minimal important difference".

The key thing here is the effect size, which is basically the standardised mean difference (i.e. mean difference expressed in standard deviation units). The guy who came up with this idea is Cohen, and he originally suggested that and SMD of 0.2 was small, 0.5 was moderate and 0.8 was large. (He also said researchers need to work out what constituted small, moderate and large in their own fields, but nobody in any filed ever took any notice).

So it is pretty routine to describe an effect of 0.6 as moderate.

Also 0.5 SMD is routinely used as a "minimum important difference" in clinical studies. There is nothing unusual in its use here.

The key thing to notice that 0.5 SMD is actually pretty unimpressive, as you can see from the diagram below (0.64 SD) (2 norml distributions, with a mean difference of 0.64sd)

reference

The problem is not the interpretation of the effect size, it’s whether or not the questionnaire-score effect size Reflects real world improvements or simply response bias in an unblinded trial.

That was very interesting, and their habit of rellying on their own arbritarty interpretatiosn hadgone further in 2017 than I had realised.

Michiel Tack said:
6) Standardised language reflecting the GRADE assessment system

That was very interesting, and their habit of rellying on their own arbritarty interpretatiosn hadgone further in 2017 than I had realised.

Michiel Tack said:
11) High risk of performance and detection bias highlighted
The updated abstract makes clear that the studies in the review have a high risk of bias for certain domains. It reads: “Most studies had a low risk of selection bias. All had a high risk of performance and detection bias.” The old version was more ambiguous and wrote: “Risk of bias varied across studies, but within each study, little variation was found in the risk of bias across our primary and secondary outcome measures.” In the Discussion section the old version even claimed that “risk of bias across studies was relatively low.”

Just to clarify, performance bias covers things like where the different randomised groups having different level of attention, which could influence outcomes. Detection bias is particularly applicable in the use of subjective outcomes in non-blinded trials (according to Cochrane).

I think that @Jonathan Edwards's point that the level of bias for a trial is greater than the highest individual bias is particularly relevant here. I think RoB2 covered this too, using the weakest link in the chain argument, you said. According to either of those approaches, all the trials should be categorised as having a high risk of bias.[/QUOTE]

Dolphin · Oct 5, 2019

We therefore identified research literature to help quantify minimal important differences (MID) for important outcome measures. For fatigue, one study among people with systemic lupus erythematosus (Goligher 2008), reported a threshold around 2.3 points for a minimally important change on the 33-point Chalder Fatigue Scale, an effect size that corresponds to an SMD of
about 0.36 (Goligher 2008).

I'm nearly positive at one stage in the other place I or someone else found a UK CFS paper where the researchers said based on their clinical experience they thought that a clinical useful difference (or some similar term like minimal important difference) for the Chalder fatigue scale should be 3 or 4 in CFS; it was definitely bigger than the 2 that was used in the PACE Trial. I have a vague recollection it might have involved Crawley or Chalder herself.

I have been doing a number of searches for the last while, but have been unable to uncover it.

Andy · Oct 5, 2019

Dolphin said:
I have been doing a number of searches for the last while, but have been unable to uncover it.

This it?

Measuring fatigue in clinical and community settings, Cella & Chalder
https://www.sciencedirect.com/science/article/abs/pii/S0022399909004176
https://sci-hub.se/10.1016/j.jpsychores.2009.10.007

Dolphin · Oct 5, 2019

Andy said:
This it?

Measuring fatigue in clinical and community settings, Cella & Chalder
https://www.sciencedirect.com/science/article/abs/pii/S0022399909004176
https://sci-hub.se/10.1016/j.jpsychores.2009.10.007

Thanks. That is an important paper on the Chalder fatigue scale, but isn't the one I'm thinking of.

ME/CFS Science Blog · Oct 5, 2019

Dolphin said:
I have been doing a number of searches for the last while, but have been unable to uncover it.

While I was searching I found this one by Crawley on the "minimally clinically important difference of the SF-36 physical function subscale for paediatric CFS/ME".
https://hqlo.biomedcentral.com/track/pdf/10.1186/s12955-018-1028-2

Dolphin · Oct 5, 2019

Michiel Tack said:
While I was searching I found this one by Crawley on the "minimally clinically important difference of the SF-36 physical function subscale for paediatric CFS/ME".
https://hqlo.biomedcentral.com/track/pdf/10.1186/s12955-018-1028-2

Thanks for that. That is from 2018 so wouldn't be what I'm thinking about, but is still interesting.

Esther12 · Oct 5, 2019

NelliePledge said:
Or simply that is a lot easier to be frank when you are leaving a job than when you have just taken it on.

If so, that would be a pretty grim sign for the state of Cochrane too. Only when approaching retirement can they consider nearly doing the decent thing, but failing to follow through.

dave30th · Oct 5, 2019

Snow Leopard said:
It is notable that the main conclusion which was a sticking point for David Tovey, namely downgrading the evidence from "probably" to "may" and from "moderate" to "low-moderate" has not made it into the revised article. (See the FOI correspondence on 29th of May)

yes, this seemed to be a settled agreement. I'm surprised it didn't survive past May.

ME/CFS Science Blog · Oct 5, 2019

Dolphin said:
That is from 2018 so wouldn't be what I'm thinking about, but is still interesting.

Could this be it:

Sabes-Figuera et al. (2012). Cost-effectiveness of counselling, graded-exercise and usual care for chronic fatigue: evidence from a randomised trial in primary care.BMC Health Serv Res. 2012 Aug 20;12:264. doi: 10.1186/1472-6963-12-264.

It says:

The primary clinical outcome was the Chalder fatigue scale [9], which consists of 13 items assessed using Likert scales (0,1,2,3) producing a total score ranging between 0 and 33. For the purpose of the economic evaluation we calculated the amount of clinically significant change by the six-month follow-up given that use of services data was not available for the twelve months follow-up. This was obtained by dividing the actual change in the Chalder fatigue scale total score by four. Thus a value of one in the change in fatigue outcome corresponds to a difference of four in the original Chalder fatigue scale, assuming that a change of that magnitude was clinically significant (CSI) [6]

So they argued that a change of 4 was clinically significant, which is higher than the 3,4. points of effect size reported by Larun et al.

Barry · Oct 5, 2019

Is this one too new, 2015,

https://sci-hub.tw/10.3109/03009742.2014.988173

Dolphin · Oct 5, 2019

Michiel Tack said:
Could this be it:

Sabes-Figuera et al. (2012). Cost-effectiveness of counselling, graded-exercise and usual care for chronic fatigue: evidence from a randomised trial in primary care.BMC Health Serv Res. 2012 Aug 20;12:264. doi: 10.1186/1472-6963-12-264.

It says:

So they argued that a change of 4 was clinically significant, which is higher than the 3,4. effect size reported by Larun et al.

Thanks. That's the one. I previously posted about it here:
https://forums.phoenixrising.me/threads/pace-trial-and-pace-trial-protocol.3928/post-341566

I don't think that this point has been made before:

(I've copied it from this post: http://forums.phoenixrising.me/inde...ronic-fatigue-sabes-figura.22388/#post-341565 )

This is from this paper:

Cost-effectiveness of counselling, graded-exercise and usual care for chronic fatigue: evidence from a randomised trial in primary care.

BMC Health Serv Res. 2012 Aug 20;12:264. doi: 10.1186/1472-6963-12-264.
Sabes-Figuera R, McCrone P, Hurley M, King M, Donaldson AN, Ridsdale L.
Free at: http://www.biomedcentral.com/1472-6963/12/264

Click to expand...

The most interesting thing in this paper to me doesn't in fact really relate to this trial specifically, but more to trials in general that use the Chalder fatigue scale, particularly when they use a threshold to signify significant or important changes. For example, the PACE Trial claimed 2 points would be sufficient for a clinically useful difference with large percentages in all groups (incl. the SMC no therapy group) achieving this.

This paper says:

Outcomes

Assessments were made at baseline with follow-up at six and twelve months. The primary clinical outcome was the Chalder fatigue scale [9], which consists of 13 items assessed using Likert scales (0,1,2,3) producing a total score ranging between 0 and 33. For the purpose of the economic evaluation we calculated the amount of clinically significant change by the six-month follow-up given that use of services data was not available for the twelve months follow-up. This was obtained by dividing the actual change in the Chalder fatigue scale total score by four. Thus a value of one in the change in fatigue outcome corresponds to a difference of four in the original Chalder fatigue scale, assuming that a change of that magnitude was clinically significant (CSI)[6]. This was based in a consensus reached by clinicians in a previous trial [10].

Click to expand...

6. McCrone P, Ridsdale L, Darbishire L, Seed P: Cost-effectiveness of cognitive behavioural therapy, graded exercise and usual care for patients with chronic fatigue in primary care.
Psychol Med2004,34(6):991-999.

10. Ridsdale L, Godfrey E, Seed P:Chronic Fatigue in general practice: authors reply.
Br J Gen Pract2001,51:317-318.

Click to expand...

Here's the relevant part of their letter (reference 10):

Because the Chalder fatigue scale6 is relatively new, there is no published definition of equivalence. The researchers in this trial include several of those involved in developing and testing the instrument. Our consensus view was that a difference of less than four, using a Likert scale, is not important. We found that the apparent advantage six months after therapy of CBT over counselling was only 1.04 points with a 95% confidence interval from -1.7 to 3.7. Arriving at this estimate was always the main aim of the trial. Jones et al7 (on whom Underwood and Eldridge rely) state ‘If every point within this range (i.e. the confidence interval) corresponds to a difference of no clinical importance then the treatments may be considered to be equivalent.’ We conclude that the treatments are clinically equivalent.

A clinician that was concerned about differences of two or three points could legitimately claim that the question is still open.

Click to expand...

Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)