Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019

Discussion in 'Psychosomatic research - ME/CFS and Long Covid' started by MEMarge, Oct 2, 2019.

  1. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,420
    Except that as @Lucibee points out, they seem to combine absolute measures with delta measures across whole groups of people, with little heed to getting the aggregation right (cannot recall the full details now).

    I think it goes like this by way of example (please correct me if I've got this wrong Lucibee):
    • Take an individual's weight, then later their change in weight.
    • Aggregate the two to arrive at their new weight.
    • Do this to lots of people within a group.
    • Run statistical analysis on group data.
    • I believe this is is valid.
    The invalid case:
    • Take an individual's weight, then later their change in weight.
    • Do this to lots of people within a group.
    • Aggregate the group's weights.
    • Aggregate the group's changes in weights.
    • Run statistical analysis on group data.
    Is this how it goes Lucibee?
     
    Sean, rvallee, Dolphin and 2 others like this.
  2. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    If by "take an individual's weight" you mean, "ask individual how much they think they weigh" then sort of.

    They didn't look at individual change in 'weight' though. All data seems to be change from baseline to endpoint in the group means. The "mean difference" they refer to is the average of the differences in group means (mean difference in means).

    eta: the individual patient data review would have allowed them to look at individual changes and to aggregate them by group, but not this review.
     
    Last edited: Oct 4, 2019
    JohnTheJack, Sean, rvallee and 3 others like this.
  3. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    The mention of IPD made me wonder about these bits of the new review, seeing as the IPD protocol has been withdrawn:

     
    rvallee likes this.
  4. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,791
    I just checked the 2017 text and the same wording was in it:
    so they may simply have forgotten to remove it.
     
  5. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    This is also an indication that the new person is even worse than the last.
     
  6. NelliePledge

    NelliePledge Moderator Staff Member

    Messages:
    14,837
    Location:
    UK West Midlands
    Or simply that is a lot easier to be frank when you are leaving a job than when you have just taken it on.
     
  7. MSEsperanza

    MSEsperanza Senior Member (Voting Rights)

    Messages:
    2,947
    Location:
    betwixt and between
    Thank you for explaining. I agree it's worth trying to avoid misunderstandings.

    I rather shouldn't post when I don't feel up to it, but just leave two more thoughts here today:

    I think there is no need to be afraid of the (reasonable/ high?) standards we apply to research or other scientific work if we apply them to all scientific work, not only the psychologizationers'. It seems to me we have a problem though if some people involved in otherwise valuable ME advocacy don't apply the same standards to all scientific work or their own research. This is something I would like to discuss more (perhaps not in public), but am not able to contribute at the moment and the near future. Having said this, I very much appreciate @Michiel Tack's proposal to MEAction and think this is a very good start for a discussion.

    Regarding the RCT definitions: Just thought whether it might be redundant to call a randomized trial an RCT just due to having comparison groups without being adequately controlled since randomization already implies that there have to be at least two groups. Put it another way, applying the loose definition, the 'R' in 'RCT' already contains the 'C'. Happy to be corrected if I'm wrong.
    It's also somehow about my post then I think. ;-) That's fine though. I just try to understand.

    Once again, apologies for just popping in. Not able to read the revised review and follow the discussion here. Anyway, it's good to see others who better understand discussing here .

    [Edited for clarity.]
     
    Last edited: Oct 5, 2019
  8. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    I thought it might be useful to get an overview of the major changes compared to the 2017 version. I prefer focusing on the main comparison of exercise therapy versus a passive control condition. I’ll update this forum post if anyone notices other changes so that we’ll maintain an overview of all the major changes.

    1) Different description of CFS
    Myalgic encephalomyelitis (ME) makes its way into the abstract. The description of CFS has been changed from a common, debilitating and serious serious health problem characterized by medically unexplained fatigue to ”a serious disorder characterised by persistent postexertional fatigue and substantial symptoms related to cognitive, immune and autonomous dysfunction.”

    2) Diagnostic criteria
    The amended review makes clear that the results only apply to patients selected with the Fukuda or Oxford criteria. The conclusion in the abstract now reads: “All studies were conducted with outpatients diagnosed with 1994 criteria of the Centers for Disease Control and Prevention or the Oxford criteria, or both. Patients diagnosed using other criteria may experience different effects.”

    3) Standard mean differences (SMD)
    The 2017 version focused on mean differences (MD) where all the results that use the same version of a questionnaire are pooled together. The problem with this approach is that you don’t get an overview of all the results for one outcome (say fatigue) if different questionnaires were used. And that’s of course what interest readers the most: the result for all fatigue outcomes taken together. That requires a pooling of results on different questionnaires for the same outcome into what is called a standardized mean difference (SMD). In the old version, SMD’s were only reported in the sensitivity analysis. The late Robert Courtney pointed out that this not according to the protocol (Edmonds et al. 2004) and that it allowed the authors to present their results more favorably. One example: the effect on fatigue at follow up was not statistically significant when expressed in SMD, but by focusing on MD’s for separate versions of the Chalder Fatigue Scale this was not easily visible in the review.

    4) Recalculation to the 33-point Chalder Fatigue Scale
    The downside of an SMD is that it is difficult to interpret because the results no longer relate to an actual questionnaire. Cochrane has therefore asked the authors to recalculate the size of SMD results for all fatigue outcomes into an MD for the 33 point version of the Chalder Fatigue Scale, which is now the most commonly used version. So first all fatigue results were pooled together and then it was calculated how large that effect would be on the 33-point version of the Chalder Fatigue Scale. The SMD for fatigue was -0.66 suggesting a moderate effect size. But when reexpressed on the Chalder Fatigue Scale, this corresponded to a 3.4 point reduction on the Chalder Fatigue Scale, which seems rather small.

    5) Minimal Important Differences (MID)
    To estimate whether a 3.4 point reduction on the Chalder Fatigue Scale is clinically significant, the authors searched for minimal important differences (MID). They found no study on CFS that did this but a paper on Lupus reported a threshold around 2.3 points on the Chalder Fatigue Scale. According to the authors, this indicates that the change caused by exercise therapy was clinically significant. They estimated MID for other outcomes measures as well.

    6) Standardised language reflecting the GRADE assessment system
    In the old review, the authors did not use a consistent method to describe the strength of evidence. They made statements that reflect their own impression of the evidence such as “encouraging evidence suggests that exercise therapy can contribute to alleviation of some symptoms of CFS” or “Patients with CFS may generally benefit […] following exercise therapy” or “We think the evidence suggests that exercise therapy might be an effective and safe intervention” or “seven studies consistently showed a reduction in fatigue following exercise therapy at end of treatment”. The new wording is standardized and reflects quality scores of the GRADE assessment system. The word ‘probably’ reflects moderate-quality evidence, ‘may’ reflects low-quality evidence and ‘uncertain’ reflects very low quality evidence. In general, this means that the results are more carefully worded to reflect the underlying evidence. One example: In the 2017 version the word ‘uncertain’ was used once, in the amended version it is used 76 times.

    7) Evidence on adverse events becomes 'uncertain'
    One of the most notable changes of consistently using the GRADE assessment system is how the evidence on adverse events is presented. The new version restricts itself to cautious statements such as “we are uncertain about the risk of serious adverse reactions because the certainty of the evidence is very low.” The previous version did recognize that sparse data made it difficult to draw conclusion, but it also made strong statements such as “no evidence suggests that exercise therapy may worsen outcomes” or “few serious adverse reactions were reported” or “exercise therapy did not worsen symptoms for people with CFS.” In their conclusion the author wrote: “We think the evidence suggests that exercise therapy might be an […] safe intervention.” These statements have now been deleted or reworded.

    8) Uncertain results at follow-up
    Another notable change is the evidence on the long-term follow-up for outcomes such as fatigue and physical function. The analysis of the data shows that at this measurement point the improvements were no longer statistically significant. As the late Robert Courtney pointed out, this was not mentioned in the abstract or explained in the main text. The old abstract confusingly wrote that “study authors reported a positive effect of exercise therapy at end of treatment with respect to […] physical function […] and self-perceived changes in overall health.” It was not made clear that this ‘positive effect’ was not statistically significant when data were pooled together. The results for fatigue at follow-up were not mentioned in the abstract. The new abstract makes clear that for each outcome except for sleep the results at follow-up are uncertain because the certainty of the evidence is very low.

    9) Elaboration of the summary of findings tables
    The results for fatigue and physical function at follow-up are now presented in the summary of findings tables, which wasn’t the case in the previous version. Instead of mentioning whether a measurement was taken post-treatment or at follow-up, the summary tables now give the exact time point or interval of outcome assessments. Overall, these summary of findings tables have become more elaborated and also present the results for comparison 2 exercise therapy versus psychological treatment, comparison 3 exercise therapy versus adaptive pacing therapy and 4 exercise therapy versus antidepressants.

    10) Probably
    The authors have rated the results for fatigue post-treatment as moderate quality, which is reflected in the wording “exercise therapy probably has a positive effect on fatigue.” The old version also rated the evidence for post-treatment fatigue as ‘moderate-quality’ but it used a different phrasing. The conclusion wrote: “Patients with CFS may […] feel less fatigued following exercise therapy." The word probably wasn’t used.

    11) High risk of performance and detection bias highlighted
    The amended abstract makes clear that the studies in the review have a high risk of bias for certain domains. It reads: “Most studies had a low risk of selection bias. All had a high risk of performance and detection bias.” The old version was more ambiguous and wrote: “Risk of bias varied across studies, but within each study, little variation was found in the risk of bias across our primary and secondary outcome measures.” In the Discussion section the old version even claimed that “risk of bias across studies was relatively low.”

    12) The 11-point version of the Chalder Fatigue Scale for the FINE Trial
    The authors have now used the 11-point version of the Chalder Fatigue Scale for the FINE trial (Wearden et al. 2010) instead of the 33-point version, which was not published in the peer-review literature. This has caused a change in the SMD for the FINE Trial from -0.43 to -0.27. The overall SMD for fatigue however only changed little because of this: instead of -0.68 [-1.02, -0.35] it now reads -0.66 [-1.01,-0.31].

    13) More sensitivity analyses
    The amended review has more sensitivity analyses. These are extra analyses made to see if the results remain the same if something is interpreted differently or if some studies are left out of the analysis. The old version tested for example how excluding the study by Powell et al. 2001, influenced the results because this study reported much larger improvements than other studies. The new version also tests how exclusion of the PACE and FINE trial influences the results for key outcomes such a fatigue and physical function. The amended review also has sensitivity analyses for outcomes of sleep and self-perceived changes in overall health, which were not reported in the old version.

    14) Two additional studies mentioned: GETSET and Marques et al.
    The authors noted that since they have performed their systemic search of the literature in may 2014, two more randomized trials have been published that are relevant and could be included in future updates. These have also reported positive findings for GET:

    Marques M, De Gucht V, Leal I, Maes S. Effects of a selfregulation based physical activity program (the "4-STEPS") for unexplained chronic fatigue: a randomized controlled trial. International Journal of Behavioral Medicine 2015;2:187-96. [DOI: 10.1007/s12529-014-9432-4]

    Clarke LV, Pesola F, Thomas JM, Vergara-Williamson M, Beynon M, White PG. Guided graded exercise self-help plus specialist medical care versus specialist medical care alone for chronic fatigue syndrome (GETSET): a pragmatic randomised controlled trial. Lancet 2017;390(10092):363-73. [DOI: 10.1016/ S0140-6736(16)32589-2]​

    15) Extra feedback and comments
    Extra feedback has been submitted. According to Richard Gardner the statement that there is no evidence that exercise therapy may worsen outcome, may be misleading as no conclusion could be made about the drop-out rates. Adrienne Wooding noted that the Cochrane review erroneously places ME/CFS in its mental health category. Mark Vink referred to his reanalysis and critique of the Cochrane review which indicates that objective outcomes generally do not show improvements following exercise therapy.

    16) Minor, non-important changes to the text
    If one puts the old and amended texts next to each other, one will notice that some sections have been rewritten, shortened or reformatted. In my view, these are not important changes to the analysis. Instead, they seem more like clarifications, explanations of the changes made or shortening of the text because it had otherwise become too long. I have therefore chosen not to specify these minor changes in detail because the overview would then be much more complicated. If anyone does see important changes to the text that I have overlooked, please let me know, so that this overview can be updated.

    EDIT: the text has been changed. The changes made to the Cochrane review are not dan update (which would include a new search and data from new studies) but an amendment.
     
    Last edited: Oct 9, 2019
  9. Sly Saint

    Sly Saint Senior Member (Voting Rights)

    Messages:
    9,920
    Location:
    UK
    unfortunately all that comes up on a google search now is
    "
    Exercise as treatment for adults with chronic fatigue syndrome ...

    https://www.cochrane.org › DEPRESSN_exercise-treatment-adults-chronic...

    3 days ago - Authors' conclusions: Exercise therapy probably has a positive effect on fatigue in adults with CFS compared to usual care or passive therapies. The evidence regarding adverse effects is uncertain. ... Existing treatment strategies primarily aim to relieve symptoms and improve function."
     
    alktipping, Hutan, Mark Vink and 10 others like this.
  10. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    995
    Location:
    UK
    Thanks very much, @Michiel Tack.


    Standard mean differences/minimal important differences


    I don't think there is a real issue with describing the effect on fatigue as measured by the Chalder Fatigue Scale as "moderate" , or the 3.4 point reduction as a "minimal important difference".

    The key thing here is the effect size, which is basically the standardised mean difference (i.e. mean difference expressed in standard deviation units). The guy who came up with this idea is Cohen, and he originally suggested that and SMD of 0.2 was small, 0.5 was moderate and 0.8 was large. (He also said researchers need to work out what constituted small, moderate and large in their own fields, but nobody in any filed ever took any notice).

    So it is pretty routine to describe an effect of 0.6 as moderate.

    Also 0.5 SMD is routinely used as a "minimum important difference" in clinical studies. There is nothing unusual in its use here.

    The key thing to notice that 0.5 SMD is actually pretty unimpressive, as you can see from the diagram below (0.64 SD) (2 norml distributions, with a mean difference of 0.64sd)

    cohen-d 0,64.png

    reference


    https://www.youtube.com/watch?v=tTgouKMz-eI




    The problem is not the interpretation of the effect size, it’s whether or not the questionnaire-score effect size Reflects real world improvements or simply response bias in an unblinded trial.

    That was very interesting, and their habit of rellying on their own arbritarty interpretatiosn hadgone further in 2017 than I had realised.

    That was very interesting, and their habit of rellying on their own arbritarty interpretatiosn hadgone further in 2017 than I had realised.
    Just to clarify, performance bias covers things like where the different randomised groups having different level of attention, which could influence outcomes. Detection bias is particularly applicable in the use of subjective outcomes in non-blinded trials (according to Cochrane).

    I think that @Jonathan Edwards's point that the level of bias for a trial is greater than the highest individual bias is particularly relevant here. I think RoB2 covered this too, using the weakest link in the chain argument, you said. According to either of those approaches, all the trials should be categorised as having a high risk of bias.[/QUOTE]
     
    Last edited: Oct 5, 2019
  11. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,791
    I'm nearly positive at one stage in the other place I or someone else found a UK CFS paper where the researchers said based on their clinical experience they thought that a clinical useful difference (or some similar term like minimal important difference) for the Chalder fatigue scale should be 3 or 4 in CFS; it was definitely bigger than the 2 that was used in the PACE Trial. I have a vague recollection it might have involved Crawley or Chalder herself.

    I have been doing a number of searches for the last while, but have been unable to uncover it.
     
  12. Andy

    Andy Committee Member

    Messages:
    23,032
    Location:
    Hampshire, UK
  13. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,791
    alktipping, JohnTheJack and Andy like this.
  14. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    While I was searching I found this one by Crawley on the "minimally clinically important difference of the SF-36 physical function subscale for paediatric CFS/ME".
    https://hqlo.biomedcentral.com/track/pdf/10.1186/s12955-018-1028-2
     
  15. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,791
    Thanks for that. That is from 2018 so wouldn't be what I'm thinking about, but is still interesting.
     
    alktipping, JohnTheJack and Andy like this.
  16. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    If so, that would be a pretty grim sign for the state of Cochrane too. Only when approaching retirement can they consider nearly doing the decent thing, but failing to follow through.
     
    alktipping, Chezboo, MEMarge and 3 others like this.
  17. dave30th

    dave30th Senior Member (Voting Rights)

    Messages:
    2,447
    yes, this seemed to be a settled agreement. I'm surprised it didn't survive past May.
     
  18. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    Could this be it:
    It says:
    So they argued that a change of 4 was clinically significant, which is higher than the 3,4. points of effect size reported by Larun et al.
     
    Last edited: Oct 5, 2019
    Hutan, MEMarge, JohnTheJack and 2 others like this.
  19. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,420
  20. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,791
    Thanks. That's the one. I previously posted about it here:
    https://forums.phoenixrising.me/threads/pace-trial-and-pace-trial-protocol.3928/post-341566

     

Share This Page