Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019

Discussion in 'Psychosomatic research - ME/CFS and Long Covid' started by MEMarge, Oct 2, 2019.

  1. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    As their source they refer to:
    Which reads:
    Ok that's it. I don't trust a word of what Larun et al. or Cochrane say anymore ....

    I really thought they would have checked that the right figures were used, not only those that favour the results for exercise therapy.


    EDIT: The last sentence has been edited for clarification.
     
    Last edited: Oct 5, 2019
    Mithriel, Hutan, LadyBirb and 11 others like this.
  2. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    To clarify what I mean: I did a quick Pubmed search using the terms: (Chalder Fatigue Scale) AND (minimally important difference OR clinically significant). It gives only 10 results including the relevant paper by Sabes-Figuera et al. and the paper on Lupus which the authors was used in the updated Cochrane review (Goligher et al. 2008). This gives me the impression that the authors should have been able to find the Sabes-Figuera study, even with a limited search.

    The PACE trial (White et al. 2011) says:
    So Larun et al. should probably have said that a clinically significant difference is estimated at 2 - 4 points on the Chalder Fatigue Scale for patients with CFS, based on previous studies.
     
    Last edited: Oct 5, 2019
    LadyBirb, JohnTheJack, Barry and 6 others like this.
  3. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
    This letter was published in the Lancet on what was done in the PACE Trial:

    https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60689-2/fulltext
     
    Last edited: Oct 5, 2019
  4. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    A SMD of 0.64 seems pretty large compared to a 3.4 point difference on a 33 point scale.

    Just a thought: Is it possible that the SMD was inflated because the standard deviation in the studies was low? Some trials used the 11-point version of the Chalder Fatigue Scale, which has ceiling effects. So perhaps most participants in these trials had near maximum scores with very little variation. So even small change would look big with so little background noise.
     
    Last edited: Oct 5, 2019
    alktipping, Hutan, MEMarge and 5 others like this.
  5. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    To clarify for others:

    Larun et al. set the minimal important difference (MID) for SF-36 physical function at 7 points, citing two studies, one on patients rheumatoid arthritis and one on patients with heart disease:
    • Ward MM, Guthrie LC, Alba MI. Clinically important changes in short form 36 health survey scales for use in rheumatoid arthritis clinical trials: the impact of low responsiveness. Arthritis Care & Research 2014;66:1783-9.
    • Wyrwich KW, Metz SM, Kroenke K, Tierney WM, Babu AN, Wolinsky FD. Triangulating patient and clinician perspectives on clinically important diNerences in health-related quality of life among patients with heart disease. Health Services Research 2007;42:2257-74
    The results of the Cochrane review showed that post-treatment, mean physical functioning scores in the exercise group were 13.10 points higher. So they argue that this is a clinically significant effect.

    The threshold of 7 points as MID however, seems quite low. The letter by Jane Giakoumakis argued for a MID of 12 points, based on half the standard deviation of the SF-36 measured in the general population. I've found this study on patients with Idiopathic Pulmonary Fibrosis which estimated the MID for SF-36 physical function at 13.9 points.

    Since SF-36 physical function is a frequently used measure I suspect there will be more of these estimates for MID. It would be useful to get a broad overview to see what range there is and whether Larun et al. choice of 7 was adequate or not.
     
    alktipping, Chezboo, Simon M and 9 others like this.
  6. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    995
    Location:
    UK
    The minimum score on shoulder is effectively 11, making it a 23 point scale. 3.4 point difference strikes me as credible for 0.5 SD, given that 0.5 SD isn't actually a very big difference (see the graph in my previous post).

    You're right that a ceiling effect would constrain the baseline standard deviation used to calculate SMD. But there is a counter argument that people on the ceiling could "improve" without showing a fall in their score, since they are already off the scale.

    Given also that, as you point out, other estimates of a minimally important difference are in the range 2-4, I am not sure the threshold used for MID is an obvious flaw in the review. Certainly not compared with all the other problems.

    (As@Dolphin points out in that letter to the Lancet, issues of an artificial baseline standard deviation are probably more important for the SF 36 physical function scale, but, surprisingly, exercise doesn't seem to have improved physical function according to this review)
     
    alktipping, JohnTheJack, Sean and 4 others like this.
  7. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    Do you mean that some patients had such severe fatigue that they still would have the maximum score even after an improvement in fatigue?

    That could be but I suspect than in the trials changes on the CFQ were mostly determined by response bias, placebo effects and other non-clinical effects (for example: not willing to admit that 12 weeks of therapy was pointless) rather than actual changes in health. So people fill in the questionnaire a little 'better.'

    In that case bias causes a small but consistent change on the questionnaire which looks moderate when expressed as SMD because patients pretty much all have the same score (near the maximum) causing little background noise.

    Just a thought. I should really look at the data more closely to see if it makes sense...
     
  8. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
    We have both the FINE and PACE Trial individual scores (though not by item), if anyone ever wants to look a bit more at individual patient data.

    ETA: Also I found Fulcher’s PhD somewhere, probably on ETHOS which also has similar data.
     
  9. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,659
    Location:
    Canada
    Weird. I do not consider any change in the CFQ to be clinically significant. None at all. At best it's a secondary measure and a very poor one over a secondary dimension of this disease. It is not a measure of anything other than the researchers' own misunderstanding of the problem and promotion of their personal beliefs above reality. It's as meaningless an argument as which precise skullcap sizes or brow distance are evidence of genius or criminality.

    Objective measures or bust. If we're "healthy", that means a normal life with no limitations whatsoever, enough with this "recovery is hard to define" crapfest. Anything else is just arguing over the precise fabric of the shoes of the angels dancing on a hairpin and its adherent properties on the metal pinhead. What a waste of everything, meaningless conversations over made-up nonsense.

    Not to rain on the discussion here, sadly because this nonsense is imposed on us like a meteor crashing down we do have to discuss it. But holy crap is this all dumb and disastrous. People (not you here, the idiots trying to make CFQ a relevant thing) shouting over which preferred arbitrary measurement on an imaginary scale means something about a subset that isn't even about meaningful by itself. Fools asking the wrong questions and wasting millions of lives arguing which imaginary answer is more meaningful than other imaginary answers or the precise cutoff at which this imaginary answer means something else or that.

    At least when people were shouting and fighting each other in the middle ages over which fictitious spirit or demon was responsible for something they had ignorance to blame for.
     
  10. BruceInOz

    BruceInOz Senior Member (Voting Rights)

    Messages:
    414
    Location:
    Tasmania
    I get the point about using the general population rather than the presumably narrower patient group. But we've seen that the distribution of the SF-36 is definitely not a normal distribution. Is it possible to derive something meaningful from the standard deviation of a non- normal distribution?
     
    Hutan, MEMarge, 2kidswithME and 8 others like this.
  11. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    Thanks for all the discussion on this.

    I remember when I first read about the concept of a 'clinically significant difference'/MID/etc, and it seemed like a potentially useful way of assessing patient views on the value of changes in questionnaire scores in nonblinded trials once they had been told to about the various problems with bias. Then I read how these concepts were actually defined in papers, and it seemed like it was often just another way of making it seem that researchers were doing more valuable work than they truly were.

    I find it difficult to believe that many patients would consider a 3.4 change to be important in trials of the sort assessed in this review. But who knows? It seems no-one bothered to ask us.

    edit: And if patients are asked to indicate the MID on a questionnaire with a huge range of items of different importance, then that doesn't lead to a MID score, it just shows that the exact items identified indicate an MID. If different items lead to the same score that does not mean they are also viewed as an important difference.
     
    Last edited: Oct 6, 2019
    alktipping, Hutan, MEMarge and 6 others like this.
  12. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    15,175
    Location:
    London, UK
    I had never heard of this MID before getting into ME studies - and I had spent years working on trials and acting as expert witness in the law courts on trials. As far as I can see it has no relevance to clinical importance. I used a slide of the 3.4 change in PACE for my NICE presentation with the full 33 points of Y axis. It just looks pathetic. None of this pseudo statistics has any bearing on reality.
     
  13. MSEsperanza

    MSEsperanza Senior Member (Voting Rights)

    Messages:
    2,947
    Location:
    betwixt and between
    Again, I stop by with a trivial point. I'm sure this has been pointed out already, but still this limitation of the review is not sufficiently addressed in the abstract, and not at all in the conclusion.

    To not specify the features of patients not meeting these selection criteria for the sample (being not able to particpate in exercise therapy) seems to me a grave neglect.

    (Hope this is understandable & also why I think this matters, not able to explain at the moment.)

    Edited to add:
    About the exercise participants had to be able to participate in:
     
    Last edited: Oct 6, 2019
  14. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    The following changes were proposed but rejected:

    1) Objective outcomes
    Tom Kindlon and Robert Courtney noted that with the exception for health resource use, Larun et al. have not reported on objective outcomes. The randomized trials included in the review had data on outcomes such as exercise testing, a fitness test, the six minute walking test, employment status and disability payments. Objective outcomes tend to be less influenced by bias due to a lack of blinding. The analysis by Vink &Vink-Niese showed that with some exceptions, objective outcomes generally have not significantly improved following exercise therapy. Back in 2015, the authors responded that “the protocol for this review did not include objective measurements." But they did seem to agree that objective measures should be carefully considered in an update. No extra objective outcomes were reported in the amended review.

    2) Compliance:
    Kindlon also asked about data on compliance: information on whether the trial participants really followed the exercise therapy as prescribed. He wrote: “it would be interesting if you could obtain some unpublished data from activity logs, records from heart-rate monitors, and other records to help build up a picture of what exercise was actually performed and the level of compliance.” Again, the authors seemed to agree that this is an important point that should be considered in an update of the review. No information is provided on compliance in the 2019 amendment.

    3) Selective reporting in the PACE trial
    Tom Kindlon and Robert Courtney both argued that the PACE trial should not be rated as low risk of bias for selective reporting. They referred to the Cochrane tool for assessing risk of bias (RoB 1), where the low risk of bias was explained as “The study protocol is available and all of the study’s pre-specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way.” Kindlon and Courtney argued that this was not the case for the PACE trial and that therefore the trial should not be rated as low risk of bias. Their comments were supported by Cochrane editor Nuala Livingstone during an internal audit of Courtney’s complaint to Cochrane. In their 2015 response, Larun et al. acknowledged that changes were made to planned analysis specified in the protocol of the PACE trial but argued that “these changes were drawn up before the analysis commenced and before examining any outcome data.” In the 2019 amendment all risk of bias judgements have remained the same, including the low risk of bias on selective reporting for the PACE trial. The authors justify this as follows: “The protocol and the statistical analysis plan were not formally published prior to recruitment of participants, and some readers, therefore, claim the study should be viewed as being a post hoc study. The study authors oppose this, and have published a minute from a Trial Steering Committee (TSC) meeting stating that any changes made to the analysis since the original protocol was agreed by TSC and signed off before the analysis commenced.”

    4) Proposal to analyze the excluded data from Jason et al.
    For the outcome of physical function at follow-up, the study by Jason et al. was excluded because of large baseline differences: the exercise group had much lower (39) physical function scores than the relaxation group (54). Kindlon noted that “It would be good if other methods could be investigated (e.g. using baseline levels as covariates) to analyse such data.” The authors responded that this would make the analysis very complicated and that this can be more easily addressed in a review based on individual patient data. The 2019 amendment does not use an alternative method to include the results of Jason et al. on physical function at follow-up.

    5) Downgrading fatigue post-treatment to low-quality evidence
    From a publicly released email exchange, we know that the previous Cochrane Editor in chief David Tovey strongly objected to the results for fatigue post-treatment to be rated as moderate quality. He wrote: “the conclusion that this is moderate certainty evidence seems indefensible to me.” Tovey argued that it could be further downgraded for inconsistency (because of considerable heterogeneity reflected by a I2 of 80%) or imprecision (because the confidence interval of the effect crosses the line of no longer being clinically significant). The authors – represented by officials of the Norwegian Institute of Public Health (NIPH) - argued that heterogeneity was mostly due to the study by Powell et al.: when it was removed, the heterogeneity became acceptable while the effect size remained moderate. Regarding imprecision, they argued that GRADE only advises downgrading when the confidence interval crosses the line of no effect, not the line of a clinically significant effect. In the email correspondence, the authors did seem to agree that these were both borderline cases and open to interpretation. They, therefore, proposed the following compromise, as explained by Fretheim Atle from the NIPH: “I proposed a compromise: We simply grade the evidence for this outcome as Low-moderate. The authors have accepted to use the term ‘may’ (usually indicating low certainty evidence) when describing the certainty of the evidence, rather than the term ‘probably’ (usually indicating moderate certainty). They have also accepted not to use any categorization of the effect size.” An alternative solution proposed was to use the term “low to moderate quality evidence”. The 2019 amendment, however, uses the words “probably” and “moderate-certainty evidence”.

    EDIT: The changes made to the Cochrane review are not an update (which would include a new literature search and new studies) but an amendment. This has now been changed in the text.
     
    Last edited: Oct 9, 2019
  15. Hoopoe

    Hoopoe Senior Member (Voting Rights)

    Messages:
    5,424
    It is of course a coincidence that all 5 proposals would have made GET look worse.
     
    alktipping, Annamaria, Sean and 10 others like this.
  16. Tom Kindlon

    Tom Kindlon Senior Member (Voting Rights)

    Messages:
    2,254
  17. Mark Vink

    Mark Vink Established Member (Voting Rights)

    Messages:
    81
    Dolphin, can you please upload the fine trial individual scores or do you have a link to these scores for me; thank you
     
    Annamaria and MEMarge like this.
  18. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
  19. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,659
    Location:
    Canada
    Excellent!

    I have not seen any justification for those from Cochrane in the review and commentary. Did I miss them? Every single one of those points is damning. Combined they frankly amount to malpractice, "justified" or not. Simply describing the problem is not a proper justification. Neither is "this would be too hard".

    This simply does not amount to serious work. Except they are all competent professionals, which suggests a much worse reason to produce something so ridiculously bad, especially knowing all the scrutiny it would be subjected to and how much is already documented.
     
  20. large donner

    large donner Guest

    Messages:
    1,214
    None of this is going to change without legal action!!

    The Lancet wont change, the BMJ wont and Cochrane wont either. I doubt NICE will throw themselves under the bus either especially with Cochrane now claiming there is evidence of efficacy.

    This needs to go to the Supreme Court. Simple as that.
     
    alktipping, Annamaria and Medfeb like this.

Share This Page