Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Discussion in 'Psychosomatic research - ME/CFS and Long Covid' started by MEMarge, Oct 2, 2019.

  1. Seven

    Seven Senior Member (Voting Rights)

    Messages:
    186
    What are the tools at our disposal? Obviously talking will not cut it.
    1) Can the New Lady (Karla) be taken to the (lets say medical board) but the professional license she has to remove her license? or at least review it?
    2) The paper to court or any other legal venue to pressure them???

    This writing letter and waiting on their good will, will not happen. So what do you have to do pressure on a different venue legal or professional license?
     
    alktipping likes this.
  2. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,005
    Location:
    UK
    You are right that Cochrane has a very poor track record of actually listening to patients and doing anything differently. And I suspect the new approach coudl easily end up being effectively a whitewash.

    Yet the move to involve patients has potential, but will only come to anything useful if patients who understand the flaws with the review become actively involved from the off. We can't risk another situation like PACE, where people who were supposed to be representing the interests of patients didn't know what they were doing, and we know how that ended.

    Editor in chief statement

    If the protocol focuses on objective, real-world outcomes then I think that'll make a huge difference to the next review. Though I think it's important that patients also have an active role in the review itself, and in the protocol really, not simply sitting on the protocol advisory committee.


    So I am arguing for active involvement from patient-experts, which I think means mostly patients who are active here. Which is very easy for me to say, because I don't have the energy to contribute very much at all to the process.
     
    alktipping, JemPD, Hutan and 7 others like this.
  3. large donner

    large donner Guest

    Messages:
    1,214
    All this from Cochrane about ....

    ....is just fucking bullshit to be honest.

    This is like BMW having a customer advisory group after their cars keep going on fire and asking them to have input into cars that go on fire.

    If they cant see the BS on their own why are they running a scientific journal.
    None of that should be necessary just do the effing science properly you idiots.

    That's YOUR bloody job.
     
    Last edited: Oct 10, 2019
    alktipping, JemPD and Annamaria like this.
  4. Cheshire

    Cheshire Senior Member (Voting Rights)

    Messages:
    4,675
    Sorry if this is a bit off topic, I can't remember and can't find the answer. Is the PACE trial included in the Cochrane review?
     
    JohnTheJack likes this.
  5. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,083
    Location:
    Belgium
    I thought it might be useful to go through the main arguments against the review, to see if these make sense and if they would impact the results. I’m numbering the arguments to make it easier to discuss them.

    1) Pooling different fatigue questionnaires
    I think it might be true that it’s problematic to combine the data of different fatigue questionnaires when they aren’t measuring the same thing. One questionnaire might be measuring fatigue severity, the other whether patients are experiencing fatigue symptoms more than usual or when they were last well. That can be a problem, but I don’t think this will impact the conclusion of this review because all of the fatigue outcomes used some version of the Chalder Fatigue Questionnaire. The only exception is the American study by Jason et al. (2007) which presented data at follow-up, not post-treatment. It was a small study (n = 25) and the only one that didn’t find a reduction of fatigue (it had a positive SMD). So excluding it wouldn’t impact the strength of the conclusion.

    2) Problems with the Chalder Fatigue Scale
    That brings us to the following point. If we focus on the main comparison of exercise therapy versus a passive control, all outcomes present low or very low quality of evidence. The only exception is fatigue-post-treatment which was considered moderate-quality. As fatigue was a primary outcome in the review, the authors focus on this outcome in their conclusion. They write: “Exercise therapy probably has a positive effect on fatigue in adults with CFS compared to usual care or passive therapies.”

    Yet, all measurements of fatigue post-treatment used some version of the Chalder Fatigue Scale, a scale which has serious problems with ceiling effects and interpretability. Nowhere in the review are the problems with the Chalder Fatigue Scale discussed or mentioned by the authors, even though this could have significantly impacted the results. For example: In the large (n= 114) study by Powell et al. (2001) patients had a fatigue score of 10.25 out of 11 points at baseline. In another large (n = 85) study, the FINE trial patients had a score of 10.45 out of 11 points for fatigue at baseline. It isn’t hard to see that 'clinical effects' can only move in one direction in such cases. Given that the effects for fatigue post-treatment are small, it seems that this problem could have significantly impacted the conclusion. I think one can make a strong case that the authors should have brought up the problems with the Chalder Fatigue Scale, for example in the discussion section.

    3) Selective reporting in the PACE trial
    I’m no expert on the PACE trial but I think it’s pretty clear that it should be rated high risk of bias for selective reporting. The authors used the Risk of bias tool from the previous Cochrane handbook (Higgins et al. 2011). Chapter 8 explains that Criteria for the judgement of ‘High risk’ of bias for selective reporting are:

    “Any one of the following:
    • Not all of the study’s pre-specified primary outcomes have been reported;
    • One or more primary outcomes is reported using measurements, analysis methods or subsets of the data (e.g. subscales) that were not pre-specified;
    • One or more reported primary outcomes were not pre-specified (unless clear justification for their reporting is provided, such as an unexpected adverse effect);
    • One or more outcomes of interest in the review are reported incompletely so that they cannot be entered in a meta-analysis;
    • The study report fails to include results for a key outcome that would be expected to have been reported for such a study.”
    The PACE trial meets most of these criteria. I think the strongest case is that the primary outcome of overall improvers as defined in the protocol was never reported in any of the PACE trial papers. Larun et al. argued that the changes to the PACE protocol and the way outcomes were reported, were made before examining any outcome data. But in an unblinded trial researchers probably get a clue of the direction and size of the effects without looking at the data. If the results look smaller than expected, they can change their analysis accordingly to make these small changes look more significant. The PACE authors have also never reported the data for the fitness test (there's only a graph in the mediation analysis paper). For a trial that assesses exercise therapy, this seems like a key outcome, one which can be expected to be reported more clearly.

    So I think there’s a strong case for this argument. I’m not sure however if it would significantly alter the results. The Cochrane risk of bias tool is just a way of standardizing assessment of risk of bias and making sure it is done in a transparent way. A different score doesn’t automatically lead to a change in the quality of evidence or the phrasing used to draw a conclusion. These are based on the GRADE assessment system. I suspect that high risk of bias due to selective reporting in one of the trials would not suffice for downgrading the quality of evidence.
     
  6. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,083
    Location:
    Belgium
    Yes
     
    JohnTheJack, Simon M, Dolphin and 2 others like this.
  7. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,848
    Location:
    Canada
    Just shows how confused the whole thing is. According to Cochrane's language guide, low evidence uses may while moderate evidence uses probably. Here he uses may while arguing for moderate evidence of "benefit", whatever that benefit may be is not at all described in any meaningful way. Still stuck at basic vocabulary problems. In addition to whatever is meant by "improvement" or "recovery".

    And it's completely indefensible to have the word uncertain used over 70 times and find it reasonable to argue this is moderate evidence, or even evidence of anything at all. In addition to the massive uncertainty of who this advice may apply to, how to differentiate and not properly accounting for severe patients who were entirely excluded from the papers used.

    Complete mess. As expected. Literally no one is happy about this, no doubt, while this is basically taking a dump on us, so so much for Cochrane being about patients first.
     
    alktipping, Anna H, Annamaria and 4 others like this.
  8. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,848
    Location:
    Canada
    This is absurd considering those trials are only possible for patients who have the capacity to participate in the first place, heavily skewed towards the less disabled. The ceiling effect seems so severe it essentially maxes out for at least half of ME patients, if not more. Didn't realize how badly before. This is completely disqualifying for a mere questionnaire that doesn't even "measure", if the word is even appropriate here, much that is relevant to ME.
     
    alktipping, Anna H, Annamaria and 5 others like this.
  9. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    FINE was designed to allow house-bound patients to participate, though it ended up also allowing those who were too healthy to be included in PACE to enter too so I'm abit surprised their average scores was so close to the maximum.
     
    alktipping, Annamaria, ukxmrv and 4 others like this.
  10. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,420
    Reminds me of how it apparently was before the railways in Britain, when the time was only roughly the same across the country. Towns and villages went by their own church clock, and it didn't really matter there was no universal agreement of any standard time. Once the railways arrived, with the need for timetables, and the need for them to arrive at a destination after leaving their starting point ( :) ) it became necessary to agree a standard.

    With something like fatigue it is going to be immensely difficult to home in on any kind of standard, because how do you quantify in any standard way - you cannot. Which is why it is all such a lot of cock.

    As a thought experiment, you could imagine taking your calibrated fatigue zapper to someone, and saying that is what level 5 fatigue feels like - remember that when you answer the questions! But there is no way to get any sort of calibrated readings, so you just get readings that mean one thing to one participant in one study, and another elsewhere.
     
  11. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,083
    Location:
    Belgium
    4) GRADE: do the oxford criteria constitute indirect evidence?
    The most recent version of the GRADE handbook can be found here (https://gdt.gradepro.org/app/handbook/handbook.html). In contrast to the Cochrane risk of bias tool, GRADE doesn’t asses individual trials but outcomes such as fatigue for all the trials included in the review. There are basically 4 ratings for assessing quality of evidence: High, moderate, low or very low. Evidence from randomized controlled trials (RCT’s) starts out having high quality but there are 5 reasons why it can be downgraded. For each of the 5 reasons the evidence can be downgraded with one or two levels.

    Risk of bias (Limitations in study design or execution) is one of the factors that can reduce the quality of evidence. Except for adverse reactions, all outcomes in the Cochrane review were downgraded with one level due to a lack of blinding. In an email of 27 May 2019, Tovey said that some thought that fatigue (post-treatment) should also be downgraded for selective outcome reporting. I’m not sure but this probably refers to problems with the PACE trial.

    Publication bias is another possible reason for downgrading. It refers to a situation where studies that report a positive treatment effect are more often published (and used in reviews) than studies that found null results. Researchers can test this by looking at the effect sizes and sample size of studies. Larger studies tend to have less extreme results so one would expect that they are somewhere in the middle of a scatter plot. If all the small studies have larger effect sizes than the large studies, that would suggest there are some small studies with small or negative effect sizes missing. Unfortunately this could not be tested because the review on GET only includes 8 RCT’s. Cochrane advised to only do a scatter plot when there are at least 10 RCT’s because when there are fewer studies the power of the tests is too low to distinguish chance from real asymmetry.

    I personally suspect that publication is an issue in the GET/CBT literature. In summary, it seems that the small trials indicated that there was an effect but this could not be replicated by the larger studies (PACE and FINE). This conclusion is somewhat obscured by the lack of trials and the outlier of Powell et al. (2001) which found much larger effects than the other studies. There is however a 2015 review by Marques et al. that looked at CFS trials for all “behavioral interventions with a graded physical activity component”. They wrote that they had “found some indication of publication bias.” But that isn’t relevant to the Cochrane review, where no conclusion about publication could be made.

    Indirectness of evidence is another factor for downgrading the quality of evidence with GRADE. Sometimes you don’t have any data on what you’re interested in, but there are high-quality RCT’s available for a population group or intervention that is very similar. In that case, the evidence can be used but it should be rated down for indirectness. Research done on animals, for example, would normally be downgraded with two levels. What also frequently happens is that the evidence is based on surrogate outcomes. So instead of diabetic symptoms, for example, the studies measured glucose levels. That’s another reason for downgrading for indirectness.

    Perhaps one could make the case that the Oxford criteria do not select ME/CFS as it is currently defined and that the evidence presented by Larun et al. should therefore be downgraded, for pretty much all outcomes. Just to give an example: 88% (post-treatment) to 93% (follow-up) of patients in the exercise group providing outcomes for the primary outcome of fatigue were selected using the Oxford criteria.

    I think downgrading for indirectness is an option for policymakers such as NICE (which also adopts GRADE). The new NICE guideline committee will probably give a short description of ME/CFS as it did last time and this will, I suspect, differ significantly with how the Oxford criteria define CFS. So if they would like to give recommendations for ME/CFS patients based on the GET Trials that used the Oxford criteria, it would be reasonable to downgrade for indirectness. After all, there’s an NIH report that said that “continuing to use the Oxford definition may impair progress and cause harm.”

    Regarding the Cochrane review, I think that the authors could at least have explained the problem better. They basically said: ‘the included trials used diagnostic criteria X and Y and patients diagnosed with other criteria may experience other effects.’ That’s probably true for all reviews so the added sentence is pretty much meaningless. What they should have said is that the diagnostic criteria in their review do not require post-exertional malaise or a marked worsening of symptoms following exertion while all the other and more recent criteria do because it is seen as a hallmark symptom of ME/CFS. This is particularly relevant in a review that wants to assess the efficacy and safety of exercise therapy.
     
  12. large donner

    large donner Guest

    Messages:
    1,214
    I just cant fathom how claims of the level that would not pass an Advertising Safety Authority standard and would result in withdrawal of the claim or even a fine can sit in a scientific journal, a supposedly more rigorous and influential medium, with zero consequences.

    Its even more pathetic when one tries to make some sense of the claim and the levels of double speak that Cochrane go to.

    If the Cochrane report was an advert on TV would it even get past the ASA?

    Would the ASA just leave the main conclusion in place and keep running the Ad happy in the knowledge that at some point in the future the people who made the Ad would consult with their consumers to make a new commercial?

     
    Last edited: Oct 10, 2019
  13. Trish

    Trish Moderator Staff Member

    Messages:
    55,962
    Location:
    UK
    As someone who prefers honesty and simplicity, I would like to see Cochrane dump their tortuous scoring processes and recognise that any trial that is unblinded and has subjective outcome measures is a non starter, and should not even make it into the review process.
     
  14. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,848
    Location:
    Canada
    And had a null result. I don't understand why it's even in there. Likely because of how it was used as a base to generate the GP training and it would look bad. Anyway it's as uninterpretable as the rest.
     
  15. JemPD

    JemPD Senior Member (Voting Rights)

    Messages:
    4,509
    @Michiel Tack you are doing a fantastic job with all the summarising, it's beyond me, thank you so much for your efforts.


    Amen. Thank you LD for articulating my feelings on the matter so succinctly.

    Sometimes your turn of phrase really makes me laugh Barry, thanks for that:rofl:
    I've been feeling so despairing about all this that it was much needed light relief.
    It really is all truly such a lot of cock.
     
  16. large donner

    large donner Guest

    Messages:
    1,214
    More rubbish!

    They had over three years to do all of that and then after they decided to retract they bottled it and allowed the writers of the review off the hook and republished the same warped conclusion.

    If they were serious about the above statement they would have involved the patient groups etc before they republished the latest non retracted crap that they know damn well is utter crap.
     
  17. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    Thanks again for your comments @Michiel Tack :

    re Indirectness of evidence - does this also relate to the fact that we do not have a direct measure of fatigue?
     
  18. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,083
    Location:
    Belgium
    I don't know: what would be a direct outcome of fatigue in your view?

    Most examples are actually the other way around: it's about objective outcomes that are used as a proxy for how patients are doing/feeling. For example exercise capacity or the number of red blood cells as a proxy for quality of life or bone density as a proxy for the number of fractures etc.
     
  19. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    I don't think we have a direct outcome for fatigue. That absence doesn't make the Chalder Fatigue Questionnaire any more valid though! I'm sure I was recently reading something about the importance of showing a questionnaire was a valid measure of what it was supposed to be measuring, and wondered if that was GRADE related, but I've forgotten the details now.
     
  20. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,083
    Location:
    Belgium
    5) Fatigue post-treatment should be rated as low instead of moderate quality evidence
    The other two factors GRADE uses to downgrading quality of evidence are inconsistency and imprecision. I would like to look at these more closely because they are at the heart of David Tovey’s argument that the outcome for fatigue post-treatment should be rated low instead of moderate quality. In an email of 27 may, Tovey wrote: “I can see three possible reasons for a downgrade: lack of blinding/subjective outcomes, imprecision, and inconsistency, so the conclusion that this is moderate certainty evidence seems indefensible to me, and as we know, I am not alone in this.”

    Inconsistency refers to an unexplained heterogeneity of results. The GRADE handbook writes: “Criteria to determine whether to downgrade for inconsistency can be applied when results are from more than one study and include:
    • Wide variance of point estimates across studies (note: direction of effect is not a criterion for inconsistency)
    • Minimal or no overlap of confidence intervals (CI), which suggests variation is more than what one would expect by chance alone
    • Statistical criteria, including tests of heterogeneity which test the null hypothesis that all studies have the same underlying magnitude of effect, have a low p value (p<0.05) indicating to reject the null hypothesis.”
    It also refers to the I2 statistic, which is a measure for heterogeneity. I2 of 30-60% is seen as moderate, 50-90% substantial while 75-100% is seen as ‘considerable heterogeneity’.

    The graph below gives the info about the analysis of fatigue post-treatment.
    upload_2019-10-10_23-54-41.png
    As one can see, I2 is 80% which indicates considerable heterogeneity. The p-test clearly rejects the null hypothesis that all studies have the same underlying magnitude of effect. There is a wide variance of point estimates going from an SMD of -0.27 in the FINE Trial to -1,52 in the trial by Powel et al. 2001. That trial shows some overlap with the small studies of Fulcher and Moss-Morris, but not when compared to the other large studies.

    So I think we can say there is large heterogeneity in this review for the outcome of fatigue post-treatment. The authors acknowledged this. In the results section they write: “The analysis suffered from considerable heterogeneity (I2 = 80%, P< 0.0001) that we explored insensitivity analysis.”

    For downgrading the quality of evidence, heterogeneity has to be unexplained. GRADE advises to do sensitivity analyses to check if it can be explained by differences in populations, interventions, outcomes or study method. This was not the case in the Cochrane review where heterogeneity was mostly caused by the trial by Powell et al. 2001. Exclusion of this trial leads to an acceptable heterogeneity (I2 = 26%, P = 0.24).

    In their justification for not downgrading for inconsistency, Larun et al. write:
    GRADE however explicitly says that “differences in direction, in and of themselves, do not constitute a criterion for variability in effect if the magnitude of the differences in point estimates is small.” It doesn’t matter whether the effect sizes are on the same side of the border crossing a positive or negative effect for an intervention. That border is arbitrary in statistical terms. What matters is the size of the difference. Similarly, the GRADE handbook doesn’t write that heterogeneity is not an issue if it is due to only one outlier. What matters is whether that outlier significantly alters the results. In a review with only 8 RCT’s most of which are really small ones, one big study can have a large impact on the results. And that seems to be the case here: If the trial by Powell et al. is removed the effect size for fatigue post-treatments is reduced by a third, from an SMD of -066 to an SMD of -0.44.

    In the email correspondence the authors argue that an SMD of -0.44 is still a moderate effect, so this doesn’t change much. But the terms ‘small’, ‘moderate’ and ‘large’ for effect sizes are arbitrary and there are different rules for applying them (Jacob Cohen, the person who introduced these said an SMD of 0,5 should be considered moderate). These are just names to help interpret statistical data. They do not play an important role in the GRADE handbook. More important is that excluding the outlier reduces the effect size by a third. So I think there’s an argument for reducing the quality of evidence for fatigue post-treatment for inconsistency.

    .................................................................................................................................................................................................................................

    Imprecision is the fifth reason for downgrading the quality of evidence in GRADE. Results are imprecise when studies include relatively few patients and few events and thus have a wide confidence interval (CI) around the estimate of the effect. So one can have a moderate SMD that indicates that the intervention reduces fatigue, but if the confidence interval is wide and includes values that suggest the intervention doesn’t work at all, the quality of evidence can be downgraded for imprecision.

    David Tovey argued that this is the case for fatigue-post treatment in the review on GET. The SMD was -0.66 indicating a moderate effect, but the confidence interval (CI −1.01 to −0.31) includes values where there is no longer a clinically significant effect. The authors, however, argued that confidence interval does not cross the line of no effect and that this is what matters. Atle Fretheim explained: “Our reading of the GRADE-handbook tells us that in a case like ours, if the 95% CI crosses the line of no effect, downgrading is warranted. You opine that downgrading is warranted if the 95% CI crosses the line for minimal clinically important difference.”

    It’s true that the GRADE handbook advises not to downgrade when the 95% CI excludes no effect, but it gives that advice for dichotomous outcomes. These usually express the risk that something (usually something bad) will happen such as a stroke or infection. The patient-reported questionnaires used in the GET review are not dichotomous but continuous outcomes. For continuous outcomes, GRADE writes: “Whether you will rate down for imprecision is dependent on the choice of the difference (Δ) you wish to detect and the resulting sample size required. Again, the merit of the GRADE approach is not that it ensures agreement between reasonable individuals, but that the judgements being made are explicit.”

    In short: I think the GET review meets that sample size requirement, so there is an argument for not downgrading for imprecision as Larun et al. did. Tovey's argument, however, makes some sense as well. For guideline panels GRADE advises that “the decision to rate down the quality of evidence for imprecision is dependent on the threshold that represents the basis for a management decision.” Given that authors themselves have defined the minimal important difference at 2.3 points on the Chalder Fatigue Scale, and the lower bound of the confidence interval corresponds to a difference of 1,6 points, this would suggest downgrading the quality of evidence for a guideline panel such as NICE.

    For authors of a systematic review, however GRADE advises simply focussing on sample size calculation, because it’s not the job of reviewers to define a clinically useful threshold and to determine the economic costs or the tradeoff between desirable and undesirable consequences. But I think one could argue that in cases where the effect size is so close to the clinically significant difference there is no need for such complex considerations, and it is reasonable to downgrade for imprecision. After all, what is the point of saying there is moderate-quality evidence that exercise therapy reduces fatigue if the size of that reduction is quite likely not clinically significant?

    ..........................................................................................................................................................................................................................

    So for both Inconsistency and Imprecision, there are reasons to downgrade (for the first more than for the second in my opinion) but it’s not a clear case, more a matter of judgement. The authors acknowledged this in the email correspondence by proposing the term low-moderate quality of evidence as a consensus. The GRADE handbook, however, writes that if there is a borderline case to downgrade for two factors, the authors should downgrade for at least one of them and explain the situation in a footnote. It writes: “If, for instance, reviewers find themselves in a closecall situation with respect to two quality issues (risk of bias and, say, precision), we suggest rating down for at least one of the two.”

    Actually this is what Larun et al. did for physical function post-treatment. Both inconsistency and imprecision were borderline cases, so they rated down with one level. They explain it as follows:
    The same thing could be said about fatigue post-treatment only here the interval ranges is smaller, the problem is more that it includes values of no clinically significant effect.

    I think there’s a case that the quality of evidence for fatigue post-treatment is low, not moderate. Even if we exclude the main problem of using subjective outcomes in unblinded trials, there are so many issues here – ceiling effects on the Chalder Fatigue Scale, heterogeneity caused by the outlier of Powell et al., an effect size barely crossing the threshold of clinically significance, that I think it’s wrong to state that “exercise therapy probably has a positive effect on fatigue.”
     
    Last edited: Oct 10, 2019

Share This Page