Independent advisory group for the full update of the Cochrane review on exercise therapy and ME/CFS (2020), led by Hilda Bastian

FMMM1 · Nov 10, 2021

Hi folks,
on September 21st I emailed NICE* to express concern re their use of Cochrane reviews to evaluate evidence; referring to the recent NICE review which referred to "low" or "very low" quality of the studies - Cochrane had evaluated as "moderate"
[Myalgic encephalomyelitis (or encephalopathy)/chronic fatigue syndrome: diagnosis and management].

NICE have replied to my email:
"Dear Francis,

Thank you for contacting the National Institute for Health and Care Excellence (NICE) regarding the Cochrane.

Would it be possible to ask for some clarification on the following so we can look into this further for you stated:

“Cochrane found these studies to be "moderate" quality evidence i.e. despite the fact that they were unblinded/inadequately blinded and used subjective outcome criteria”

The wording of the enquiry suggests that the Cochrane review mentioned is of psychological interventions.

The most recent Cochrane review of psychological therapies (CBT) in CFS<https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD001027.pub2/full> does not use the word ‘moderate’ anywhere in describing evidence quality. Additionally, there is an editorial note on this Cochrane review which states ‘This 2008 review predates the mandatory use of GRADE methodology to assess the strength of evidence, and the review is no longer current.’ It would therefore not be appropriate to compare findings from this Cochrane review with NICE because it does not use GRADE methodology and is described as not current.

However there is a Cochrane review of exercise therapy for CFS<https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub8/full> which does include some findings rated as moderate in GRADE tables. Can I confirm if you were referring to this?"

I really don't know much about the ME/CFS NICE review, just that it seems to have appropriately rated the evidence for CBT and/or GET (exercise) as "low" or "very low" quality. I'd assumed that Cochrane had rated these studies (PACE etc.) as "moderate". So I'm not confident that I can respond to NICEs query. If there's anyone who can e.g. explain if there's a mismatch between Cochrane's evaluation of CBT and/or GET (exercise), and NICEs, then I'd be grateful.

Thanks in advance.

*Email to NICE September 21st
"This month Prof. Gillian Leng (NICE chief executive - cc) announced that NICE "have signed a collaborative agreement with Cochrane. Cochrane has a well established reputation for producing high quality systematic reviews which take into account the very latest evidence".
The recent review by NICE, of studies relating to the use of psychological interventions to treat myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), found that they were "low and very low quality"* evidence. Cochrane found these studies to be "moderate" quality evidence i.e. despite the fact that they were unblinded/inadequately blinded and used subjective outcome criteria (questionnaires) - rather than objective outcome criteria (FitBit type devices which reliably monitor activity).
NICE's use of Cochrane reviews also creates a risk for those with Long covid, and Lyme disease, i.e. since "low and very low quality" evidence will be considered "moderate" quality - suitable evidence to support the use of psychological interventions like CBT.

I ask those on the APPG for ME and APPG for Coronavirus, NICE (Prof. Gillian Leng) and others, to consider how the issue of NICEs reliance on flawed Cochrane reviews can be addressed.
Thank you in advance for your assistance,
Francis"

ME/CFS Science Blog · Nov 10, 2021

FMMM1 said:
I really don't know much about the ME/CFS NICE review, just that it seems to have appropriately rated the evidence for CBT and/or GET (exercise) as "low" or "very low" quality. I'd assumed that Cochrane had rated these studies (PACE etc.) as "moderate".

Both rate outcomes (e.g. fatigue, sleep, depression) for a particular comparison (for example exercise therapy versus treatment as usual) rather than individual trials (like PACE).

You could argue that there is a discrepancy between the NICE evidence review (which rated all GET outcomes as low to very low quality) versus the Cochrane review (Larun et al. 2019) which highlighted the outcome: fatigue measured post-treatment for the comparison GET versus treatment as usual, as moderate quality. Larun and colleagues used that particular outcome to argue that GET reduces fatigue.

For most other outcomes, however, the Cochrane review also rated it as low quality so there isn't that much of a difference with the NICE Evidence review.

Hope this helps!

Caroline Struthers · Nov 10, 2021

Michiel Tack said:
Both rate outcomes (e.g. fatigue, sleep, depression) for a particular comparison (for example exercise therapy versus treatment as usual) rather than individual trials (like PACE).

You could argue that there is a discrepancy between the NICE evidence review (which rated all GET outcomes as low to very low quality) versus the Cochrane review (Larun et al. 2019) which highlighted the outcome: fatigue measured post-treatment for the comparison GET versus treatment as usual, as moderate quality. Larun and colleagues used that particular outcome to argue that GET reduces fatigue.

For most other outcomes, however, the Cochrane review also rated it as low quality so there isn't that much of a difference with the NICE Evidence review.

Hope this helps!

There was a huge to-ing and fro-ing about this between Cochrane and Larun's boss Atle Fretheim with an "independent" (NOT) GRADE inventor cum arbitrator Gordon Guyatt thrown in at the end. Authors refused to downgrade the GRADE quality rating on that one outcome (fatigue post-treatment) even to "low-to-moderate" as advised by Cochrane. Cochrane wanted to withdraw the review completely but I think if the authors refuse to agree to it, they can't do so. Some sort of weird code of conduct between journal editors and academics which transcends everything else, even the safety of patients. Weird for a charity isn't it?

The correspondence obtained via FOI is excruciating. Arguing the toss to save reputation of Cochrane and its contributors.

Correspondence with David Tovey

Correspondence with Karla Soares-Weiser

Caroline Struthers · Nov 10, 2021

PS - Andy Oxman has just published a paper with Larun https://f1000research.com/articles/10-433. Oh the irony.

BruceInOz · Nov 11, 2021

How much of the difference is accounted for by NICE downgrading studies that did not require PEM. I presume cochrane did not do this. If they did, are the two more in line?

Esther12 · Nov 11, 2021

Caroline Struthers said:
PS - Andy Oxman has just published a paper with Larun https://f1000research.com/articles/10-433. Oh the irony.

'Quality of information in news media reports about the effects of health interventions: Systematic review and meta-analyses'

Looks like they registered that review in 2018, before Atle's discussions with Tovey: https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=95032

Jonathan Edwards · Nov 11, 2021

I think it may be a red herring to worry about the discrepancy between the NICE and Cochrane grading of the evidence.

The key point is that both are committed to using an invalid system, GRADE.
The discrepancy just shows that not only does use of GRADE lead to results that are unrealistic, it does so inconsistently.The result is at the whim of vested interests - which is precisely what it is designed not to be.

Hoopoe · Nov 11, 2021

A study that could easily be confusing reporting bias with true treatment effect should not count as valid evidence for treating patients.

From what I understood with GRADE such studies would generally count as valid evidence. With the NICE guidelines the studies reached the appropriate low grade only when other factors were also taken into consideration.

GRADE seems to be a system that allows even homeopathy to reach the status of evidence based treatment for almost any condition (if sufficiently subjective endpoints are chosen).

Jonathan Edwards · Nov 11, 2021

Indeed, @strategist, it is no more complicated than that.
The last thing we want is an argument about who is best at doing GRADE right.

Caroline Struthers · Nov 11, 2021

strategist said:
GRADE seems to be a system that allows even homeopathy to reach the status of evidence based treatment for almost any condition (if sufficiently subjective endpoints are chosen).

And it also allows treatments that don't seem to harm people in trials (especially if you don't look for, or report harms) to reach the status of evidence-based - even if they don't work. Doesn't Guyatt talk about moderate quality evidence for a "non-zero effect". What sort of GRADE-induced nonsense is that??!

Caroline Struthers · Nov 11, 2021

Jonathan Edwards said:
I think it may be a red herring to worry about the discrepancy between the NICE and Cochrane grading of the evidence.

The key point is that both are committed to using an invalid system, GRADE.
The discrepancy just shows that not only does use of GRADE lead to results that are unrealistic, it does so inconsistently.The result is at the whim of vested interests - which is precisely what it is designed not to be.

GRADErs are as bad, if not worse, than Cochranites in their belief in the process they have devised. Even when their own studies show it's not fit for purpose. I will dig out the paper from a presentation I attended where an arch GRADE person (really nice guy) showed a particular aspect of GRADE (I think its accuracy in predicting whether future studies will change the strength of evidence...) did not actually work in practice. He seemed puzzled and upset. But as far as I can tell, he is still one of the faithful!

Caroline Struthers · Nov 11, 2021

Caroline Struthers said:
GRADErs are as bad, if not worse, than Cochranites in their belief in the process they have devised. Even when their own studies show it's not fit for purpose. I will dig out the paper from a presentation I attended where an arch GRADE person (really nice guy) showed a particular aspect of GRADE (I think its accuracy in predicting whether future studies will change the strength of evidence...) did not actually work in practice. He seemed puzzled and upset. But as far as I can tell, he is still one of the faithful!

I think this is the paper https://pubmed.ncbi.nlm.nih.gov/26342443/

Hoopoe · Nov 11, 2021

In engineering disciplines one method to find flaws in systems is to actively search for ways to make systems fail.

If we translate that to GRADE, one way to show it fails is I suspect would be to find clinical trials that show a lack of effect on blinded or objective outcomes but also report some improvement on subjective outcomes. If we take only the subjective outcomes, can we get a nice GRADE score for the treatment? I suspect yes, even though the more reliable outcomes are already telling us it doesn't work.

That could be a proof that GRADE will fail even when used properly.

adambeyoncelowe · Nov 11, 2021

The way they pick reviewers is also problematic. NICE at least strives for a balanced and representative guideline development committee, with a separate technical team, and then has things like the consultation to put its findings out into the world and get feedback on them.

Cochrane seems almost designed to leave decision-making to those with COIs and is neither as transparent nor as rigorous. And if the editors can't change those decisions, it's very easy for people to say they won't budge without very good reason.

petrichor · Nov 11, 2021

Caroline Struthers said:
There was a huge to-ing and fro-ing about this between Cochrane and Larun's boss Atle Fretheim with an "independent" (NOT) GRADE inventor cum arbitrator Gordon Guyatt thrown in at the end. Authors refused to downgrade the GRADE quality rating on that one outcome (fatigue post-treatment) even to "low-to-moderate" as advised by Cochrane. Cochrane wanted to withdraw the review completely but I think if the authors refuse to agree to it, they can't do so. Some sort of weird code of conduct between journal editors and academics which transcends everything else, even the safety of patients. Weird for a charity isn't it?

The correspondence obtained via FOI is excruciating. Arguing the toss to save reputation of Cochrane and its contributors.

Correspondence with David Tovey

Correspondence with Karla Soares-Weiser

By my reading of this correspondence, this makes the previous cochrane review extremely misleading. The reason they chose not to downgrade for imprecision was because they post hoc decided that the question the review was answering wasn't whether exercise therapy had a clinically meaningful effect (a question that people are actually interested in), but whether it has any non-zero effect at all, trivial or not (a question no one is interested in).

Given such a strange goal for a review, it ought to have made it explicitly clear that's the question it was answering in claiming there was moderate quality evidence. As Tovey himself suggested, they should have used the phrasing "might, or might not be, clinically important". Instead they just chose not to make any specification about the magnitude of the effect - which is misleading because they would know everyone would assume they're referring to an effect that actually has some clinical importance (after all, helping make healthcare decisions is the purpose of cochrane).

I can't find anywhere in the review, let alone the abstract, where they make it clear the question they're answering isn't whether exercise therapy has a clinically meaningful effect - but whether it has any non-zero effect at all, important or not. The authors did not follow Guyatt's advice to qualify the moderate certainty evidence rating. You would actually be led to think they're answering the first question, as they actually define and talk about minimally important differences in the review.

To quote, they say "Clinical studies and meta-analysis can detect small differences in outcomes with little or no importance to individual participants. Moreover, the interpretation of what is considered an important difference may vary between patients, researchers and clinical experts (Wyrwich 2007). We therefore identified research literature to help quantify minimal important differences (MID) for important outcome measures"

Which entirely makes it sound like the goal of the review is to determine the level of certainty of evidence in regards to what's actually clinically important. The "Implications for practice" section also implies that the purpose of the review was determining whether there is a clinically important effect.

This, I think, essentially amounts to a deception, and that correspondence pretty much confirms it. In my opinion the previous review should be edited to explicitly specify they mean moderate quality evidence of an effect that may or may not be clinically important, as Tovey himself wanted. Otherwise it should be withdrawn, because in its current state it's extremely misleading.

(Furthermore, the agreement between the authors and Tovey/Soares-Weiser was to follow what Guyatt said. He advised qualifying the moderate evidence rating so it's clear they aren't talking about an important effect, and the authors didn't do that.)

Caroline Struthers · Nov 11, 2021

strategist said:
In engineering disciplines one method to find flaws in systems is to actively search for ways to make systems fail.

If we translate that to GRADE, one way to show it fails is I suspect would be to find clinical trials that show a lack of effect on blinded or objective outcomes but also report some improvement on subjective outcomes. If we take only the subjective outcomes, can we get a nice GRADE score for the treatment? I suspect yes, even though the more reliable outcomes are already telling us it doesn't work.

That could be a proof that GRADE will fail even when used properly.

Let's do it! Except will we find any trials which measure and report both (in one publication)

Caroline Struthers · Nov 11, 2021

petrichor said:
By my reading of this correspondence, this makes the previous cochrane review extremely misleading. The reason they chose not to downgrade for imprecision was because they post hoc decided that the question the review was answering wasn't whether exercise therapy had a clinically meaningful effect (a question that people are actually interested in), but whether it has any non-zero effect at all, trivial or not (a question no one is interested in).

Given such a strange goal for a review, it ought to have made it explicitly clear that's the question it was answering in claiming there was moderate quality evidence. As Tovey himself suggested, they should have used the phrasing "might, or might not be, clinically important". Instead they just chose not to make any specification about the magnitude of the effect - which is misleading because they would know everyone would assume they're referring to an effect that actually has some clinical importance (after all, helping make healthcare decisions is the purpose of cochrane).

I can't find anywhere in the review, let alone the abstract, where they make it clear the question they're answering isn't whether exercise therapy has a clinically meaningful effect - but whether it has any non-zero effect at all, important or not. The authors did not follow Guyatt's advice to qualify the moderate certainty evidence rating. You would actually be led to think they're answering the first question, as they actually define and talk about minimally important differences in the review.

To quote, they say "Clinical studies and meta-analysis can detect small differences in outcomes with little or no importance to individual participants. Moreover, the interpretation of what is considered an important difference may vary between patients, researchers and clinical experts (Wyrwich 2007). We therefore identified research literature to help quantify minimal important differences (MID) for important outcome measures"

Which entirely makes it sound like the goal of the review is to determine the level of certainty of evidence in regards to what's actually clinically important. The "Implications for practice" section also implies that the purpose of the review was determining whether there is a clinically important effect.

This, I think, essentially amounts to a deception, and that correspondence pretty much confirms it. In my opinion the previous review should be edited to explicitly specify they mean moderate quality evidence of an effect that may or may not be clinically important, as Tovey himself wanted. Otherwise it should be withdrawn, because in its current state it's extremely misleading.

(Furthermore, the agreement between the authors and Tovey/Soares-Weiser was to follow what Guyatt said. He advised qualifying the moderate evidence rating so it's clear they aren't talking about an important effect, and the authors didn't do that.)

Fantastic summary of the key issue here - thank you!

Hoopoe · Nov 11, 2021

Caroline Struthers said:
Let's do it! Except will we find any trials which measure and report both (in one publication)

How about this classic?
Active Albuterol or Placebo, Sham Acupuncture, or No Intervention in Asthma
https://www.nejm.org/doi/full/10.1056/nejmoa1103319

Trish · Nov 11, 2021

Caroline Struthers said:
Let's do it! Except will we find any trials which measure and report both (in one publication)

Yes, the asthma one we often quote. I'll try to find it. And PACE itself which had different outcomes for subjective and objective measures.

Edit - cross posted, yes, that's the asthma one.

Adrian · Nov 11, 2021

strategist said:
In engineering disciplines one method to find flaws in systems is to actively search for ways to make systems fail.

If we translate that to GRADE, one way to show it fails is I suspect would be to find clinical trials that show a lack of effect on blinded or objective outcomes but also report some improvement on subjective outcomes. If we take only the subjective outcomes, can we get a nice GRADE score for the treatment? I suspect yes, even though the more reliable outcomes are already telling us it doesn't work.

That could be a proof that GRADE will fail even when used properly.

Caroline Struthers said:
Fantastic summary of the key issue here - thank you!

Its interesting looking at this from the perspecitive of different disaplins. As someone who works in security we do reviews of systems to try to spot potential failings and security issues but I think more importantly try to build methodology and tools into the standard processes in order to improve quality and reduce security issues. For example, there are code analysis tools that spot issues and the use of weak libraries and tools that look at interactions between components for known vulnerabilities. I wonder if there could be an equivilant for trial design in terms of methodology and tools to help go through the design and help identify potential issues (both with the measures take, controls, as well as with the stats methods used (i.e. what techniques are appropriate given lack of independence or underlying distributions and baises in errors)). I was also thinking that aspects of systems are built on well known primatives and protocols to ensure secure communications (things like crypto and associated protocols such as TLS) - these are very well studied with considerable effort put in to break them - to the extent that anyone using their own cryptography would be laughed at. Is there an equivelent in the medical trial world - I get the impression that perhaps randomization is studied in this way but measurement systems seem to be a weak point where someone doing a trial can make up what they record (or use clearly poor questionnaires such as the CFQ).

The way to really address things is not to have a review system (like Grade) that will pick up on issues but to have better standards for trial design. Even to try to run through GRADE prior to running the trial could be valuable (perhaps an ethics committee should do this and say trials giving low quality or very low quality evidence are unethical as the results are meaningless!). It feels like trying to fix the way reviews are done is fixing the wrong problem.

The other important thing is to continuously look for flaws both in a given trial (and hence the trustworthyness of those results) and in the underlying things used in a trial. So in the securtiy world there is a big group of people who spend their lives trying to break systems (both attackers and defenders) and there is money for the defensive side in terms of bug bounties. Perhaps journals should offer bug bounties for people who find issues with published trials and that would help improve the quality of the published work. Also if for example particular methods are shown to be weak then there needs to be an appreciation that the trials that use them may be impacted and they need to be looked at from that perspective.

I suspect there is something around risks that also could be brought out. Where for example, there is concern that some mittigations may not work (for example - is a control group strong enough to control for the important factors) then perhaps additional measurement strategies should be put in place to help to judge the validity of a given control group.

Independent advisory group for the full update of the Cochrane review on exercise therapy and ME/CFS (2020), led by Hilda Bastian

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Administrator