BMJ: Rapid response to 'Updated NICE guidance on CFS', 2021, Jason Busse et al, Co-chair and members of the GRADE working group

tmrw · Feb 26, 2021

Oh no. It’s even worse. It is more authors. Look who is here:
Per Olav Vandvik, Elie A. Akl, Sue Brennan, Philipp Dahm, Marina Davoli, Signe Agnes Flottorp, Paul Garner, Joerg J. Meerpohl, Reem Mustafa, Maria X. Rojes, Gordon H. Guyatt
McMaster University
1280 Main St. West, Hamilton, ON, Canada

Edit:
Competing interests: GHG is co-chair of the GRADE Working Group; SAF, EAA, PD, SB, MD, MR, JJM, RM, and MR are members of the GRADE guidance and working groups; POV is a member of the GRADE Working Group. This Rapid Response is not an official communication from the GRADE Working Group.

chrisb · Feb 26, 2021

Very clever. The one, Busse,who is first named seems to be the only one, apart from PG, not directly associated with GRADE. He seems instead to have an interest in "insurance medicine".

It is however rather confusing. PG does not seem to appear in the competing interests section. However there are twelve authors and eleven COIs, but neither JB or PG are listed. I never was good at maths.

Adam pwme · Feb 26, 2021

lunarainbows · Feb 26, 2021

I hope someone (in a prominent position) can counter this publicly and get it published somewhere. I’m worried this is going to hurt NICE and the guidelines

Sean · Feb 26, 2021

Andy · Feb 26, 2021

rvallee said:
I imagine there is supposed to be a point somewhere in there but all I read is people saying if they had been tasked with it they would have done it differently because they don't like the outcome it gave without being there to change it to their preference. No substantial argument other than they don't like being shown wrong and simply want to continue with their bad and harmful pseudoscience. It's very meandering. Unclear if it's submitted to the NICE committee or trying to influence the outcome using external political pressure.

As usual it cites the Cochrane exercise (allegedly) under review. Cochrane's inexplicable behavior is continuing to impair progress and cause harm. The obsession these people have with snake oil is beyond absurd, it's deranged and completely self-serving, pays no attention whatsoever to their actual work, this is naked self-interest, how it affects them personally.

https://www.bmj.com/content/371/bmj.m4774/rr-7

It appears to be people from the GRADE working group, which explains a lot about why it produces such poor results:

@Caroline Struthers , you might be interested in this. If I was into conspiracy theories I might think that Cochrane dragging their feet about their review might have something to do with this.

Sly Saint · Feb 26, 2021

SW must be racking up quite a bar bill.

Hoopoe · Feb 26, 2021

I don't understand the finer details of GRADE but if it's true that it allows demonstratably flawed science to pass as adequate, then it's flawed and should be retired.

Caroline Struthers · Feb 26, 2021

Andy said:
@Caroline Struthers , you might be interested in this. If I was into conspiracy theories I might think that Cochrane dragging their feet about their review might have something to do with this.

PG messaged me about the Exercise review on 6 July...

cassava7 · Feb 26, 2021

About the lead author, Dr Busse (from his McMaster page):

Dr. Busse’s clinical interests include insurance medicine, orthopedic trauma, chronic pain and other medically unexplained syndromes, and management of complex disability. Dr. Busse is also interested in methodological research including expertise-based randomization and the use of composite endpoints in clinical trials.

Note that the authors made the clever trick of separating 'lack of blinding' from 'subjective outcomes' into two unlinked paragraphs:

At the same time, trial results may or may not be appreciably affected by blinding. A recent meta-epidemiological study reported no difference in estimated treatment effect between trials with versus those without blinding of patients, healthcare providers, or outcome assessors.[5] It is therefore reasonable not to rate down the certainty of evidence for risk of bias because of failure to blind as the sole problem.

GRADE does not rate down the certainty of evidence on the basis that subjective outcomes are reported directly by patients. Indeed, GRADE provides detailed guidance on presenting and interpreting results of patient reported outcomes such as fatigue, pain, physical functioning, and quality of life.[7] This guidance reflects GRADE’s emphasis on what is most important to patients. In the case of chronic fatigue syndrome, the Cochrane review finding of important improvement in fatigue with exercise will be crucial for patients in choosing their treatment.

Noone ever said that the lack of blinding alone constitutes a problem, but that the lack of blinding for subjective outcomes is. The meta-epidemiological study on blinding that they cite also mentions conflicting results on the blinding of PROMs (bolding mine):

Other studies
Systematic reviews of meta-epidemiological studies7 23 identified four studies (comparisons within meta-analyses) estimating the impact of blinding patients, three studies estimating the impact of blinding trial personnel, and four studies estimating the impact of blinding outcome assessors. In all instances, blinding had surprisingly little effect.7 23 Two additional recent studies partly confirmed this pattern: an analysis of physiotherapy trials24 found little evidence of an impact of blinding of patients or of outcome assessors, and a study of oral health trials25 found no evidence of an impact of blinding of outcome assessors, though some evidence of a moderate effect of patient blinding.

By contrast, three systematic reviews of within-trial comparisons for 51 trials with both blinded and non-blinded outcome assessment found that blinding had a clear effect.17 18 19 For example, non-blinded outcome assessors of subjective26 outcomes exaggerated odds ratios by 36%, on average.17 Similarly, a systematic review of 12 trials randomising patients to blinded and non-blinded substudies reported a pronounced bias due to lack of patient blinding in complementary/alternative medicine trials with patient reported outcomes, exaggerating effect sizes by 0.56 standard deviations.13 Such comparisons within trials have no major risk of confounding. The trial design is rare, however, so to what extent the results could be generalised is not clear.

Results of meta-epidemiological studies comparing double blind trials with trials without (or unclear) double blinding have shown noticeable variation.7 A systematic review by Page and colleagues found an overall 8% exaggeration of odds ratios in trials without double blinding (although confidence intervals overlapped no effect),7 and an exaggeration of 23% when outcomes were subjective.7 12

It is also noted that their meta-epidemiological approach could be unreliable or that their findings are imprecise. They recommend that the risk of bias from lack of blinding should continue to be assessed with currently available tools, such as Cochrane's RoB2.

Mechanisms and implications
Clarification of the circumstances in which blinding is important in trials, and an empirical assessment of direction and degree of bias, have important and direct implications for the design of future trials, for interpretation of trial results, and for instructions on how to assess risk of bias when conducting systematic reviews. Clarification is also pertinent to the current debate on the balance between reliability and relevance of unblinded patient reported outcome measures (PROMS),27 28 and the relative importance of blinded explanatory trials versus unblinded pragmatic trials.29

[...]

Blinding has been considered an essential methodological precaution in trials for decades. We did not expect to find that our study does not firmly underpin standard methodological practice. Further, our results are coherent with other meta-epidemiological studies that have reported similar results. The implication seems to be that either blinding is less important (on average) than often believed, that the meta-epidemiological approach is less reliable, or that our findings can, to some extent, be explained by lack of precision. At present, we suggest that assessors of the risk of bias in trials included in a systematic review continue to deal with the implications of lack of blinding for risk of bias, as is done in version 2 of the Cochrane risk of bias tool.34

Caroline Struthers · Feb 26, 2021

Tomorrow said:
Oh no. It’s even worse. It is more authors. Look who is here:
Per Olav Vandvik, Elie A. Akl, Sue Brennan, Philipp Dahm, Marina Davoli, Signe Agnes Flottorp, Paul Garner, Joerg J. Meerpohl, Reem Mustafa, Maria X. Rojes, Gordon H. Guyatt
McMaster University
1280 Main St. West, Hamilton, ON, Canada

Edit:
Competing interests: GHG is co-chair of the GRADE Working Group; SAF, EAA, PD, SB, MD, MR, JJM, RM, and MR are members of the GRADE guidance and working groups; POV is a member of the GRADE Working Group. This Rapid Response is not an official communication from the GRADE Working Group.

Gordon Guyatt (Godfather of GRADE??) was the person who arbitrated on the wording of the amendment - so saying, I think, that there was still "moderate" evidence, according to GRADE of a "non-zero" effect on fatigue. The correspondence is here

cassava7 · Feb 26, 2021

Other than lack of blinding, this response mentions the fatigue outcome on the Cochrane review, as measured by the Chalder Fatigue Scale, but completely omits the problems that have been picked up with it. https://www.s4me.info/docs/CFQ-Critique-S4me.pdf

They blame NICE for downgrading the evidence due to indirectness, which of course NICE had to do to be consistent with their diagnostic criteria that require PEM. They say the Cochrane review on exercise therapy for CFS carried out a subgroup analysis for different diagnostic criteria and found no difference between subgroups. Obviously, the only criteria in the studies included in the review were Oxford and Fukuda -- none of which require PEM... But then they go on to blur ME/CFS with other conditions considered as MUS. No wonder they don't know or care about PEM.

Regarding directness, changes to diagnostic criteria for chronic fatigue syndrome, fibromyalgia, irritable bowel syndrome, or other complex conditions that lack pathognomonic findings may or may not affect results. Systematic review authors can explore the issue in subgroup analysis focused on diagnostic criteria. [6] The Cochrane review carries out such a subgroup analysis, and there was little or no difference between subgroups based on different diagnostic criteria. It is inappropriate to downgrade on indirectness without clear evidence of a difference in effects between trials using different criteria.

Also, it is somewhat ironic that GRADE is used as a mean to justify not rating down the evidence on fatigue in exercise therapy due to imprecision and/or inconsistency, because @Michiel Tack referred to the GRADE handbook in his commentary to Cochrane to justify that it should...

2) Fatigue post-treatment should be rated as low instead of moderate quality evidence

The certainty of evidence for all outcomes in comparison 1 (exercise therapy versus treatment as usual, relaxation or flexibility) was assessed as low to very low according to the GRADE system. (2) The sole exception is fatigue measured at the end of treatment which was assessed as providing moderate certainty evidence. It is unclear why the certainty of evidence for this outcome wasn’t downgraded for inconsistency and/or imprecision as was the case for physical function measured at the end of treatment.
The meta-analysis of post-treatment fatigue was associated with considerable heterogeneity (I2 = 80%, P< 0.0001). This heterogeneity was mainly caused by one outlier, the trial by Powell et al. If this trial is excluded, heterogeneity is reduced to acceptable levels (I2 = 26%, P = 0.24) but the standardized mean difference (SMD) drops by one third, from -0.66 to -0.44. This corresponds to a 2.3 point instead of 3.4 point reduction when re-expressed on the 33-point Chalder Fatigue Scale, a difference that may no longer be clinically meaningful. A minimal important difference (MID) of 3 points on the Chalder Fatigue Scale has previously been used in an exercise trial for CFS. (3)

Fatigue post-treatment could also be downgraded for imprecision as the confidence interval crosses the line of no clinically significant effect. The 95% confidence interval of the SMD for fatigue (.31-1.10) corresponds to a 1.6 to 5.3 point interval when re-expressed on the 33-point Chalder Fatigue Scale. For continuous outcomes, the GRADE handbook recommends: “Whether you will rate down for imprecision is dependent on the choice of the difference (Δ) you wish to detect and the resulting sample size required.” Given that the authors of this Cochrane review specified a MID of 2.3 for the Chalder Fatigue Scale and that a MID of 3 points or higher has been used for CFS (3) and other chronic conditions (4,5), it seems warranted to downgrade this outcome for imprecision.

I recognize that for both inconsistency and imprecision the case isn’t clear-cut. The GRADE handbook, however, states that if there is a borderline case to downgrade the certainty of evidence for two factors, it is recommended to downgrade for at least one of them. The handbook writes: “If, for instance, reviewers find themselves in a close-call situation with respect to two quality issues (risk of bias and, say, precision), we suggest rating down for at least one of the two.” (2) Therefore the outcome fatigue measured at the end of treatment should preferably be downgraded to low certainty evidence.

Sean · Feb 26, 2021

@cassava7 beat me to it, but...

“current NICE methods would discount any randomised controlled trials using this approach, citing risk of bias, inconsistency, imprecision, and subjective outcomes”.

No, uncontrolled subjective outcomes. There is no objection to subjective measures, if adequate blinding and/or objective outcome measures are also used.

This is what this whole debate is going to come down to: the claim that uncontrolled subjective outcomes are sufficiently robust, effective, and safe to be applied in clinical, advisory, and medico-legal settings. Indeed, that they can be invoked even in contradiction to results from blinded or objective outcome measures.

What is wrong with grading studies according to "risk of bias, inconsistency, imprecision, and [uncontrolled] subjective outcomes."? Seems like a good idea to me. Methodology 101, even.

Maybe NICE didn't use GRADE because its an inferior tool?

...as we have quoted above, it is because of theoretical objections and anecdotes from patients.

They don't do self-awareness, do they. The whole CBT/GET model, a lá PACE, is about as theory driven and confirmation biased as it gets. And I see nothing but anecdotes from those claiming to have recovered. Certainly no solid robust trial results whatsoever.

They have literally nothing beyond modest short-term changes in subjective self-report scores.

Caroline Struthers · Feb 26, 2021

Sean said:
@cassava7 beat me to it, but...

No, uncontrolled subjective outcomes. There is no objection to subjective measures, if adequate blinding and/or objective outcome measures are also used.

This is what this whole debate is going to come down to: the claim that uncontrolled subjective outcomes are sufficiently robust, effective, and safe to be applied in clinical, advisory, and medico-legal settings. Indeed, that they can be invoked even in contradiction to results from blinded or objective outcome measures.

What is wrong with grading studies according to "risk of bias, inconsistency, imprecision, and [uncontrolled] subjective outcomes."? Seems like a good idea to me. Methodology 101, even.

Maybe NICE didn't use GRADE because its an inferior tool?

They don't do self-awareness, do they. The whole CBT/GET model, a lá PACE, is about as theory driven and conformation biased as it gets. And I see nothing but anecdotes from those claiming to have recovered. Certainly no solid robust trial results whatsoever.

They have literally nothing beyond modest short-term changes in subjective self-reports scores.

NICE did use GRADE, but with a bit more intelligence/common sense

Jonathan Edwards · Feb 26, 2021

I think thesis great, because we have Gordon Guyatt, Mr GRADE himself, weighing in and saying his system would have rated PACE as reliable evidence.

It would have been easy to see the ME/CFS kerfuffle as a backwater in the evidence-baed world but I think this makes it clear that the NICE committee decision is a real threat to the cosy EBM system.

I don't think anyone needs to feel sympathetic to people who switch what they say according to which friends they want to bond with even if they have suffered from Covid for rather a long period. The lack of professionalism here is gobsmacking. These are people claiming to be experts on dispassionate evaluation of evidence and yet they are clearly the opposite.

I am reasonably confident that none of this will affect the NICE decision because, as Caroline says, NICE used GRADE and managed to get a sensible result nonetheless.

The pressure people who do not like the result are putting on NICE is a clear demonstration of the insanity of a system like GRADE. If it generates results that can be open to lobbying it is subjective and if it is subjective then there is no point in having what appears to be an objective numbering system.

Hoopoe · Feb 26, 2021

How would GRADE rate the unblinded Rituximab trial?

Jonathan Edwards · Feb 26, 2021

strategist said:
How would GRADE rate the unblinded Rituximab trial?

If you mean the unblinded continuation study following the blinded phase 2 ME study then I don't think it would get as far as being assessed by GRADE because it was not controlled. The controlled trial was blinded.

FMMM1 · Feb 26, 2021

Jonathan Edwards said:
GRADE works like this. You take a group of people who think that they have a reasonably good appreciation of the risks of methodology being faulty and giving unreliable results. You get them to invent a set of numbers that roughly reflect how they think they decide (although there are nice papers showing that we often do not decide the way we think we do). You give this set of numbers a name and suggest that people who may not quite know themselves how to decide on unreliability use the numbers to do it - not even their common sense. You then recommend that all studies are judged by these numbers, even by people who think they have a reasonably good appreciation of the risks and do not need numbers.

It is a bit like recommending that Michel Roux rather than judging whether a salmon en croute is perfectly cooked goes to the set of numbers in his book and works out whether it is cooked or not using the numbers. It is plain silly.

As an example, try applying a set of numbers to the likelihood of statements made by heads of state with odd-looking hairstyles being true, rather than listening to what the guy says. It's nuts.

And it was identified long ago by Ralph Waldo Emerson.
A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.

The other thing is that there doesn't appear to a good reason to rely on crap studies/methodology i.e. to assess whether an intervention works. E.g. Fluge, and Mella, used activity monitors to assess rituximab. So rather than defending poor studies they could spend some time looking at objective monitoring [activity monitoring +?] and call for sufficient funding to incorporate this into studies. OK they'd probably have some excuses about assessing other psychological outcomes from interventions. However, if it's Government funded research then it should look at objective outcomes like increased activity, able to return to work, study ---- surely they would indicate "objective psychological" improvement.

Free meal ticket sums them up!

ME/CFS Skeptic · Feb 26, 2021

Jonathan Edwards said:
I think thesis great, because we have Gordon Guyatt, Mr GRADE himself, weighing in and saying his system would have rated PACE as reliable evidence.

Well, it is great for someone looking for an example of how applying the GRADE system can lead to a problematic conclusion.
But for the ME/CFS patient community it's highly concerning that proponents of GET have managed to get Gordon Guyat to become involved and sign a statement such as this one.

Perhaps we should try to contact him, explain how GET actively tried to influence how patients report their symptoms and that GRADE provides no option to take such risk of response bias into account?

ME/CFS Skeptic · Feb 26, 2021

I'm thinking about submitting the following rapid response - any suggestions before I do so?

-----------------------------------------

I would like to respond to the comment by Jason Busse and colleagues as it includes some remarkable statements. The authors criticize the NICE guidance committee on myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) for employing “a disastrous misapplication of GRADE methodology.” In a draft document, the committee rated the quality of evidence for GET as low to very low.

As an example of an “appropriate” application of GRADE, Busse and colleagues refer to a contested Cochrane review on GET for ME/CFS. This review rated the quality of evidence for GET in reducing fatigue as moderate. According to Busse and colleagues, this is “sufficient in GRADE methodology to justify strong favorable recommendations.” These statements are concerning for several reasons.

The Cochrane review cited by Busse and colleagues is currently being updated, following concerns about its methodology. Cochrane’s Editor-in-Chief, Dr. Karla Soares-Weiser explained that “this amended review is still based on a research question and a set of methods from 2002, and reflects evidence from studies that applied definitions of ME/CFS from the 1990s.” [1]

The review is far from an exemplary assessment of the evidence for GET. It highlights fatigue assessments made directly after treatment ended and downplays assessments that were made several months later (the latter formed the primary outcome for the trials that provide most of the data). Follow-up results do not support the recommendation that GET reduces fatigue. In addition, the review compares GET to a passive control condition where patients received less time and attention from healthcare providers. The review also focused on subjective outcomes and ignored negative results on objective measurements such as employment figures, activity levels, and fitness tests which all tend to show no significant improvement in the GET group.

There is reasonable concern that the reduction seen on fatigue questionnaires reflects response bias rather than a genuine reduction in fatigue. Patients in the GET-group, for example, received instructions to interpret their symptoms as less threatening and more benign. According to one therapist manual on GET “participants are encouraged to see symptoms as temporary and reversible, as a result of their current physical weakness, and not as signs of progressive pathology.” Treatment manuals also included strong assertions designed to strengthen patients’ expectations of GET. One patient booklet stated: “You will experience a snowballing effect as increasing fitness leads to increasing confidence in your ability. You will have conquered CFS by your own effort and you will be back in control of your body again.”

These instructions were not given to patients in the control group and result in a high risk of response bias. It was therefore reasonable for the NICE guideline committee to rate the quality of evidence for GET as low. Other reviews had previously come to a similar conclusion. [2, 3]

If the GRADE system was used as Busse and colleagues recommend, there would be a high risk that quack treatments and various forms of pseudo-science also provide reliable evidence of effectiveness in randomized trials. All that is needed is an intervention where therapists actively manipulate how patients interpret and report their symptoms. One example should suffice to clarify this point.

Suppose an intervention based on ‘neurolinguistic programming’ where therapists assume that saying one is fatigued, reinforces neural circuits that perpetuate fatigue. The intervention consists of breaking this vicious cycle by encouraging patients to no longer see or report themselves as fatigued. This example is not that far-fetched as there are already behavioral interventions for ME/CFS that are based on similar principles. [4] According to GRADE methodology, however, such attempts to manipulate how patients report their symptoms, form no reason to downgrade the quality of evidence of randomized trials, even if fatigue questionnaires are used as the primary outcome.

The first and foremost principle of rating quality of evidence should be to understand the specifics of what is being assessed. One has to understand the intervention and the way it impacts patients. By providing a standardized checklist and algorithm to assess quality of evidence, the GRADE methodology discourages researchers from studying the details of what happens in randomized trials. The rapid response by Busse and colleagues is an example of how this approach might result in questionable treatment recommendations.

References

1. Cochrane. Publication of Cochrane Review: ‘Exercise therapy for chronic fatigue syndrome.’ https://www.cochrane.org/news/publication-cochrane-review-exercise-therapy-chronic-fatigue-syndrome. Accessed 26 Nov 2019.

2. Vink M, Vink-Niese A. Graded exercise therapy for myalgic encephalomyelitis/chronic fatigue syndrome is not effective and unsafe. Re-analysis of a Cochrane review. Health Psychol Open. 2018;5:2055102918805187.

3. Tack M, Tuller DM, Struthers C. Bias caused by reliance on patient-reported outcome measures in non-blinded randomized trials: an in-depth look at exercise therapy for chronic fatigue syndrome. Fatigue: Biomedicine, Health & Behavior. 2020;8:181–92.

4. Reme SE, Archer N, Chalder T. Experiences of young people who have undergone the Lightning Process to treat chronic fatigue syndrome/myalgic encephalomyelitis--a qualitative study. Br J Health Psychol. 2013;18:508–25.

BMJ: Rapid response to 'Updated NICE guidance on CFS', 2021, Jason Busse et al, Co-chair and members of the GRADE working group

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)