RoB 2: a revised tool for assessing risk of bias in randomised trials (2019) Sterne et al.

Sly Saint · Sep 5, 2019

will this new tool impact on the revision of the Cochrane exercise review?

Barry · Sep 5, 2019

Sly Saint said:
will this new tool impact on the revision of the Cochrane exercise review?

Maybe that is the intent. Feels like someone's car failing its MOT on major counts, and the person having the clout to change the MOT criteria so it passes. An obviously frivolous example re MOTs, but it should also be equally frivolous for life-impacting assessments in medicine.

Caroline Struthers · Sep 5, 2019

Sly Saint said:
will this new tool impact on the revision of the Cochrane exercise review?

It will not affect the revision which is currently still under consideration.

https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200.pub7/information#whatsNew

17 June 2019

Amended

Addition of new published note 'Cochrane’s Editor in Chief has received the revised version of the review from the author team with changes made in response to the complaint by Robert Courtney. The process has taken longer than hoped; the amended review is being finalised and it will be published during the next 2 months.'

So the revision should have been published by 17 August. I hope the delay means the new Editor in chief (in post since 1 June) acts on the wishes of the previous Editor in Chief who stated in writing he wanted to withdraw the review, but the authors refused to allow it.

Whether or not they withdraw the current review, if they update it, the authors (presumably a new author team...???) will likely use the new tool when assessing risk of bias of included studies. The bias assessment therefore will not take account of conflict of interest, researcher allegiance, reliance on subjective outcomes, manipulation of recovery criteria, selection of participants using dodgy diagnostic criteria etc. etc.

Barry · Sep 5, 2019

Caroline Struthers said:
Whether or not they withdraw the current review, if they update it, the authors (presumably a new author team...???) will likely use the new tool when assessing risk of bias of included studies. The bias assessment therefore will not take account of conflict of interest, researcher allegiance, reliance on subjective outcomes, manipulation of recovery criteria, selection of participants using dodgy diagnostic criteria etc. etc.

In so many other fields it would be possible to escalate such gross self serving rule manipulation to some higher authority. They realise they routinely violate all manner of sane rules, so they change the rules.

NelliePledge · Sep 5, 2019

Barry said:
In so many other fields it would be possible to escalate such gross self serving rule manipulation to some higher authority. They realise they routinely violate all manner of sane rules, so they change the rules.

It’s a pity Parliament is otherwise engaged as it strikes me this would be a good investigation for the Science Select Committee Carol Monaghan is on.

Barry · Sep 5, 2019

NelliePledge said:
It’s a pity Parliament is otherwise engaged as it strikes me this would be a good investigation for the Science Select Committee Carol Monaghan is on.

Why - is there something else going on

.

Yes, I fully agree.

Pi · Sep 5, 2019

Meanwhile, in the real world of the drug trials, blinding remains an issue:

My dad has Parkinson's disease, and I was recently reading about the GDNF trial. This was a "promising" new treatment but there was no significant difference between placebo and GDNF at 40 weeks (subjective outcome, blinded trial).

The researchers decided to extend the trial to 80 weeks, with both groups receiving GDNF treatment for the remainder of the trial, which was then open-label/non-blinded. Both groups showed a significant gain from baseline (week 0), but this was considered unreliable, because the study wasn't blinded (although there are plans to run a new trial, blinded for fall 80 weeks to explore the promising results).

One report of this is below. How can the lack of blinding be seen as a serious issue in drug trials, but miraculously not be considered a problem in psychosocial trials?

The research group continued to treat this group of patients for an additional 40 weeks. In this second stage of the trial, all patients received GDNF, without a placebo group. The results of this phase of the trial was recently published in the Journal of Parkinson’s Disease. This phase of the study did not show a statistically significant improvement in motor functioning when comparing the group that received placebo/GDNF to the group that received GDNF/GDNF over the full 80 weeks. However, both groups showed significant improvements from week 40 to week 80. The results could indicate that GDNF infusion requires more time than 40 weeks to show its full effect. However, any conclusions derived from the second part of the trial must take into account the fact that all patients received the drug. It is a well-researched phenomenon that knowing that one has received treatment can be therapeutic in and of itself; this is known as the placebo effect. Without a group that did not receive the drug, it is impossible to know what role the placebo effect played in the improvements seen in the second phase of the trial.

I am concerned that this new paper will rubberstamp a lot of flawed, non-blinded trials with subjective outcomes. Are there any plans to deal with this problem? I can't imagine writing responses to the paper will have much effect. Is there any way to get a paper published about the issues of blinding in (psychosocial) studies with subjective outcomes? And would that make any difference at all?

Simbindi · Sep 5, 2019

Pi said:
My dad has Parkinson's disease, and I was recently reading about the GNF trial. This was a "promising" new treatment but there was no significant difference between placebo and GDNF at 40 weeks (subjective outcome, blinded trial).

I followed this trial with interest as the young lady who previously supported me (for mental health reasons) dad was part of this trial.

She felt many of my symptoms were very like her dads (difficulty walking in a straight line, gait, standing upright, memory problems, tremors etc.) She accompanied me to some of my GP appointments and was shocked at how my symptoms were dismissed and trivalised by the doctor. She encouraged me to change GP, which I eventually did. The new GP did agree to refer me to a neurologist, but now I'm far too ill to be able to cope with it!

Back to the trial - it was very invasive therapy, with some severe side-affects seen. However, I hope it eventually proves to help some Parkinson sufferers.

Jonathan Edwards · Sep 15, 2019

I have sent in a Rapid Response letter, as recommended by Dr Godlee at BMJ. I have copied this to the editor in chief at Cochrane.

Letter:

In the recent publication in BMJ of the finalised Risk of Bias 2 tool (RoB2), used in the context of the GRADE system by Cochrane for assessing quality of clinical trial evidence, the discussion claims that changes made are likely to reduce stringency of RoB assessment, especially in unblinded trials. This looks to be a retrograde step, since unblinded trials with subjective outcomes appear increasingly to be given more credence than deserved. What is also of concern is that the corresponding author, Jonathan Sterne, is an author on the report of the unblinded SMILE trial of the Lightning Process for myalgic encephalomyelitis/ chronic fatigue syndrome (ME/CFS) (Crawley et al. 2018). This report has been severely criticised for multiple methodological errors. Despite rewriting, it is still considered inadequate by over 50 academics and clinicians (Tuller, 2019).

The proposals the RoB2 document makes include handling of problems with bias due to beliefs held by the patient or treatment delivery team when outcome measures are subjective and also the changing of outcome measures after trial initiation but before data analysis. Problems with the SMILE trial include both severe risk of bias relating to subjective outcomes and outcome switching midstream.

The GRADE system takes the premise that randomised trials provide high quality evidence unless they suffer from one or more defects, including bias. The fragility of this can be illustrated by proposing a trial that randomises patients to being taught to think and say they feel better, whether they do or not, or to being told to say how they really feel, using how the patient says they feel as outcome. One might think such a trial would never be proposed. Yet this appears to be more or less what the SMILE trial was (at least from the little we know about the Lightning Process). Moreover, the report does not tell us much, and it looks as if according to RoB2 the risk of bias is not scored high if we do not know enough to have specific reasons for doing so!

RoB2 does mention the possibility that bias might arise from patient or therapist beliefs, citing a physiotherapist assessing her own treatment or a patient reporting their response to homeopathy, as if to indicate bias is restricted to such self-evident examples. There appears to be no recognition that expectation bias due to beliefs about the outcome of an intervention is ubiquitous in trials (and any experiment, including laboratory work). In ME/CFS we have seen major responses due to expectation bias with conventional drugs now known to have no therapeutic effect, such as rituximab. The RoB2 analysis seems either naïve or disingenuous.

The SMILE trial had a highly unsatisfactory structure, starting as a ‘feasibility study’, recruiting more than half of the patients, and then being registered after switching of outcomes in the knowledge of progress of early participants. This might fall under RoB2’s condition of before data analysis but is a classic cherry-picking scenario. ‘Feasibility’ and ‘pragmatic’ trials appear increasingly popular. These terms look like attempts to legitimise methods that violate basic rules for gathering reliable evidence. It needs to be acknowledged that ‘methodological experts’ associated with clinical trials units and related departments have a conflict of interest in terms of co-authorship on publications of such trials.

We appear to be moving towards acceptance of methodology that for decades we have known yields meaningless results. I am concerned that bodies like Cochrane and the BMJ are sleepwalking into a situation where they rubber stamp commercial ventures of no merit.

References

Crawley, E.M., Gaunt, D.M., Garfield, K., et al. (2018). Arch. Dis. Childhood 103,2. https://adc.bmj.com/content/103/2/155

Tuller, D (2019). http://www.virology.ws/2019/08/28/t...godlee-about-bmjs-ethically-bankrupt-actions/

MSEsperanza · Sep 15, 2019

Jonathan Edwards said:
I have sent in a Rapid Response letter, as recommended by Dr Godlee at BMJ. I have copied this to the editor in chief at Cochrane.

Thank you, Jonathan, for another great letter.

In case a rapid response to the BMJ needs editorial approvement, it has been approved.
In any case, it's published here:

https://www.bmj.com/content/366/bmj.l4898/rr

Jonathan Edwards · Sep 15, 2019

MSEsperanza said:
In any case, it's published here:

https://www.bmj.com/content/366/bmj.l4898/rr

Interesting. I was not told it would appear.
Sterne's name seems to have disappeared from the author list but that may be an artefact of alphabetical formatting for the RR section.

rvallee · Sep 15, 2019

Jonathan Edwards said:
Interesting. I was not told it would appear.
Sterne's name seems to have disappeared from the author list but that may be an artefact of alphabetical formatting for the RR section.

Seems like it, I see Sterne as first author and for article correspondence.

Andy · Sep 16, 2019

which is a link to a study which is discussed here, https://www.s4me.info/threads/compare-trials-ben-goldacre-et-al.8170/

Jonathan Edwards · Sep 16, 2019

Why would one trust units that get their income and academic kudos out of doing trials?

ME/CFS Science Blog · Sep 24, 2019

Sorry that it took so long for me to come back to this thread.

Esther12 said:
Has anyone seen the full text of the earlier risk of bias tool (for which Sterne was also a co-author)?

The original tool is described in the Cochrane handbook, chapter 8, which is publically available. Here's a short overview: https://handbook-5-1.cochrane.org/c...a_for_judging_risk_of_bias_in_the_risk_of.htm This is the version used in the Cochrane review on GET.

Do note that there was a previous version of the Cochrane handbook and the risk of bias tool, that was published in 2008. An update in 2011 made some changes such as splitting up bias due to blinding into blinding of patients and therapists and blinding of outcome assessors. Here's an overview of the changes from the 2008 to the 2011 version. https://handbook-5-1.cochrane.org/c...n_risk_of_bias_tool_in_5_0_2_versus_5_1_0.htm

An overview of the changes from the 2011 version described in the handbook to the new Rob2, is given in table 2 of the paper.

ME/CFS Science Blog · Sep 24, 2019

I've mostly focused on the issue of blinding. On this aspect, the new tool is less bad than one would think after first reading the paper. I'll try to explain below:

The old version
Suppose that a trial did not blind patients or therapists and that it used subjective outcomes. In the old tool, a reviewer has to answer the question whether "the outcome is likely to be influenced by lack of blinding". So in the case of subjective outcomes, the answer would be yes and the trial would be rated as high risk of bias. If the trial also didn't blind outcome assessors and it is likely that the outcome is influenced by lack of blinding, the trial would also be rated as high risk of bias for this domain.

So the trial would be rated as high risk of bias for at least 2 out of 7 domains. It can, however, get good scores (low risk of bias) for the other domains. And there is no rule for how these risk of bias domains should be added up to assess a trial. The Cochrane handbook writes:

Cochrane Handbook said:
any assessment of the overall risk of bias involves consideration of the relative importance of different domains. A review author will have to make judgements about which domains are most important in the current review. For example, for highly subjective outcomes such as pain, authors may decide that blinding of participants is critical.

Unfortunately, Larun et al. didn't do that. In the GET review, they wrote that "risk of bias across studies was relatively low." They briefly mentioned that not blinding patients and therapist and using subjective outcomes might cause bias, but then argue that "many patient charities are opposed to exercise therapy for chronic fatigue syndrome (CFS), and this may in contrast reduce the effect." In other words, the treatment must be really working because so many patients oppose it!

The new tool
The new tool is different as it specifies how the scores for the trials should be added up to the overall risk of bias, which is probably one of the reasons why it is more complex. Sterne et al. write:

Stern et al. 2019 said:
The overall risk of bias generally corresponds to the worst risk of bias in any of the domains.

So if a study scores high risk of bias in one of the domais it should be rated as high risk of bias overall, or more precisely: for the particular outcome assessed, because Rob 2 encourages splitting up the assessment per outcome or result. The previous version just noted that one could split up the results in objective or subjective outcomes if that is thought the be helpful.

So back to our hypothetical unblinded trial: how would it be rated in the new risk of bias tool?

In domain 2 they ask wether patients and therapist were aware of the assigned intervention. But that domain only assesses one part of the bias caused by a lack of blinding and in my view, it's the least important part. It deals with changes to the interventions received because people in the trial were aware of the assignments. So for example, if patients know they are in the control group they might follow other treatments during the trial (co-intervention) or therapist might treat patients differently if they know they are in the intervention group. This can only lead to a high risk of bias if many conditions are met. So let's skip this domain.

The bias due to blinding that we are interested in is assessed in domain 4: "Risk of bias in measurement of the outcome". Skip the first two questions, it really starts at 4.3 where they ask if outcome assessors were blinded. The thing is that "for participant-reported outcomes, the outcome assessor is the study participant." So that would be the case in our hypothetical trial. The next question asks "Could assessment of the outcome have been influenced by knowledge of intervention received?" This another way of asking whether it was subjective (such as pain/fatigue questionnaires) or objective (such as all-cause mortality). The elaboration reads:

Supplementary material Sterne et al. 2019 said:
Knowledge of the assigned intervention could influence participant-reported outcomes (such as level of pain), observer-reported outcomes involving some judgement, and intervention provider decision outcomes. They are unlikely to influence observer-reported outcomes that do not involve judgement, for example all-cause mortality

That seems to be the case for our hypothetical trial. So far so good.

It's mostly the next question that annoys me. After having already asked whether it is possible that the outcome was influenced by knowledge of the intervention received, it now asks whether this influence was likely or not. It doesn't give an example where it wasn't likely and I can't really think of a scenario where this is the case. As Jonathan Edwards' letter points out, the examples of outcomes likely being influenced by unblinding are rather extreme like a physiotherapist who assed the intervention he himself delivered or a homoeopathy trial with patient-reported symptoms. What I really don't like in their elaboration is the following sentence:

Supplementary material Sterne et al. 2019 said:
When there are strong levels of belief in either beneficial or harmful effects of the intervention, it is more likely that the outcome was influenced by knowledge of the intervention received.

Both the PACE authors and Larun et al. have argued that bias due to lack of blinding is not really a big concern because patients don't like GET. In the PACE-trial expectations for GET were no higher than for APT. I suspect researchers might use such measurements of expectations before the trial to argue that bias due to lack of blinding is not really a big concern.
In my view question 4.5 in the new Rob tool is redundant and at risk of being misused, so it would better be deleted.

But back to our imagined trial: an honest reviewer would say that it is indeed likely that "assessment of the outcome was influenced by knowledge of intervention received", so answering Yes to question 4.5. And yes to 4.5 means high risk of bias for this domain (it doesn't really matter anymore what the answers to 4.1 and 4.2 were). High risk of bias in one domain means that the overall risk of bias for that outcome of the trial should be rated as high risk of bias as well.

So overall, it seems that in Rob2, it's more difficult to rate a trial as high risk of bias for each domain. But if it is rated as high risk of bias for any of the domains, that should be the overall rating as well. Whether that's an improvement or a step backwards, will probably depend on how serious reviewers take the overall rating of bias. If they are allowed to ignore it and just say: 'some domains were good, some were bad', then the bar will be lowered. The traffic light representation of bias will show more green lights. If however the overall risk of bias is taken seriously and required to be included in the conclusions of the review paper, then this could be a step forward. Because this method is able to represent the fact that one major shortcomming in a trial is sufficient to see it as flawed (high risk of bias), no matter how good the other parts are.

Esther12 · Sep 25, 2019

Thanks for all of your work summarising this stuff Michiel.

And there is no rule for how these risk of bias domains should be added up. The Cochrane handbook writes:

any assessment of the overall risk of bias involves consideration of the relative importance of different domains. A review author will have to make judgements about which domains are most important in the current review. For example, for highly subjective outcomes such as pain, authors may decide that blinding of participants is critical

Click to expand...

Unfortunately, Larun et al. didn't do that. In the GET review, they wrote that "Risk of bias across studies was relatively low." They briefly mentioned that not blinding patients and therapist and using subjective outcomes might cause bias, but then argued that "many patient charities are opposed to exercise therapy for chronic fatigue syndrome (CFS), and this may in contrast reduce the effect." In other words, the treatment must be really working because so many patients oppose it!

Thanks for explaining that. So the old tool may have given them less leeway on classing certain aspects of a trial's risk of bias, but more leeway to then just ignore areas where trials had a high risk of bias?

So the good news is that both tools seem terrible, and therefore RoB2.0 may be less of a step back that we thought. Phew.

Jonathan Edwards · Sep 25, 2019

I am not sure the new tool is any clearer on general risk of bias. It is actually wrong where the original tool assumed people new how bias works.

The general risk of bias is always worse than the worst specific risk. Any reasonably intelligent person knows that. If bad weather includes rain, hail, snow and gales, then the risk of bad weather is always the worst risk of any of those plus some more because there is some risk of them too. Whether the addition is linear or geometric or complex doesn’t matter.

Barry · Sep 25, 2019

That is gobbledegook surely. Risk of bias due to being an open label trial is orthogonal to risk of bias due to deviations from intended intervention? Mashing the two into a single supposedly explanatory sentence seems illogical. It sounds as if they are trying to take two sources of bias they want to whitewash, and make it sound like they can make it OK. Or am I missing something?

Caroline Struthers · Sep 25, 2019

Michiel Tack said:
I've mostly focused on the issue of blinding. On this aspect, the new tool is less bad than one would think after first reading the paper. I'll try to explain below:

The old version
Suppose that a trial did not blind patients or therapists and that it used subjective outcomes. In the old tool, a reviewer has to answer the question whether "the outcome is likely to be influenced by lack of blinding". So in the case of subjective outcomes, the answer would be yes and the trial would be rated as high risk of bias. If the trial also didn't blind outcome assessors and it is likely that the outcome is influenced by lack of blinding, the trial would also be rated as high risk of bias for this domain.

So the trial would be rated as high risk of bias for at least 2 out of 7 domains. It can, however, get good scores (low risk of bias) for the other domains. And there is no rule for how these risk of bias domains should be added up to assess a trial. The Cochrane handbook writes:

Unfortunately, Larun et al. didn't do that. In the GET review, they wrote that "risk of bias across studies was relatively low." They briefly mentioned that not blinding patients and therapist and using subjective outcomes might cause bias, but then argue that "many patient charities are opposed to exercise therapy for chronic fatigue syndrome (CFS), and this may in contrast reduce the effect." In other words, the treatment must be really working because so many patients oppose it!

The new tool
The new tool is different as it specifies how the scores for the trials should be added up to the overall risk of bias, which is probably one of the reasons why it is more complex. Sterne et al. write:

So if a study scores high risk of bias in one of the domais it should be rated as high risk of bias overall, or more precisely: for the particular outcome assessed, because Rob 2 encourages splitting up the assessment per outcome or result. The previous version just noted that one could split up the results in objective or subjective outcomes if that is thought the be helpful.

So back to our hypothetical unblinded trial: how would it be rated in the new risk of bias tool?

In domain 2 they ask wether patients and therapist were aware of the assigned intervention. But that domain only assesses one part of the bias caused by a lack of blinding and in my view, it's the least important part. It deals with changes to the interventions received because people in the trial were aware of the assignments. So for example, if patients know they are in the control group they might follow other treatments during the trial (co-intervention) or therapist might treat patients differently if they know they are in the intervention group. This can only lead to a high risk of bias if many conditions are met. So let's skip this domain.

The bias due to blinding that we are interested in is assessed in domain 4: "Risk of bias in measurement of the outcome". Skip the first two questions, it really starts at 4.3 where they ask if outcome assessors were blinded. The thing is that "for participant-reported outcomes, the outcome assessor is the study participant." So that would be the case in our hypothetical trial. The next question asks "Could assessment of the outcome have been influenced by knowledge of intervention received?" This another way of asking whether it was subjective (such as pain/fatigue questionnaires) or objective (such as all-cause mortality). The elaboration reads:

That seems to be the case for our hypothetical trial. So far so good.

It's mostly the next question that annoys me. After having already asked whether it is possible that the outcome was influenced by knowledge of the intervention received, it now asks whether this influence was likely or not. It doesn't give an example where it wasn't likely and I can't really think of a scenario where this is the case. As Jonathan Edwards' letter points out, the examples of outcomes likely being influenced by unblinding are rather extreme like a physiotherapist who assed the intervention he himself delivered or a homoeopathy trial with patient-reported symptoms. What I really don't like in their elaboration is the following sentence:

Both the PACE authors and Larun et al. have argued that bias due to lack of blinding is not really a big concern because patients don't like GET. In the PACE-trial expectations for GET were no higher than for APT. I suspect researchers might use such measurements of expectations before the trial to argue that bias due to lack of blinding is not really a big concern.
In my view question 4.5 in the new Rob tool is redundant and at risk of being misused, so it would better be deleted.

But back to our imagined trial: an honest reviewer would say that it is indeed likely that "assessment of the outcome was influenced by knowledge of intervention received", so answering Yes to question 4.5. And yes to 4.5 means high risk of bias for this domain (it doesn't really matter anymore what the answers to 4.1 and 4.2 were). High risk of bias in one domain means that the overall risk of bias for that outcome of the trial should be rated as high risk of bias as well.

So overall, it seems that in Rob2, it's more difficult to rate a trial as high risk of bias for each domain. But if it is rated as high risk of bias for any of the domains, that should be the overall rating as well. Whether that's an improvement or a step backwards, will probably depend on how serious reviewers take the overall rating of bias. If they are allowed to ignore it and just say: 'some domains were good, some were bad', then the bar will be lowered. The traffic light representation of bias will show more green lights. If however the overall risk of bias is taken seriously and required to be included in the conclusions of the review paper, then this could be a step forward. Because this method is able to represent the fact that one major shortcomming in a trial is sufficient to see it as flawed (high risk of bias), no matter how good the other parts are.

Thank you so much for this fantastic and clear explanation

The trouble with the risk of bias tool is that the judgements are, at the end of the day, subjective (as we well know!). When these judgements are presented in individual reviews in a nice clear traffic light picture, the impression given is that the judgements were objective, and that the arguments presented about how seriously (or not) to take high risk of bias assessments are reasonable - because it's Cochrane.

RoB 2: a revised tool for assessing risk of bias in randomised trials (2019) Sterne et al.

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Established Member

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)