GRADE classifies certainty of evidence into 4 categories: ‘High’, ‘Moderate’, ‘Low’, or ‘Very Low’. Study limitations allow downgrading the certainty of evidence but only with 2 levels. This is a problem because randomized trials start out as having high-quality evidence. Hence if you detect numerous serious problems (and thus a very high risk of bias), the certainty of evidence would still not be classified as ‘very low’. There is no good justification for restricting the impact of study limitations to only 2 levels of certainty. What if there is a flaw that makes the study results useless? This seems to be a dead angle in the GRADE system.
I've recently done a course on systematic reviews that had a short section on GRADE. Now, I may have things wrong, but I did not find GRADE to be a very bad system. In a systematic review at least, GRADE is applied to the collected evidence for an outcome, not to individual studies.
First off, you decide if a study is so egregiously flawed that the results must be rubbish e.g. it looked at a population that is mostly different to your target population, or there is overt evidence of fraud. In that case, you don't include it in the systematic review at all. I don't believe unblinded studies with only subjective outcomes fit in that category, because if all the patients were suddenly rising up from their beds and reporting 'cured' levels of SF-36, and 6 months later 90% were still in that 'healthy' range, then the study could well be indicating that the treatment is useful. If there were those sort of results in a decent sized unblinded study of people with well-defined ME/CFS, I'd be thinking hard about trying the treatment, even if the outcomes were subjective. So, I think the GET and CBT studies mostly qualify to be in reviews.
Then, my understanding is that an outcome from RCTs starts at 'high' certainty and is assessed against 5 criteria. Risk of bias is one of the five. For each criteria, you can downgrade the particular RCT outcome by one or two. So, I think it's clear that the CBT and GET outcomes drop by two categories to low certainty due to a high risk of bias. The other categories almost certainty add some further downgrades, for indirectness, for inconsistency (different results from different studies), for imprecision and for publication bias.
On indirectness, most BPS studies would be downgraded for ME/CFS outcomes due to poor selection criteria.
Outcomes could be downgraded for inconsistency if the few studies with objective criteria did not find an objective increase in activity, while the subjective reporting did, for example.
For imprecision, if there is a risk that important harms were not considered (as would be the case in ME/CFS studies that did not track activity for a long enough period when there are hints of substantial harm coming from non-trial sources), then the outcome would be downgraded.
Publication bias is where there is some evidence that studies that don't show a favourable effect of the intervention are not getting published. There are several ways to work that out, including looking at trial registers.
And then there's more. Having got the outcome, with it's 'Very Low' certainty rating, you can make some comment about the size of the benefit. Even if all the studies are consistently reporting a benefit, if the benefit is small, within the range that would often be seen with ineffective but hyped treatments in unblinded situations, then you can report that there is a very low certainty that the reported small benefit is real.
As with many tools, it's not so much the quality of the tool, as the quality of the person using it. Yes, to a certain extent, the GRADE outcome is subject to the prejudices of the person using it. What GRADE does is provide a framework for analysis and a requirement that assumptions that have been made are reported and made explicit. I think it's ok.
Here's a
link to the handbook.
(And edited to add publication bias, which I had forgotten about.)