1. Sign our petition calling on Cochrane to withdraw their review of Exercise Therapy for CFS here.
    Dismiss Notice
  2. Guest, the 'News in Brief' for the week beginning 18th March 2024 is here.
    Dismiss Notice
  3. Welcome! To read the Core Purpose and Values of our forum, click here.
    Dismiss Notice

Who Agrees That GRADE is (a) unjustified in theory and (b) wrong in practice?

Discussion in 'Other research methodology topics' started by Jonathan Edwards, Mar 4, 2021.

  1. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    The idea of GRADE to provide a recipe for making decisions for people who are not themselves capable of making such decisions on their own is a flawed and dangerously counterproductive idea in a medical context.

    The pseudo-arithmetic structure of allocating evidence to 'grades' has no purpose other than to sound standardised. Standardisation in decision-making by definition makes it less precise.

    The proper process is for people with enough experience and skill in logic to view the evidence available and decide what its implications for recommended management are in one integrated decision step. Any intermediate steps of forcing information into grading levels and using arbitrary rules for moving up and down grading levels is logically invalid and bound to interfere with, rather than assist, a decision.

    It should be possible for a randomised controlled trial that has fatal flaws that make it uninterpretable to be downgraded to uninterpretable (no need for very low or grade 1 or anything) on the basis of any one flaw that is enough to reach that judgment. GRADE does not allow this and so is highly likely to produce false conclusions.

    It is interesting to see that both Cochrane and NICE use GRADE but NICE does not trust the Cochrane use of GRADE so re-does it. At NICE I can see the practical reason for using GRADE. Technical staff use GRADE to prepare a provisional analysis which is then reviewed by a committee. The technical staff have no experience of trials so will need something like GRADE. I do not see why the committee needs to make use of GRADE. I think it would be fair to ask technical staff to search for studies and document a list of features but I do not think there is any merit in asking them to grade, since I don't think grading comes in to this.

    For Cochrane the worry is that nobody oversees the use of GRADE by the review team. There does not seem to be any place for anything like GRADE here. Admittedly Cochrane reviews go out to peer review but we have seen how problematic that is.


    It would be easy to think that because GRADE has been arrived at by a consensus of 'experts' that is must be as good an approach as any. However, by definition those who choose to see themselves as experts suited to the construction of such a set of rules will be those who do not see that the exercise is pointless and invalid in decision-making theory terms. Those who can see that the exercise is doomed will not volunteer to be on the committee. It may be worth remembering that at least in the UK you get a pay rise for sitting on committees but not for just doing your job well, despite the fact that if you are sitting on a committee you cannot be doing the job you are paid to do.
     
    Last edited: Mar 4, 2021
    Hutan, sebaaa, oldtimer and 17 others like this.
  2. FMMM1

    FMMM1 Senior Member (Voting Rights)

    Messages:
    2,591
    Yea surely a medical Doctor who is unsure of a diagnosis can set out their views and ask a colleague(s) for their views?

    Black boxes (like GRADE) are rightly concerning - the fact that this one requires you to give a +ve value to data that should be discarded, means that it isn't fit for purpose. Bit disappointing that the great, and the good, have been touting it and are still trying to defend it.

    Excuse me for not keeping up i.e. you folks have already done this. I Googled "cochrane insurance medicine + GRADE" and yes:
    "The evaluation for quality of evidence of cost or economic outcomes through GRADE: A survey of Cochrane reviews"
    [https://abstracts.cochrane.org/2017...-medicine-evaluation-cochrane-reviews-and-new]

    Old joke "if you can't be part of the solution then make money out of the problem" - tasteless in this context.

    I wouldn't blame anyone using strong language re this.
     
    Hutan, sebaaa, Milo and 11 others like this.
  3. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    The point about giving some sort of positive value to evidence, however flawed, is also a salient one, yes. Any system like GRADE should recognise evidence that makes it highly likely that there was no effect as for PACE. GRADE deals with this by noting the consistency of any positive finding but that misses the opportunity to make use of strong evidence for no effect from individual studies.
     
    Hutan, sebaaa, Milo and 14 others like this.
  4. arewenearlythereyet

    arewenearlythereyet Senior Member (Voting Rights)

    Messages:
    2,092
    I can’t say I am well read enough on the detail of GRADE but I would say that whatever system is used to put a ‘value’ on research should determine first what the purpose of the ‘grade’ is.

    Pace is a bad trial in terms of flawed methodology so you could discount it completely or can you use it to provide proof that even flawed it shows that CBT and GET doesn’t work (based on the principle that a negative result is as useful as a positive one)? So I guess it all hangs on what your objective is ....do a thorough search of all known research and use it to establish what facts exist?

    In this case it’s a bit moot since all the evidence we have says that we don’t know very much and there isn’t a lot of anything other than what little we have tried so far doesn’t work.

    One thing I used to do when doing a literature search ahead of pitching for a research grant (food not medical) was to initially group past research in terms of quality/strength just so I could weigh things up. This was good because you could quickly filter out the wheat from the chaff and spot ‘career publishing’ by the same authors and genuine replication etc. but also negative results that showed what ideas had been disproved.

    I can see that grouping evidence might be useful initially to establish a base and even to demonstrate at a high level what you are dealing with, but that’s probably where it ends.

    The next bit (insight) should be based on skill, common sense and consensus. I.e free thought not some second rate algorithm that assumes that people are incapable of learning a skill
     
  5. Hoopoe

    Hoopoe Senior Member (Voting Rights)

    Messages:
    5,234
    The lowest GRADE certainty rating is "very low", described as "The true effect is probably markedly different from the estimated effect".

    Does that really accurately describe the worst possible scenario? It's as if only positive results are possible with this system, a bit like a questionnaire where the only allowed options are varying degrees of improved health, and there is no option to indicate a lack of improvement or deterioration.

    If we come up with a scale like this, it should start with something like "high certainty of no effect", followed by "uninterpretable".

    And looking at something like PACE, it's somewhere between uninterpretable and disproving any claims of meaningful treatment effects.
     
    Hutan, sebaaa, oldtimer and 12 others like this.
  6. Trish

    Trish Moderator Staff Member

    Messages:
    51,841
    Location:
    UK
    Hutan, Michelle, alktipping and 8 others like this.
  7. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    And this is very odd wording. It seems to indicate either a lack of understanding of probability or a hidden assumption that the result was 'fiddled'. If the accuracy of the estimated effect is just plain uncertain the true effect is most probably something like it but might be quite different. If there is a conclusion that the true effect is probably markedly different' I think there has to be an assumption that the estimate is biased and in practice that is always biased towards a positive effect, unless you are dealing with someone trying to disprove homeopathy perhaps.
     
    Hutan, Michelle, Adrian and 6 others like this.
  8. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    I think the BMJ What is GRADE is a reasonable place to get an overview. The GRADE manual is long, although it is reasonably easy to locate various aspects.
     
    Hutan, alktipping, Barry and 5 others like this.
  9. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    Looking at the BMJ 'What is GRADE' and the opening para of 'How does it work?' the following sentence is interesting:

    An overall GRADE quality rating can be applied to a body of evidence across outcomes, usually by taking the lowest quality of evidence from all of the outcomes that are critical to decision making. (my bolding)

    To me, the confusions involved in what GRADE is trying to do are apparent straight away. It is not clear whether the idea is to decide whether or not there is an effect or to decide what size it is, apparently assuming that there is one. Certainty and quality are also seen as interchangeable. The whole thing looks like a fail on a probability exam paper.
     
    Michelle, Ariel, alktipping and 6 others like this.
  10. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    3,488
    Location:
    Belgium
    Well said.

    This seems to be one of the main issues: that GRADE does not believe in a fatal flaw that makes 'evidence' totally unreliable. The only way to rate something as very low quality is if a trial suffers from several different flaws. I haven't seen any arguments why this would be the case in the real world.

    Suppose for example that there is an interpretation problem on a questionnaire: patients indicate they got better when in fact they meant something else. The GRADE approach, if I understand correctly, only offers the possibility the downgrade the quality of evidence a little bit even though the data is totally useless.

    The problem is that GRADE is now so often used that this is seen as the correct and neutral way to rate quality of evidence. If you deviate from it, by arguing there is a fatal flaw that makes evidence fully unreliable, then you're arbitrary, not neutral or biased etc.

    I think the Hanbook said something like: we're not trying to tell you how to rate evidence, merely how to make your decision transparent. The fact that you can only downgrade evidence two times for risk of bias, from strong to low quality, shows that this isn't really the case.

    It would be interesting to have a comparison of how evidence is rated with and without GRADE. I suspect that the approach with GRADE will result in evidence being rated higher quality than the approach without GRADE.
     
    Hutan, Dolphin, MSEsperanza and 14 others like this.
  11. cassava7

    cassava7 Senior Member (Voting Rights)

    Messages:
    983
    It seems that previous discussions on GRADE have picked up your points @Jonathan Edwards.
    From a 2014 editorial by Malmivaara [1] (bolding mine):
    Malmivaara does not suggest creating an uninterpretable "grade" but arguably hinges towards it; similarly to the thread on Busse et al.'s response where you mentioned that GRADE only allows rating down up to 2 grades at a time (not 3), he suggests that should be changed.
    His editorial is well worth a read, he analyzes the possible issues with each criterion in GRADE separately.
    [1] Malmivaara A. Methodological considerations of the GRADE method. Ann Med. 2015;47(1):1-5. doi:10.3109/07853890.2014.969766
     
    Last edited: Mar 4, 2021
    Hutan, Michelle, alktipping and 7 others like this.
  12. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    Interesting: That would seem to make a mockery of Busse et al.'s claim that GRADE had been disastrously misapplied. Clearly the GRADE people think they are telling others how decisions should come out. Moreover, if GRADE is being used by technical staff who have no experience of the psychology of trials in real life then there must be a tacit assumption that they are expecting GRADE to guide them to the right conclusion.
     
  13. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    I was hoping someone would disagree with me to point out where I am arguing badly. But still time for that!
     
    Ariel, alktipping, FMMM1 and 6 others like this.
  14. FMMM1

    FMMM1 Senior Member (Voting Rights)

    Messages:
    2,591
    Yes the "beauty" of what they do is note the remarkable consistency of studies which do not have objective outcome measures, and are unblinded, -- they then ignore the question is this the Hawthorne effect [https://en.wikipedia.org/wiki/Hawthorne_effect]. If you show an interest in people they respond positively Hawthorne effect.

    If these guys were doing card tricks on a street corner we would have a grudging respect --- they are involved in health care and they are coming up with dodgy black boxes [GRADE] and supporting research which is fundamentally flawed.
     
    Michelle, Ariel, alktipping and 4 others like this.
  15. cassava7

    cassava7 Senior Member (Voting Rights)

    Messages:
    983
    Much like @Jonathan Edwards' criticisms, Irving et al. (2017) wrote a critical review that is not specific to GRADE but that focuses on it in 5 points: "(1) lack of information on validity and reliability, (2) poor concurrent validity, (3) may not account for external validity, (4) may not be inherently logical, (5) susceptibility to subjectivity" [1].

    Norris and Bero (2016) highlight some of the same problems [2]:
    But the response to their concerns from the US GRADE Network seems to be that, to improve inter-rater reliability, raters should receive training on GRADE and use the GRADEpro software [2, Comments]:
    Kavanagh (2009), who has a similar position as @Jonathan Edwards on GRADE, comes to the same conclusion after repeating the issues above (external and internal consistency, not inherently logical, lack of validation although this may have evolved since, potential for bias) [3]:
    [1] Irving M, Eramudugolla R, Cherbuin N, Anstey KJ. A Critical Review of Grading Systems: Implications for Public Health Policy. Eval Health Prof. 2017;40(2):244-262. doi:10.1177/0163278716645161 (free access: Sci-hub link)

    [2] Norris SL, Bero L. GRADE Methods for Guideline Development: Time to Evolve?. Ann Intern Med. 2016;165(11):810-811. doi:10.7326/M16-1254

    [3] Kavanagh BP. The GRADE system for rating clinical guidelines. PLoS Med. 2009;6(9):e1000094. doi:10.1371/journal.pmed.1000094
     
    Last edited: Mar 4, 2021
  16. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    I see now what 'transparent' is supposed to mean - to have the reasoning explicit. But GRADE does not do this. It just requires that you say you downgraded one pip for bias and one pip for indirectness or whatever. Does it require you to say what your reasons are? I think it would be better simply to have a rule at NICE and Cochrane that reasons for evaluations must be given in full.
     
    Hutan, Ariel, alktipping and 7 others like this.
  17. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    Thanks for the literature @cassava7. So I am re-inventing the wheel, but maybe if the wheel has been forgotten by the people that matter that is not so bad! Also sense a degree of polite restraint in some of the critique that I would want to blow away.

    I don't understand quite what Kavanagh means here:
    There is a very good alternative to using the GRADE system to rate clinical guidelines: clinicians and organizations should use published guidelines while considering the clinical context, the credentials, and any conflicts of interest among the authors, as well as the expertise, experience, and education of the practitioner.

    What published guidelines should clinicians use? It seems not GRADE, but what then? The final conclusion seems to be not to have grading rules until they are of proven benefit and safety.
     
  18. Kitty

    Kitty Senior Member (Voting Rights)

    Messages:
    5,032
    Location:
    UK
    Clinical trial, then? :laugh:
     
    alktipping and FMMM1 like this.
  19. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    Having read Kavanagh I think I understand the sentence about guidelines. He is suggesting that clinicians should look at a recommendation in a guideline and then themselves judge the evidence on the basis of reading all the papers.

    What is slightly puzzling about Kavanagh's account is that it is about grading recommendations rather than evidence. With organisations like NICE recommendations are heavily coloured by cost-effectiveness and resource considerations. And the clinician does not have the opportunity to make up their own mind and act on it. On a broader front for GPs Kavanagh's suggestion is no use because they do not have the time. What would make more sense would be to say that the clinicians issuing the guidelines should make up their own minds on the basis of reading the papers.
     
    Hutan, alktipping, Barry and 5 others like this.
  20. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,267
    Location:
    London, UK
    The other useful thing about Kavanagh is that he/she makes it clear that GRADE is not itself evidence-based. I would like to get more detail on that. Presumably some sort of testing process has been done but it sounds as if when it has that things have turned out inconsistent.
     
    Hutan, Ariel, alktipping and 7 others like this.

Share This Page