More PACE trial data released

Discussion in 'Psychosomatic research - ME/CFS and Long Covid' started by JohnTheJack, May 7, 2019.

  1. sTeamTraen

    sTeamTraen Established Member (Voting Rights)

    Messages:
    46
    I did these analyses back in May, but I don't remember where I got the description from, so the explanation that follows comes from reading my code. Maybe someone else can fill in the gaps. (There is some information on p. 828 of White et al.)

    There are four variables that determine recovery and improvement:
    CFQLSOV0 - CFQ score on a Likert scale at baseline
    CFQBSOV0 - CFQ score on a binary scale (that's what it says here; not sure what that means, maybe Y/N for 11 criteria?) at baseline
    PCFQLS52 - CFQ score on a Likert scale at 52 weeks
    PCFQBS52 - CFQ score on a binary (see above) scale at 52 weeks

    Improvement is defined as 2 or more points lower scores on the Likert scale, or 1 or more points lower score on the binary scale, over the 52 weeks. (Pretty modest, it seems to me.)
    Recovery is defined as a Likert score of <= 18, or a binary score of <= 3, at 52 weeks. The third column in this section on the third tab indicates the amount by which the improvement or recovery scores are higher using the Likert scale versus the binary scale.

    The "or" clauses in the preceding statements reflect my code, which I am reading right here. My memory is generally poor for this sort of detail, and I don't have ready to hand the story of which was used, or what (if anything) was changed, so again I encourage people who have been looking at these articles for longer than me to contribute here.
     
    JohnTheJack likes this.
  2. adambeyoncelowe

    adambeyoncelowe Senior Member (Voting Rights)

    Messages:
    2,736
    So I may be foggy, but APT and SMT seem to have better improvement/recovery rates at the bottom than the other arms (check the percentages given)?
     
    Last edited: Aug 18, 2019
    MSEsperanza likes this.
  3. sTeamTraen

    sTeamTraen Established Member (Voting Rights)

    Messages:
    46
    Regardless of the identity of the reviewers, it is extremely unlikely that they were given the data, or indeed that they even asked for them. Until very recently this was almost unheard of, and despite the science reform movement, it's not yet the norm. Sometimes reviewers who insist on seeing the data are told that their services are no longer required.
     
  4. sTeamTraen

    sTeamTraen Established Member (Voting Rights)

    Messages:
    46
    My reading is that CBT and GET have better outcomes (improvement and recovery) than APT and SMC with either scoring scheme, but that the gap between (CBT/GET) and (APT/SMC) is smaller with the Likert-based scoring.

    I am now getting a hazy memory of having pointed this out before, perhaps in response to criticism of the method having apparently been switched from binary to Likert. That is (again, in this hazy memory, which could be completely wrong), the switch benefitted all methods, but it benefitted APT/SMC more than CBT/GET and so it's hard to argue that the switch was done explicitly or exclusively to boost the results of what are assumed to be the authors' preferred modalities.
     
    Annamaria and JohnTheJack like this.
  5. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    My understanding is that this switch came before they had started analysing data, as a part of other changes to the primary outcomes, but (I thikn) after data from the FINE trial had shown that for FINE a change from bimodal to likert would have allowed them to report a statistically significant improvement.

    So even if this ended up benefiting SMC/SMC+APT more, I think it's quite possible it was done in the expectation that it would make it easier for them to report positive results (along with the other changes to their primary outcome). Having said that, I don't think that is a point worth raising, and generally speculating about why the PACE researchers did something is probably an unhelpful distraction from just stating the problems with what they did.

    For the recovery criteria, it looks likely to me that the protocol deviations were finalised after trial data had been analysed (though the timing of this has never been clearly stated (I made some recent comments of that in relation to an Oxford university statement that claimed "the study authors have repeatedly made clear, the criteria were changed on expert advice and with oversight committee approvals before any of the outcome data was analysed" here): https://www.s4me.info/threads/a-general-thread-on-the-pace-trial.807/page-35#post-193225

    In their recovery paper they say (my emphasis): "We changed three of the thresholds for measuring recovery from our original protocol (White et al. 2007) before the analysis, as explained below." Which is different to "before any of the outcome data was analysed", though some readers might miss the significance of that.

    Wessely used the same phrasing of 'the analysis': https://twitter.com/user/status/848125525482774530


    If they had finalised (absurd) changes to their recovery criteria after they'd been analysing the trial data, without being clear about this, that is something they should be expect to explain.

    Thanks to everyone looking at this data - does it help show anything interesting about the associations between the trials more objective and subjective outcomes, and if those are the same amongst the different treatment types [CBT+SMC/GET+SMC vs APT+SMC/SMC]?
     
    Last edited: Aug 18, 2019
  6. adambeyoncelowe

    adambeyoncelowe Senior Member (Voting Rights)

    Messages:
    2,736
    These numbers are confusing me:
    What are they? This is where it looks like APT and SMC look better on something.
     
  7. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK
    We should remember they are different scoring schemes in that there are patients that improved with one scheme and got worse with the other. When there was this effect it was often they improved on the likert scale and got worse on the binomial one. In the main paper they gave a reason for switching by claiming that the likert one is more accurate but it isn't like measuring in mm rather than cm rather its like using 2 different bendy rulers and they provide no empirical evidence that either is a better estimate of ground truth.
     
    ukxmrv, obeat, Milo and 11 others like this.
  8. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    When I said, "I very much doubt", I really meant, "I know, because I used to work there."

    I have also done stats reviews for them, asked for more info/data, and not heard back. The whole stats/review situation is literally the reason I left The Lancet to do my MSc in Medical Statistics. I'm sorry I didn't fight harder. But that's why I'm fighting now.

    Right. I'm off to have a look at Borg. (Resistance is futile.)
     
    BurnA, 2kidswithME, RuthT and 23 others like this.
  9. sTeamTraen

    sTeamTraen Established Member (Voting Rights)

    Messages:
    46
    Yes. When they switched from binary to Likert scoring (I think this was the order of the switch), APT/SMC results got better by more than CBT/GET results got better.

    Hence, any claim by opponents of the trial that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look 'even better' than APT/SMC" is not supported, because the relative advantage of CBT/GET over APT/SMC went down when this was done.

    I suppose that these numbers could be used as support for the claim that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look better, full stop" (i.e., only looking at recovery/improvement rates for those modalities). But if you're looking for a smoking gun, it probably isn't here.

    (The usual disclaimers apply as to whether the numbers mean anything clinically relevant outside of this table.)
     
  10. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK
    I think an analysis that was done on the Fine data showed that the trial changed from having a non-significant result to a significant one when they did this. The PACE team may have been aware of this when they were writing their stats plan.
     
  11. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,563
    Location:
    UK
    There reasoning for doing the change was not well justified so it demonstrates a willingness to tinker with the reporting without good reason.

    I don't think the CRQ scores really have much meaning. Its a fairly random set of questions with some being about physical fatigue slightly less about mental fatigue and one or two potentially relating to depression. So in the structure they have an inbuilt bias towards physical fatigue. The language of the questions is also very confusing as it asks for change in fatigue (from some changing reference point) so it will suffer from recall biases.
     
  12. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,420
    Except of course they did not do this after formally analyzing the data.
     
  13. adambeyoncelowe

    adambeyoncelowe Senior Member (Voting Rights)

    Messages:
    2,736
    Thank you. It just wasn't clear to me what those numbers were showing, until you explained it. I didn't assume it was a smoking gun, but they did look weird.

    So it's the increase caused by a switch from bimodel to Likert scoring. Thanks. That's helpful.
     
    Last edited: Aug 19, 2019
  14. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,860
    Location:
    Australia
    Your personal experience? :p

    The underlying problem is that they did not provide what they said they would provide in the protocol. Even if they sufficiently justified the change in the manuscript (they didn't), I don't think the manuscript should have passed peer review without at least providing a sensitivity analysis using what was defined as primary outcomes in the protocol.

    The effect of the outcome switch isn't necessarily to make CBT or GET look better, the goal is for them to report more optimistic outcomes in general.

    There is one message that they think is more important than promoting CBT/GET, and that is the idea that some CFS patients improve/recover.
     
    Last edited: Aug 21, 2019
  15. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    Okey dokey. I've had a wee look at Borg. Borg is a measure of perceived effort experienced by participants as they undergo the step test (equivalent to ascending and descending three flights of stairs in 2 minutes). I explained a bit more about the step test here: https://lucibee.wordpress.com/2018/07/06/pace-trial-tiptoeing-around-the-step-test/

    All I'm going to do is to describe the data, because I think that's all we *can* do with what they've provided. It might be interesting to look at any relation with physical function, but because the test was self-paced, I'm not sure that will tell us much more than we already know.

    First is the scoring system. The Borg scale is scored between 6 and 20.
    Pts were given a laminated sheet, which indicated how to rate their effort:
    7="Very, very light"
    9="Very light"
    11="Fairly light"
    13="Somewhat hard"
    15="Hard"
    17="Very hard"
    19="Very, very hard"

    Here are the baseline scores in all groups:
    baseline_borg.png

    Rather than look at final scores, I've calculated "improvement" (+ve nos indicate improvement, -ve nos indicate deterioration), because that's more informative as to whether fitness has actually been improved. [eta for "clarity", or not!]

    At 24 weeks:
    Borg_improvement_24weeks.png

    At 52 weeks:
    Borg_improvement_52weeks.png

    And at 52 weeks split by group:
    Borg_improvement_52wks_split.png
     
    Last edited: Aug 21, 2019
    JohnTheJack, Barry, rvallee and 9 others like this.
  16. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    But I'll add a caveat. An improvement in Borg score could simply be indicating that pts are better at pacing themselves at the step test, rather than being any fitter.

    This is further muddied by interventions such as CBT, which aims to change pts perception of symptoms, and may be affecting they way they rate thing like "effort". This is why I think the shapes of the graphs are important, and not just their summary measures (means, medians etc).
     
  17. Trish

    Trish Moderator Staff Member

    Messages:
    55,414
    Location:
    UK
    I'm not clear how the Borg test works and what it's supposed to be measuring. Does the patient have to complete the task in the set time, and then rate how hard they had to work to complete it? Or do they do as much as they can in 2 minutes, or all the steps in as long as it takes?

    If someone with mild ME manages to complete all the steps within the 2 minutes, and finds it very hard, are they classed as more or less fit than someone who only manages to complete half the steps in the time by taking it more slowly, and rates it as medium effort, are they fitter or less fit? Or have I missed the point? (very likely).
     
    Annamaria, rvallee, MEMarge and 4 others like this.
  18. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    For the Borg scale, participants were asked to give a "rating number that best indicates what effort they felt the exercise had taken at the end of the step test" [exact wording from the trial protocol]. Participants were told that the step test would be measuring their fitness. It's supposed to take about 2 minutes, but from the instructions, it seems that it can take as long as the participant needs to - that's the self-paced element. So, yes, it will matter how many steps someone does, how long they take, and how much effort they rate it as. Which is why it isn't necessarily reliable, particularly in non-healthy individuals.

    [eta: Borg is not the test. The step test is the test, and Borg just gives an idea of how much effort the pt thought they were putting into it.]

    Here is the relevant page from the protocol:
    Step_test_protocol.png
     
    Last edited: Aug 21, 2019
  19. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,860
    Location:
    Australia
    The Borg scale is a relative scale which patients rate intensity of effort during an exercise test. It is meaningless to compare scores between patients, or between exercise tests for the same patient if the gap is more than about a day.

    Participants rating lower peak Borg scores doesn't necessarily mean the exercise felt easier, it can mean they did not exercise to the same level of intensity. I think GET patients were more motivated to exercise at a higher intensity, hence higher scores.
     
  20. NelliePledge

    NelliePledge Moderator Staff Member

    Messages:
    14,845
    Location:
    UK West Midlands
    Did they do anything with the non subjective data?
     
    Annamaria, MEMarge, Barry and 2 others like this.

Share This Page