More PACE trial data released

sTeamTraen · Aug 18, 2019

adambeyoncelowe said:
Can someone explain the improvements to me? On the third tab.

Lucibee said:
No idea. You'll need to ask @sTeamTraen

I did these analyses back in May, but I don't remember where I got the description from, so the explanation that follows comes from reading my code. Maybe someone else can fill in the gaps. (There is some information on p. 828 of White et al.)

There are four variables that determine recovery and improvement:
CFQLSOV0 - CFQ score on a Likert scale at baseline
CFQBSOV0 - CFQ score on a binary scale (that's what it says here; not sure what that means, maybe Y/N for 11 criteria?) at baseline
PCFQLS52 - CFQ score on a Likert scale at 52 weeks
PCFQBS52 - CFQ score on a binary (see above) scale at 52 weeks

Improvement is defined as 2 or more points lower scores on the Likert scale, or 1 or more points lower score on the binary scale, over the 52 weeks. (Pretty modest, it seems to me.)
Recovery is defined as a Likert score of <= 18, or a binary score of <= 3, at 52 weeks. The third column in this section on the third tab indicates the amount by which the improvement or recovery scores are higher using the Likert scale versus the binary scale.

The "or" clauses in the preceding statements reflect my code, which I am reading right here. My memory is generally poor for this sort of detail, and I don't have ready to hand the story of which was used, or what (if anything) was changed, so again I encourage people who have been looking at these articles for longer than me to contribute here.

adambeyoncelowe · Aug 18, 2019

sTeamTraen said:
I did these analyses back in May, but I don't remember where I got the description from, so the explanation that follows comes from reading my code. Maybe someone else can fill in the gaps. (There is some information on p. 828 of White et al.)

There are four variables that determine recovery and improvement:
CFQLSOV0 - CFQ score on a Likert scale at baseline
CFQBSOV0 - CFQ score on a binary scale (that's what it says here; not sure what that means, maybe Y/N for 11 criteria?) at baseline
PCFQLS52 - CFQ score on a Likert scale at 52 weeks
PCFQBS52 - CFQ score on a binary (see above) scale at 52 weeks

Improvement is defined as 2 or more points lower scores on the Likert scale, or 1 or more points lower score on the binary scale, over the 52 weeks. (Pretty modest, it seems to me.)
Recovery is defined as a Likert score of <= 18, or a binary score of <= 3, at 52 weeks. The third column in this section on the third tab indicates the amount by which the improvement or recovery scores are higher using the Likert scale versus the binary scale.

The "or" clauses in the preceding statements reflect my code, which I am reading right here. My memory is generally poor for this sort of detail, and I don't have ready to hand the story of which was used, or what (if anything) was changed, so again I encourage people who have been looking at these articles for longer than me to contribute here.

So I may be foggy, but APT and SMT seem to have better improvement/recovery rates at the bottom than the other arms (check the percentages given)?

sTeamTraen · Aug 18, 2019

Lucibee said:
I very much doubt that the peer reviewers saw any more data than was in the published study. It was fast-tracked - which means they had even less time to scrutinise it than normal (generally an overnight turnaround). And given that those reviewers were likely to be friends of the trial authors anyway...

Regardless of the identity of the reviewers, it is extremely unlikely that they were given the data, or indeed that they even asked for them. Until very recently this was almost unheard of, and despite the science reform movement, it's not yet the norm. Sometimes reviewers who insist on seeing the data are told that their services are no longer required.

sTeamTraen · Aug 18, 2019

adambeyoncelowe said:
So I may be foggy, but APT and SOT seem to have better improvement/recovery rates at the bottom than the other arms?

My reading is that CBT and GET have better outcomes (improvement and recovery) than APT and SMC with either scoring scheme, but that the gap between (CBT/GET) and (APT/SMC) is smaller with the Likert-based scoring.

I am now getting a hazy memory of having pointed this out before, perhaps in response to criticism of the method having apparently been switched from binary to Likert. That is (again, in this hazy memory, which could be completely wrong), the switch benefitted all methods, but it benefitted APT/SMC more than CBT/GET and so it's hard to argue that the switch was done explicitly or exclusively to boost the results of what are assumed to be the authors' preferred modalities.

Esther12 · Aug 18, 2019

sTeamTraen said:
My reading is that CBT and GET have better outcomes (improvement and recovery) than APT and SMC with either scoring scheme, but that the gap between (CBT/GET) and (APT/SMC) is smaller with the Likert-based scoring.

I am now getting a hazy memory of having pointed this out before, perhaps in response to criticism of the method having apparently been switched from binary to Likert. That is (again, in this hazy memory, which could be completely wrong), the switch benefitted all methods, but it benefitted APT/SMC more than CBT/GET and so it's hard to argue that the switch was done explicitly or exclusively to boost the results of what are assumed to be the authors' preferred modalities.

My understanding is that this switch came before they had started analysing data, as a part of other changes to the primary outcomes, but (I thikn) after data from the FINE trial had shown that for FINE a change from bimodal to likert would have allowed them to report a statistically significant improvement.

So even if this ended up benefiting SMC/SMC+APT more, I think it's quite possible it was done in the expectation that it would make it easier for them to report positive results (along with the other changes to their primary outcome). Having said that, I don't think that is a point worth raising, and generally speculating about why the PACE researchers did something is probably an unhelpful distraction from just stating the problems with what they did.

For the recovery criteria, it looks likely to me that the protocol deviations were finalised after trial data had been analysed (though the timing of this has never been clearly stated (I made some recent comments of that in relation to an Oxford university statement that claimed "the study authors have repeatedly made clear, the criteria were changed on expert advice and with oversight committee approvals before any of the outcome data was analysed" here): https://www.s4me.info/threads/a-general-thread-on-the-pace-trial.807/page-35#post-193225

In their recovery paper they say (my emphasis): "We changed three of the thresholds for measuring recovery from our original protocol (White et al. 2007) before the analysis, as explained below." Which is different to "before any of the outcome data was analysed", though some readers might miss the significance of that.

Wessely used the same phrasing of 'the analysis':

If they had finalised (absurd) changes to their recovery criteria after they'd been analysing the trial data, without being clear about this, that is something they should be expect to explain.

Thanks to everyone looking at this data - does it help show anything interesting about the associations between the trials more objective and subjective outcomes, and if those are the same amongst the different treatment types [CBT+SMC/GET+SMC vs APT+SMC/SMC]?

adambeyoncelowe · Aug 18, 2019

sTeamTraen said:
My reading is that CBT and GET have better outcomes (improvement and recovery) than APT and SMC with either scoring scheme, but that the gap between (CBT/GET) and (APT/SMC) is smaller with the Likert-based scoring.

I am now getting a hazy memory of having pointed this out before, perhaps in response to criticism of the method having apparently been switched from binary to Likert. That is (again, in this hazy memory, which could be completely wrong), the switch benefitted all methods, but it benefitted APT/SMC more than CBT/GET and so it's hard to argue that the switch was done explicitly or exclusively to boost the results of what are assumed to be the authors' preferred modalities.

These numbers are confusing me:

TX Improv- Improv- %incr
APT 99 65 52.3
CBT 113 87 29.9
GET 123 87 41.4
SMC 98 62 58.1
TX Recov-L Recov-b %incr
APT 34 17 100.0
CBT 60 31 93.5
GET 51 30 70.0
SMC 32 11 190.9

What are they? This is where it looks like APT and SMC look better on something.

Adrian · Aug 19, 2019

sTeamTraen said:
My reading is that CBT and GET have better outcomes (improvement and recovery) than APT and SMC with either scoring scheme, but that the gap between (CBT/GET) and (APT/SMC) is smaller with the Likert-based scoring.

I am now getting a hazy memory of having pointed this out before, perhaps in response to criticism of the method having apparently been switched from binary to Likert. That is (again, in this hazy memory, which could be completely wrong), the switch benefitted all methods, but it benefitted APT/SMC more than CBT/GET and so it's hard to argue that the switch was done explicitly or exclusively to boost the results of what are assumed to be the authors' preferred modalities.

We should remember they are different scoring schemes in that there are patients that improved with one scheme and got worse with the other. When there was this effect it was often they improved on the likert scale and got worse on the binomial one. In the main paper they gave a reason for switching by claiming that the likert one is more accurate but it isn't like measuring in mm rather than cm rather its like using 2 different bendy rulers and they provide no empirical evidence that either is a better estimate of ground truth.

Lucibee · Aug 19, 2019

sTeamTraen said:
Regardless of the identity of the reviewers, it is extremely unlikely that they were given the data, or indeed that they even asked for them. Until very recently this was almost unheard of, and despite the science reform movement, it's not yet the norm. Sometimes reviewers who insist on seeing the data are told that their services are no longer required.

When I said, "I very much doubt", I really meant, "I know, because I used to work there."

I have also done stats reviews for them, asked for more info/data, and not heard back. The whole stats/review situation is literally the reason I left The Lancet to do my MSc in Medical Statistics. I'm sorry I didn't fight harder. But that's why I'm fighting now.

Right. I'm off to have a look at Borg. (Resistance is futile.)

sTeamTraen · Aug 19, 2019

adambeyoncelowe said:
These numbers are confusing me:

What are they? This is where it looks like APT and SMC look better on something.

Yes. When they switched from binary to Likert scoring (I think this was the order of the switch), APT/SMC results got better by more than CBT/GET results got better.

Hence, any claim by opponents of the trial that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look 'even better' than APT/SMC" is not supported, because the relative advantage of CBT/GET over APT/SMC went down when this was done.

I suppose that these numbers could be used as support for the claim that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look better, full stop" (i.e., only looking at recovery/improvement rates for those modalities). But if you're looking for a smoking gun, it probably isn't here.

(The usual disclaimers apply as to whether the numbers mean anything clinically relevant outside of this table.)

Adrian · Aug 19, 2019

sTeamTraen said:
Hence, any claim by opponents of the trial that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look 'even better' than APT/SMC" is not supported, because the relative advantage of CBT/GET over APT/SMC went down when this was done.

I think an analysis that was done on the Fine data showed that the trial changed from having a non-significant result to a significant one when they did this. The PACE team may have been aware of this when they were writing their stats plan.

Adrian · Aug 19, 2019

sTeamTraen said:
so it's hard to argue that the switch was done explicitly or exclusively to boost the results of what are assumed to be the authors' preferred modalities.

There reasoning for doing the change was not well justified so it demonstrates a willingness to tinker with the reporting without good reason.

sTeamTraen said:
(The usual disclaimers apply as to whether the numbers mean anything clinically relevant outside of this table.)

I don't think the CRQ scores really have much meaning. Its a fairly random set of questions with some being about physical fatigue slightly less about mental fatigue and one or two potentially relating to depression. So in the structure they have an inbuilt bias towards physical fatigue. The language of the questions is also very confusing as it asks for change in fatigue (from some changing reference point) so it will suffer from recall biases.

Barry · Aug 19, 2019

sTeamTraen said:
Hence, any claim by opponents of the trial that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look 'even better' than APT/SMC" is not supported, because the relative advantage of CBT/GET over APT/SMC went down when this was done.

Except of course they did not do this after formally analyzing the data.

adambeyoncelowe · Aug 19, 2019

sTeamTraen said:
Yes. When they switched from binary to Likert scoring (I think this was the order of the switch), APT/SMC results got better by more than CBT/GET results got better.

Hence, any claim by opponents of the trial that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look 'even better' than APT/SMC" is not supported, because the relative advantage of CBT/GET over APT/SMC went down when this was done.

I suppose that these numbers could be used as support for the claim that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look better, full stop" (i.e., only looking at recovery/improvement rates for those modalities). But if you're looking for a smoking gun, it probably isn't here.

(The usual disclaimers apply as to whether the numbers mean anything clinically relevant outside of this table.)

Thank you. It just wasn't clear to me what those numbers were showing, until you explained it. I didn't assume it was a smoking gun, but they did look weird.

So it's the increase caused by a switch from bimodel to Likert scoring. Thanks. That's helpful.

Snow Leopard · Aug 21, 2019

sTeamTraen said:
Sometimes reviewers who insist on seeing the data are told that their services are no longer required.

Your personal experience?

sTeamTraen said:
Hence, any claim by opponents of the trial that "The authors switched to Likert scoring to binary scoring in order to make CBT/GET look 'even better' than APT/SMC" is not supported, because the relative advantage of CBT/GET over APT/SMC went down when this was done.

The underlying problem is that they did not provide what they said they would provide in the protocol. Even if they sufficiently justified the change in the manuscript (they didn't), I don't think the manuscript should have passed peer review without at least providing a sensitivity analysis using what was defined as primary outcomes in the protocol.

The effect of the outcome switch isn't necessarily to make CBT or GET look better, the goal is for them to report more optimistic outcomes in general.

There is one message that they think is more important than promoting CBT/GET, and that is the idea that some CFS patients improve/recover.

Lucibee · Aug 21, 2019

Okey dokey. I've had a wee look at Borg. Borg is a measure of perceived effort experienced by participants as they undergo the step test (equivalent to ascending and descending three flights of stairs in 2 minutes). I explained a bit more about the step test here: https://lucibee.wordpress.com/2018/07/06/pace-trial-tiptoeing-around-the-step-test/

All I'm going to do is to describe the data, because I think that's all we *can* do with what they've provided. It might be interesting to look at any relation with physical function, but because the test was self-paced, I'm not sure that will tell us much more than we already know.

First is the scoring system. The Borg scale is scored between 6 and 20.
Pts were given a laminated sheet, which indicated how to rate their effort:
7="Very, very light"
9="Very light"
11="Fairly light"
13="Somewhat hard"
15="Hard"
17="Very hard"
19="Very, very hard"

Here are the baseline scores in all groups:

Rather than look at final scores, I've calculated "improvement" (+ve nos indicate improvement, -ve nos indicate deterioration), because that's more informative as to whether fitness has actually been improved. [eta for "clarity", or not!]

At 24 weeks:

At 52 weeks:

And at 52 weeks split by group:

Lucibee · Aug 21, 2019

But I'll add a caveat. An improvement in Borg score could simply be indicating that pts are better at pacing themselves at the step test, rather than being any fitter.

This is further muddied by interventions such as CBT, which aims to change pts perception of symptoms, and may be affecting they way they rate thing like "effort". This is why I think the shapes of the graphs are important, and not just their summary measures (means, medians etc).

Trish · Aug 21, 2019

I'm not clear how the Borg test works and what it's supposed to be measuring. Does the patient have to complete the task in the set time, and then rate how hard they had to work to complete it? Or do they do as much as they can in 2 minutes, or all the steps in as long as it takes?

If someone with mild ME manages to complete all the steps within the 2 minutes, and finds it very hard, are they classed as more or less fit than someone who only manages to complete half the steps in the time by taking it more slowly, and rates it as medium effort, are they fitter or less fit? Or have I missed the point? (very likely).

Lucibee · Aug 21, 2019

Trish said:
I'm not clear how the Borg test works and what it's supposed to be measuring. Does the patient have to complete the task in the set time, and then rate how hard they had to work to complete it? Or do they do as much as they can in 2 minutes, or all the steps in as long as it takes?

If someone with mild ME manages to complete all the steps within the 2 minutes, and finds it very hard, are they classed as more or less fit than someone who only manages to complete half the steps in the time by taking it more slowly, and rates it as medium effort, are they fitter or less fit? Or have I missed the point? (very likely).

For the Borg scale, participants were asked to give a "rating number that best indicates what effort they felt the exercise had taken at the end of the step test" [exact wording from the trial protocol]. Participants were told that the step test would be measuring their fitness. It's supposed to take about 2 minutes, but from the instructions, it seems that it can take as long as the participant needs to - that's the self-paced element. So, yes, it will matter how many steps someone does, how long they take, and how much effort they rate it as. Which is why it isn't necessarily reliable, particularly in non-healthy individuals.

[eta: Borg is not the test. The step test is the test, and Borg just gives an idea of how much effort the pt thought they were putting into it.]

Here is the relevant page from the protocol:

Snow Leopard · Aug 21, 2019

Trish said:
I'm not clear how the Borg test works and what it's supposed to be measuring. Does the patient have to complete the task in the set time, and then rate how hard they had to work to complete it? Or do they do as much as they can in 2 minutes, or all the steps in as long as it takes?

The Borg scale is a relative scale which patients rate intensity of effort during an exercise test. It is meaningless to compare scores between patients, or between exercise tests for the same patient if the gap is more than about a day.

Participants rating lower peak Borg scores doesn't necessarily mean the exercise felt easier, it can mean they did not exercise to the same level of intensity. I think GET patients were more motivated to exercise at a higher intensity, hence higher scores.

NelliePledge · Aug 21, 2019

Did they do anything with the non subjective data?

More PACE trial data released

Established Member (Voting Rights)

Senior Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Administrator

Senior Member (Voting Rights)

Established Member (Voting Rights)

Administrator

Administrator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)