PACE trial TSC and TMG minutes released

I remember some on-line comment from someone who said that they were a PACE participant, and that they had stopped conducting the step test towards the end of the trial as they were getting too many bad reactions. It's always worth being cautious about what one reads on the internet, but these minutes indicate that there was concern about the step test. I wonder if there was a sharp decline in participation rates towards the end of the trial.... guess we'd need access to the secret data to check.
A slight digression maybe, but there was also a PACE participant who said that when their heart monitoring went beserk due to GET, their was a strong effort to simply not record those results because they were unexpected and "must be wrong", or something akin to that. This turned up fairly recently, possibly in response to one of @dave30th blogs.
 
Useful to see that @Sasha.
I guess my point relates to what we can reasonably expect the body language to be when the assessor hands the patient the self-report form on APT - which would be crucial to bias.

Whatever the manuals say the therapists and assessors might have viewed the trial as trying to see if any of the three treatments were associated with improvement. If APT provided the right conditions for natural recovery then maybe there would be quite a bit of recovery. If the 'philosophy' was that nobody knew and that this was a good opportunity to test both the therapies in use and a pacing regimen - as it was sold to patient organisations as far as I understand, (And presumably deliberately implied in the use of the PACE acronym to satisfy the patient organisations.) then body language would be of one sort. If the 'philosophy' was as Horton said to counter one approach against the other then the body language would be quite different.

Here's a critique of the patients and therapists manuals from Magical Medicine, p316, the APT patient manual and therapists manual starts at p 369:

http://www.margaretwilliams.me/2010/magical-medicine_hooper_feb2010.pdf
 
My views of PACE's methodological flaws are on record, but there are a couple of issues around these particular arguments:
I agree that it is hardly credible that a trial overseen by the MRC clinical trials team should be doing this in 2009. When I was involved in setting up a trial of this size in 2000 it was absolutely clear to us that we had to define our primary end point before starting anything. The importance of predefined primary endpoints was understood at least by the mid 1990s.
PACE have already argued that they primary outcomes are still self-reported fatigue and function; they changes from categorical to continuous reporting*, but that's not quite the same as totally switching outcomes (such as switching from school attendance to fatigue...).

* in practice, this also involved downgrading the threshold of improvement from the protocol as "success" on the new primary outcome still had to be exceed the weaker measure of "clinically useful difference", effectively much less than the original "clinically important difference". But that complicates the argument.


But we are still left with the fact that anyone who knew about trials at this date knew that what was being done would have made the study unpublishable if it was dealing with a commercial drug. The suspicion has to be that the MRC somehow felt that because this was a therapist-delivered treatment within the healthcare system that pragmatic healthcare policy somehow justified abandoning scientific methods.
That's very true, but sadly it's also the norm for trialling behavioural/pyschological therapies. So it's not a flaw specific to PACE.

If these are the two main arguments, then this looks less than a simple 'open and shut' case.
 
I haven't read these minutes yet as closely as some appear to have done, but my impression is there was no discussion in the TSC of dropping the actigraphy.

They have come up with quite elaborate explanations as to why this wasn't carried out at the end. Has anyone seen anything on it? Shouldn't this have been addressed by the TSC?
 
PACE have already argued that they primary outcomes are still self-reported fatigue and function; they changes from categorical to continuous reporting*,

I think one would regard it as the same crime. Primary endpoints had to be defined by specific method as well as subject matter. For my trial we had to choose whether to use a DAS or ACR20 or ACR50 score - no leeway. Having an 'analysis strategy committee' right up to the end of an unblinded study is plain ridiculous.

That's very true, but sadly it's also the norm for trialling behavioural/pyschological therapies. So it's not a flaw specific to PACE.

If these are the two main arguments, then this looks less than a simple 'open and shut' case.

I don't follow that, @Simon M. Bad methodology may have been the norm for behavioural studies but that does not make it any less necessary for the MRC trials staff to ensure that the methods are adequate for trials they fund. I don't know how many behavioural studies were MRC funded at this scale but I doubt many. If the incompetence was more generalised all the more reason for it to be flagged up as a systematic failure. And in rheumatology everyone was aware that therapist-delivered studies were useless without objective measures. In the late 1980s I was research advisor to our physic unit and had to keep explaining to them that studies like this are just no good.

The third argument is that a trial of therapies that deliberately aim to induce subjective bias in patients' perceptions of their illness are uniquely vulnerable to this design.
 
Last edited:
That's very true, but sadly it's also the norm for trialling behavioural/pyschological therapies. So it's not a flaw specific to PACE.

If these are the two main arguments, then this looks less than a simple 'open and shut' case.

I don't follow this.

The question is what is the norm for clinical trials.
How many clinical trial experts would argue that PACE met the standard required?

How many scientists in general, would even argue that there can be different levels of evidence required?
Either the bar is met or it isn't, for anyone to claim the bar can bet set lower for them, would be admitting they are more into voodoo than science.
 
A really dumb question for the experts here:

Are the self-report measures used in PACE and the other studies (Chalder fatigue, SF36, functional score etc or whatever) summative scores or multiple threshold scores. I have assumed they were probably summative (i.e. you ask lots of questions and add up the answer scores). I think multicomponent assessment should probably nearly always be multiple threshold (you only score an improvement if all, or a stringent selection of, subsections improve by a certain threshold). This avoids getting an apparently meaningful improvement just from changes in the most subjective component.
 
jennysunstar‏ @jennysunstar 3h3 hours ago
Replying to @keithgeraghty
Keith please look at MM p58, FMS patients actively recruited to trial, confirmed by Dept of Health, GPs given financial inducements to get FM patients to join trial.
http://www.margaretwilliams.me/2010/magical-medicine_hooper_feb2010.pdf …

What a fiasco. I think those people using PACE for their undergrad course on how not to design trials will need to extend the course unit:

Item 17: Bad cherry unpicking
 
A really dumb question for the experts here:

Are the self-report measures used in PACE and the other studies (Chalder fatigue, SF36, functional score etc or whatever) summative scores or multiple threshold scores. I have assumed they were probably summative (i.e. you ask lots of questions and add up the answer scores). I think multicomponent assessment should probably nearly always be multiple threshold (you only score an improvement if all, or a stringent selection of, subsections improve by a certain threshold). This avoids getting an apparently meaningful improvement just from changes in the most subjective component.
I presume the point you are making here is that you could have a few crucially important criteria, where too low a score on any one renders the whole thing a failure no matter what the other scores, but if you muddle them all together then the priorities/weightings get lost.
 
  • Like
Reactions: Jan
A really dumb question for the experts here:

Are the self-report measures used in PACE and the other studies (Chalder fatigue, SF36, functional score etc or whatever) summative scores or multiple threshold scores. I have assumed they were probably summative (i.e. you ask lots of questions and add up the answer scores). I think multicomponent assessment should probably nearly always be multiple threshold (you only score an improvement if all, or a stringent selection of, subsections improve by a certain threshold). This avoids getting an apparently meaningful improvement just from changes in the most subjective component.

They are used as summative but I don't think the story is that simple. For example the CFQ has two major components (physical and mental fatigue) so really it should be a multi-component thing but its not used as that. With the Sf36 there are questions about different physical abilities (walking, washing and dressing, stair climbing, general activities, bending and kneeling, lifting). I tend to think there are probably different components in here as well - looking at data from the ONS study for mid valued scores it was hard to make assumptions of what activities couldn't be done.
 
There are sentences on the net looking very like the one @Daisymay quoted where the name is Wessely. He was certainly director of the CTU responsible for PACE as far as I can see, at least at some point. What is not clear to me is whether this is the entire MRC CTU or whether it is a subsection.

Would this be referring, for example, to this in Magical Medicine:
'It is Professor Wessely who is in charge of the MRC PACE Clinical Trial Unit.'

My reading is that Hooper refers to PACE as 'MRC PACE', so Wessely is in charge of the CTU for the trial (the one at King's which played a leading role), but it's not the MRC CTU. It's the CTU for MRC PACE.
 
I presume the point you are making here is that you could have a few crucially important criteria, where too low a score on any one renders the whole thing a failure no matter what the other scores, but if you muddle them all together then the priorities/weightings get lost.

The point is actually not that.
It is common to have additive scores but the logic of them is unclear. If you were confident that measure A was a reliable indicator of what you want to know then just measure A. If you are not so confident that a change in A has no false positives or false negatives it is reasonable to measure B, C and D if you think that helps. If any of B, C or D is better than A than just use one of them, so the assumption is that none of these give you full confidence you are measuring what you want to know but that measuring several gives you more confidence.

That takes you into Bayes's theorem. If A is reasonably likely to indicate the answer and A shows such and such then in what way does knowing B change the probability that you know what you want to know about improvement?

A standard clinical algorithm in medicine is to ask one question A and if it suggests a problem you then ask another question B that confirms your interpretation of the first answer if it is positive but may well be irrelevant if the first answer was negative. You could score in this algorithmic way but there are times when a positive answer to a third question C will reasonably stand in for a positive answer to the A but might need to be corroborated a slightly different way with D. And if either the patient or assessor is filling in a form you often want a full dataset anyway because you may be interested in answers to individual questions.

So summation, even with weighting does not come into logical clinical decisions about how well somebody is a lot of the time. There is a geometric (or Bayesian) mathematical interaction between the values for the answers. However, if you want some form of quantitation, rather than just an estimate of the probability of improvement then you want to be able to say that you are adequately confident that they are slightly but noticeably better or substantially better or almost completely better based on corroboration from different measures. That is what we do in rheumatology. We use the American College of Rheumatology criteria for robust confidence in one of these three levels of improvement which require a 20%, 50% and 70% improvement in at least five variables, two of which are mandatory. That does not mean that the person is '20% better' or '50% better' because the system is deliberately stringent and there may not be such a thing as '20% better' anyway. But it robustly captures rates of mild, moderate and major improvement.
 
It is common to have additive scores but the logic of them is unclear. If you were confident that measure A was a reliable indicator of what you want to know then just measure A.

The point that Likert made is that if want to measure someones views on A then you may wish to ask them basically the same question multiple times because they may make errors or not understand the question. So you ask the same question multiple times using different words and add up the answers.

hat takes you into Bayes's theorem. If A is reasonably likely to indicate the answer and A shows such and such then in what way does knowing B change the probability that you know what you want to know about improvement?
I'm not sure quite what you are getting at here but I was assuming that you would have P(Improvement| A,B,C,D) and P(Improvement |A) so that the probability of an improvement will be more likely given more evidence. But I think there are assumptions of the pieces of evidence being IID (Independent and identically distributed).

A standard clinical algorithm in medicine is to ask one question A and if it suggests a problem you then ask another question B that confirms your interpretation of the first answer if it is positive but may well be irrelevant if the first answer was negative. You could score in this algorithmic way but there are times when a positive answer to a third question C will reasonably stand in for a positive answer to the A but might need to be corroborated a slightly different way with D. And if either the patient or assessor is filling in a form you often want a full dataset anyway because you may be interested in answers to individual questions.

Sounds more like a decision tree. There are ways of deriving decision trees (or forests) from data and this could be used potentially to order the questions in something like the SF36 scale. This may then make more sense. There is some of this in assumptions in the scale for example if you can walk a mile easily I assume its not worth asking if you can walk a block easily (but the questions are currently asked).

But I do think there is value in the answers (possibly more than the sum of the scores). Also in the changes to answers.
 

What a fiasco.

These two quotes from Sharpe and White stood out to me in the huge report by Malcolm Hooper/Margaret Williams linked to above http://www.margaretwilliams.me/2010/magical-medicine_hooper_feb2010.pdf , p.58


‘At the International Science Festival held on 9th April 2004 in Edinburgh, Michael Sharpe spoke in a debate entitled “Science and ME” and was specifically asked if patients with fibromyalgia (FM) were to be included in the PACE Trial of “CFS/ME”. Sharpe replied in the affirmative, implying that patients with FM needed to be included in order to reach the recruitment target. He said (verbatim): “We want broadness and heterogeneity in the trial”.’

White, speaking/writing at a different time:

‘White also asserts: “There is little doubt that patients with fibromyalgia have close comorbidities with several disorders that are regarded by many as functional disorders. These include: irritable bowel syndrome (and) CFS/ME. I have argued against this idea, suggesting that the commonality is abnormal illness behaviour as seen in the process of somatisation” and he concludes “The final area of commonality between fibromyalgia and CFS concerns the social risk markers for maintenance of both disorders”’


The what now?

The conflation of FM with CFS (Clauw’s term) was evident in Clauw’s commentary on the GETSET paper:

“Fatigue (measured by the updated Chalder fatigue questionnaire) is the symptom that improved the most in the GETSET trial, and similar findings have been noted in many previous studies that assessed this outcome in chronic fatigue syndrome or related conditions such as fibromyalgia, making graded exercise a cornerstone of treatment recommendations in these conditions.8–11

8 Clauw DJ. Fibromyalgia: a clinical review. JAMA 2014; 311: 1547–55.
9 Hauser W, Thieme K, Turk DC. Guidelines on the management of fibromyalgia syndrome—a systematic review. Eur J Pain 2010; 14: 5–10.
10 Macfarlane GJ, Kronisch C, Dean LE, et al. EULAR revised recommendations for the management of fibromyalgia. Ann Rheum Dis 2016; published online July 4. DOI:10.1136/annrheumdis-2016-209724.
11 Jones KD, Liptan GL. Exercise interventions in fibromyalgia: clinical applications from the evidence. Rheum Dis Clin North Am 2009; 35: 373–91.

Edited to clarify that White was speaking at a different time than Sharpe.
 
Sharpe replied in the affirmative, implying that patients with FM needed to be included in order to reach the recruitment target. He said (verbatim): “We want broadness and heterogeneity in the trial”.’

That is shocking. So they are both saying on different occasions that FM is part of their definition of ME/CFS. So they went even wider than the Oxford definition.
 
I think one would regard it as the same crime. Primary endpoints had to be defined by specific method as well as subject matter. For my trial we had to choose whether to use a DAS or ACR20 or ACR50 score - no leeway. Having an 'analysis strategy committee' right up to the end of an unblinded study is plain ridiculous.
I’m assuming that those last two are two different versions of the same basic questionnaire? That would seem not so different from the change that PACE made (which does undermine their approach).

I don't follow that, @Simon M. Bad methodology may have been the norm for behavioural studies but that does not make it any less necessary for the MRC trials staff to ensure that the methods are adequate for trials they fund. I don't know how many behavioural studies were MRC funded at this scale but I doubt many. If the incompetence was more generalised all the more reason for it to be flagged up as a systematic failure. And in rheumatology everyone was aware that therapist-delivered studies were useless without objective measures. In the late 1980s I was research advisor to our physic unit and had to keep explaining to them that studies like this are just no good.
I don’t know how easy it is to get a Parliamentary enquiry set up, but I assume there is a certain threshold cross, and the “defendants“ might get a voice. PACE still has its defenders in Parliament. My point is that if one of the major points is a generic-to-field issue, it could be harder to get the enquiry off the ground. But I’ve know-how idea how hard it is to do these things (I thought most such enquiries were run by existing select committees).
 
The point is actually not that.
It is common to have additive scores but the logic of them is unclear. If you were confident that measure A was a reliable indicator of what you want to know then just measure A. If you are not so confident that a change in A has no false positives or false negatives it is reasonable to measure B, C and D if you think that helps. If any of B, C or D is better than A than just use one of them, so the assumption is that none of these give you full confidence you are measuring what you want to know but that measuring several gives you more confidence.

That takes you into Bayes's theorem. If A is reasonably likely to indicate the answer and A shows such and such then in what way does knowing B change the probability that you know what you want to know about improvement?

A standard clinical algorithm in medicine is to ask one question A and if it suggests a problem you then ask another question B that confirms your interpretation of the first answer if it is positive but may well be irrelevant if the first answer was negative. You could score in this algorithmic way but there are times when a positive answer to a third question C will reasonably stand in for a positive answer to the A but might need to be corroborated a slightly different way with D. And if either the patient or assessor is filling in a form you often want a full dataset anyway because you may be interested in answers to individual questions.

So summation, even with weighting does not come into logical clinical decisions about how well somebody is a lot of the time. There is a geometric (or Bayesian) mathematical interaction between the values for the answers. However, if you want some form of quantitation, rather than just an estimate of the probability of improvement then you want to be able to say that you are adequately confident that they are slightly but noticeably better or substantially better or almost completely better based on corroboration from different measures. That is what we do in rheumatology. We use the American College of Rheumatology criteria for robust confidence in one of these three levels of improvement which require a 20%, 50% and 70% improvement in at least five variables, two of which are mandatory. That does not mean that the person is '20% better' or '50% better' because the system is deliberately stringent and there may not be such a thing as '20% better' anyway. But it robustly captures rates of mild, moderate and major improvement.
Many thanks. I think one of the ongoing concerns is how the PACE authors glibly translate questionnaire answers into simplistic numbers, and then presume to treat them as if those numbers are continuous variables with linear relationship to their underlying real-world parameters. Quite apart from the fact they are subjective anyway. But I don't imagine that issue is confined to PACE.
 
Many thanks. I think one of the ongoing concerns is how the PACE authors glibly translate questionnaire answers into simplistic numbers, and then presume to treat them as if those numbers are continuous variables with linear relationship to their underlying real-world parameters. Quite apart from the fact they are subjective anyway. But I don't imagine that issue is confined to PACE.

Abso-*&^%$-lutely!

I can bore you endlessly on my explorations into CFQ as a change variable if you like... :bookworm:
 
Back
Top Bottom