Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Graham · Oct 15, 2019

rvallee said:
Ah but you are missing the magical ingredient: did you spend several months trying to convince them to think of themselves as better than they are and telling them not improving is their own fault? That's how the pros do it. Common mistake.

Yes, but I managed their improvement with just a few exhortations: does this make me a super-pro?

As far as normalizing scores are concerned, I'd have problems doing that with such a strange distribution: there's far too much clumping at the top end for relatively healthy folk, so the variety of scores is too limited. Remember that it is only a 21 point score (0 to 100 in steps of 5).

What comes over loud and clear from the CFQ analysis is that the scale does not go low enough to properly distribute the range of disability. A bimodal score of 11 covers the Likert scores 33 to 26 with reasonable density: as you move to higher bimodal scores, the clumping gets tighter.

Simon M · Oct 15, 2019

It’s also worth considering the MID in relation to the smallest possible improvement for a patient on the actual scale. For the CFQ, this is one point. So the PACE threshold of two points effectively means “a clinically useful improvement is anything bigger than the smallest possible improvement for an individual patient. For the CFQ, this is one point. So the PACE threshold of two points effectively means “a clinically useful improvement is anything bigger than the smallest possible improvement for a patient”.

And for the SF 36 physical function, the smallest possible improvement is five points (for moving from, for example, “limited a little” to lnot limited at all” on any one of 10 questions).

Some of the MID emerging from the studies are less than one unit better than “the smallest improvement that can be measured for an individual patient”. That doesn’t seem right.

ME/CFS Science Blog · Oct 15, 2019

Simon M said:
Some of the MID emerging from the studies are Less than one unit better than “the smallest improvement that can be measured for an individual patient”. That doesn’t seem right.

Yes, That's probably my mistake. The value of 3 for the SF-36, for example, was a normalised value. I'm trying to figure out how to recalculate them to the raw score because the study that Larun et al. cited (Ward et al. 2014) also used a normalised. It gave a MID of 7.1. I suspect the raw value will be bigger.

Dolphin · Oct 15, 2019

Michiel Tack said:
Yes, That's probably my mistake. The value of 3 for the SF-36, for example, was a normalised value. I'm trying to figure out how to recalculate them to the raw score because the study that Larun et al. cited (Ward et al. 2014) also used a normalised. It gave a MID of 7.1. I suspect the raw value will be bigger.

I believe to convert, the formula would be: (normalised score * healthy population SD used to make normalised score) /10.

ME/CFS Science Blog · Oct 15, 2019

Dolphin said:
I believe to convert, the formula would be: (normalised score * healthy population SD used to make normalised score) /10.

I've been searching but can't really find the full explanation. I suspect it's explained in this reference:

Ware JE, Kosinski M, Bjorner JB, Turner-Bowker DM, Gandek B, Maruish ME. User's manual for the SF-36v2 Health Survey. 2nd ed. QualityMetric Incorporated; Lincoln, RI: 2007.

I found some info here: http://www.med.uottawa.ca/courses/CMED6203/Index_notes/SF36 fn .pdf

The 0 - 100 scores for the eight subscales are standardized using a z-score transformation, which involves subtracting the population mean score for that scale from each respondent’s score, and dividing the difference by the population standard deviation. For example, the raw score for Physical Function is transformed by subtracting 83.29094 and dividing by 23.75883; the formulae are given in the scoring manual (14, Table 6.12). Next, to give a mean of 50 and standard deviation of 10, the z-score is multiplied by 10 and 50 is added to the product.

Dolphin · Oct 15, 2019

Michiel Tack said:
I believe to convert, the formula would be: (normalised score * healthy population SD used to make normalised score) /10.

Click to expand...

I've been searching but can't really find the full explanation. I suspect it's explained in this reference:

Ware JE, Kosinski M, Bjorner JB, Turner-Bowker DM, Gandek B, Maruish ME. User's manual for the SF-36v2 Health Survey. 2nd ed. QualityMetric Incorporated; Lincoln, RI: 2007.

I found some info here: http://www.med.uottawa.ca/courses/CMED6203/Index_notes/SF36 fn .pdf

The 0 - 100 scores for the eight subscales are standardized using a z-score transformation, which involves subtracting the population mean score for that scale from each respondent’s score, and dividing the difference by the population standard deviation. For example, the raw score for Physical Function is transformed by subtracting 83.29094 and dividing by 23.75883; the formulae are given in the scoring manual (14, Table 6.12). Next, to give a mean of 50 and standard deviation of 10, the z-score is multiplied by 10 and 50 is added to the product.

Yes, I have seen something similar before. Z-scores are generally calculated like that in statistics.

Just to clarify that I was talking about converting the MID.

If I recall correctly population scores for the SF 36 physical functioning scale used for normalised scoring tend to be around 24.

ME/CFS Science Blog · Oct 15, 2019

Dolphin said:
Just to clarify that I was talking about converting the MID.

You could be right.

Swigris et al. 2010 said they recalculated the MID raw value for Kosinski et al. 2000 (reported as 7.7) to a norm-based value of 3.

Your formula is not far off 3 x 24/10 =7.2

So for the Ward et al. study the raw value would be 7.1 x 24/10 = 17 points. If we would use a more conservative estimate for the SD, say 20, it would still result in a doubling of the norm-based MID.

ME/CFS Science Blog · Oct 15, 2019

Some more info from here: https://c-path.org/wp-content/uploads/2017/05/2017_session5_scoringfinal.pdf

EDIT: if we use the formula here, the 7.1 norm-based score would correspond to an original score of 16.54.

BruceInOz · Oct 15, 2019

Michiel Tack said:
I found some info here: http://www.med.uottawa.ca/courses/CMED6203/Index_notes/SF36 fn .pdf

But we know the general population data for SF36 PF is non normal due to hitting the ceiling of 100 for healthy people. (Eg. The first hit on a google search for "sf36 physical functioning non normal distribution" is https://www.ncbi.nlm.nih.gov/m/pubmed/17515490/ )

So do any of these manipulations using means and standard deviations actually mean anything if it's non normal?

ME/CFS Science Blog · Oct 16, 2019

BruceInOz said:
So do any of these manipulations using means and standard deviations actually mean anything if it's non normal?

I don't know. We're mostly doing the calculation to get the original value.

The study the Cochrane review cites for the minimal important difference (MID) for physical function uses a norm-based value of 7.1 Norm-based values are not very relevant to the MID estimate needed and they tend to be substantially lower than the original value. Perhaps the authors overlooked this. The info I posted about norm-based value was just to figure out how to recalculate the norm-based value to its original value. Using the formula above, the original value would be 16.5. Even if the figures used, such as the standard deviation are a little bit different, it would probably still result in a doubling of the figure.

So this seems to be the case: the Cochrane review says that a MID of 7 for physical function is common but one of the studies it refers to actually found a MID of 16.5.

Lucibee · Oct 16, 2019

Lucibee said:
When doing stats on data, you have to make certain assumptions based on its distribution so that the models work. For things like testing comparison of means, it's the distribution of the residuals that matters, not necessarily the data itself.

Although I did train as a statistician, it was 20 years ago, and I misspoke here. The distribution of residuals is important when regressing one variable on another, say when looking at association between two variables x and y, but not for comparison of means. For the quantity of data here (PACE trial and meta-analyses), parametric methods (that assume a normal distribution) are OK, and give a reasonable estimate, but, as I hope you can see, they do not describe the data particularly well. That the groups show differences, and those differences are statistically significant is not in doubt. However, it's the reasons *why* there are differences, and whether those differences are clinically significant that's important. Statsing the hell out of the data won't tell you anything about those reasons.

And using stats to find a "clinically important difference" on what is already a subjective scale, rather than just asking the patients themselves what matters to them, is not going to give you the "right" answer.

ME/CFS Science Blog · Oct 16, 2019

So I'm going to change the original summary I've posted on Minimal Important Differences (MID).

The studies of Ward et al. 2014 and Swigris et al. 2010 gave values of 7.1 and 3 as MID but these were norm-based. Recalculated these would be somewhere around 16.5 and 7.1. Thanks to Dolphin for pointing this out.

I've also found another study (Quintana et al. 2005) on patients who've had a hip joint replacement which reported a high MID of 20.40 for SF-36 physical function. It would be interesting to find more of these MID for sf-36 physical function. The provisional overview thus far looks something like this:

My conclusion is a bit changed now: it seems that the authors of the Cochrane review took the lowest value of 7 for MID.

There's still something weird with the Wyrwhich et al. 2007 study as it reports estimates lower than 5. As @Simon M pointed out that's smaller than the smallest possible improvement on the scale. The authors note this in the text:

the magnitude of nearly all of the small mean change estimates across all SF-36 scales did not reach the change associated with state change values, the amount of change in a scale score that occurs when shifting up or down one response category of only one item.

[...]

Results from the SF-36 did not perform as strongly. Indeed, they often yielded mean values that were smaller than a state change value.

I guess this is a consequence of the anchoring method. They usually substract the scores on the questionnaire corresponding with patients judgement for 'a little bit improvement' with the questionnaire scores for 'the same'. But patient's judgement of this is not perfect and questionnaire scores do not always reflect a patient's physical functioning. Sometimes the questionnaire scores for 'remained the same' will be larger than for a small improvement. I think that's why MID estimates using the anchoring method can sometimes be inadequate.

Lucibee · Oct 16, 2019

Michiel Tack said:
But patient's judgement of this is not perfect and questionnaire scores do not always reflect a patient's physical functioning.

Please can you listen to yourself here!
Who is filling in the questionnaire? The patient.
How is the patient's physical functioning measured? By a subjective questionnaire that was filled in by the patient.
Whether they are using the anchoring method or the distribution method, both rely on data that was obtained using the patient's judgement.

See also Pitfalls and Problems section in this paper by Angst et al.

Also, don't forget that the patient's judgement and perception of their physical functioning may have been materially altered by the intervention itself, without [necessarily] affecting their underlying physical functioning.

ME/CFS Science Blog · Oct 16, 2019

Lucibee said:
Whether they are using the anchoring method or the distribution method, both rely on data that was obtained using the patient's judgement.

True. But if you take for example half a standard deviation of a sample of patients who filled in the questionnaire, it's clear that the patients didn't determine the MID. The estimate is not their judgment even though their data was used.

The anchoring method uses a bit more of the patient's judgement but it relies on an agreement between questionnaire scores and the patient assessment of global clinical improvement. The agreement between these can be a bit messy. Sometimes the physical function score of patients who said they stayed the same will be higher than for patients who said they improved a little. That's how you get these low MID estimates that are lower than the smallest possible improvement on the scale. This is once again not necessary what patients judge the MID to be. I bet that if you would take the time to explain the problem and ask patients what they think is the MID on the scale, none of them would come up with a value that is lower than the smallest possible improvement on the scale.

Simon M · Oct 16, 2019

Lucibee said:
Who is filling in the questionnaire? The patient

And for many outcomes such as fatigue, mood and pain nobody knows better than the patient the levels of fatigue, pain and mood they experience. I think it is important to respect patients' ability to describe their own experience.
added (I agree that the SF36 PF has alternatives)

This is separate from two specific problems:

1) response bias eg expectation bias and "treatments" that aim to change how people feel (or at least how they describe) symptoms. (This might also apply to MID studies themelves). This is why it's important to include objective outcomes and a red flag when subjective gains are not matched by objective ones.

2) Problems that the scale fails to adequately measure what it claims to measure, eg with CFQ.

Michiel Tack said:
The anchoring method uses a bit more of the patient's judgement but it relies on an agreement between questionnaire scores and the patient assessment of global clinical improvement.

Which is a flawed approach, because it uses the global score to validate fatigue, physical function scores etc, when they are not the same thing.

Michiel Tack said:
I bet that if you would take the time to explain the problem and ask patients what they think is the MID on the scale, none of them would come up with a value that is lower than the smallest possible improvement on the scale.

I'm sure you are right. Both scales development and MID work need patients as partners. Their perspective on what matters (accurate assement of the symptom, MID) is the most important one.

Jonathan Edwards said:
Actual usefulness is what clinical usefulness means... It means useful to the patients.

Precisely.

Added JE quote.

Snow Leopard · Oct 16, 2019

Lucibee said:
And using stats to find a "clinically important difference" on what is already a subjective scale, rather than just asking the patients themselves what matters to them, is not going to give you the "right" answer.

x100000000000000

Trish · Oct 16, 2019

Simon M said:
And for many outcomes such as fatigue, mood and pain nobody knows better than the patient the levels of fatigue, pain and mood they experience. I think it is important to respect patients' ability to describe their own experience.

I completely agree with you that the patient is the best at judging and describing what they experience, but the ways provided to report that experience are so deeply flawed as to be meaningless.

They are vague descriptors, not measures on a linear scale. I have no idea how I'm supposed to relate the statements on the CFQ or SF-36 to my lived experience in a meaningful and consistent way.

Yet the researchers pretend they are getting numerical data that can be analysed as they would the patient's heights or ages. It's simply not that sort of data.

Esther12 · Oct 16, 2019

This is a bit of an abstract post of little relevence to the Cochrane review!

Simon M said:
And for many outcomes such as fatigue, mood and pain nobody knows better than the patient the levels of fatigue, pain and mood they experience. I think it is important to respect patients' ability to describe their own experience.

I'd normally agree with "nobody knows better", but I'm not sure how much we should "respect patients' ability to describe their own experience". It seems like humans often aren't very good at that sort of thing.

If we were to ask people to rate how happy they were on a scale of 1-100, those answers would only be of limited use letting us work out who was happiest, what was associated with happiness, etc. Even ignoring the problems you mention with response bias and the specific problems with the questions being asked, I still think that there are reasons to be cautious of self-reported states for a whole range of things.

We've seen the harm that can be done by treating patient self-reports with too little respect, and of being in a group whose self-reports are presumed to be of less worth than others, but I think there's some reason to be a bit wary of self-reports from humans as a whole.

ME/CFS Science Blog · Oct 16, 2019

Just throwing this in here:

The Jason et al. 2007 studied determined clinically significant effects for the physical function scale as follows:

Ferguson, Robinson, and Splaine (2002) have recommended using the Reliable Change Index (RCI), which evaluates the magnitude of change scores necessary for a measure to be considered statistically reliable. To determine clinical significance of the Physical Functioning subscale, the baseline minus 12-month follow-up change scores need to exceed the age adjusted RCI and the 12-month follow-up scores must fall within the normative value (defined for this study as being within one standard deviation of the mean). Using these two criteria, the CBT, COG, ACT, and RELAX groups achieved clinically significant improvements for physical functioning in 18.2%, 30.4%, 11.1%, and 21.7% of participants, but there were no significant differences among conditions [v2 (3, N = 86) = 2.41, p = .49].

Lucibee · Oct 16, 2019

Esther12 said:
I'd normally agree with "nobody knows better", but I'm not sure how much we should "respect patients' ability to describe their own experience". It seems like humans often aren't very good at that sort of thing.

And that's precisely why these researchers take a paternalistic attitude to conditions like ME/CFS - simultaneously allowing subjective self-report (because it suits them to do so), whilst at the same time dismissing patients' experience of harm or lack of improvement.

Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019 and new date December 2024

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)