ME/CFS Skeptic
Senior Member (Voting Rights)
I plan to submit these comments to the current version of the Cochrane review next week (sorry I took me so long to write these down - long story). I thought it might be useful to post it here on S4ME first in case someone notices any mistakes so that I can still correct these before formally submitting it.
Comments to the Cochrane review on exercise therapy for CFS
I appreciate the efforts made by Cochrane and the authors to correct some of the errors in the previous version of this review. There are however some major problems that remain and significantly impact the results and conclusion. I hope to clarify these in my comments below. All my comments concern the main comparison of graded exercise therapy (GET) versus a passive control condition (treatment as usual or relaxation/flexibility therapy).
‘Long-term’ follow-up results downplayed
The review highlights assessments made directly after treatment ended and downplays assessments made several months later even though the latter formed the primary outcome for the trials that provide most of the data.
The authors’ conclusions in the abstract for example states: “Exercise therapy probably has a positive effect on fatigue in adults with CFS compared to usual care or passive therapies.” At follow-up, however, the difference in fatigue scores between the exercise- and passive control groups were no longer statistically significant (Analysis 1.2). The three largest trials which provide most of the data in this Cochrane review (PACE, FINE, and Powell et al., 2001) defined their primary outcome at this follow-up assessment 52 to 70 weeks post-randomization. Their intent was likely to reduce bias as one of these trials, the FINE trial, noted that “assessment at week 70 is required because short-term assessments of outcome in a chronic health condition such as CFS/ME can be misleading.” Using interim assessments made directly after treatment ended, to claim that “exercise therapy probably has a positive effect on fatigue” seems unwarranted as the majority (75%) of the data for this comparison comes from trials that defined their primary outcome several months later. At that time point, the pooled between-group differences were no longer statistically significant. This is true for other outcomes as well, such as physical functioning.
The review frequently uses the term ‘long-term follow up’ for assessments made 52 to 70 weeks post-randomization. This might be confusing to readers. The long-term follow-up results of the largest trial, the PACE trial, for example, refer to assessments made 2 years or longer after randomization as this is the terminology used by the trial authors in their protocol and publications. (1) The term ‘long-term follow up’ doesn’t normally refer to the assessments made a half year after treatment ended as these were reported as the main results of the trial. In the PACE-trial, patients in the GET group also received booster-sessions 36 weeks post-randomization. This indicates that the assessments 52 weeks post-randomization are not adequately described as a ‘long-term’ follow-up.
Fatigue post-treatment should be rated as low instead of moderate quality evidence
The certainty of evidence for all outcomes in comparison 1 (exercise therapy versus treatment as usual, relaxation or flexibility) was assessed as low to very low according to the GRADE system. (2) The sole exception is fatigue measured at the end of treatment which was assessed as providing moderate certainty evidence. It is unclear why the certainty of evidence for this outcome wasn’t downgraded for inconsistency and/or imprecision as was the case for physical function measured at the end of treatment.
The meta-analysis of post-treatment fatigue was associated with considerable heterogeneity (I2 = 80%, P< 0.0001). This heterogeneity was mainly caused by one outlier, the trial by Powell et al. If this trial is excluded, heterogeneity is reduced to acceptable levels (I2 = 26%, P = 0.24) but the standardized mean difference (SMD) drops by one third, from -0.66 to -0.44. This corresponds to a 2.3 point instead of 3.4 point reduction when re-expressed on the 33-point Chalder Fatigue Scale, a difference that may no longer be clinically meaningful. A minimal important difference (MID) of 3 points on the Chalder Fatigue Scale has previously been used in an exercise trial for CFS. (3)
Fatigue post-treatment could also be downgraded for imprecision as the confidence interval crosses the line of no clinically significant effect. The 95% confidence interval of the SMD for fatigue (.31-1.10) corresponds to a 1.6 to 5.3 point interval when re-expressed on the 33-point Chalder Fatigue Scale. For continuous outcomes, the GRADE handbook recommends: “Whether you will rate down for imprecision is dependent on the choice of the difference (Δ) you wish to detect and the resulting sample size required.” Given that the authors of this Cochrane review specified a MID of 2.3 for the Chalder Fatigue Scale and that a MID of 3 points or higher has been used for CFS (3) and other chronic conditions (4,5), it seems warranted to downgrade this outcome for imprecision.
I recognize that for both inconsistency and imprecision the case isn’t clear-cut. The GRADE handbook, however, writes that if there is a borderline case to downgrade the certainty of evidence for two factors, it is recommended to downgrade for at least one of them. The handbook writes: “If, for instance, reviewers find themselves in a close-call situation with respect to two quality issues (risk of bias and, say, precision), we suggest rating down for at least one of the two.” (2) Therefore the outcome fatigue measured at the end of treatment should preferably be downgraded to low certainty evidence.
Problems with the Chalder Fatigue Scale
Fatigue is the primary outcome measure of this review and it has the largest effect size. All exercise trials in the review used a version of the Chalder Fatigue Scale to measure fatigue. The sole exception is the trial by Jason et al., 2007 which only provided assessments at follow-up and didn’t report a statistically significant difference between the exercise and control group. Consequently, a large part of the conclusion of this review is based on how trial participants filled in different versions of the Chalder Fatigue Scale after receiving exercise therapy.
Unfortunately, several problems have been noted with the Chalder Fatigue Scale and its scoring systems. (6,7) The questionnaire was not included in the Common Data Elements (CDEs) formulated for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) in 2017. Instead, the CDE listed 9 other questionnaires for the assessment of fatigue in patients with ME/CFS. (8) In my comments below I hope to clarify that some of the issues with the Chalder Fatigue Scale could have significantly impacted the results and conclusion of the Cochrane review.
Ceiling effects
Ceiling effects have been noted for the Chalder Fatigue Scale especially if bimodal scoring (0-11) is used. (9,10) In the trial by Powell et al., for example, patients had a fatigue score of 10.28 out of 11 points at baseline. In the FINE Trial, the second-largest trial in the review, patients had a score of 10.45 out of 11 points at baseline. An increase in fatigue might not be recorded in these trials as most participants already had a score close to the maximum on the scale.
If a worsening of fatigue is equally likely in the exercise- and passive control group, ceiling effects might not have favored one over the other. But this assumption is rather unlikely as a worsening of symptoms following (physical) exertion is one of the characteristics of CFS (11) and in multiple surveys CFS patients report to have worsened following GET. (12,13) More generally, participating in an exercise intervention has been shown to increase the relative risk of non-serious adverse events. (14) Therefore, it seems reasonable to assume that more CFS patients in the exercise than in the control group could have experienced an increase in fatigue after scoring (close to) the maximum on the Chalder Fatigue Scale. This would have distorted the results and caused a false impression of improvement.
I would like to spell out this argument more clearly as it could easily be overlooked or misinterpreted. To clarify we could use an imaginary exercise trial where all participants have a fatigue score of 9 out of 11 at the start of the trial. In the passive control group, half of the participants’ fatigues scores increase with 2 points while in the other half it decreases with 2 points. The average of the control group at the end of the trial would still be 9 out of 11. In the exercise group, half of the participants’ fatigues scores increase with 6 points while for the other half it decreases with 6 points. Their average is not 9 but 7 out of 11 because an increase of 6 points could not be fully recorded on the scale. Something similar might have happened in exercise trials for CFS where patients scored close to the maximum on the Chalder Fatigue Scale at the start of the trial.
Interpretation problems
The Chalder Fatigue Scale also has problems of interpretability as it asks trial participants if they experience fatigue symptoms less than usual compared to when they were last well. (6,7) When questionnaires are completed after the treatment ends, patients might be confused and compare themselves to how they were before the trial started, rather than when they were last well. This misinterpretation occurred in a Japanese trial exploring the effects of yoga in CFS. (15) One of the participants recorded very low scores on the Chalder Fatigue Scale post-treatment because she was confused by the baseline comparison. The authors note: “whereas the intent was to compare her current condition to when she last felt well, she had been sick and almost bed-bound for some years so she misunderstood ‘than usual’ as ‘than the days sick in bed’ because it had become a regular part of life for her.” In a trial on cognitive behavioral therapy in multiple sclerosis, patients in both the intervention and control group reported having less fatigue on the Chalder Fatigue Scale than healthy controls at the end of the trial. (16) A plausible explanation is that patients wanted to indicate that they had less fatigue since the start of the trial rather than compared to when they were last well. These interpretation problems question the validity of the Chalder Fatigue Scale in measuring improvements over time.
In this Cochrane review, none of the problems with the Chalder Fatigue Scale are mentioned or taken into consideration when interpreting effect sizes.
Bias due to lack of blinding
The review focuses on patient-reported outcome measures (PROMs) in trials where blinding of patients and healthcare providers was not practically feasible. This creates a high risk of response- and expectancy bias. (17,18) The authors have downgraded the certainty of evidence of almost all outcomes with one level according to the GRADE system. As I hope to explain in this section, there are reasons to think this does not adequately address the risk of bias involved.
A 2014 review by Hrobjartsson et al. on trials that compared blinded and non-blinded groups, reported an average difference in effect size for patient-reported outcomes of -0.56. (19) This is very similar to the effect sizes reported in this review and suggests that bias due to a lack of blinding questions not only the certainty of evidence but the evidence itself. Response- and expectancy bias were likely quite high in the exercise trials included in the review as the intervention included close collaboration with therapists and strong statements designed to raise patients’ expectations of treatment. A therapist manual of the PACE trial, for example, said about patients: “[…] it is important that you encourage optimism about the progress that they may make with this approach. You can explain the previous positive research findings of GET and show in the way you discuss goals and use language that you believe they can get better.” (20)
Unfortunately, the review does not consider the possibility that the small to moderate effect sizes found could be the result of bias associated with a lack of blinding patients and therapists. Downgrading the certainty of evidence with only one level seems to underestimate the risk of bias involved.
Objective outcomes not reported
One of the major problems with the review is that except for various data on health service resources from the PACE trial, it doesn’t report on objective outcome measures that were included in the eight exercise trials. Objective outcomes are thought to be more robust than PROMs towards bias associated with a lack of blinding. One of the largest studies on bias in randomized trials, for example, concluded: “Our results suggest that, as far as possible, clinical and policy decisions should not be based on trials in which blinding is not feasible and outcome measures are subjectively assessed. Therefore, trials in which blinding is not feasible should focus as far as possible on objectively measured outcomes…” (21)
The authors of the Cochrane review did the opposite by focusing on subjective outcomes and largely ignoring objective outcomes. The eight exercise trials in this review have data on employment, activity levels, and various kinds of fitness tests. These did not show the improvements seen on the subjective outcomes measures (such as fatigue questionnaires) that were summarized and highlighted in this review. In 2018, Vink & Vink-Niese provided an overview of objective outcomes in the eight exercise trials for CFS in the Cochrane review. They concluded:
The lack of reporting on objective outcomes was already noted by Kindlon T and Courtney R after a major update of the Cochrane review in 2015. The authors of the review responded that it was not possible to report on objective measurements as the protocol did not include them. The protocol published by Edmonds et al. in 2001, however, does mention “employment status”, and “timed walking tests and tests of strength or aerobic capacity” as outcomes. (22) It is unclear why these weren’t reported.
The review as it currently stands gives the impression that outcomes were cherry-picked. The outcomes that showed improvements in the GET-group (PROMs that are sensitive to bias) were highlighted while outcomes that didn’t or rarely show improvements in the GET group (objective outcomes that are more robust to bias) were largely ignored.
The fact that patients in the GET group weren’t able to increase their fitness on objective tests might help interpret scores on the SF-36 physical functioning subscale. Normally there is a significant correlation between this questionnaire and objective measures of fitness. (23) In the exercise trials, however, there seems to be a discrepancy between improvements reported with the SF-36 physical functioning questionnaire and the lack of significant improvements seen on more objective measures of fitness. This suggests that bias due to lack of blinding might have affected the validity of PROMs summarized in this review. Because objective outcomes were not reported in the Cochrane review, its readers will not be aware of this. After reading the reported improvements on the SF-36 subscale, they may be under the impression that CFS patients get fitter following GET, an assumption that is not supported by objective measurements.
No information on compliance
A lack of improvement on objective measures of fitness following a months-long exercise program could indicate a problem with compliance. Unfortunately, the review doesn’t discuss the level of compliance in the eight exercise trials even though this was mentioned as an outcome measure in the protocol. Kindlon T highlighted the need for compliance measures in his comments to the review in 2015, explaining that “Information on adherence and what exercise was actually done is important in terms of helping clinicians, and indeed patients, to interpret and use the data.” Unfortunately, none of the updates of the review acted upon this.
Information on compliance is relevant as several reports have suggested that CFS patients experience an “activity ceiling.” (24,25) These small, in-depth studies have used objective measurements of activity such as accelerometers and reported that ME/CFS patients struggle to significantly increase their physical activity level for a long time. (25–27) Black & McCully, for example, report that “the inability to sustain target activity levels, associated with pronounced worsening of symptomology, suggests the subjects with CFS had reached their activity limit.” (25) Friedberg & Sohl suggest that patients may be reducing other activities to keep up with exercise prescriptions. (27) Something similar might have occurred in the eight exercise trials included in the Cochrane review. This would have implications for the data on safety and acceptability of GET reported in this review. Therefore I would recommend including a section on compliance measures (or lack thereof) in future updates of the review.
....... End of part I .......
Comments to the Cochrane review on exercise therapy for CFS
I appreciate the efforts made by Cochrane and the authors to correct some of the errors in the previous version of this review. There are however some major problems that remain and significantly impact the results and conclusion. I hope to clarify these in my comments below. All my comments concern the main comparison of graded exercise therapy (GET) versus a passive control condition (treatment as usual or relaxation/flexibility therapy).
‘Long-term’ follow-up results downplayed
The review highlights assessments made directly after treatment ended and downplays assessments made several months later even though the latter formed the primary outcome for the trials that provide most of the data.
The authors’ conclusions in the abstract for example states: “Exercise therapy probably has a positive effect on fatigue in adults with CFS compared to usual care or passive therapies.” At follow-up, however, the difference in fatigue scores between the exercise- and passive control groups were no longer statistically significant (Analysis 1.2). The three largest trials which provide most of the data in this Cochrane review (PACE, FINE, and Powell et al., 2001) defined their primary outcome at this follow-up assessment 52 to 70 weeks post-randomization. Their intent was likely to reduce bias as one of these trials, the FINE trial, noted that “assessment at week 70 is required because short-term assessments of outcome in a chronic health condition such as CFS/ME can be misleading.” Using interim assessments made directly after treatment ended, to claim that “exercise therapy probably has a positive effect on fatigue” seems unwarranted as the majority (75%) of the data for this comparison comes from trials that defined their primary outcome several months later. At that time point, the pooled between-group differences were no longer statistically significant. This is true for other outcomes as well, such as physical functioning.
The review frequently uses the term ‘long-term follow up’ for assessments made 52 to 70 weeks post-randomization. This might be confusing to readers. The long-term follow-up results of the largest trial, the PACE trial, for example, refer to assessments made 2 years or longer after randomization as this is the terminology used by the trial authors in their protocol and publications. (1) The term ‘long-term follow up’ doesn’t normally refer to the assessments made a half year after treatment ended as these were reported as the main results of the trial. In the PACE-trial, patients in the GET group also received booster-sessions 36 weeks post-randomization. This indicates that the assessments 52 weeks post-randomization are not adequately described as a ‘long-term’ follow-up.
Fatigue post-treatment should be rated as low instead of moderate quality evidence
The certainty of evidence for all outcomes in comparison 1 (exercise therapy versus treatment as usual, relaxation or flexibility) was assessed as low to very low according to the GRADE system. (2) The sole exception is fatigue measured at the end of treatment which was assessed as providing moderate certainty evidence. It is unclear why the certainty of evidence for this outcome wasn’t downgraded for inconsistency and/or imprecision as was the case for physical function measured at the end of treatment.
The meta-analysis of post-treatment fatigue was associated with considerable heterogeneity (I2 = 80%, P< 0.0001). This heterogeneity was mainly caused by one outlier, the trial by Powell et al. If this trial is excluded, heterogeneity is reduced to acceptable levels (I2 = 26%, P = 0.24) but the standardized mean difference (SMD) drops by one third, from -0.66 to -0.44. This corresponds to a 2.3 point instead of 3.4 point reduction when re-expressed on the 33-point Chalder Fatigue Scale, a difference that may no longer be clinically meaningful. A minimal important difference (MID) of 3 points on the Chalder Fatigue Scale has previously been used in an exercise trial for CFS. (3)
Fatigue post-treatment could also be downgraded for imprecision as the confidence interval crosses the line of no clinically significant effect. The 95% confidence interval of the SMD for fatigue (.31-1.10) corresponds to a 1.6 to 5.3 point interval when re-expressed on the 33-point Chalder Fatigue Scale. For continuous outcomes, the GRADE handbook recommends: “Whether you will rate down for imprecision is dependent on the choice of the difference (Δ) you wish to detect and the resulting sample size required.” Given that the authors of this Cochrane review specified a MID of 2.3 for the Chalder Fatigue Scale and that a MID of 3 points or higher has been used for CFS (3) and other chronic conditions (4,5), it seems warranted to downgrade this outcome for imprecision.
I recognize that for both inconsistency and imprecision the case isn’t clear-cut. The GRADE handbook, however, writes that if there is a borderline case to downgrade the certainty of evidence for two factors, it is recommended to downgrade for at least one of them. The handbook writes: “If, for instance, reviewers find themselves in a close-call situation with respect to two quality issues (risk of bias and, say, precision), we suggest rating down for at least one of the two.” (2) Therefore the outcome fatigue measured at the end of treatment should preferably be downgraded to low certainty evidence.
Problems with the Chalder Fatigue Scale
Fatigue is the primary outcome measure of this review and it has the largest effect size. All exercise trials in the review used a version of the Chalder Fatigue Scale to measure fatigue. The sole exception is the trial by Jason et al., 2007 which only provided assessments at follow-up and didn’t report a statistically significant difference between the exercise and control group. Consequently, a large part of the conclusion of this review is based on how trial participants filled in different versions of the Chalder Fatigue Scale after receiving exercise therapy.
Unfortunately, several problems have been noted with the Chalder Fatigue Scale and its scoring systems. (6,7) The questionnaire was not included in the Common Data Elements (CDEs) formulated for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) in 2017. Instead, the CDE listed 9 other questionnaires for the assessment of fatigue in patients with ME/CFS. (8) In my comments below I hope to clarify that some of the issues with the Chalder Fatigue Scale could have significantly impacted the results and conclusion of the Cochrane review.
Ceiling effects
Ceiling effects have been noted for the Chalder Fatigue Scale especially if bimodal scoring (0-11) is used. (9,10) In the trial by Powell et al., for example, patients had a fatigue score of 10.28 out of 11 points at baseline. In the FINE Trial, the second-largest trial in the review, patients had a score of 10.45 out of 11 points at baseline. An increase in fatigue might not be recorded in these trials as most participants already had a score close to the maximum on the scale.
If a worsening of fatigue is equally likely in the exercise- and passive control group, ceiling effects might not have favored one over the other. But this assumption is rather unlikely as a worsening of symptoms following (physical) exertion is one of the characteristics of CFS (11) and in multiple surveys CFS patients report to have worsened following GET. (12,13) More generally, participating in an exercise intervention has been shown to increase the relative risk of non-serious adverse events. (14) Therefore, it seems reasonable to assume that more CFS patients in the exercise than in the control group could have experienced an increase in fatigue after scoring (close to) the maximum on the Chalder Fatigue Scale. This would have distorted the results and caused a false impression of improvement.
I would like to spell out this argument more clearly as it could easily be overlooked or misinterpreted. To clarify we could use an imaginary exercise trial where all participants have a fatigue score of 9 out of 11 at the start of the trial. In the passive control group, half of the participants’ fatigues scores increase with 2 points while in the other half it decreases with 2 points. The average of the control group at the end of the trial would still be 9 out of 11. In the exercise group, half of the participants’ fatigues scores increase with 6 points while for the other half it decreases with 6 points. Their average is not 9 but 7 out of 11 because an increase of 6 points could not be fully recorded on the scale. Something similar might have happened in exercise trials for CFS where patients scored close to the maximum on the Chalder Fatigue Scale at the start of the trial.
Interpretation problems
The Chalder Fatigue Scale also has problems of interpretability as it asks trial participants if they experience fatigue symptoms less than usual compared to when they were last well. (6,7) When questionnaires are completed after the treatment ends, patients might be confused and compare themselves to how they were before the trial started, rather than when they were last well. This misinterpretation occurred in a Japanese trial exploring the effects of yoga in CFS. (15) One of the participants recorded very low scores on the Chalder Fatigue Scale post-treatment because she was confused by the baseline comparison. The authors note: “whereas the intent was to compare her current condition to when she last felt well, she had been sick and almost bed-bound for some years so she misunderstood ‘than usual’ as ‘than the days sick in bed’ because it had become a regular part of life for her.” In a trial on cognitive behavioral therapy in multiple sclerosis, patients in both the intervention and control group reported having less fatigue on the Chalder Fatigue Scale than healthy controls at the end of the trial. (16) A plausible explanation is that patients wanted to indicate that they had less fatigue since the start of the trial rather than compared to when they were last well. These interpretation problems question the validity of the Chalder Fatigue Scale in measuring improvements over time.
In this Cochrane review, none of the problems with the Chalder Fatigue Scale are mentioned or taken into consideration when interpreting effect sizes.
Bias due to lack of blinding
The review focuses on patient-reported outcome measures (PROMs) in trials where blinding of patients and healthcare providers was not practically feasible. This creates a high risk of response- and expectancy bias. (17,18) The authors have downgraded the certainty of evidence of almost all outcomes with one level according to the GRADE system. As I hope to explain in this section, there are reasons to think this does not adequately address the risk of bias involved.
A 2014 review by Hrobjartsson et al. on trials that compared blinded and non-blinded groups, reported an average difference in effect size for patient-reported outcomes of -0.56. (19) This is very similar to the effect sizes reported in this review and suggests that bias due to a lack of blinding questions not only the certainty of evidence but the evidence itself. Response- and expectancy bias were likely quite high in the exercise trials included in the review as the intervention included close collaboration with therapists and strong statements designed to raise patients’ expectations of treatment. A therapist manual of the PACE trial, for example, said about patients: “[…] it is important that you encourage optimism about the progress that they may make with this approach. You can explain the previous positive research findings of GET and show in the way you discuss goals and use language that you believe they can get better.” (20)
Unfortunately, the review does not consider the possibility that the small to moderate effect sizes found could be the result of bias associated with a lack of blinding patients and therapists. Downgrading the certainty of evidence with only one level seems to underestimate the risk of bias involved.
Objective outcomes not reported
One of the major problems with the review is that except for various data on health service resources from the PACE trial, it doesn’t report on objective outcome measures that were included in the eight exercise trials. Objective outcomes are thought to be more robust than PROMs towards bias associated with a lack of blinding. One of the largest studies on bias in randomized trials, for example, concluded: “Our results suggest that, as far as possible, clinical and policy decisions should not be based on trials in which blinding is not feasible and outcome measures are subjectively assessed. Therefore, trials in which blinding is not feasible should focus as far as possible on objectively measured outcomes…” (21)
The authors of the Cochrane review did the opposite by focusing on subjective outcomes and largely ignoring objective outcomes. The eight exercise trials in this review have data on employment, activity levels, and various kinds of fitness tests. These did not show the improvements seen on the subjective outcomes measures (such as fatigue questionnaires) that were summarized and highlighted in this review. In 2018, Vink & Vink-Niese provided an overview of objective outcomes in the eight exercise trials for CFS in the Cochrane review. They concluded:
“The analysis of the objective outcomes in the trials provides sufficient evidence to conclude that graded exercise therapy is an ineffective treatment for myalgic encephalomyelitis/ chronic fatigue syndrome.” (7)
The lack of reporting on objective outcomes was already noted by Kindlon T and Courtney R after a major update of the Cochrane review in 2015. The authors of the review responded that it was not possible to report on objective measurements as the protocol did not include them. The protocol published by Edmonds et al. in 2001, however, does mention “employment status”, and “timed walking tests and tests of strength or aerobic capacity” as outcomes. (22) It is unclear why these weren’t reported.
The review as it currently stands gives the impression that outcomes were cherry-picked. The outcomes that showed improvements in the GET-group (PROMs that are sensitive to bias) were highlighted while outcomes that didn’t or rarely show improvements in the GET group (objective outcomes that are more robust to bias) were largely ignored.
The fact that patients in the GET group weren’t able to increase their fitness on objective tests might help interpret scores on the SF-36 physical functioning subscale. Normally there is a significant correlation between this questionnaire and objective measures of fitness. (23) In the exercise trials, however, there seems to be a discrepancy between improvements reported with the SF-36 physical functioning questionnaire and the lack of significant improvements seen on more objective measures of fitness. This suggests that bias due to lack of blinding might have affected the validity of PROMs summarized in this review. Because objective outcomes were not reported in the Cochrane review, its readers will not be aware of this. After reading the reported improvements on the SF-36 subscale, they may be under the impression that CFS patients get fitter following GET, an assumption that is not supported by objective measurements.
No information on compliance
A lack of improvement on objective measures of fitness following a months-long exercise program could indicate a problem with compliance. Unfortunately, the review doesn’t discuss the level of compliance in the eight exercise trials even though this was mentioned as an outcome measure in the protocol. Kindlon T highlighted the need for compliance measures in his comments to the review in 2015, explaining that “Information on adherence and what exercise was actually done is important in terms of helping clinicians, and indeed patients, to interpret and use the data.” Unfortunately, none of the updates of the review acted upon this.
Information on compliance is relevant as several reports have suggested that CFS patients experience an “activity ceiling.” (24,25) These small, in-depth studies have used objective measurements of activity such as accelerometers and reported that ME/CFS patients struggle to significantly increase their physical activity level for a long time. (25–27) Black & McCully, for example, report that “the inability to sustain target activity levels, associated with pronounced worsening of symptomology, suggests the subjects with CFS had reached their activity limit.” (25) Friedberg & Sohl suggest that patients may be reducing other activities to keep up with exercise prescriptions. (27) Something similar might have occurred in the eight exercise trials included in the Cochrane review. This would have implications for the data on safety and acceptability of GET reported in this review. Therefore I would recommend including a section on compliance measures (or lack thereof) in future updates of the review.
....... End of part I .......
Last edited: