Cochrane Review: 'Exercise therapy for chronic fatigue syndrome', Larun et al. - New version October 2019

Discussion in 'Psychosomatic research - ME/CFS and Long Covid' started by MEMarge, Oct 2, 2019.

  1. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,420
    What evidence is there that it is normal? Given how subjective it is, how can we know how the scores correlate with actual fatigue?
     
  2. Amw66

    Amw66 Senior Member (Voting Rights)

    Messages:
    6,769
    This
     
  3. Trish

    Trish Moderator Staff Member

    Messages:
    55,414
    Location:
    UK
    The Likert scores could have been between 17 and 33 at the start (bimodal scores 6 to 11), and the mean was around 28 which is at the upper end of the range. That suggests to me a possible skewed distribution. Do we have a graph of the actual scores at the start of the trial to see whether it looks skewed?

    But I think that's beside the point. The normal distribution is a mathematical model of distribution of data measured on a linear scale that has random variation around a mean.

    CFQ is not that sort of data. It's an idiotic collection of vague statements that may or may not relate to fatigue, and patients' imperfect interpretations of them, and has a strong ceiling effect, and takes no account of the relative importance of each statement in patients' level of disability. It's counting descriptors, not measuring their severity. Giving numerical scores of equal weight to such responses is not science, and certainly doesn't produce meaningful linear data. The statisticians involved in analysing data based on CFQ should know applying analyses based on normal distributions was meaningless.

    As for a change of 2 points on this scale being clinically meaningful. Words fail me.
     
    Last edited: Oct 11, 2019
    Marit @memhj, MEMarge, Hutan and 6 others like this.
  4. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    Sorry if what I wrote was a bit long/detailed and focused on minor points. I'm just going through the different arguments and issues with this review one by one; to see what makes sense. The order doesn't indicate importance. Have to take breaks in between, but I'm planning to go through them all.

    6) The selection of studies

    Some have argued that the Cochrane review should have included the study by Nunez et al. (2011). In that trial, however, the intervention consisted of a “multidisciplinary treatment combining CBT, GET, and pharmacological treatment” in group form. So GET only formed one part of the treatment and if patients improved/deteriorated we wouldn’t know which part was to praise or blame. In the control group, patients received ‘exercise counselling’ by a physiotherapist the goal of which was “to provide activities that restored the patient's ability to do sustained physical exercise as far as possible.” So it wasn’t really exercise therapy that was being tested.

    The argument to include the ‘Belgian report’ is also not a strong one I think. This wasn’t a trial or study but an internal service evaluation of a multidisciplinary treatment that included both GET and CBT (in group form). In 2002 the Belgian Government created several ME/CFS centres around the country where this intervention (in group form) was provided. This was paid for by the government insurance agency, RIZIV/INAMI but part of the agreement was that the centres would record detailed information so that the results could be evaluated. The results were published in a 2006 report that is only available in French or Dutch. People usually link to a 2008 report by the Federal Knowledge Centre because it’s written in English and includes a short summary of the finding of the 2006 report, but it doesn’t provide the data. So I don’t think the Belgian report could be included in the Cochrane review (it could be mentioned though, as Vink & Vink-Niese did).

    Then finally, there’s an argument that the trial of Wallman et al. 2004 is not really GET but pacing. I think Ellen Goudsmit supports this view. I myself am not convinced. I would describe it as a symptom contingent instead of time/quota-contingent form of graded exercise therapy. Patients can reduce their activity if they feel worse but they are still instructed to increase their physical activity level with the expectations that this will improve their health. I think that’s a key aspect of exercise therapy and a clear difference with what pacing means to most ME/CFS patients. So I think it’s not abnormal to include this trial in the Cochrane review.
     
  5. BruceInOz

    BruceInOz Senior Member (Voting Rights)

    Messages:
    414
    Location:
    Tasmania
    I just created the histograms below from the baseline (all groups) PACE data.
    PaceCFQHistogram.jpg
    The CFQ data is definitely skewed but interestingly the SF-36 PF is less so
    PaceSF36PFHistogram.jpg
    I guess the Bowling population data is more skewed because healthy people have a strong ceiling effect but sick people less so.
     
    MEMarge, Hutan, Annamaria and 7 others like this.
  6. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    Just look at the data! (I've done some plots, but I don't have them to hand right now - I'll post them tomorrow.) - eta: thanks @BruceInOz !

    Errrrr.....???? When doing stats on data, you have to make certain assumptions based on its distribution so that the models work. For things like testing comparison of means, it's the distribution of the residuals that matters, not necessarily the data itself. [*eta* I need to correct an error here - see later linked post] But this isn't the issue here.

    It's *way* worse than that.

    The issue for clinically important (or useful, whatever) difference is that fundamentally the measurement scale can't change between baseline and the endpoint. But with CFQ, it very definitely does change, because the way it is interpreted by the participant changes (it *has* to if you hit the ceiling and get worse!). We know that how the participant scores themself at baseline (in order to get onto the trial) will be different from how they score themself during the trial without their underlying fatigue changing, because the baseline comparison point changes.

    And even without that very obvious change, the intervention itself is designed to change the participant's perception of fatigue without necessarily changing their underlying fatiguiness. There is no way you can establish any sort of clinically important difference (the smallest change in a treatment outcome that an individual patient would identify as important and which would indicate a change in the patient's management) either between baseline and endpoint, or between groups, when those things are going on.

    The additional problem is that when you turn a qualitative measure into a pseudo-quantitative one, you make mahoussive assumptions about the behaviour of that data, just because you have assigned numbers to it. For a start, you assume it is uni-dimensional (it isn't - CFQ asks 11 questions, some of which are correlated, some of which aren't - it simply won't behave in a linear, scalable way like say, distance, or time, or weight). You assume that it is relatable between individuals - that what one individual scores will equate to what another scores (it's very clear that's not the case because of the ambiguity of the CFQ). You assume it is relatable and comparable within an individual over time, and we've already seen that that's not the case.

    And we haven't even got onto what it actually measures, and the issues with including improvement and deterioration on the same scale, while simultaneously expecting to be able to deduce that from a difference in 2 scores that may mean entirely different things.

    Aaargh!
     
    Last edited: Oct 16, 2019
    MEMarge, Hutan, ladycatlover and 16 others like this.
  7. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    Just to slightly complicate this point, I have heard that Wallman considers the treatment tested here to be closer to pacing that GET. It could be that is just said to people critical of GET? Or it could be that it is misleading to lump it in with GET trials. One problem with 'exercise therapy' is that it can mean such a wide range of things that it makes it very difficult to know exactly what is being tested, or what patients are being asked to consent to.
     
  8. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
    The Wallman intervention was counted by her herself as pacing in this paper. But it is very different to the interventions designed by Ellen Goudsmit or Leonard Jason
    https://www.ncbi.nlm.nih.gov/m/pubmed/22181560/

     
  9. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    I know, but Jo Nijs was also an author of that paper and has since proposed 'Activity Pacing Self-Management' which also includes gradual increases of physical activity after a long stabilization phase.

    I think that Nijs and Wallman interpret the term pacing somewhat differently from how ME/CFS patients use it. I think their view might be more in line by how the term pacing is used in the chronic pain literature where it often includes a gradual increase in physical activity (think of the recent paper by Deborah Antcliff).

    There's a longer description of Wallman's therapy in this paper:
    https://www.researchgate.net/public...for_individuals_whit_chronic_fatigue_syndrome

    I agree it's very cautious and quite different from other forms of GET. But it still instructs patients to exercise more and more with the expectation that it will improve their health - which is the essence of graded exercise therapy for me. I think most patients don't see pacing as a therapy that involves trying to increase their physical activity level if able. I think patients see it more as a management strategy to minimize PEM or manage their energy budget.

    EDIT: Investing energy into maximizing physical activity could mean that patients are less able to socialize, read, work or do other meaningful activities. So if a health professional tells patients that they should do a certain amount of physical activity per day and that they should try to increase that as they are able, there's a certain assumption behind that - the idea that maximizing physical activity will be better for the patient than whatever he/she was doing before that. I think that assumption is better described as exercise therapy, than as pacing.
     
    Last edited: Oct 12, 2019
  10. Barry

    Barry Senior Member (Voting Rights)

    Messages:
    8,420
    Yes!
     
    alktipping and Annamaria like this.
  11. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    Just noticed that the trial by Fulcher & White, 1997 has two fatigue outcomes: the 44-point Chalder Fatigue Scale and a visual analogue scale. The Cochrane review only uses the first (they probably thought it was a good thing that they could use the same scale as other GET-trials).

    It might be interesting to compare the results of the two fatigue outcomes. My quick calculations indicate that the SMD for the Chalder Fatigue Scale (0.84) was almost twice as large as the SMD for the visual analogue scale (0.42).

    upload_2019-10-12_11-14-44.png
     
    Hutan, Simon M, Annamaria and 5 others like this.
  12. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    OK. Here are the plots. SMDs for all trials in the review will be based on data like these (CFQ at 52 weeks from PACE shown here). As you can see, the mean (which is what is being compared between the groups) is a really rubbish summary measure to be basing any comparison on.

    CFQscores_52wks.png
    At least SMD doesn't do any sort of comparison against baseline (as far as I'm aware), but even so. A few points to remember: A score above 18 was needed to included in the trial; any score of 12 or above indicates an overall worsening of fatigue (whether this comparison was made against baseline or "when you were last well"). Both GET and CBT arms were encouraged to ignore or reframe their symptoms (including perception of fatigue - "feeling tired after exercise is normal" - making "no more than usual" a more likely option for some questions).

    It would have been so easy for the researchers to use a slightly modified version of the CFQ specifically for the trial that clearly indicated that pts should compare themselves with the start of the trial. It would have then given them better information, particularly if the scale had been a balanced likert that included "much improved" as one of the options. But hey, it is what it is.
     
    Last edited: Oct 12, 2019
  13. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    7) Objective outcomes: Where is the protocol?
    Larun et al. noted that the 8 randomized trials included in their review have a high performance bias and detection bias due to a lack of blinding. In such cases, objectives outcomes are considered more reliable. The largest study to date on bias in randomized trials (the BRANDO project, Savovic et al. 2012) concluded:

    “Our results suggest that, as far as possible, clinical and policy decisions should not be based on trials in which blinding is not feasible and outcome measures are subjectively assessed. Therefore, trials in which blinding is not feasible should focus as far as possible on objectively measured outcomes, and should aim to blind outcome assessors.”
    Larun et al. seem to have done the exact opposite. They have presented the subjective outcomes and left out the objective outcomes with the sole exception of service use as reported in the PACE trial.

    When Tom Kindlon and Robert Courtney pointed out the omission of objective outcomes the authors responded that “The protocol for this review did not include objective measurements” hence they were not included in the review. The only protocol I could find is one A4 piece of paper written in 2001 where Edmonds et al. say they are going to “review all randomised controlled trials of exercise therapy for adults with chronic fatigue syndrome (CFS).” That’s pretty much all it says (link here: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003200/full)

    EDIT: The version of the protocol I found was incomplete (I've posted it at the bottom of the page). The full protocol has been posted by Dolphin further in this thread.

    In the History overview of the Cochrane review, there is a note dated 25 May 2004, which says: “The protocol for this review has undergone post hoc alteration based on feedback from referees. The following sections have been altered: Types of interventions; Search strategy; Methods of the review.” I haven’t been able to find this updated protocol or the post-hoc changes made to it.

    The very first Cochrane review on GET (Edmonds et al. 2004) mentioned under types of outcome measures: “Other possible measures include timed walking tests and tests of strength or of aerobic capacity.” But then they only report on functional work capacity as reported in the trial by Wearden et al. 1998 (which they confusingly call Apply et al. 1995). Other trials had objective outcomes as well, but these were not reported in the review. Perhaps Larun et al. thought that because Edmonds didn’t report objective outcomes they don’t have to do it as well? In any case, I couldn’t find a protocol that specifies subjective but not objective outcomes.

    Even if there was one, the authors’ argument can still be considered problematic. A protocol is seen as a tool against bias. It’s supposed to prevent researchers from changing their analysis as they go through the data so that they can present the results in a way that favours their preferred conclusion. That’s the reason why researchers have to state in advance which hypothesis they want to test or which data they want to analyze. Otherwise, you get cherry-picking and an unbalanced review. The problem we have with the GET review, is that the authors have cherry-picked the results and wrote an unbalance review by leaving out the objective outcomes. So referring to a protocol to defend this unbalance doesn’t make any sense, because the whole point of a protocol is to prevent such biases.

    Finally, if the protocol really was a barrier to report objective outcomes, I suspect the authors could have changed this when they performed their 2015 update of the review. After all, the note on the history of the review dated 25 May 2004 says that the protocol has already been updated once post-hoc. Relying on a protocol written in 2001, when most of the studies included in the review were not reported yet, seems rather odd. Their new literature search in 2015 seemed like an ideal time to update the protocol as well.

    Even if there was a protocol that prevented reporting on objective outcomes, there are still some things that I don't get. For examle: why does the 2004 review by Edmonds report on functional capacity (presented as a measure of quality of life) but the review Larun et al. does not? Or why do Larun et al. report on service use, which is also an objective outcome? Was this specified in a protocol somewhere?

    So in conclusion: (1) I could not find a protocol that specified the subjective but not the the objective outcomes used in the same trials (2) If there was one I don’t see why the authors could not have updated this either before their first review or following the criticism made by Kindlon and Courtney (3) Even if there was such a protocol that cannot easily be changed, that would still be an absurd situation that needs to corrected as soon as possible. Protocols are meant to prevent bias, not maintain or justify it. Not reporting on objective outcomes seems like a major flaw in this review.
     
    Last edited: Oct 13, 2019
    Hutan, Simon M, alktipping and 10 others like this.
  14. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    7) Objective outcomes: an overview of what wasn’t reported
    I thought it might be useful to get an overview of the objective outcomes that were available in the trials that make up the Cochrane review. Mark Vink already gave a good summary in his 2018 analysis of the Cochrane review, but I would like to present the results per outcomes rather than per study. Unfortunately, for most outcomes the measures used are somewhat different, making it difficult to perform a meta-analysis. So this will be mostly a narrative review of the objective outcomes. I’ve tried to group them into different categories to make the evidence more comprehensible.

    Quite a few of the exercise studies have reported significant improvements on objective outcomes. But I think that after close scrutiny and when similar outcomes from other trial are taken together, none of these hold up. If I have missed an important objective outcome reported in one of the GET-trials of the Cochrane review, please let me know so I can update this overview.

    Oxygen consumption during an exercise test
    Let’s start with exercise testing. Several studies have performed an exercise test and measured maximal oxygen consumption. Unfortunately, the procedures were quite different in each study.
    • Fulcher & White, 1997 reported statistically significant differences for peak oxygen consumption and maximum ventilation for the exercise group compared to controls. But the p-values (0.03 and 0.04 respectively) are quite close to 0.05 and the authors had analyzed 9 different measurements taken during the exercise test. So after Bonferroni correction for multiple comparisons, the results would probably no longer be considered statistically significant.
    • Wearden et al. 1998 report on ‘functional work capacity’ which was calculated as the amount of oxygen consumed in the final minute of exercise per kilogram of body weight. The paper says that “there was a significant effect of exercise on functional work capacity”. But their trial had four arms as the authors wanted to test not only exercise therapy but also fluoxetine (an antidepressant). So for the main comparison in the Cochrane review (exercise versus passive control), we would need the two groups without fluoxetine. The first Cochrane review (Edmonds et al. 2004), reported on this comparison as follows: “Functional work capacity improved in the exercise therapy group compared to the control group at 12 weeks (WMD -4.40, CIs - .9.10 to 0.30) and at 26 weeks (WMD -2.89, CIs -7.71 to 1.93) in the Appleby 1995 study, but at neither time was the difference statistically significant.” (Appleby 1995 refers to an early report on the trial by Wearden et al. 1998.) It should also be noted that this trial had high dropouts in the exercise group (33%), much more than in the control group (15%). This could have impacted the results of the exercise test in favour of the intervention.
    • The information provided by Wallman is also complicated. The results section reads: “Oxygen uptake values were 9.6% higher after the intervention in the exercise group compared with an 8.9% decline in the relaxation/flexibility group, but the difference in final values for the groups was not significant.” The data provided in table 3, however, indicate a statistically significant difference. In this trial, patients performed several exercise tests: perhaps the authors had the measurement taken at a later timepoint without reporting the data in their paper? It’s a rather confusing report.
    • Finally, there’s the trial by Moss-Morris et al. Their data showed no significant difference in VOpeak between the exercise group and the control group. The authors noted however that these values should be interpreted cautiously because exercise data was only completed by half of the patient sample. But because dropouts were similar in both groups and the exercise group’s results decreased, it seems unlikely that this would have impacted the conclusion of no significant improvement.
    Muscle force
    Two studies have reported on muscle force. Wallman et al. 2004 reported a statistically significant difference for the “power output adjusted for body weight (W·kg-1) that coincided with a subject’s target heart rate” during a submaximal exercise test. Rather disappointingly they provide no data, just a graph of the results. Fulcher & White, (1997) measured “maximal quadriceps voluntary contraction” and found no statistically significant difference for the exercise group compared to the control group.

    Activity level
    Several studies have used objective outcomes related to physical activity levels.
    • Wallman et al. (2004) for example report that “activity levels increased in the graded exercise group, although the final levels did not differ between the groups.” Unfortunately, it’s not clear what device they used and they do not provide the data, just a graph of something that is measured in kJ/week. I suppose it’s some sort of wearable that gives an indication of physical activity.
    • Wearden et al. (2010), the FINE Trial used a step test. It measured the time to take 20 steps or the number of steps taken if this was not achieved. The data was never reported in the literature. The authors simply noted in their mediation analysis (published 3 years after the publication of the main outcomes) that “there were no between group differences in any of the step test measures at 20 or 70 week." Thanks to a freedom of information request by Kathryn Dickenson, the data of this step test became publicly available. I hope somebody with statistical skills could present the data in an orderly way, that would be very much appreciated. Relevant thread here: https://www.s4me.info/threads/fine-trial-step-test-data-released-in-2017.11171/
    • The PACE-trial had a step test that measured fitness. The data was also never reported. There was just a graph in the mediation analysis (published 4 years after the main outcomes were published) that showed that there was no significant difference between exercise therapy and specialist medical care (or any of the other groups). People have requested the data of this fitness test using freedom of information requests, but this was denied by Queen Mary University of London for being "vexatious".
    • The PACE trial also had a 6-minute walking test, which was highlighted in the main paper because the difference between GET and specialist medical care was statistically significant. The trial by Jason et al. also had a 6-minute walking test, where the difference was not statistically significant. It would be interesting to know how these two results add up if the data is pooled together.
    Blood lactate
    Two trials measured blood lactate during an exercise test. Wallman et al. (2004) reported a statistically significant difference in blood lactate production. While it increased a little in the exercise group, it decreased a little in the control group. I don’t know if that speaks for or against exercise therapy improving fitness, to be honest. Fulcher & White 1997 reported on submaximal blood lactate and post-test blood lactate. In both cases, there was no significant difference between intervention and control group.

    Heart rate and blood pressure
    Then there are quite a few studies that measured heart rate or blood pressure before, during or after an exercise test. Unfortunately, these are all a bit different so it’s hard to compare. Wallman et al. reported statistically significant differences for the resting heart rate and the resting systolic (but not diastolic) blood pressure. I’m a bit wary though about the objective results reported in this study. There was no protocol while the authors took multiple exercise test and do not present their data in a comprehensive way. So there’s a danger of cherry-picking the most ‘significant’ outcomes. I don’t like having to trust authors on this.

    Anyway, all the other reports on heart rate were non-significant differences. In the trial by Fulcher & White, the measurements were maximum heart rate and recovery of the heart rate three minutes after the exercise test. Moss-Morris reported on the maximum heart rate achieved while the FINE Trial (Wearden et al. 2010) had data on the maximum heart rate reached on a step-test.

    Tolerance of exercise
    There were also some measures that were related to tolerance of exercise.
    • Wallman et al. 2004, for example, report a statistically significant difference for “achievement of target heart rate” during the exercise test. In the trial by Fulcher & White however, the percentage of predicted maximum heart rate did not show a significant difference between intervention and control group. A closer look at the Wallman data shows that there was barely an improvement for 'achievement of target heart rate' in the GET-group; it was mostly the control group that deteriorated.
    • Fulcher & White also report on the exercise test duration, but this did not show a significant difference. Wallman et al. report that ratings of perceived exertion on the Borg Scale were lower after the exercise intervention (p-value of 0.013) but there was no significant difference on this measurement in the PACE trial. Fulcher & White report a p-value of 0.04 for the difference in perceived exertion during the post-treatment exercise test, a difference that would no longer be statistically significant if a Bonferonni correction for multiple comparisons was performed.
    • Finally, Wallman had also data on the respiratory exchange ratio (RER), which is often used in exercise test to see if patients provided full cooperation. There was a significantly larger increase in RER in the exercise group than in the control group but the p-value (0.047) was suspiciously close to 0.05. As I have said earlier, I have my doubts about the objective outcomes in the Wallman et al. study because I think it is at high risk of reporting bias (I think Larun et al. should have rated this study as high risk instead of unclear risk of bias for selective reporting of outcomes).
    Cognitive testing
    The trial by Wallman et al. (2004) also had two versions of a cognitive test, one of which resulted in a statistically significant difference in favour of the exercise group. The report reads: “on the modified Stroop Colour Word test, there were no significant differences between the groups before the intervention, but a significant difference in favour of the graded exercise group after the intervention on the more difficult level of this test (P=0.029).”

    Employment and disability payments
    For employment there’s data from two trials. Jason et al. 2007 give the percentage of patients that were employed. There was no significant difference between the two groups (although it’s notable that the percentage in the exercise group decreased from 41% to 33%, despite a high dropout rate). The PACE trial gave data on lost employment, more precisely days lost from work. In both groups there was a notable increase in lost employment, but there was no significant difference between the two. The PACE trial also had data on income benefits, illness/disability benefits and payments from income protection. There was no significant difference between the GET and SMC control group for all these outcomes (but it’s once again notable that all these benefits increased in the GET-group).

    Service use
    Service use was already reported in the Cochrane review. It wrote:

    “During the 12-month post-randomisation period, participants in the exercise group had a lower mean number of specialist medical care contacts than those allocated to treatment as usual (MD −1.40, 95% CI −1.87 to −0.93; Analysis 1.16). A variety of other health care resource use metrics did not differ significantly between the two groups (Analysis 1.16; Analysis 1.17), including use of primary care resources (e.g. GP or practice nurse), other doctor contacts (e.g. neurologist, psychiatrist or other specialists), accident and emergency contacts, medication (e.g. hypnotics, anxiolytics, antidepressants or analgesics), contacts with other healthcare professionals (e.g. dentist, optician, pharmacist, psychologist, physiotherapist, community mental health nurse or occupational therapist), inpatient contacts, and other contacts with healthcare/social services (e.g. social worker, support worker, nutritionist, magnetic resonance imaging (MRI), computed tomography (CT), electroencephalography (EEG).”​

    The PACE trial also reported that healthcare costs were higher in the GET than in the control group but that the opposite was true for informal care. The total health costs for GET were estimated at 2224 pounds and 1424 pounds for specialist medical care (SMC) alone. The total societal costs were estimated at 20,935 pounds and 22,088 for SMC alone. This is however reported in the McCrone et al. (2012) paper for which PLOS One has issued an expression of concern.


    Conclusion: There doesn’t seem to be an objective outcome where GET causes significant improvements compared to the control group if all the exercise trials are taken together. And there are some outcomes such as employment or activity levels, where the evidence seems to agree that GET does not cause improvements. It's a shame how poorly the trial authors have reported their objective outcomes, given that these are most reliable when blinding is not possible.


    EDIT 1: I have added the data from Fulcher & White, 1997 for perceived exertion.
     
    Last edited: Oct 13, 2019
    Hutan, Simon M, alktipping and 11 others like this.
  15. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    Caution: This analysis was done by someone with no professional statistical training and is quite possibly wrong.

    Pooling the 6-minute walking test data - a preliminary attempt

    I did an attempt to pool the data from the 6-minute walking test in the PACE trial and Jason et al. 2007 in a meta-analysis. I used Review Manager (RevMan) the tool that Cochrane authors use and I just filled in the data (mean, SD and n). I used a random effects analysis model because of suspected heterogeneity and because Larun et al. also used this in their analyses.

    First I used SMC because I don’t know if what Jason et al. and White et al. report are exactly the same. The result was an SMC of only 0.13 which was not statistically significant. This is because the Jason et al. study had exactly the opposite effect: the exercise group did worse than the relaxation group. I also suspect that in a random effects analysis model the small studies are given a relatively large weight, something I noticed in the analyses by Larun et al. (RevManager calculates these automatically).

    upload_2019-10-12_23-40-59.png

    Because I suspect that the high numbers reported by Jason are the distance walked in feet rather than in meters, I’ve tried to recalculate the data (1 foot is 0.3048 meters). That gives numbers that are somewhat higher but comparable to those of the PACE trial. That allows me to do an analysis of mean difference and express the results in meters.

    upload_2019-10-13_10-53-24.png

    I’m not confident or experienced in doing this kind of statistical analysis so I hope someone with more knowledge could have a look. I personally find it weird that you don’t have to give in baseline data for the calculation. I guess that means that these meta-analyses can only provide a very rough estimation? For example; the difference from baseline in the PACE trial for GET compared to SMC was 45 meters, not 31.

    @Lucibee Could I interest you in having a look?

    EDIT: there was a minor mistake in the second graph. The study by Jason et al. doesn't give exact drop out rates, only that there was no difference among the groups where approximately 25% dropped out. So I've used this as an approximation to calculate the sample size.
     
    Last edited: Oct 13, 2019
    Annamaria, Amw66, Sean and 4 others like this.
  16. Esther12

    Esther12 Senior Member (Voting Rights)

    Messages:
    4,393
    Thanks again for all your work on this @Michiel Tack

    One reading of that summary is that if the authors of a review were looking to report positive results for an objective outcome they could probably find a way to do so if they were careful in the way they grouped together objective outcomes.
     
    Last edited: Oct 13, 2019
    alktipping, Annamaria, Sean and 2 others like this.
  17. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
    Correction: A bimodal score of 6 was needed to be included in the trial. I can’t remember the exact data in this trial but generally at the start that would be 17+. Aside: 18 and less was considered fatigue in the normal range and a revised recovery criterion.
     
    Last edited: Oct 12, 2019
  18. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
    Well done for finding this. However, people should probably see the page themselves to see what is mentioned. It does mention outcome measures it is going to look at. You do go on to discuss this further. But I had read what you wrote below as discussing the outcome document or post-hoc additions rather than the protocol.

     
  19. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,001
    Location:
    Belgium
    Now I'm confused. When I search for the protocol I get a very, very short text that does not mention the outcomes measures it is going to look at.

    Could you quote from whatever you are seeing, for example where outcomes measures are specified (perhaps Shub gave me another version or something, I've got a feeling I'm missing something).

    I've added the protocol I'm seeing in attachment.
     

    Attached Files:

    Last edited: Oct 13, 2019
    alktipping, MSEsperanza and Andy like this.
  20. Dolphin

    Dolphin Senior Member (Voting Rights)

    Messages:
    5,792
     
    Annamaria, Barry, Trish and 3 others like this.

Share This Page