text3

3. It has not been proven that we should continue to use GET and CBT with all ME/CFS patients.

This analogy uses the data from table 3 of the PACE trial showing the improvement in performance on the Chalder Fatigue scale of the group that received Graded Exercise Therapy as well as Specialist Medical Care. That data has been carefully and fairly rescaled to fit.

Amy Pashunt looked at her exam result in horror. She had scored 15% for Mathematical Empiricism, and she knew that in order to become a researcher, she had to get a grade A. Her tutor sympathised with her, and explained that she had two options. She could just follow a year of Standard Maths Classes, or, in addition, she could take Graham's Expensive Tuition.

With the SMC, evidence showed that, after a year, students of her ability added 13 marks on average to their performance, but with GET that would add a further 10 marks, bringing the whole performance up to a borderline pass.

A whole year, and all she could expect would be a borderline pass!

But this is an average effect. Suppose that you find that out of each class of 30 following both SMC and GET, 12 will improve (if at all) by less than 6 marks and still fail. Of the remaining 18, only 7 will actually improve enough to reach a grade C, and the remaining 13 will possibly scrape a borderline pass. Despite having extensive information for hundreds of previous students over many years, the school does not offer any analysis of how to recognise whether you could benefit from these courses, but instead simply recommends that you follow the retake schedule for a year, taking both SMC and GET.

What would your reaction be?

Would it change if you then found that in your area, there was no realistic prospect of proper SMC, and only of a more limited form of GET? (and of course, you need to remember that a grade A is equivalent to a return to full health - this still isn't an option for anybody).

It is a matter of priority that we find the factors that determine if GET and CBT are appropriate and effective for particular patients, and the valuable database provided by the PACE trial should enable us to do that. It is inexcusable to have all this information, but continue to suggest that most patients could benefit from the therapies. That is not true.

Much has also been made of the fact that few of the patients' scores showed any real deterioration on these therapies. Would it affect your judgement if you discovered that both the structure of the study and of the scale itself may have prevented some patients from registering deterioration, even if their scores were not anywhere near the maximum fatigue score of 33? There is a much more thorough study of the measurement (or not) of harm in this and other studies by Tom Kindlon.

The PACE trial had 4 groups, each with around 160 patients: the members of one group had sessions of specialist medical care about 5 times over the year: the other three groups saw had specialist medical care 3 or 4 times, but also had an additional 12 to 15 sessions of Graded Exercise Therapy (G.E.T.), or of Cognitive Behaviour Therapy (C.B.T.), or of Adaptive Pacing Therapy (A.P.T.). Their progress was tracked in a variety of ways and was measured at the start, after 12 weeks (half-way through), after 24 weeks (at the end of the courses) and after 52 weeks.

First let us look at the way that G.E.T. is claimed to moderately improve the fatigue levels of patients. We will focus on their assessment of fatigue using the continuous/Likert Chalder scale, which runs from 0 points to 33 points, where 33 points represents utter exhaustion (in the true sense of the term - not just feeling very tired). Patients were assessed at the start of the trial, after 12, after 24 and after 52 weeks. A drop in score therefore represents a drop in fatigue - an improvement in health.

At the start of the trial, the average fatigue score of the patients in the group that received Graded Exercise Therapy and Specialist Medical Care was 28.2 out of 33. At 12 weeks it dropped to 22.8, at 24 weeks to 21.7 and at the end of the year to 20.6 points. This is a total decrease in fatigue as measured by the questionnaire of 7.6 points on a 33 point scale.

But we need to compare this with the drop in fatigue levels of the patients in the group who received only Specialist Medical Care. Over the year, their average fatigue level dropped by 4.5 points, and again, most of this improvement happened in the first 12 week interval. In effect, adding Graded Exercise Therapy to Specialist Medical Care meant adding an improvement in fatigue of 3.1 points to the 4.5 point improvement seen in the group that only had S.M.C.

In comparison with scores of healthy people, these improvements are very small indeed. They are also levelling off in a classic "exponential decay" pattern, which suggests that continuing with these therapies would not continue to produce further improvements.

On its own, S.M.C. improved scores by more than the extra caused by adding on about a dozen sessions of G.E.T. (4.5 versus 3.1). Why did the authors of the PACE trial choose to emphasise the benefits of G.E.T. and not the importance of patients having several sessions of Specialist Medical Care?

The second issue to consider is the consequences of the average improvement being so low, and the baseline score of each group being so close to the end value of the scale (for the G.E.T.+S.M.C. group it was 28.2 out of 33). In this situation, it is very similar to the situation of describing the average salary of adults in the U.K. mentioned in the previous section. If we have just one or two patients in a group of 15 showing a large improvement (either one patient reaching the "healthy" zone, or two patients becoming much better), the remaining patients must show minimal improvements to keep the average score low. This is likely to be what actually happened, as the measurement of spread, the standard deviation, almost doubled by the end of the trial, which is just what happens when we have a few people with a very large wage being "balanced" by a large number on a very low wage.

Similar results were found in the C.B.T. group, and across a range of assessments. More details of that are listed below.

In most cases, the scores of the patients who only had Specialist Medical Care showed more than half of the overall improvement in the scores of the G.E.T.+S.M.C. and of the C.B.T.+S.M.C. groups, and to use that combined score to reach targets gives a false impression of the effectiveness of G.E.T. and C.B.T. It is like setting the minimum height standard for entry into the police force at 5 feet 10 inches, then allowing candidates to stand on a box over three feet high. Interestingly the patient satisfaction with S.M.C. (50%) was below the level of satisfaction with G.E.T. and C.B.T. (82% or more), which suggests that patients overestimated the relative value of G.E.T. and C.B.T. This is an important consideration when you realise that the two main assessments, fatigue and physical function, were made through questionnaires.

All the reported measures over the course of a year:

For Fatigue, the scores for the G.E.T.+S.M.C. group improved by 7.6 points on a 33 point scale: the scores for the C.B.T.+S.M.C. group improved by 7.4 points: the S.M.C. group improved by 4.5 points.

For Physical Function, the scores for the G.E.T.+S.M.C. group improved by 21 points on a 100 point scale: the scores for the C.B.T.+S.M.C. group improved by 19.2 points: the S.M.C. group improved by 11.6 points.

On the Work and Social Adjustment Scale, the scores for the G.E.T.+S.M.C. group improved by 6.8 points on a 40 point scale: the scores for the C.B.T.+S.M.C. group improved by 6.4 points: the S.M.C. group improved by 3 points.

For the six-minute walk, the distances for the G.E.T.+S.M.C. group improved by 67 metres to 379 metres (where 600 metres is a minimum average for healthy adults): the scores for the C.B.T.+S.M.C. group improved by 21 metres, which was less than the S.M.C. group which improved by 22 metres. Approximately a quarter of each group did not take part in these assessments.

For Sleep,the scores for the G.E.T.+S.M.C. group improved by 2.7 points on a 20 point scale: the scores for the C.B.T.+S.M.C. group improved by 2.6 points: the S.M.C. group improved by 1.4 points.

In most of these cases above, the group that did not receive any of the therapies (the SMC group) improved by over half of the amount that was seen in the groups that did receive the therapies. In other words, the therapies can only claim to account for a little under half the improvement, which is why the stated conclusion was that "CBT and GET can safely be added to SMC to moderately improve outcomes for chronic fatigue syndrome". How much of the other half of the improvement could be attributed to factors other than the specialist medical care (which was at a greater level than most patients experience) will be explored later.

The PACE trial reported that very few patients had scores that worsened very much over the length of the trial, and this played an important part in concluding that it was safe to treat ME/CFS patients with GET and CBT. This disagrees with many reports from other surveys (such as the 2010 survey by AfME), so needs to be considered carefully. As mentioned earlier, there is a much more thorough study of the measurement of harm in this and other studies by Tom Kindlon.

On a superficial glance, one would think that only those who scored close to the boundary would be limited in movement: on the Chalder Fatigue Scale for example, perhaps only those who scored 32 or 33 would be unable to deteriorate. This would be wrong.

This is a summary of the 11 points in the Chalder Fatigue Scale: under the Likert scoring system, each aspect scores 0 if the aspect is better than when the patient was healthy, 1 if the same, 2 if worse and 3 if much worse.

Do you have problems with tiredness? 2
Do you need to rest more? 3
Do you feel sleepy or drowsy? 2
Do you have problems starting things? 3
Do you lack energy? 3
Do your muscles have less strength? 3
Do you feel weak? 3
Do you have difficulty concentrating? 3
Do you make slips of the tongue when speaking? 2
Do you find it more difficult to find the correct word? 1
How is your memory? 3

The overall score for this would be 28 out of 33, so it would appear that the score could easily show a deterioration.

But 7 of these aspects are already at their worst score, so any deterioration in these aspects would not show. The only way for the score to become worse is for the three scoring 2 to become 3s, and for the one aspect unaffected by the illness (finding correct words) to now become a problem.

The average patient in the PACE trial would have problems with all 11 aspects, and a score of 28 or 29, so the scope for registering deterioration is very, very limited (see 6-details for a fuller explanation).

The Chalder Fatigue scale resists measuring deterioration in the average patient in the trial.

It is also possible to speculate that for some of the patients who were in the group that received both G.E.T. and S.M.C., the improvement caused by, say, help with pain control and sleep was offset by harm done by G.E.T. As long as the improvement was larger than the harm, an overall improvement would be registered. The study doesn't lend itself to this sort of analysis, but if, for example, both the S.M.C.-only group and the G.E.T. group each had ten similar patients scoring 33 points on the Chalder scale, and in the S.M.C.-only group they all improved to 28, but in the G.E.T.+S.M.C. group 5 improved to 25 and 5 only improved to 31, it is possible that G.E.T. helped half the group by 3 points and harmed the other half by 3 points. This could have been swamped by a few patients showing a marked improvement. Obviously the real results would be much more difficult to analyse.

It is entirely possible that great care was taken with the patients to avoid worsening their condition, and that if similar care was taken across the country, then the various accounts and surveys indicating that some patients found that G.E.T. and C.B.T caused them great harm may have been more due to an inflexible and less-skilled application of these therapies. But there is another problem with these hidden ceilings - patients will go through good and bad patches, and these may be of short or long duration. If a significant number of patients' scores are close to a boundary, then because there is more scope to improve than deteriorate, random changes together with a natural tendency for some patients to improve will themselves produce a weak average improvement. (If a class score an average of only 1 out of 10 in a tables test, then pure chance would make it likely that the next test was better - it could hardly get worse!). In statistics, this is called "regression to the mean".

Of course, without access to the individual data, all of this is speculative. The authors of the PACE trial clearly expected much greater success. The fact that the results were very weak, and that there are areas of concern in the assessments which could in themselves cause a weak average improvement, means that the trial has not proved their case. It would also be unwise to assume that the assertion that these therapies are safe is a secure one.

It is appropriate here to quote Dr Alastair Miller talking about the PACE trial: "Although NICE have previously recommended graded exercise and C.B.T. as treatments for C.F.S./M.E., this was on the basis of somewhat limited evidence in the form of fairly small clinical trials. This trial represents the highest grade of clinical evidence - a large randomized clinical trial, carefully designed, rigorously conducted and scrupulously analysed and reported. It provides convincing evidence that G.E.T. and C.B.T. are safe and effective and should be widely available for our patients with C.F.S./M.E." (Science Media Centre) We find it disturbing that he accepts that the evidence base prior to PACE was limited, and yet it has formed the only regular basis of treatment on offer to patients. The evidence that PACE provides is clear - C.B.T. and G.E.T. have very little to offer a large proportion of patients: the Number Needed to Treat is around 7 or 8 (only one in 7 shows a clear improvement). If this is the strongest evidence to date for C.B.T. and G.E.T., then NICE must follow that evidence and cease to recommend these therapies for automatic use with patients diagnosed with M.E./C.F.S., in line with the approach from Norway.

This is not the same as saying that G.E.T. and C.B.T. have nothing to offer. Quite the contrary - there are a few patients, not necessarily having ME/CFS, who could benefit by a smaller or greater amount with the skilled application of these therapies, but these therapies should be appropriately targeted, especially bearing in mind the risk of causing harm. The PACE trial should contain enough information to determine the appropriate factors that distinguish these patients within the very broad Oxford Criteria.

GET was done on the basis of deconditioning and exercise intolerance theories of chronic fatigue syndrome.

CBT was done on the basis of the fear avoidance theory of chronic fatigue syndrome. (from the PACE trial)

On this evidence, given that either therapy was responsible for less than half the already small improvements in most measures, it seems inappropriate that these theories should continue to hold any relevance to ME/CFS in general. Inevitably, there will be some patients for any illness for whom deconditioning and fear avoidance are factors that hinder their progress, but it would be reasonable to suppose from this evidence that these theories are no more specific to ME/CFS than they are to heart disease, liver disease, or even a broken hip. It would be very unwise to assign all patients with a heart condition to Graded Exercise Therapy. It is equally unwise to do the same for patients with ME/CFS.

On the other part of the website there is also a page on the Chalder Fatigue Scale, and how the change from Bimodal scoring to Likert scoring muddied the waters, enabling someone to enter the trial but simultaneously score enough to be counted as being recovered to a normal level. It is heavily graphic, and I am not sure how I could translate it into text form. But if you would like me to try, please let me know.