I have been experimenting putting different patterns of answers in the FSS questionnaire according to my own symptoms at different stages of my illness, and the sort of pattern I would expect in different circumstances, and calculating the p values according to the two different versions of the so called prediction formulae given by Dr C.
Here are some of my observations:
The first version, which included duration of illness was, as Dr C found out when he tried it on patients with longer duration of illness than the range covered by his first sample, completely useless.
For example, for the same set of FSS scores, a severely affected patient increased their chances of improvement from under 10 % to over 99% over the durations from 5 to 30 years. So of course that formula was a dud.
Using the second formula, and comparing it with the first formula set at 10 years duration, there was still a significant difference in p values between the two formulae. For example, filling it in for myself when moderately affected after 10 years, the first formula gave me a 79% chance of improvement, the second formula gave me a 10% chance of improvement.
There was a general trend for both formulae to give a better chance of improvement for the more mildly affected across the board, so someone who filled in the maximum 7 on every question was given a 12% or 6% chance of recovery, whereas 4's across all questions gave 99% and 82%
Finally I had a go at filling in the questionnaire on the basis that fatigue was a purely physical problem and compared it with a more depression type fatigue with worse scores on motivation, social and role responsibilities.
The difference here was stark.
Physically based fatigue gave an almost 100% chance of improvement.
Depression based fatigue gave a zero chance of improvement.
So in summary - the first formula was useless because it was based on a sample with too narrow a duration range, so failed when patients with higher duration were included.
Both formulae may be reflecting a problem with diagnosis, with patients with depression related fatigue not responding to the treatment, and patients with physically based fatigue, and more likely to actually have ME/CFS responding.
In other words, the magic formula is simply an artefact of an ill diagnosed sample with containing two distinct groups of patients, and the formula is simply separating the parts of the questionnaire that best separate the patients into those two different diagnostic categories.
So my theory is, the drug is 'working' on patients with ME/CFS and failing on patients with depression based fatigue.
We then come to interpretation.
I can see two interpretations:
1. The drug works for ME/CFS
2. The drug doesn't work for ME/CFS, but the placebo effect is operating on the ME/CFS patients differently from the depressed patients. This seems just as plausible, given that it is an open label trial, so the patients know they are getting a drug that seems promising, and those with ME/CFS will be naturally hopeful. They may also, in that hopeful state, and knowing they are taking part in a trial, be extra careful with pacing during the trial, in order not to confound the effects of the drug, crashing less often as a result of pacing, and with a combination of hope and more stable symptoms, fill in the end of trial questionnaire more positively. Compare that with the depressed group, who, being depressed, are not hopeful, and feel just as depressed at the end of the trial, so show no subjective improvement.
You may wonder why I have spent time doing all this. The answer is simple. I was bored and needed something to occupy my mind. And I was curious as to how such an outlandish seeming claim could be made about an apparently nonsensical formula.
I still maintain, as
@Adrian has explained, that these are retrospective formulae based on small samples, and only applying to those samples. The value of the second formula as a predictive formula will only be known and should only be claimed to be valid if it has been tested on a completely new cohort of patients not including any of the original sample. And even if it does turn out to be predictive, it may simply be an artefact of poor initial diagnosis, and a tool (FSS) that picks up the effects of that poor diagnosis.
Well that was fun.
Edit to add: It is also perfectly possible that I have filled in the formulae wrongly on my spreadsheet, so don't take all this too seriously.