Preprint Development and psychometric evaluation of The Index of Myalgic Encephalomyelitis Symptoms TIMES Part I…, 2026, Horton, Tyson, Fleming, Gladwell

Half of their limitations section is like asking someone what their weaknesses are and them replying ‘well I’m somewhat of a high achiever’.
Yes, @bobbler mentioned it, but I was still very surprised when I got to the Limitations section and read this:

Limitations: The strengths of this study lie in the large, representative sample, the robust co-production with people with ME/CFS and clinicians working in NHS specialist ME/CFS services. The number of people with severe/very severe ME/CFS recruited is also a strength.
After some elaboration about that, more praises are sung.
A further strength is the thoroughness with which the Rasch analysis was completed. It is unusual to see the complete cycle of scale development, analysis, empirical modification, and psychometric validation of the modified scale in a new cohort, and the conversion tables to
enable parametric analyses are rarely presented.

Finally they get to some 'possible' limitations.
Possible limitations lie in the representativeness of the sample. The literature regarding the demographics of ME./CFS are sparse, so it is not possible to compare our sample with an authoritative source of epidemiological data. However, our demographics broadly reflect other large ME/CFS studies with convenience samples31-34: predominantly middle-aged women with moderately severe illness and extensive lived experience. Thus, we are confident data are reasonably representative of the ME/ CFS community.
That seems to be saying 'we mostly talked to middle-aged women who are quite ill and have been ill for a long time, but that's okay, because that is what everyone else does too, and therefore we are confident that we have sampled a reasonable representation of the ME/CFS community'. Aside from that making no logical sense, it doesn't even seem to have occurred to them that the people who will be filling out the survey mostly won't be middle-aged women who have 'extensive lived experience' of ME/CFS. I would have thought that it was important to test the survey on people new to the illness and the language that is used to describe symptoms. For all their extensive and obscure validation, it doesn't seem that they did that.

The one limitation that they whole heartedly get behind is that it hasn't been validated in children. I guess that's because that is another nice bit of fundable work there.
 
Thanks @ME/CFS Science Blog for linking the questionnaire, it is handy.

On PEM, the one question in the Fatigue section is this:
Post exertional malaise (PEM). PEM describes a worsening of symptoms after seemingly trivial or undemanding activity of any description. It is often referred to as ‘a crash’. Onset may be delayed and it can be long lasting.
Given it is a worsening of symptoms, a change from baseline, and given the measure is of frequency rather than impact on your life, it is pretty much impossible to score a '3' (all of the time) for this. It's an (almost) fundamental impossibility to have a symptom of periodic worsening and to be able to answer to the question
'Over the last month how often have you experienced this symptom' - 'All of the time'. If you had PEM all of the time, then you had no periodic worsening, so therefore you didn't have PEM....

Most people won't be able to say that they had PEM most of the time, most people are reducing activity so as to not be in PEM most of the time, if not consciously then because having PEM results in them staying in bed for a while and then recovering. Most people with ME/CFS will probably answer 'some of the time' and score a '1' for the symptom that is core to ME/CFS and undoubtedly is one of the most debilitating aspects of the illness. The way it is worded, it is very clear that you have to have experienced the symptom 'all the time', not just suffered from the knowledge that PEM could hit 'all the time'.

Because PEM is included in fatigue (because it is a consequence of exertion?), it means it is assessed by that frequency measure rather than the severity measure (i.e. how troublesome has this symptom been?). If it was assessed by a severity measure, then most people with ME/CFS would rate it a 3, because knowing you get clobbered most times you exert is very troublesome, as is having to spend days in bed feeling very ill.

There is no weighting of the fatigue scores in the total score. So that '1' from PEM contributes exactly as much to the total measure of ME/CFS symptoms as a mild problem with falling asleep when you want to, or a mild problem with excessive farting.

Contrast that with the sleep section: 6 questions about sleep - six opportunities to rate each symptom with a 3 (very severe). Even if you only rated each one a '2', that is 12 points. The survey seems set up to reward inputs from the clinic around sleep. If they manage to convince someone that they should not nap in the afternoon, that counts as an improvement, although it may mean the person is unable to prepare their evening meal.

This really is rubbish. All that fancy talk about validation, and they can't even properly recognise the symptoms that really count.
 
Last edited:
The survey seems set up to reward inputs from the clinic around sleep. If they manage to convince someone that they should not nap in the afternoon, that counts as an improvement, although it may mean the person is unable to prepare their evening meal.
Yes, the BACME style clinics are obsessed with sleep management. Completely put of proportion or evidence. No amount of ‘sleep hygiene’ or ‘sleep management’ fixes the sleep problems caused by PEM or a crash (which they conflate but many of us distinguish).

The more I look at their way of trying to rate and score things the more it seems clear it sets up some very perverse incentives for both patient and clinic/clinician.
 
BIt perturbed about this too



I'd be intrigued by the context and am not a fan of the wording of wired but tired. But these are some of the key features of PEM - wired but tired and bladder.

I'm not sure about the alcohol intolerance one - and can certainly see how in the hands of those who are going to use it relevantly (ie it being a clue for something scientific looking into mechanisms for biomedical cures) it is a bit of a strange one to fit in.


Ah... here we are:



So 58 items.

I somehow feel it will be a compromise of all types where it isn't specific enough to be useful whilst also being too long to be fair or doable.

I also worry that this is more about PPS 'persistent symptoms' and 'counting how many' and not 'studying the impact' or 'looking for patterns'.

Particularly due to what I have read above about melding mild and moderate so talk about not just having ceiling/floor effects but there not even being a possibility of differentiating in the middle categories either?
I’m no scientist or researcher or statistician etc

Isn’t “it wasn’t giving a clear signal so we changed/ took out *symptoms* and *frequency* a bit “tail wagging the dog” aka you’re ditching measuring an illness symptom only because they way your measuring it isn’t saying what you want?

I still want someone to start a campaign about how the NHS is rolling out untested* treatments based on baseless ideas and symptoms.

*for the benefit of Sarah Tyson, I know she likes to obfusticate, I know she’s tested the validity of the questioning, I’m talking about tested regarding whether this helps or harms a patient with ME.
 
Do we know what the response is from elsewhere in the community? I”m not on Facebook so can’t see if the MEA have posted about it there but I know there’s often discussion in the comments tomitems there.

It looks like a revised version of the pre-print has been posted which responds to one of issues raised here. They’ve added:
Funding: This work was funded by the ME Association, UK
Disclosure Statement: The authors report no conflicts of interest but note that Mr Fleming is employed by the ME Association as the head of project development
https://www.medrxiv.org/content/10.64898/2026.02.16.26346394v2

I like that you can report no conflicts while reporting s conflict. They’re hardly the only to do this but still.
 
Some more thoughts, apologies again for repetition of previous points made.

The MEA funded the study, their employee and head of project development co-authored the paper, all recruitment was done through the MEA’s website and social media channels so biases participation and frames any involvement in the way the study authors want. They talk up patient and clinician involvement, so I wonder was there any dissent or was there groupthink with all saying ’this is great’.

I wish I understood more about the analysis methods used to assess, but do not. I can’t help but think this may flatten and not distinguish well between a range of severities or fluctuations well though. The whole approach seems to about forcing people and the data into a model rather than finding what works best. Overall I can’t see how any method can make the application of that method or all the other decisions made in or the output of any study ‘valid’ in the way they present.

“Rasch analysis also provided empirical evidence that a four-item response option was more effective than five”
Okay, maybe, but have you considered that a linear point based scale of this type doesn’t reflect patient experience or symptom fluctuation at all?

Do we know the questionnaire works well on a range of severities, is there any breakdown of performance across severities? Do we know it works consistently over time when reapplied to people or not? Do we know if it can capture changes to symptoms following interventions at all? Has any of this been validated or is it theoretical?

Some of these apply to all questionnaires like this, especially as we have no interventions to test! And this may be good in some ways, maybe this approach is useful. But I do question the way they have presented things.
 
The mathematical/statistical methods used assume the data is clinically meaningful. No fancy analysis can make a silk purse out of a pig's ear.

If the types of questions asked and response options offered are not able to give a clear picture of a person's most troublesome symptoms and what they need a doctor to help with, and are not able to show change over time, tnen no amount of fancy stats will fix that.

I think the fundamental problems with the design of all the questionnaires created in this project are so bad that the end result is an earthquake scale disaster.

The fact that there is no doctor, let alone a specialist doctor who understands ME/CFS, leading this symptom survey makes it a car crash.

Of course they got positive feedback, people like to be able to tell someone vaguely clinical about their symptoms, especially when we have been so neglected and abused. That doesn't make the questionnaire useful or valid.

Sorry to keep going on about this, but I'm so angry with the MEA for funding this and continuing to promote it in the NHS, with the clear aim of propping up BACME clinics.

This particular paper is just the statistical icing on a rotten cake.
 
Psychmetrics. Isn't that a contradictio in terminis? All the money, people "power" for what?.
Someone in the US needs a humongeous amount of praise, as @Hutan pointed out so do the authors, in case others won't give it they praise themselves, abundantly.

Do they understand ME/CFS now? Only one question about PEM? Too difficult ?

I once saw Prof. Fluge apologise to his learned collegues at a conference, in an adorable way, because he had to rely self-reported outcomes,
Researchers don't like self-reported.
Psychologists and others make a living out of it, Questionaire after questionaire, all self-reported, are they helpful?
Not one bit. Have all those questionaires helped even one patient?
 
As I mentioned upthread, It is possible that this is a reasonable thing to do. The authors report finding that frequency and severity assessments tracked together and so they didn't get much extra information from asking about frequency and severity.
Seems to me that asking about severity implicitly includes frequency by definition, as a severe symptom that happens for 10 minutes once per week, low frequency, would also be judged as low severity simply because it's not especially impactful. Hell, symptoms like this usually get entirely forgotten when it comes time to mention them.

But that's just one of way too many flaws, not much worth bothering with.
 
It was disappointing to see so much hard work wasted in those discussions and to have good faith feedback rejected in the way it was.
I’m just surprised they haven’t even mentioned one of the more modern and well known and regarded questionnaires in the field. It seems a striking omission.
I guess it's because it's not really used by anyone, it's pretty much absent from the literature as a result. Health care practices are too often a popularity contest, about how many people use them, rather than whether they are any good at all. CFQ is one of the most used ones, and it's pretty much the worst. But simply because it's been used a lot, it justifies using it some more. Failure inspires failure, because lazy failure is more popular than doing real work.

So we get the problematic loop of bad practices being overused, making them more cited, and thus get even more used, while better ones are entirely ignored, mainly because they don't validate the popular practices.

"Y using this because X is using it."
"And X is using this because Z is using it."
"Z learned it from X, because they saw them use it a lot."
Evidence-based medicine!
 
After some elaboration about that, more praises are sung.
A further strength is the thoroughness with which the Rasch analysis was completed. It is unusual to see the complete cycle of scale development, analysis, empirical modification, and psychometric validation of the modified scale in a new cohort, and the conversion tables to enable parametric analyses are rarely presented.
Seems to me like they thoroughly exposed how performative those methodologies are. Methodology before results gets you bad results in good academic standing. It might be entirely useless, even counterproductive, but all the correct paint colors have been painted within the pre-defined outlines. What a mess.
 
Back
Top Bottom