Preprint Development and psychometric evaluation of The Index of Myalgic Encephalomyelitis Symptoms TIMES Part I…, 2026, Horton, Tyson, Fleming, Gladwell

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Development and psychometric evaluation of The Index of Myalgic Encephalomyelitis Symptoms TIMES Part I: Rasch Analysis and Content Validity
Mike C Horton; Sarah F Tyson; Russell Fleming; Peter Gladwell

OBJECTIVE
To develop and psychometrically evaluate an assessment of symptoms in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS)

METHODS
An initial symptom list was devised from the relevant literature with the patient and clinician advisory groups. An online survey with 85 symptom items in eight domains was completed by people with ME/CFS. Each item had two response structures (assessing symptom frequency and severity on five-point scales). Rasch analysis assessed each domain for unidimensionality, targeting, internal reliability, item fit and local dependency.

RESULTS
Survey data (n=721) indicated various item anomalies and inter-item dependencies, leading to item re-formatting or removal. The frequency and severity-based responses broadly replicated each other, and a four-point response format appeared more appropriate than a five-point response format. Following Rasch-based scale amendments, a revised version with a single four-point response format was re-administered to test the modifications. Validation data (n=354) showed the modified scale had an improved response structure and functionality across all domains, satisfying Rasch model assumptions. Additionally, domain-level super-items allowed for a summated total score along with sub-scales summarising neurological and autonomic symptoms, again satisfying Rasch model assumptions.

CONCLUSIONS
The Index of ME Symptoms (TIMES) and its associated sub-scales and domain scales are stable, valid assessments of symptoms in ME/CFS.

Web | DOI | PDF | Preprint: MedRxiv | Open Access
 
I have only read the introduction so far. There is little or no indication as to what this 'index' is for, except 'clinical assessment', whatever that may mean.

The background refers to diagnostic criteria and the De Paul questionnaire and talks mostly of 'concept validity' and consistency in 'psychometric' terms. I cannot see what any of this is supposed to achieve.

In what way does it provide more useful information than "how have you been getting on since we last met?"
 
I have only read the introduction so far. There is little or no indication as to what this 'index' is for, except 'clinical assessment', whatever that may mean.

The background refers to diagnostic criteria and the De Paul questionnaire and talks mostly of 'concept validity' and consistency in 'psychometric' terms. I cannot see what any of this is supposed to achieve.

In what way does it provide more useful information than "how have you been getting on since we last met?"
Stable, innit.
 
Asking about symptoms like this has valid uses in research, but I have no idea what possible usefulness it has as a clinical tool. It's hard to think of who or what this is for. Much better studies have been done, and the best ones are still the first done by the Body Politic community, and they did it before the medical profession messed it all up, refusing to lead in any way.

I don't know how many people here are familiar with some of the weird lingo about 'looksmaxing' are. I'm not, it's more of a meme, especially among influencers, the idea of optimizing and fine-tuning appearance and making it the foundation of one's identity. Basically obsessively going to the gym, doing vibes-based cosmetic surgery and adopting a "f**k everyone" attitude, or something like that. It seems to mostly be about being narcissistic jerks who think empathy is weakness.

This reminds me of that. Not just this, the whole biopsychosocial evidence-based medicine stuff. Which this is part of. This can be used to tweak and optimize some fake metric, touting fake success where nothing done actually matters.

Everything is about method. Bad method, but method still. Outcomes? Pfff. No one cares about outcomes. A century from now, if technology stopped progressing, the same junk would be repeated with the same total lack of progress. At this point, an AI apocalypse sounds like a major improvement. When you look at how the problem is discussed today, and how it's identical in every single way to how it was discussed 50 years ago, it's just so demoralizing. Millions of lives and no one gives a damn, the same failure on repeat. What a dumb waste.
 
Russell Fleming has been part of the research team from the start, presumably based on his employment by the MEA who funded the project. Since he's not a clinician, his place is presumably as a pwME.
 
I've literally only just opened and scanning for number of questions they ended up with and the first para I've read is (I've put it into 2 paras rather than 1 for readabiltiy):

Alongside the selection of a single response format, it was apparent that for most items, the response structure wasn’t working as intended, and disordered response thresholds were observed. This was consistently due to the non-emergence of the second response category across both the frequency-based and severity-based formats.

A generic post-hoc rescoring was therefore implemented across all items, where the second and third response options were merged. Thus, the response ‘Occasionally (now and then)’ was merged with ‘Regularly (about half the time)’, and ‘Mild symptoms (able to carry on with activities)’ was merged with ‘Moderate symptoms (Interfering with some activities)’. This vastly reduced the number of items displaying disordered thresholds.

Oh come on!!

What type of valid, I mean external validity, the proper type, can there be when we read that second paragraph?

Particularly if its use is looking at whether people are mild, moderate, severe or are improving or getting worse, or are struggling or if it gets used for assessments or if it is being used for research to see if a 'treatment' helps.

When they have had to merge 'occassionally' (now and then) with 'regularly (about half the time)' - which is pretty significant with regards employment and being able to look after yourself. It is a pretty different issue eg not being able to eat or drink or walk or talk 'now and then' to 'half the time' for example regarding consequences and support and how serious things are . If you can't do a task about half the time that is surely significantly different to 'now and then'. Whatever is being measured.

And have had to merge 'mild symptoms' meaning 'able to carry on with activities' with 'moderate' meaning 'interfering with some activities'. Which is like merging 'can' with 'can't'.

And why didn't at the point they realised this was needed because of whatever 'disordered thresholds' is did noone think 'maybe we have the wrong concept here' and the issue is that we are measuring the wrong things given, you know, the main feature is PEM. And it isn't like they made the instructions or descriptors unambiguous even to that.

ANd that such a flag indicated a need to redraw the questions in relation to this, as they should have been running a better properly exploratory piece of research first so they understood how clear they would need to be about certain things otherwise they will have variability representing the numerous different ways just a single question-wording could have been interpreted etc
 
Last edited:
BIt perturbed about this too

During analysis of TIMES2 (Figure 1), no further post-hoc modelling adjustments were made to the item set, but three further items were removed due to excessive misfit: The ‘wired but tired’ item was removed from the Cranial Nerves domain; the ‘bladder’ item was removed from the Gastrointestinal domain; and the ‘alcohol intolerance’ item was removed from the Immune System domain. All other items displayed Rasch model fit within their domains, although some minor anomalies remained.

I'd be intrigued by the context and am not a fan of the wording of wired but tired. But these are some of the key features of PEM - wired but tired and bladder.

I'm not sure about the alcohol intolerance one - and can certainly see how in the hands of those who are going to use it relevantly (ie it being a clue for something scientific looking into mechanisms for biomedical cures) it is a bit of a strange one to fit in.


Ah... here we are:

The final TIMES2 item set consists of 58 items across the nine previously stated domains (fatigue, cognition, pain, motor-sensory, sleep, cranial nerves, cardio-respiratory, gastrointestinal, and immune system) and the Rasch model fit and psychometric properties of the domains are summarised in Table 3.

So 58 items.

I somehow feel it will be a compromise of all types where it isn't specific enough to be useful whilst also being too long to be fair or doable.

I also worry that this is more about PPS 'persistent symptoms' and 'counting how many' and not 'studying the impact' or 'looking for patterns'.

Particularly due to what I have read above about melding mild and moderate so talk about not just having ceiling/floor effects but there not even being a possibility of differentiating in the middle categories either?
 
IN the end after the discussion section:

Limitations: The strengths of this study lie in the large, representative sample, the robust coproduction with people with ME/CFS and clinicians working in NHS specialist ME/CFS services.

The number of people with severe/very severe ME/CFS recruited is also a strength. It is estimated that 25% of people with Me/CFS are severely or very severely affected5. This is reflected in the TIMES cohorts and is higher than other large studies using similar recruitment methods, such as DeCodeME31.

We think the accommodations offered to maximise accessibility enabled a relatively high proportion of severely affected people to participate. These included alternative ways to complete the surveys (paper, phone, video or proxy); a survey tool that automatically saved responses so it could be completed in phases; using a dyslexia friendly style guide to maximise accessibility, and encouraging discussion with researchers and (potential) participants about their needs and further accommodations.

Firstly this feels like the cliche of when you are asked an interview question about 'their weaknesses' and some people try and turn it into a way to crowbar in a strength.

But secondly this claim is not consistent with what I'm sure was on facebook at the time admitting that this whole project had an issue with not representing severe people. And that they'd have to 'do a different one for severe'.

And inconsistent with the feedback they directly received - which I believe led to one of the authors making quite rude snarky responses along the line of 'a questionnaire can't do harm' inferring feedback along these lines would be ridiculed or perceived as ridiculous (thereby inhibiting feedback by coercion, sorry but I think that's correct to say if you are allowed to be like that publicly to someone's feedback that is fair and likely to be common then putting of others from daring to agree with it would be a consequence) , which at the time to me exhibiting the level of concern for not harming that is inappropriate to work in the area, but hey.


It is pretty concerning that this lot and this methodology has the gall to try and write something suggesting what they put pwme through for this is OK for any pwme nevermind severe. But this seems to be inferring it is some sort of paragon to be followed in how to treat severe and very severe in research!!!

The gall of citing 'like DecodeME' who actually did do their methodology such as to be inclusive and very sensitive to eg how much time it would take someone just to post a swab back to them, and provided videos etc and were quite opposite in attitude to this - my jaw drops, is this delusion or just callous cheek?
 
Last edited:
Russell Fleming has been part of the research team from the start, presumably based on his employment by the MEA who funded the project. Since he's not a clinician, his place is presumably as a pwME.

But funders do not normally get their names on what they fund. Funders do not fund their own people. There is a conflict of interest surely?
 
I don't understand the point of all this elaborate analysis and the creation of a scoring system. Surely the overall score will depend too much on the choice of which symptoms to list, for example if they were to list 6 separate sleep symptoms and only one OI symptom, then those with mildly disturbed sleep of various sorts will seem much sicker than those with OI that is totally disabling.

I note that the feedback they quote from a patient is being pleased that their symptoms are recognised. That can be done with a simple list of symptoms, or a blank page to write your symptoms on or a doctor who takes the time to ask an not the troublesome symptoms. Of course they had positive feedback, because it gives the illusion that someone is going to do something about the symptoms.

The questionnaire was designed by and for therapists to use in BACME clinics. A therapist can not do anything about symptoms - that's the doctor's role. So why do they need this data?
 
Back
Top Bottom