Patient led measure of outcomes

Why do you think it would only be suitable as a short-term approach?
I think the approach of each patient picking 3 activities they currently can't do and want to be able to do may be OK for a trial of a treatment, with the follow up running for, say, a year, might be OK to give a clear indication of whether the treatment has had a clinically significant effect. Also for pwME, effects of activities are cumulative, and PEM is delayed, so that needs to be factored in.

I think for something like a long term study of pwME's fluctuations over time that would not be useful. It doesn't take into account worsening, and, depending on the activities picked, may turn out to be to limited, with further improvements or worsening not registering. Something like FUNCAP which covers a wider range of activities across all severity level would be better for that.
 
I'm of the opinion that certain test things like eg a shower (and that would need to include as well as frequency: time, an activity monitor but also issues that might change like if people get a seat etc) are going to be harder ones for the bps to cheat. Because its the one thing I could never cheat other than picking the only window over x time period my body wouldn't faint and loading up on meds, caffeine or whatever I used at that level of severity to help adrenaline-up if needed for it. My hair length makes a difference now because I have to brush it before, sometimes just before sometimes I pace that the day before. BUt hey that hasn't happened by accident either.

I also know that as I've got worse over the years I've been in denial particularly whilst I was working, putting on my best performance for work as the priority. But that if someone looked closely then first the grocery shops went online, then work from home, shower at different times and frequency. But also big things like having to be driven places, get stairlift, how much I can even use my downstairs, ability to sit in a chair or different types of chair.

Well... it all seems pretty significant stuff when you look at it over those longer spaces of time, and me not really being surrounded by anyone being any much different or kinder or more accepting or being a different personality myself.

So sometimes when I look at that I wonder whether I'm 'overthinking it' with the forensic nature of what my calendar has been like/how punishing it has been in the preceding weeks and months being so pertinent. But it's true. I'm having an awful week (which of course those near me still hint maybe I caught a bug even decades in to avoid 'getting it about ME/CFS timescales') because of a load that finished 3 weeks ago. And I'm worse than last week, even though I've had a week more 'rest' vs that hectic time before. Even if you were using an app I'm not sure it would make sense. Or have picked up the differences from this week to last in how awful I feel and debilitated I am.

So it has reminded me that even the 'home experiment' format if you could trust people to put aside eg a week to not have other things and test if they could get shower or teethbrushing in and compared it to the year before will be affected by that. And I don't know how much of that will come out in the wash over matched pairs that had a comparatively less threshold vs committment week a year ago vs today or if for most of us it will tend to just go that one way of the tightening ratchet.

I know that things like the teaching survey format wouldn't work (they give each person a date in the calendar and they have to fill in exactly what % of time they spent on tasks related to teaching, research, admin - with the idea across all the people the dates will 'even out' because some will be term time, exams, some out of term etc) ie it has to be the individual compared to themself, not relying on size of sample to even out over a population. For multitude reasons (including some things might work for certain types).

I'm keeping on thinking because as a saddo this does fascinate me professionally as conundrums to how to get a research method to tackle something thanks to my background, as well as personally, hence the questions I ask myself. And then checking I'm not doing overkill/putting in excess that doesn't matter. ie remembering what are we looking at 'differentiating on' and what does or doesn't matter too. Those different angles. And not getting caught in the trap of just using what's available as a measure vs picking something there isn't even a good proxy for.

I do think that somehow having better 'frames of reference' to those measuring points in time is the clever bit and needs something far better than a survey question, certainly of the standard format (it would need to be more experiment type). I wonder whether seeing a video of myself a year ago would help for example.

It depends who we are needing to prove it to as well I guess. I think we can sort of get there but it might need to be specific things for specific things.

But yes if you were monitoring a whole population over very long periods of time then things like the shower and 5 items. And funcap even in the basic form I'm surprised by how at first you think you are teetering over whether the 'can't do at all' or'3 days impact' vs 'can't do on same day' makes sense but then as the very different grades of items come through you realise how less that matters in the big picture anyway because I'm so disabled that half the questions later on are just dreamland stuff, so I can see how it separates pretty well even with that. And how much of a change on how many of the features are we talking about. Given it should be long-term and it should be 'enough' I guess on at least something.
 
Last edited:
Wouldn’t free text answers also be impacted by mood, etc?
I'd warn against free text type stuff. the iller I am the more my ability to find words (anomia) or just want/need to lie down and not be able to do it at all will be. I might be more likely to undersdescribe or say the equivalent of the in-person 'I'm fine +small talk' when I'm so ill I'm having to dictate what would be 'top of head' stuff rather than when I'm less ill but can at least access my words and 'meta' how do I feel part (which won't happen until I get a better moment when I'm most ill)

there is also the ambiguity of different people meaning very different things by the same words another person uses. if we ever needed to compare one person with another. and misinterpretation.
 
Thanks again to @Utsikt for giving me the link to Jo's paper. I've copied the relevant bits out below, reformatted for ease of reading.

***

On entry, a set of criteria was laid down for each patient on the basis of their clinical state at entry, indicating what would be considered 'ideal' improvement, 'useful' improvement, no change, and deterioration.

  • Ideal improvement was intended to indicate the best possible outcome which might be expected in the face of any irreversible problems such as joint deformity or chronic uraemia, sustained for at least three months.
  • Useful improvement was intended to indicate an improvement short of ideal which justified the cost, inconvenience, and potential hazard of high dose steroid infusion, and which was sustained for at least three months.
  • A static state was intended to indicate the absence of either useful improvement or significant deterioration, assessed at three months, or earlier if withdrawn for alternative dosage treatment.
  • Significant deterioration was intended to imply that clinical problems had worsened or that new problems had developed which were of greater importance than any coexisting improvement. The appearance of renal disease in the face of improved arthralgia would be considered deterioration, but the development of arthralgia in the face of significant improvement in renal function would not.

Assessment was made at three months, or earlier if withdrawn for alternative dosage treatment. The criteria for outcome were different for each patient, based on the problems of relevance to that individual.

Criteria were often quite complex, being derived from a range of baseline clinical and laboratory data. An example of a set of criteria is given in Table 2.

Table 2 Criteria for patient 10 (first arm)

Clinical features on entry: pyrexia, leucopenia, anaemia, pleural effusion, pleuritic pain, and proteinuria.
  • Deterioration=Death. or cerebral disease, or proteinuria more than 8 g/day, or increase in pleural effusion on radiography. or neutrophil count below i0(/i.
  • Static=Neither 'deterioration' nor 'useful improvement'.
  • Useful improvement =At least two of the following at three months: no pyrexia. proteinuria less than I g/day. 80% resolution of pleural effusion on radiography. neutropphil count above 2-5x1l()/l on two separate occasions.
  • Ideal=No features of SLE. and specifically. all criteria for useful improvement fulfilled and haemoglobin greater than I 10 g/l at three months without transfusion.
[...]

In trying to answer a question of outcome, the power of the statistical analysis is reduced in proportion to the number of analyses made which relate to the question. When dealing with very small numbers the only option is to use one outcome measure.

A point scoring system using a range of clinical data which provided a sensitive reflection of each patient's problems would have to be based on an unmanageable set of rules involving many inter-related contingencies, involving time relationships and subtle grades of severity. As an alternative we used a system of individualised criteria.

This is equally valid statistically and can be much more closely tailored to events of importance to each patient. It suffers from the disadvantage that one physician's assessment of important outcome events may differ from another's.

It became clear during the trial, however, that the two or three physicians drawing up each set of criteria agreed very closely on what constituted ideal improvement and useful improvement as originally defined.

The study shows that it is feasible to conduct double blind trials using individualised outcome criteria. Even with the use of individualised outcome criteria the power of the statistical analysis is weak because of small numbers.

The study may not have detected a modest difference in effect between the two dosages, demonstrating the almost insuperable problems of studying uncommon heterogeneous disease. Nevertheless, we consider that individualisation of outcome criteria goes part of the way to solving this problem and can be a very valuable technique.
 
Five seems like quite a lot - would three be good enough?

I think it should just be 'did you do them?' That cuts out the interpretation. I think any study still needs to be long enough to pick up deterioration due to overexertion. So, that covers part of the negative knock ons. I think the measure needs to be completed daily - that way there is an aspect of frequency, so that also covers part of the negative knock ons.

I like the idea of picking activities from Funcap.
there aren't many measures I can think of I'd complete daily if I was in a crash. Today on the app lying flat on my back (admittedly looking at my phone for much of it, but TV mostly off etc) I've been in 'activity' all day. Until today I've found it interesting where the switch-point of angle I'm sitting up at makes it rest vs activity, laying on my back put me in rest mostly. So clearly today somethign is different. And this isn't the worst day I've had this week. SO I like the over a month and something you mightn't do every day



there are ones like toothbrushing where that is my aim, so if it doesn't it says something. particularly over a month if I could pass a month if I had silly amounts of committment vs threshold. Because the odd difference will iron out.

and I have one app recording where I wasn't sat down the whole time and the intensity is so hugely different vs the rest of them. plus even with sitting down it's enough of an exertion it always registers as a time peak - trouble is that when I'm least unwell I'll be more efficient, when I'm most unwell I'll be least efficient and give up sooner than when I'm middle unwell. So it doesn't capture everything.


vs the ones that are daily like drinking or toilet or moving in bed which are impacted by other symptoms as well as indicators of how well I am. eg I need to wee a lot at certain stages of PEM and drink more if I can. and the staggering and quality and method of doing these says more on the debility etc.


Lots of really useful and insightful feedback. Thank you everyone.

So maybe something like…

- Pick 3 activity descriptors that you feel best describe your current limitations and level of activity, you can choose from FUNCAP55 or write your own.

Try to pick a range which represents you best, with one you can usually do without much difficulty one less often and one you’re rarely able to do

- Each day (or week) record how many times you do each activity (without significant difficulty?)

- Each day (or week) record if you consider it a good, average or bad day/week for you

In this way you will get both weekly and monthly totals of ‘activities’ and ‘good/average/bad’ periods

You can use a spreadsheet, a piece of paper, or whatever method works for you (I/we could provide some ideas and templates to use, copy or print)
 
I'm not concentrating very well at the moment, so I hope I'm not just repeating someone else's idea that I've read and then forgotten. But, having just read Jo’s paper, I think that the approach it takes gets around a lot of our issues about only occasionally being able to do particular things, and struggling to imagine what would happen if we did certain things (a major problem with FUNCAP, IMO).

In his paper, the clinicians described each patient at entry, and for each of them, said what they would consider to be deterioration, what would constitute useful improvement, and what would be the best possible outcome, allowing for the permanent damage that that patient had from their illness.

As JemPD says, if you don’t have a reasonably stable baseline, this may not be a feasible approach. Maybe any clinical trials would need to be done only on stable patients.

But for people with reasonable stability, we could first define what our basic day looks like. It’s likely to be our comfortable maximum activity, because when we are so limited, I think we generally live up to whatever energy we’ve got.

So that baseline description might be something like, ‘Have to lie in bed for 10 hours a day, can just about prepare food, have a basic flannel wash, be on the computer for a couple of hours, can’t talk, difficult to walk from one room to another, zero house work.’

Then deterioration and useful improvement would need to be defined as a change from that, but I’m not sure what the best approach is to being specific about it. Someone living like that would consider losing any of those abilities to be deterioration, but how much loss would look meaningful in a trial?

Similarly, a useful improvement to a PwME existing at such a low level of function could be something as simple as, ‘Can talk for 10 minutes’ or ‘Can unload the dishwasher’. But if you were running a trial and wanted to demonstrate that your side-effect-laden drug was worth taking, wouldn’t you want something bigger? How would you determine that?

Any thoughts, @Jonathan Edwards? How did you determine this in your trial, for the patients? Or were there already well-established clinical criteria that you could use?
 
Last edited:
. Maybe any clinical trials would need to be done only on stable patients.

But for people with reasonable stability, we could first define what our basic day looks like. It’s likely to be our comfortable maximum activity, because when we are so limited, I think we generally live up to whatever energy we’ve got.

So that baseline description might be something like, ‘Have to lie in bed for 10 hours a day, can just about prepare food, have a basic flannel wash, be on the computer for a couple of hours, can’t talk, difficult to walk from one room to another, zero house work.’

Then deterioration and useful improvement would need to be defined as a change from that, but I’m not sure what the best approach is to being specific about it. Someone living like that would consider losing any of those abilities to be deterioration, but how much loss would look meaningful in a trial?

Similarly, a useful improvement to a PwME existing at such a low level of function could be something as simple as, ‘Can talk for 10 minutes’ or ‘Can unload the dishwasher’. But if you were running a trial and wanted to demonstrate that your side-effect-laden drug was worth taking, wouldn’t you want something bigger? How would you determine that?

Any thoughts, @Jonathan Edwards? How did you determine this in your trial, for the patients? Or were there already well-established clinical criteria that you could use?


I don't think we can have something that relies on that assumption to that level - I obviously don't mind if there is a much smaller caveat of 'those in the early days or in a particularly unusual situation'.

I think the issue is that noone can predict for sure that they will be stable. We are all a noisy neighbour/building work/virus/injury/new or no carer/boss change/family emergency/powercut/broken down car away from significant change if our threshold then goes beneath what is constantly achievable/needed over time.

And it's only over the space of decades that I know what someone might think is stable actually isn't because of deterioration or slight differences in hindsight.

But further than that anyone who says or thinks they are to get into a trial isn't necessarily actually going to be more stable than the one who doesn't get in because they take that assessment too seriously and have the more cautious/conservative knowledge.


I also have a huge problem with those who think they can control their PEM dominating the research because it will exclude the more severe and not really be testing out any treatment to the full extent. Only on those who have the type and situation which is pretty rare to be able to stay out of PEM.

Which will likely also come with all sorts of related sociodemographic backfires on representativeness and so on which in today's healthcare climate will bite us in the bum.

But also not be testing whether it actually works by reducing the impact of activities that would induce PEM. If you don't have the people who are regularly in PEM to different levels.
 
Last edited:
I just measure the time I spend lying down. All my functionings, including brain, are inversely proportional to the amount of time I spend lying down.

One advantage of TSLD is that you can compare it across patients. Questionnaires and VAS are subjective and therefore more difficult to compare. It also let you compare longitudinally. I used to spend 7-8 hours lying down. Now my range is 1-4 with occasional 0 or 5. Everything is so much more difficult on 4-hour days.
 
Back
Top Bottom