SF-36 - a discussion

DokaGirl · Jan 15, 2023

As @strategist has said pwME may not have a realistic picture of their disability. What I've seen in others, and myself is that we often overestimate how we're doing.

I think comparing to others in the same age bracket could give an idea of function.

As well, comparing one's pre ME activities to current activities can provide an outline.

Before ME, we used to cross country ski most weekends. Fast forward a few years later, and my age cohorts were going skiing most weekends, but I could only dream about it.

Just one example of how much pwME lose, and how radical a contrast it is with healthy people, as well as one's own pre ME function.

Hoopoe · Jan 15, 2023

It's typical for me to feel relatively good hours or days before crashing. The feeling good is the emotional component of a "be active and get things done" state of mind. The increased brain activity is not tolerated for long.

It would be interesting to try and measure this phenomenon and observe the transition into an excited state and the crash.

alex3619 · Jan 15, 2023

I think SF-36 is generally fine as a tool but subject to over-interpretation. Comparing any two patients, or a cohort, is problematic given how people might interpret things differently. It can only be a rough guide. I think it has a better place over time for a single patient. Is this patient doing better or worse?

FMMM1 · Jan 15, 2023

Jonathan - "I keep wondering what it is we should really be trying to do with these questionnaires and whether they really provide useful documentation over and above 'are you better' or actually hide that real answer."
I'm a parent of someone who's ill --- so don't really know, but this seems reasonable ---

Ravn · Jan 15, 2023

All the questionnaires currently used in ME suffer from the fundamental flaw that they ignore how PEM and pacing to avoid PEM work.

Questionnaires about function, like the SF-36, fail to account for the can/can't factor. That's about how your answers depend on your interpretation of 'can'. 'Can' walk up a flight of stairs can be interpreted as meaning 1) being able to get to the top of the stairs irrespective of any PEM later, or as 2) being able to do it once but not repeatedly without PEM later, or as 3) as being able to do it as often as a healthy person without PEM later. The questionnaires were designed with interpretation 1 in mind but this isn't meaningful in ME (except in the most extreme cases where there is a change from or to very severe).

Questionnaires about symptom load, like the DSQ, fail to account for the did/didn't factor. That's about how your answers to how many symptoms you had how badly during a given period depend to a large degree on how well you were able to pace during that period.

As mentioned in my previous post, the function questionnaire under development in Norway is the only one (I'm aware of) properly addressing this problem.

Jonathan Edwards said:
I keep wondering what it is we should really be trying to do with these questionnaires and whether they really provide useful documentation over and above 'are you better' or actually hide that real answer.

I suspect a single question would be just as meaningful and accurate/inaccurate - and subject to wishful thinking - as any longer questionnaire. But I also suspect we're stuck with longer questionnaires because funders and researchers are so used to having them they simply insist on using them. So the best we can do is encourage use of the least bad questionnaires possible.

My one point item for assessing treatment effect would be trying to answer the question 'has your PEM threshold shifted?'. So something like 'overall, has the level of activity you can do - consistently and without getting payback later - decreased, increased or stayed much the same [compared to some previous time point]'?

Answer options should include 'not sure, maybe I can do a tiny bit more/less, it's hard to tell' to prompt people to at least reflect on their level of uncertainty. This won't eliminate the effect of wishful thinking but should help reduce it.

Kitty · Jan 15, 2023

Ravn said:
My one point item for assessing treatment effect would be trying to answer the question 'has your PEM threshold shifted?'. So something like 'overall, has the level of activity you can do - consistently and without getting payback later - decreased, increased or stayed much the same [compared to some previous time point]'?

Yes, that does get to the nub of it.

It also has the advantage of potentially making it easier to monitor patients for longer, as they're less likely to get bored by completing a long questionnaire. Given the natural ups and downs in function that some ME patients probably experience no matter how they manage their symptoms, a major trial of a drug treatment ideally ought to be followed up for at least two years.

If there were a regular follow-up with just one or two questions, I'd also be tempted to add a free-form text box for patients to note down a couple of thoughts about the current state of play if they want. Not to be included in the trial data, but to remind them of where their thinking was the last time they made an assessment.

It might help prompt a more thoughtful reflection. I've found that I tend to overestimate gains if I happen to have had a good week, but if I also read previous notes, it's sometimes clear that there has been little change, or things aren't actually as good as they were, or I was frankly deluded last time and on the verge of a crash. If you're contacting people every month or three months, the task of responding can quickly turn from a carefully considered exercise into a familiar little job that can be dashed off quite quickly, and anything that works against that is probably useful (if established good practice allows it, that is).

Sean · Jan 16, 2023

Ravn said:
My one point item for assessing treatment effect would be trying to answer the question 'has your PEM threshold shifted?'. So something like 'overall, has the level of activity you can do - consistently and without getting payback later - decreased, increased or stayed much the same [compared to some previous time point]'?

Not just the threshold, but also the degree.

I strongly suspect that the process that causes PEM is always present and active, just with varying degrees of sensitivity and effect, both between patients, and for each patient over time.

I would even suggest that PEM is ME, for all practical purposes. Or at least the primary expression of it.

Milo · Jan 16, 2023

Jonathan Edwards said:
What I may have taken for granted is that the issue is the value of SF36 as a trial endpoint.

I think SF-36 could be use generally as to assess a cohort's level of illness and compare to another cohort between studies. For instance one cohort with high level of physical functioning compared to another cohort that is much more severe could be used to assess the general level of disability of that particular cohort, and perhaps explain results between the 2 cohorts.

However SF-36 as an end point following intervention, I am not sure it is sensitive enough or provide enough information. So perhaps the question is what end points would measure success or failure?

1) more functionality; Dr Bateman would say # hours of feet on the floor (upright activities); it could be # of steps
2) return to work or more working hours. Return to physical activities. Return to social activities; ability to talk, converse and think with no payback
3) Absence of PEM (CPET)

The thing with end points is that some people would dream to be able to wash their hair once a month, while it is not a problem for others. Others struggle having a 10 minutes conversation on the phone, while other can easily talk for 2 hours without pay back. We are very diverse in terms of baseline capacity and the measure of success will very accordingly. Some would call success the capacity to just sit up in bed for 5 minutes.

Adrian · Jan 16, 2023

The SF36 is a terrible tool for measurement in my opinion. (although better than some!).

If you look at the questions they do not appear independent or evenly spaced. So for example there is a question around the ability to walk a block (in the US version) and the ability to walk a mile. I would guess that the level of improvement to walk a block is not the same as to walk a mile.

Then there is the issue that questions are (in my view) not independent. For example, questions ask about the ability to walk and climb stairs. (depending on disability) these abilities are likely to be physically related (i.e. if your walking improves then your stair climbing ability improves) which leads to double counting in certain areas.

A long time ago I had a look at some of the ONS data for the SF36 physical function scale where they have individual question answers. What was clear is that at the edges a couple of questions represent low and high ability. But in the middle there was little order in the question answers suggesting the lack of a good physical proxy.

In effect I think it measures change in the middle of the scale as bigger than change at the edges. But its not clear how this is. It’s feels a bit like counting with a scale of one, few, many. I don’t think it is anywhere near good enough to give a consistent change across a group of patients. It certainly doesn’t have properties (linearity) allowing for mean differences to be quoted.

I could see some sort of analysis of change based on all features being used but I’m not sure how that would look.

Milo · Jan 16, 2023

Adrian said:
For example, questions ask about the ability to walk and climb stairs. (depending on disability) these abilities are likely to be physically related (i.e. if your walking improves then your stair climbing ability improves) which leads to double counting in certain areas.

I have to disagree with you. Walking on flat surface, or walking downhill requires much less energy than climbing stairs. The demand in energy for climbing stairs is greater than walking a flat surface. I have learnt this very early on in my disease. I simply do not do stairs. I cannot walk a whole lot mind you but my walking usually is through a grocery store (not a big box store). If I went up stairs, I would crash.

I have organized my life to avoid stairs and uphills. It means that when I am in my underground parking, I can walk down ramps. Then take an elevator to my apartment. To go back to my car, I go down 2 flights of stairs, avoiding walking up ramps.

Of course this is N=1 and every situation is different. If a pwME had to go up 2 flights of stairs just to reach their appartment, I have no doubt that they would become housebound very quickly because going up stairs is a significant energy demand. But I would suggest that if they lived in a no stairs environment they would be able to do a little more.

Hoopoe · Jan 16, 2023

The closest to a useful questionnaire would be something that attempts to work out how large the energy reserves of the patient are, and how well they hold up over several days/weeks at a constant level of daily activity.

I can do a lot of activities, so am technically limited only mildly limited in all of them, but have only a few hours every day where I can really do stuff (without causing excessive stress to the body), and this changes everything and the illness that seems like a mild problem if only "ability to do" is looked at, is suddenly a serious disability.

The "how well can you do x" questions seem suitable for problems like having lost a leg or knee problems. They're no good when the limitation is not so much in any particular activity but more in amount multiplied by intensity over a timeframe of a week.

My physical ability has actually improved in the last few years and for several months I've been able to walk for several kilometers, part of it uphill, without major crashes (although it's clearly stressing my body and causes mild PEM). But am not able to do anything else requiring meaningful physical activity on that day, like cooking or cleaning, and will probably have to hold back a littleon the next 1-2 days too. I can cook, clean, and walk, just not all of it on the same day.

Also sometimes my muscles get weak while walking upstairs and on other days there is no limitation at all.

FMMM1 · Jan 16, 2023

Jonathan Edwards said:
What about the idea of having a list of activities ('function measures') that you fill in each time in terms of grade of difficulty

This seems like a really good idea, but I think you'd need to collect some objective data (actimetry).

If you found that the approach you propose (above) works, then it would be useful for other studies - 2 trials for the price of 1!

CRG · Jan 16, 2023

No direct experience of it, only aware of it as a screening tool in research papers. My view on SF-36 can be reduced to three of questions:

1. What is it for ? The 36 SF is derived from the SF 80 developed by Ware and colleagues, their 1993 Manual and Interpretation guide (pdf) runs to 316 pages, however the introductory paragraph makes it clear that the purpose of the 36 SF is to facilitate a measure of how treatment meets the patients' expectations and needs.

2. How is it being used ? In ME/CFS it's clear that SF-36 has been considered as a tool for the assessment of the health status of ME/CFS patients rather than simply the assessment of treatments provided to patients:

Comparison of Euroqol EQ-5D and SF-36 in patients with chronic fatigue syndrome

Abstract
"The objective of the study was to compare the Euroqol EQ-5D (Euroqol) and short-form 36 (SF-36) health questionnaires in patients with chronic fatigue syndrome (CFS). One hundred and twenty-seven out-patients referred to a hospital-based infectious disease clinic with a diagnosis of CFS were contacted by post and asked to complete both questionnaires. Additional data were determined from hospital casenotes. Eighty-five patients returned correctly completed questionnaires. Euroqol health values and visual analogue scale (VAS) scores were strongly and significantly correlated with all dimensions of the SF-36, with the exception of physical limitation of role. SF-36 dimensions were in turn strongly and significantly correlated with each other, with the same exception. Patients reported a high degree of physical disability and a moderate degree of emotional or psychological ill-health. The Euroqol elements dealing with mobility and self-care referred to inappropriately severe degrees of disability for these patients with CFS. Similarly some dimensions in the SF-36 were oversensitive and did not discriminate between patients with moderate or severe disability. It was concluded that Euroqol scores correlated strongly with SF-36 scores and provided useful information about patients with CFS and that Euroqol would be a useful tool for the rapid assessment of health status in CFS. The current Euroqol instrument refers to inappropriately severe degrees of disability for patients with CFS and would need to be modified to be maximally useful in this situation."

Functional status in patients with chronic fatigue syndrome, other fatiguing illnesses, and healthy individuals

Abstract
"Chronic fatigue syndrome (CFS) is a condition that may be associated with substantial disability. The Medical Outcomes Study Short-Form General Health Survey (SF-36) is an instrument that has been widely used in outpatient populations to determine functional status. Our objectives were to describe the usefulness of the SF-36 in CFS patients and to determine if subscale scores could distinguish patients with CFS from subjects with unexplained chronic fatigue (CF), major depression (MD), or acute infectious mononucleosis (AIM), and from healthy control subjects (HC). An additional goal was to ascertain if subscale scores correlated with the signs and symptoms of CFS or the presence of psychiatric disorders and fibromyalgia."

3. Is the SF-36 useful in measuring how treatments meet ME/CFS patients' expectations and needs ? There are reasons to think that even when SF-36 is restricted to use on measures of treatments on ME/CFS patients that it is somewhat deficient in terms of both conflation of mental health measures with physical measures, and in respect of challenges presented by cognitive impairment.

SF-36 as a Predictor of Health States pdf: https://www.sciencedirect.com/scien...35f3d8c&pid=1-s2.0-S1098301510755443-main.pdf

Extracts

The SF-36 is relatively poor at accounting for the health status of respondents. There are significant paths but the variance accounted for in absolute and relative terms is small. Physical Health does a much better job of accounting for general mental health than it does for perceived health problems or physician determined illness. These findings suggest that the SF-36 may not discriminate well between healthy and nonhealthy groups and that objective measures of health status may be required in conjunction with the use of the SF-36.

Additionally, the substantial covariation between Mental Health and Physical Health, the crossloadings of General Mental Health onto both Physical Health and Mental Health, and the loading of General Health onto Mental Health rather than Physical Health raise questions about the validity of the Mental Health and Physical Health constructs. Construct validity is called into question further by the relatively low correlations between Physical Health and both physician reported illness and reported health problems. The evidence suggests that Physical Health and Mental Health may not be distinct constructs. Perhaps, as Keller et al. [4] suggest, these constructs are simply measures of health. However, as measures of health, the correlations with health states should be substantial rather than low to moderate as found in the present data. Clearly additional research on the construct validity of the SF-36 is needed.

Problems in using health survey questionnaires in older patients with physical disabilities. The reliability and validity of the SF-36 and the effect of cognitive impairment

Abstract
"Reliability and validity of the SF-36 Health Survey Questionnaire was assessed in older rehabilitation patients, comparing cognitively impaired with cognitively normal subjects. The SF-36 was administered by face-to-face interview to 314 patients (58–93 years) in the day hospital and rehabilitation wards of a department of medicine for the elderly. Reliability was measured using Cronbach’s alpha (for internal consistency) on the main sample and intraclass correlation coefficients on a test–retest sample; correlations with functional independence measure (FIM) were examined to assess validity. In 203 cognitively normal patients (Mini-Mental State Examination ≥24), Cronbach’s alpha scores on the eight dimensions of the SF-36 ranged from 0.545 (social function) to 0.933 (bodily pain). The range for the 111 cognitively impaired patients was 0.413–0.861. Cronbach’s alpha values were significantly higher (i.e. reliability was better) in the cognitively normal group for bodily pain (P = 0.003), mental health (P = 0.03) and role emotional (P = 0.04). In test–retest studies on a further 67 patients, an intraclass correlation coefficient of 0.7 was attained for five out of eight dimensions in cognitively normal patients, and four out of eight dimensions in the cognitively impaired. Only the physical function dimension in the cognitively normal group attained the criterion level (r > 0.4) for construct validity when correlated with the FIM. In this group of older physically disabled patients, levels of reliability and validity previously reported for the SF-36 in younger subjects were not attained, even on face-to-face testing. Patients with coexistent cognitive impairment performed worse than those who were cognitively normal."

---------------------------------

From which I conclude: SF-36 should not be used in circumstances other than a formal study where participants have signed appropriate consents. It should not be used as a tool of individual patient assessment, and only used in studies where health outcomes are being assessed. There is concern about the appropriateness of SF-36 being applied to ME/CFS on grounds of conflation of mental health with physical health measures which is a source of ongoing contention in the study of ME/CFS, and on grounds of cognitive impairment impacts which have been consistently under recognised in ME/CFS patients and alternative measures of changes in the physical status of ME/CFS patients in treatment studies should be considered.

https://www.sciencedirect.com/science/article/pii/S1098301510755443

Simon M · Jan 16, 2023

SF-36 Physical Function subscale

I actually think the SF 36 scale provides the best available self-report measure of physical function (I'm not keen on the rest of it).

It is quick to complete and can be done online and by thousands of people without access to special kit. Plus, it's widely used and so allows comparisons between ME and many other chronic illnesses (almost always, ME/CFS comes out as worse than others). It is also effectively a 20 point scale (a hundred points in five point increments), which makes it recently sensitive (other subscales in SF 36 measure in large chunks only).

I found that as my physical functioning has varied over time, it’s done a pretty good job of tracking the change.

Its chief conceptual drawback is it asks you what you think you’re capable of, not what you do. But even so, I think most people have a very good idea if they can are limited in climbing a set of stairs or not, or walking 100 yards.

Obviously, like every other self-report scale, it’s unsuitable for use as an outcome measure in non-blinded trials. In particular, it's easy to game the score on a couple of questions depending on how optimistic you feel, producing a swing of 5 to 10 points out of 100. Curiously, this is exactly the 'improvement' found by the Pace trial. Though for many questions (can you walk a mile, play a round of golf), nothing is going to change my answer.

Plus, I'm not sure what would be better than this amongst self-report scales of functioning.

SNT Gatchaman · Jan 16, 2023

Agree with others' points above. A few responses, probably duplicating.

TigerLilea said:
Personally I think these types of questionnaires are meaningless. Limiting questions to "the last 4 weeks" for someone who has had ME (or whatever health condition) for years or decades doesn't provide accurate information.
...

Cut down the amount of time you spent on work or other activities
Yes No

For this I would answer "No". The reason being, I had to cut down the amount of time I spent on work or other activities back in 2021. But by answering "no" to this question, it leaves the impression that I'm able to work or engage in my regular daily activities.

I've always answered that as "Yes" as in "During the last 4 weeks I continue to cut down the amount of time I spent on work/activities compared to before I was chronically ill". I think that should be clarified in the question, as @TigerLilea's answer is also quite logical and precise. I doubt they're looking for recent deterioration, as question 2 is "Compared to one year ago, how would you rate your health in general now?" but maybe. Either way, Q should be clarified or results won't be meaningful.

I answer "Yes" to Q13-16 (physical) and "No" for Q17-19 (emotional). Appreciate some could have impacting comorbid depression etc. Seems a little odd that after those 7 Qs, Q20 (social deterioration) is assessed with a single combined physical or emotional cause.

Not sure of the benefit of four questions which are asking much the same thing: Qs 23, 27, 29, 31 ("full of pep", "lot of energy", "worn out", "feel tired").

Q34 and Q36 also oddly duplicated: "I'm as healthy as anyone I know" and "my health is excellent". Though in theory, you could be in a rest home and (depending on your caregiver) answer both "yes" and "no"!

Q35 (expect health worsening) is also a meaningless question for ME when biology unknown, unless the questioner is all-in on the BPS model. @Jaybee00's comment about the patient recognising progressive ME noted, though.

Jonathan Edwards said:
Either you elieve patients are you don't. If you are using a subjective scoring system like this (for a blinded trial) then what matters is what the patient really thinks their status is.

What is the purpose of using SF-36 in a study? If it's for inclusion/exclusion then perhaps we really only need a couple of subjective answers to identify PEM and variable energy reduction +/- orthostatic intolerance, cognitive dysfunction. Maybe even "Do you get PEM?" is sufficient. If it's to assess subjective response to therapy, do we need much more than something like the Bell scale and maybe some questions around relevant symptoms?

Trish · Jan 16, 2023

I think there's quite a lot of talking at cross purposes on this thread. Some are talking about SF-36 Physical Functioning which is scored on whether you can do 10 activities with great difficulty, some difficulty or no difficulty, and gives a score from 0 to 100. This is what was used in PACE and lots of other ME/CFS CBT/GET etc trials.

Then there is the whole SF-36 which includes several other sets of questions related to mental health and other aspects of life. I haven't come across that whole scale being used in clinical trials, I think.

DokaGirl · Jan 17, 2023

Sean said:
Not just the threshold, but also the degree.

I strongly suspect that the process that causes PEM is always present and active, just with varying degrees of sensitivity and effect, both between patients, and for each patient over time.

I would even suggest that PEM is ME, for all practical purposes. Or at least the primary expression of it.

I think your idea that PEM is ME is interesting. I can't recall for sure, but I think some pwME feel not bad until they plunge into PEM.

On the other hand, I feel dreadful all the time, always have with ME, and then gradually slide into PEM which pretty much immobilizes me on the couch for a day, or two or more.

Once in a while I plunge into PEM within a few minutes, but the usual is a slower process - as in yesterday I went to town, my husband drove, we did a few hours of errands, and the next day I was pretty much couchbound all day, and maybe the next day and so on, and so on.

Sean · Jan 17, 2023

the introductory paragraph makes it clear that the purpose of the 36 SF is to facilitate a measure of how treatment meets the patients' expectations and needs.

Now that is interesting.

Simon M · Jan 17, 2023

Jonathan Edwards said:
This has come up in the context of the Research Strategy Working Group.
I would be interested to know the views here.

It would be helpful to know the purpose of using the sf-36 scale. and apologies if this is posted earlier, I haven't been able to read the whole thread.

Jonathan Edwards · Jan 17, 2023

Simon M said:
It would be helpful to know the purpose of using the sf-36 scale.

@Simon M ,
The research working group was considering outcome measures for ME trials. Someone asked what was the view on SF36 n that context. Clearly there are lots o questions about what the trial is designed to show etc. but I think SF36 was raised in the context of a measure of 'function' or 'ability to do things you want to do' - perhaps with the assumption that only relevant parts of the questionnaire would be used.

SF-36 - a discussion

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Administrator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)