Assessing Functioning in adolescents with Chronic Fatigue Syndrome: Psychometric properties and Factor Structure of SSAS & SF36 PF, 2020, Loades

Discussion in 'Psychosomatic research - ME/CFS and Long Covid' started by Dolphin, Feb 22, 2020.

  1. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    I mean *lack of* linearity is going to be a problem. The authors themselves noted that the SF36 PF subscale split into 2 distinct factors. As the items within each factor group are fairly well correlated, they will have to assume that each score on each type of factor will correlate/match at least on severity. You don't want someone scoring 20 on one set of factors being equivalent to a score of 40 on the other set. But I don't know how you would account for that without there being a set of standard objective measures you can test that against.

    And then there's linearity of scale. Is the difference between a score of 10 and 20 the same as the difference between 80 and 90?

    If these scales are simply used as a rough idea of how disabled someone is, or as a summary measure of a population, then there's not so much a problem. However, if you are using them to make direct comparisons between people, then there might be, because one person's score may not be directly equivalent to another's.
     
  2. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    It seems that these are problems of all questionnaires, not only the ones used here (and probably of other outcome measures as well). I suspect it's difficult to get around this problem. You can't measure physical function directly so one will have to use questions and a scoring system that approximates it as good as possible.
     
    Invisible Woman and Lucibee like this.
  3. Trish

    Trish Moderator Staff Member

    Messages:
    55,414
    Location:
    UK
    There is also the potentially dramatic effect of persuasion. All it needs is for the therapist to persuade people to interpret differently the level of difficulty they have with each item on the SF-36 scale.

    Take someone with mild to moderate ME who says they have some difficulty with half the items (5x5), and a lot of difficulty with the other half(5x0). Their score is 25.

    Give them a course of therapy that persuades them that what they are experiencing is normal aches and pains and tiredness, perhaps comparing themselves with people who are bedbound, and getting them to focus on how much more they can do than that.

    Persuade them that their 'some difficulty' is normal and should be classed as no difficulty (5x10) and their idea of a lot of difficulty is exaggeration, and is really just some difficulty (5x5). Their score is now 75.

    Miracle cure!

    As Lucibee says, as a general indicator of population and individual levels of disability, SF-36 is useful. It is also useful as an indicator of disability levels between different patient populations, and the full SF-36 with all the other things like social and emotional functioning is useful in homing in on where and individual or patient group's main area of difficulty lies.

    But using individual patients' changes in SF-36 PF over time, such as in a clinical trial, is fraught with traps, which as far as I'm concerned means all the fancy statistical analyses in the world won't make it a reliable or valid measure of how ill or how disabled someone with ME is and whether the treatment has been effective. It's far too subjective.
     
  4. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    I don't think I agree with this. The problem you sketched is mostly a problem of clinical trial design where you have to control for other factors that might influence how participants fill in the questionnaire, though blinding and an adequate control condition etc. But in a standard blinded RCT with a drug and placebo group, this shouldn't be an issue.

    I think I prefer something like the SF-36 PF subscale because it asks patients directly what (specific) activity they can or can't do, which I think will result in less subjective and more reliable answers than if you ask them to rate the severity of a symptom or impairment on a larger scoring scale. The SF-36 PF subscale is also not too long and therefore easily interpretable. There is this recent trial on intranasal mechanical stimulation, where the authors reported improvements on a ME symptom rating scale. But that's a scale that asks about multiple symptoms and where a 5-degree scale from 0-4 (none, light, moderate, severe, very severe) is used. That makes the result more difficult to interpret and the issues of non-linearity you sketched above, probably even more problematic.

    In short, I don't see the problem with using the SF-36 as an outcome measure in treatment trials for ME/CFS or to measure disability compared to other patient groups.
     
  5. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,860
    Location:
    Australia
    Yes, the key issue of lack of linearity is if you were to say, worsen slightly overall, despite improving on a particular question, you may end up getting the same or an improved score despite the worsening. This means that overall, the set of questions do not fulfil the requirements of being a scale.
     
  6. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    Perhaps things would be clearer if you could give an example of an outcome measure that that doesn't have this problem because I still don't seem to get it.

    If you ask patients to rate their physical functioning on a scale from 1-10, patients could be making the same consideration as you sketched above. They could reflect on how most aspects of their physical functioning (for example walking, lifting things etc.) got worse but that one aspect got a lot better (for example getting up from bed) and so give a score that is the same or an improvement, despite worsening on most aspects of physical functioning.
     
  7. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,860
    Location:
    Australia
    Some are much less likely to have this problem, but regardless, the point is that PROMS used on their own lead to problems of interpretation in prospective studies (including clinical trials).
     
    alktipping and ME/CFS Skeptic like this.
  8. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    I suppose that short questionnaires that focus on a particular issue (not overall impairment) and don't have large scoring ranges would be better in this regard.

    I agree it's often better to have a combination of objective measurements and questionnaires of what you want to measure, but that makes trials also more costly and difficult to do.

    Suppose a researcher has some reason to think drug X will provide symptom relief for brain fog in ME/CFS and he wants to do a small blinded RCT to test it. In such cases, I think a relatively short questionnaire that asks patients about specific cognitive issues would be the preferred primary outcome measure. If the questionnaire hasn't got any notable issues (like questions that don't make sense or ceiling effects) I don't think it would cause many problems of interpretation, to be honest. Given the many things that can go wrong with such studies, the imperfect linearity of the scale is probably going to be low on the list of things to worry about.

    I write this as someone who has no particular expertise in this subject and who would like to know more about it, so apologies for my frankness.
     
  9. Trish

    Trish Moderator Staff Member

    Messages:
    55,414
    Location:
    UK
    The example you gave where this scale might be a useful measure of change with treatment was a double blinded trial of a medication. I agree subjective measures like this can be useful in that context because the blinding and lack of psychological persuasion mean the scores are more likely to be consistent. (And it's certainly more useful as a measure of ME severity than the ridiculous CFQ).

    But that's a world away from the situation where it is usually used for ME - unblinded psychological trials. The fact that Chalder and co, some of the worst offenders in this regard, are using fancy stats here to pretend they have proved the measure is reliable and valid without giving the contexts in which this might be true is troubling.

    I contend that any claim to reliability and validity in the context of the trials that group carry out is just plain wrong. The reliability goes out the window when persuasion is involved.

    Objective measures like employment/school attendance, 2 day CPET, actometers, fitness and cognitive tests, tests of stamina. etc etc.
     
    Last edited: Feb 23, 2020
    Amw66, alktipping and ME/CFS Skeptic like this.
  10. Sly Saint

    Sly Saint Senior Member (Voting Rights)

    Messages:
    9,922
    Location:
    UK
    I don't see why they don't use standard physical fitness tests. Particularly as they are so certain that exercise can't cause any harm.

    eg I found this study
    Reliability of health-related physical fitness tests in
    European adolescents. The HELENA Study
    https://s3.amazonaws.com/academia.e...5a5d21f377334f6f0fbee8f3d5960f353980f3c531292

     
    MEMarge and ME/CFS Skeptic like this.
  11. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    I think we fully agree on this Trish, it's just that I would label these problems as issues with trial design (bias) rather than the reliability or validity of questionnaires.

    I think that the term reliability has a specific meaning in this context, namely that if the same patient fills in the same questionnaire, he gets more or less consistent results, otherwise the questionnaire is not considered reliable. It's also useful to know how the singular questions of the questionnaire correlate with each other and with other, more objective outcomes measures. Those are basic properties of questionnaires that need to be checked and could be useful for future researchers who want to test drug therapies in ME/CFS. In short, I didn't interpret this study as an attempt to claim that the results found by this group in randomized trials are reliable - cause that's another question, one about bias and trial design.

    I think these have the same issues we were discussing.

    The hours of employment might decrease without the patient doing significantly worse (they could, for example, be doing more unpaid work).

    Cognitive tests generally correlate poorly with the cognitive problems that patients report (I tend to believe patients more than the tests).

    Many ME/CFS patients have relatively normal CPET (a stamina test) results despite being severely ill. The 2-day CPET studies only show a consistent decline for workload at the ventilatory threshold and the studies are too small to say this is robust (or what it actually means) while most other measurements (like VO2 or maximal workload) look relatively normal.

    Actimeters have a lot of problems too: are patients wearing them consistently, are these influenced by simple wrist movements etc. etc.
     
    Theresa and Trish like this.
  12. Snow Leopard

    Snow Leopard Senior Member (Voting Rights)

    Messages:
    3,860
    Location:
    Australia
    One of the points Trish and I are trying to make is that the reliability of these questionnaires tested outside the context of prospective studies does not indicate their reliability within the context of a prospective study.

    This is not merely an issue of trial design, but a problem with patient rated outcome measures themselves.

    CPETs are not stamina tests, having said that, many participants (in my opinion) don't reach a true VO2Max on the tests, indicated by relatively low maximum heart rates for their age. VO2Max itself is a simply measure of how much oxygen can be delivered to the muscles and therefore a measure of cardiovascular fitness. The maximal workload on CPET tests is below 50% of the amount of power that participants can put out maximally for 8 seconds (maximal voluntary contraction), nor even 30 seconds on a bike (Wingate test) on a bicycle. Which is to say VO2Max occurs well below maximal muscle drive. Given this knowledge, it should not be surprising that VO2Max itself is not a useful measure for the illness.

    http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-879X2015000300261
     
    Last edited: Feb 23, 2020
    MEMarge, ME/CFS Skeptic and Trish like this.
  13. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    Apologies in advance for being difficult, but I think this helps in understanding things.
    But how would you test the reliability of an outcome measure over a long period of time: if you get a different score how can you know if the difference is due to an unreliable questionnaire or an actual change in the patient's condition? Are there other outcome measures than PROMS that have been tested and shown to be reliable in this way?

    I suspect that the trial design typical of GET/CBT studies also distorts observer-reported outcomes or things like the 6-minute walking test. It's not solely an issue of PROMS. And when bias is properly controlled for in a blinded RCT I see little reason to think that the prospective reliability of PROMS is an issue compared to other outcome measures.
     
  14. Trish

    Trish Moderator Staff Member

    Messages:
    55,414
    Location:
    UK
    Isn't that why the PACE researchers scrapped the end of trial actigraphy - they had found out from other studies that the patients reporting being able to be more active on SF-36 were actually not any more active.
     
    MEMarge, Sean, alktipping and 2 others like this.
  15. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,002
    Location:
    Belgium
    But in that case, patients were actively encouraged to interpret and their symptoms differently, so they were like primed to fill in the questionnaire differently.

    I think if you were to do a simple prospective study with both actimeters and the SF-36 SF subscale, it would be rather difficult to determine reliability one based on the other. If there was a significant divergence I would doubt which of the two is the most reliable measure.

    It's like the cognitive tests: these sometimes correlate poorly with patient reports, but in my view, that doesn't mean that the patient reports are unreliable.
     
    Trish likes this.
  16. Trish

    Trish Moderator Staff Member

    Messages:
    55,414
    Location:
    UK
    When this sort of argument is made, I always think of the example of the asthma study:
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351653/
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351653/figure/F2/
    It's a great pity we don't have an easily administered accurate objective test as in asthma, but until we do, I don't think we can take for granted the suggestion that any questionnaire is reliable.

    Anyway, I'll bow out of this now. Thanks for an interesting discussion. I think we basically agree that ideally objective tesing is best, but will have to agree to disagree about how reliable SF-36 is likely to be for ME studies, blinded or not.
     
  17. Lucibee

    Lucibee Senior Member (Voting Rights)

    Messages:
    1,498
    Location:
    Mid-Wales
    The first step is to be aware that there is a problem. But most studies won't even acknowledge that.

    Saying, "we don't have anything better, so they'll just have to do" is not good enough.

    I'm not against them being used at all, but I do think they should come with much, much stronger warnings about how their use under certain circumstances may affect the interpretability of a study.

    Warnings such as, "this measure may correlate well with more objective measures at baseline, but not as an outcome measure", should be setting off alarm bells as to why that may be, and that maybe the interventions used are affecting the measurement tool (the patient themselves) in ways that are unanticipated and haven't been controlled for.

    If all the sphygmomanometers in one arm of a trial on a blood pressure treatment were being recalibrated, while the other arm was being left alone, you wouldn't hesitate to declare that trial to be flawed. Yet we accept that happening routinely in psychological trials because "that's the way it's always been done."

    Granted, we don't know how much an intervention that aims to change a patient's perception of their symptoms by endorsing a positive spin will affect how they will modify their SF36 answers at endpoint, but even if it is only a small amount, that's important, particularly if it means they are no longer reliably reporting how their symptoms affect them. But standard validity and reliability testing is not going to tell you that.
     
    Amw66, rvallee, Sly Saint and 8 others like this.

Share This Page