Assessing Functioning in adolescents with Chronic Fatigue Syndrome: Psychometric properties and Factor Structure of SSAS & SF36 PF, 2020, Loades

Dolphin

Senior Member (Voting Rights)
Free full text:
https://purehost.bath.ac.uk/ws/port...6_Psychometric_Properties_BCP_R2_Complete.pdf

Title
Assessing Functioning in adolescents with Chronic Fatigue Syndrome: Psychometric properties and Factor Structure of the School and Social Adjustment Scale and the Physical Functioning Subscale of the SF36


Loades, M.E.1,2, Vitoratou, S.3, Rimes, K.A.4,& Chalder, T.4,5.

Affiliations
1Department of Psychology, University of Bath
2Bristol Medical School, University of Bristol
3Psychometrics & Measurement Lab, Department of Biostatistics and Health Informatics, King’s College London
4King’s College London
5South London & Maudsley NHS Trust

Corresponding author contact details Maria Loades, Department of Psychology, University of Bath, Bath, BA2 7AY, England. Email m.e.loades@bath.ac.uk (+44) 01225 385249; BA(Cantab), DClinPsy

Abstract

BACKGROUND:

Chronic fatigue syndrome(CFS)has a major impact onfunctioning.However, no validated measures of functioning for this population exist.

AIMS:

We aimed to establish the psychometric properties of the 5-item School and Social Adjustment Scale (SSAS) and the 10-item Physical Functioning Subscale of the SF-36 in adolescents with CFS.

METHOD:

Measures were completed by adolescents with CFS (N = 121).

RESULTS:

For the Physical Functioning Subscale, a two-factor solution provided a close fit to the data. Internal consistency was satisfactory. For the SSAS,a one factor solution provided an adequate fit to the data. The internal consistency was satisfactory.Inter-item and item-total correlations did not indicate any problematic items and functioning scores were moderately correlated with other measures of disability, providing evidence of construct validity.

CONCLUSION:

Both measures were found to be reliable and valid and provide brief measures for assessing these important outcomes.Henceforth,we recommend that the Physical Functioning Subscale be used as 2 subscales in adolescents with CFS.

Keywords:physical, academic, functioning, social, CFS, adolescents
 
Where are the actometers or other objective measures of activity?

ETA: oh, they did include one measurable activity: 5 sit to stand manoeuvres, started from a seated position in a chair!

"The speed of completion is used as a measure of physical strength"
 
Last edited:
I have no idea what's going on here. "We've never done this" thing that Chalder and her colleagues have done countless times? Nobody has ever given those psychometric questionnaires to this patient population? Literally part of the standard Chalder questionnaire set for several decades and, no, not valid measures of much.

There better be some brain damage to explain this nonsense at some point because it looks completely absurd to go around in circles for this long without some explanation.
 
My knowledge of stats is far too rusty to get my head around this, but it looks to me like some stats whizkid has tried to 'analyse' the data to 'prove' reliability and validity on the false assumption that this is linear data they are dealing with.

I hope someone more up to date can clarify.
 
My knowledge of stats is far too rusty to get my head around this, but it looks to me like some stats whizkid has tried to 'analyse' the data to 'prove' reliability and validity on the false assumption that this is linear data they are dealing with.

I hope someone more up to date can clarify.

I'm currently reading a book that should help with just that: Streiner and Norman - Health Measurement Scales.

It looks like they are trying to confirm whether the design of the tests is adequate, rather than whether they are appropriate for use on a particular population.

That emoji looks like a winking biscuit.

:rofl: It was supposed to be me "peeking through fingers".
 
Where are the actometers or other objective measures of activity?

ETA: oh, they did include one measurable activity: 5 sit to stand manoeuvres, started from a seated position in a chair!

"The speed of completion is used as a measure of physical strength"

Useful test. (If you're studying 90 year olds on a dozen meds, with dementia and multiple medical comorbidities.)
 
My knowledge of stats is far too rusty to get my head around this, but it looks to me like some stats whizkid has tried to 'analyse' the data to 'prove' reliability and validity on the false assumption that this is linear data they are dealing with.

I hope someone more up to date can clarify.
What do you mean with the false assumption that this is linear data they are dealing with?
 
What do you mean with the false assumption that this is linear data they are dealing with?
Stats tests like Pearson's correlation coefficient that they used assumes the data is measured on a linear scale, like height. Something like SF-36 physical functioning is not linear. So the ten point difference between being able and not able to do vigorous sports cannot be directly compared with the 10 point difference between being able to climb one flight of stairs and not. And having some difficulty on two items is not the same as having a lot of difficulty on one item, but they get the same numerical scores.
 
Stats tests like Pearson's correlation coefficient that they used assumes the data is measured on a linear scale, like height. Something like SF-36 physical functioning is not linear. So the ten point difference between being able and not able to do vigorous sports cannot be directly compared with the 10 point difference between being able to climb one flight of stairs and not. And having some difficulty on two items is not the same as having a lot of difficulty on one item, but they get the same numerical scores.
Thanks for explaining!

But isn't that the case then for all questionnaires that add scores to different questions?

In my view, it makes sense that the ten points to one question mean something different than the ten points of another question on the same scale. Otherwise, you would have almost an all or nothing situation with ceiling effects: as soon as somebody has significant issues with physical functioning, he would score 10 on almost all the questions. If you want to differentiate patients (or healthy controls) based on severity, then it makes sense that each item has to mean something different and that's harder to score to 10 points on one versus the other.

I think that if you ask patients to rate their physical functioning on a sort of visual analogue scale with scores going from 0 to 10 that you basically face similar problems. Because people have to interpret the scores themselves and make their own baseline comparison: for a modest person a score of 5 will mean something different than for a person known to state things strongly. I think one could even assume that the difference between a 9 to a 10 score will be different than between a 5 to a 6 for example because people might be reluctant to take the maximum score on the scale.

You take the example of height, but that's also a perfect measurement where each extra centimetre reflect an increase in height because that's what height is. So basically I think this isn't so much a statistical issue or a question of how to analyze the data, but about measurements of physical functioning being imperfect. Sometimes a higher scores on the scale will not always reflect an equal increase in actual physical functioning - hard to get around that.
 
Last edited:
I agree that questionnaires are by their nature not measures on linear scales, but that has implications for what sort of statistical testing is valid.

Some statistical testing is based on using measures that have a 'normal distribution' (symmetrical about the mean, and tailing away symmetrically on both sides). SF36 is a highly skewed distribution, with the majority of the population scoring between 80 and 100, and the sick, disabled and elderly forming a long tail down to zero.

That makes measures like mean and standard deviation, and statistical tests based on them, invalid, as I understand it, but they keep using them, with ridiculous consequences like Trudie Chalder claiming large numbers of ME patients returned to 'normal', and using some multiple of SD below the mean as a measure of 'normal'.
 
The statistical tests they've used here are highly specific to the testing of particular aspects of psychometric scales while they are being designed. They're not testing how these scales are going to be subsequently used. It makes sense to test them on an adolescent population with CFS if that's what they are going to be used on, but they haven't then tested to make sure that the results they get from that population are comparable with other populations (as far as I can see).

Linearity is going to be a problem under some circumstances, but it depends how the test is ultimately used.

Although they've looked at whether the tests here (SF36 and SSAS) are good proxies for more objective tests, they haven't then assessed whether they are still good proxies once an intervention (that might change how someone completes the test) is done (ie, the intervention changes the responses on the test without changing the underlying thing that the test is supposed to be measuring). That, for me, will be the key factor. But it's not really discussed, either here or in the textbook I'm reading. I can only presume that it is so fundamental to have asked that question, that's why it's not addressed. Or it really is the "elephant in the room".
 
Some statistical testing is based on using measures that have a 'normal distribution' (symmetrical about the mean, and tailing away symmetrically on both sides). SF36 is a highly skewed distribution, with the majority of the population scoring between 80 and 100, and the sick, disabled and elderly forming a long tail down to zero.
Yes, but I think that's only if you use the scale in the healthy population and if you look at a patient group such as ME/CFS that you get a crude approximation of a normal curve.

Below is a graph of the SF-36 scores of the PACE-trial at baseline with a normal curve with the same mean and standard deviation written over it in red. This isn't the best example because you can see from the graph that they used a score of 65 or lower as inclusion criteria.

upload_2020-2-23_14-6-28.png
From what I understood, many of the statistical tests used to compare means are rather robust to such deviations from a normal curve because they assume the sampling distribution rather than the actual data to have a normal distribution. So In short, based on my limited statistical knowledge, I don't think Chalders team is making a major statistical error in such cases.
Linearity is going to be a problem under some circumstances
Could you give an example of this?
 
Back
Top Bottom