Review Measurement Properties of the Patient Health Questionnaire–15 and Somatic Symptom Scale–8: A Systematic Review and Meta-Analysis 2024 Axelsson et al

Andy

Retired committee member
Key Points

Question What is known about the measurement properties of the Patient Health Questionnaire–15 (PHQ-15) and Somatic Symptom Scale–8 (SSS-8)?

Findings This systematic review and meta-analysis of 305 studies with 361 243 participants found that general and symptom domain–specific factors contributed to response patterns. The PHQ-15 (α = 0.81) and SSS-8 (α = 0.80) exhibited adequate internal consistency, but with redundant PHQ-15 items. Correlations with other scales generally supported construct validity; a difference of 3 or greater constituted a relevant change on both scales; and screening properties for the identification of somatoform disorders were suboptimal.

Meaning The findings of this study suggest that the PHQ-15 and SSS-8 can be recommended for assessment and monitoring of somatic symptom burden, but clinicians need to be aware that such scores reflect complex, multifactorial structures.

Abstract

Importance The subjective experience of somatic symptoms is a key concern throughout the health care system. Valid and clinically useful instruments are needed.

Objective To evaluate the measurement properties of 2 widespread patient-reported outcomes: the Patient Health Questionnaire–15 (PHQ-15) and Somatic Symptom Scale–8 (SSS-8).

Data Sources Medline, PsycINFO, and Web of Science were last searched February 1, 2024.

Study Selection English-language studies reporting estimates pertaining to factor analysis, taxometric analysis, internal consistency, construct validity, mean scores in relevant groups, cutoffs, areas under the receiver operating characteristic curves (AUROCs), minimal clinically important difference, test-retest reliability, or sensitivity to change.

Data Extraction and Synthesis Search hits were reviewed by independent raters. Cronbach α, Pearson r, means, and between-group effect sizes indicative of sensitivity to change were pooled in random-effects meta-analysis. Study quality was assessed using 3 instruments. Reporting followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 reporting guideline.

Main Outcomes and Measures Comprehensive overview of evidence pertaining to the measurement properties of the PHQ-15 and SSS-8.

Results A total of 305 studies with 361 243 participants were included. Most concerned routine care (178 studies) and the general population (27 studies). In factor analyses, both scales reflected a combination of domain-specific factors (cardiopulmonary, fatigue, gastrointestinal, pain) and a general symptom burden factor. The pooled PHQ-15 α was 0.81 (95% CI, 0.80-0.82), but with low item-total correlations for items concerning menstrual problems, fainting spells, and sexual problems (item-total correlations <0.40), and the SSS-8 α was 0.80 (0.77-0.83). Pooled correlations with other measures of somatic symptom burden were 0.71 (95% CI, 0.64-0.78) for the PHQ-15 and 0.82 (95% CI, 0.72-0.92) for the SSS-8. Reported AUROCs for identification of somatoform disorders ranged from 0.63 (95% CI, 0.50-0.76) to 0.79 (95% CI, 0.73-0.85) for the PHQ-15 and from 0.71 (95% CI, 0.66-0.77) to 0.73 (95% CI, 0.69-0.76) for the SSS-8. The minimal clinically important difference on both scales was 3 points. Test-retest reliability could not be pooled and was inconsistent for the PHQ-15 (PHQ-15: r = 0.65-0.93; ICC, 0.87; SSS-8: r = 0.996, ICC = 0.89). The PHQ-15 showed tentative sensitivity to change (g = 0.32; 95% CI, 0.08-0.56), but data for the SSS-8 were lacking.

Conclusions and Relevance In this systematic review and meta-analysis, findings supported use of the PHQ-15 and SSS-8 for the assessment of symptom burden, but users should be aware of the complex, multifactorial structures of these scales. More evidence is needed concerning longitudinal measurement properties.

Open access
 
At least moderate or high risk of bias for all studies:
m_zoi241321f2_1731448368.12998.png


Some indication of publication bias:
X0paB0a_d.webp


This might be a useful reference for MCID:
Two studies 187,340 presented estimates of the MCID, ie, the smallest difference of clinical relevance, on the SSS-8 and argued for an MCID of 3. One study340estimated that the MCID for the PHQ-15, although excluding 1 item, was 2.3.340

———

The scales’ accuracy in identifying somatoform disorders was only borderline acceptable. Test-retest reliability over 7 to 14 days was evaluated in a small number of studies and was found to be inconsistent for the PHQ-15. Tentatively, this parameter appeared adequate for the SSS-8, based on 2 relevant studies.301 ,315 The PHQ-15 showed preliminary evidence of being sensitive to change, but data for the SSS-8 are lacking.
Focusing on the identification of somatoform disorders, the screening ability of the PHQ-15 and SSS-8 was only borderline acceptable. We can also note that the widely used cutoff of 10 was close to the pooled mean score in many health care settings (Table 2). As with all clinimetric instruments, the choice of cutoff should be informed by its purpose—such as to identify all cases or to achieve the highest possible correct classification rate—and the setting in which it is used.346 Generally speaking, however, based on the existing evidence base, the sum scores on the PHQ-15, and tentatively also the SSS-8, appear to be of limited use for the identification of somatoform disorders.
I doubt this will stop their use for exactly that.

———

To enable reliable measurement of somatic symptom burden in routine care and clinical trials, instruments need to be able to detect change. It is also important to know what constitutes the smallest clinically relevant change. Based on the present study, there is tentative evidence that the PHQ-15 can detect change in somatic symptom burden caused by treatment. For the SSS-8, only 1 study187 of relevance for sensitivity to change could be identified. Even though the results were promising, further evaluation is warranted. Regarding the smallest difference in sum score considered clinically relevant, the results suggest 2.3 points (in practice, 3 points) for the PHQ-15 and 3 points for the SSS-8. In summary, the current evidence base is cautiously supportive of further use of the PHQ-15 and SSS-8 in the study of change in somatic symptom burden.
Additionally, most included RCTs indicative of sensitivity to change were at high risk of bias due to threats to the measurement strategy,36 in part caused by the lack of blinding, and it is possible that the association of CBT with changes in symptom burden may also reflect expectancy effects.
I believe the highlighted sentence above is directly contradicted by the limitations. If most studies have a high risk of bias, they can’t be supportive of anything. They are, at best, inconclusive.
 
Oh wow, did the skull shape caliper scale built around a specific morphology find that the specific morphology it was built to promote as superior find that different morphologies are different and therefore validated as being inferior? Shocker!

The most absurd thing is that unlike those biased questionnaires, measuring skull shapes is an actual objective measurement, although any interpretation remains as invalid as astrology. Measuring something has a precise meaning in science, a meaning that no questionnaire can ever reach. They rate, they don't measure anything. Their scales are not linear, so they can't even be properly called scales, and their construction has zero relevance to the physical world we live in. Scoring is also a fine term, which the authors recognize:
clinicians need to be aware that such scores reflect complex, multifactorial structures
No way, you're telling me that reducing complex multidimensional structures that can't be mapped out to a single digit on a small non-linear and arbitrary 'scale' may not reflect the complex multidimensional nature of that structure? Well, I nevah! Then why pretend with the conclusion that this is any valid?

It's absolutely scandalous that such a large number of studies have been conducted for something this ridiculously pseudoscientific.
 
Back
Top Bottom