Concerns about composite reference standards in diagnostic research, Dendukuri et al, 2018

cassava7

Senior Member (Voting Rights)
Composite reference standards are used to evaluate the accuracy of a new test in the absence of a perfect reference test. A composite reference standard defines a fixed, transparent rule to classify subjects into disease positive and disease negative groups based on existing imperfect tests. The accuracy of the composite reference standard itself has received limited attention.

We show that increasing the number of tests used to define a composite reference standard can worsen its accuracy, leading to underestimation or overestimation of the new test’s accuracy. Further, estimates based on composite reference standards vary with disease prevalence, indicating that they may not be comparable across studies. These problems can be attributed to the fact that composite reference standards make a simplistic classification and then ignore the uncertainty in this classification.

Latent class models that adjust for the accuracy of the different imperfect tests and the dependence between them should be pursued to make better use of data.

Summary points
  • Composite reference standards define a fixed, transparent rule to classify subjects into disease positive and disease negative groups based on existing imperfect tests

  • They are widely regarded as appropriate for determining sensitivity and specificity of a new test in the absence of a perfect reference test

  • Though a composite reference standard is attractive for its simple and transparent construction, it can result in biased estimates as it makes suboptimal use of data

  • Bias due to a composite reference standard can worsen as more information is gathered and the new test’s accuracy can be overestimated if the errors made by the composite reference standard and the new test are correlated

  • Composite reference standards cannot aid standardisation across settings when disease prevalence varies

  • Appropriately constructed latent class models should be used to make complete use of the information gathered from multiple imperfect tests
https://www.bmj.com/content/360/bmj.j5779
 
They give an example of a composite reference standard:
a composite reference standard defined using data gathered in a South African cohort study of symptomatic children.6 It classifies a child who is positive on culture, smear microscopy, chest radiography, or the tuberculin skin test as having TB. The apparent advantage of this composite reference standard is that it would identify more TB cases than culture alone. Studies using a standard such as this typically treat it as an error-free reference test to estimate the new test’s sensitivity (proportion of all patients with the disease that are correctly detected by the new test) and specificity (proportion of all patients without the disease who are correctly detected by the new test).37

We focus on composite reference standards based on the OR rule, which classifies patients with at least one positive test as disease positive and those with all negative tests as disease negative.3 Other possible composite rules include the AND rule, which classifies a patient as disease positive only if all tests are positive, or K positive rules, which classify a patient as disease positive only if at least K tests are positive.12

So, I think we could classify the current ME/CFS diagnostic criteria each as a composite reference standard - PEM is present; AND activity levels substantially reduced for more than 6 months; AND fatigue is present etc. The multiple requirements make for a better diagnostic criteria than if there was just one test e.g. the presence of persisting fatigue. We have seen the debates about whether adding in more requirements e.g. for having sore throats or orthostatic intolerance makes the standard better or worse - for example, whether the simple IOM criteria or more complex criteria such as the CCC are better.

The drawbacks of the composite reference standard can be overcome using a statistical modelling approach called latent class analysis.3 Instead of classifying subjects into fixed disease categories, latent class analysis estimates the probability that each patient has the disease using all observed tests, including the test under evaluation (web figure 2). It adjusts for the sensitivity and specificity of each test as well as the possibility of conditional dependence between them. Simply put, latent class analysis considers how certain we are about classifying patients into diseased or non-diseased groups rather than making a black and white decision.

Composite reference standards may be considered clinically meaningful17 as they resemble clinical decision rules, which classify patients into mutually exclusive categories to support decision making—for example, rules identifying whether a subject is a candidate for TB treatment. Such decision rules are not necessary in research settings as no black or white decision needs to be made. Clinical decision rules might indicate the best possible management strategy, but are recognised by clinicians as imperfect.16 Yet similar rules are used to define composite reference standards for a diagnostic accuracy studies with no such recognition.

In the absence of a perfect reference test, a new test could be evaluated in terms of outcomes such as diagnostic yield or effect on patient management instead of accuracy.18 Latent class analysis would also be relevant in such analyses to estimate percentage of overdiagnosis or overtreatment,619 eventually supporting the development of optimal clinical decision rules.

I'm not sure that I completely understand what is being proposed here. But, given the uncertainties in ME/CFS diagnosis, I like the idea of researchers working with the signs and symptoms of individuals (with and without an ME/CFS diagnosis), using unsupervised grouping techniques to see how much a proposed biomarker adds to, or confirms, the clear separation of groups.

wikipedia said:
Latent class analysis (LCA) is a subset of structural equation modeling, used to find groups or subtypes of cases in multivariate categorical data. These subtypes are called "latent classes"

@cassava7, what are your thoughts about the paper in the context of ME/CFS?
 
Back
Top Bottom