DMissa
Senior Member (Voting Rights)
But my real point is that a test that differentiates is not actually what we are looking for.
Clinical criteria change with the years, with borders, and with egos (to say nothing of the challenges now apparently introduced by pandemics). Having something reproducible and objective as a bulwark against these aspects, and as a clear signpost of legitimacy as a silver bullet to skeptics is incredibly important imo.
Specificities and sensitive are not important.
They may be, especially if this corresponds to differing clinical presentations within the particular ME/CFS population (eg: symptoms or onset patterns). What if we had two tests that produced outcomes residing within useful percentiles of each population as you suggest with the one metabolite example, but one of them has a much greater overlap with people without ME/CFS while possessing a greater fidelity for distinguishing people with ME/CFS of X or Y clinical pattern? A combined or staged protocol using both measurements with the specific utilities of each in mind has obvious value. One is better for filtering out people without ME/CFS and one is better for confirming a precise ME/CFS diagnosis. It may be more economical or more practical to proceed with only one test and then the other when more detail is needed. These statistics inform that process.
in general terms diseases are not as obscure as that
Who is to say that we are dealing with a disease that is best described in general terms?
I understand the general (not specifically this paper) criticisms of machine learning or taking too many concurrent and individually meaningless measurements and applying them to diagnostic applications. Especially given the randomly generated dataset shown earlier in the thread (useful and impressive demonstration by @chillier btw, love it). I actually agree with many of these critical ideas in a broad sense. This applies to my own past work; I am not coming from a place of defensiveness whatsoever. I think the problem is in the language. Blanket language is dangerous. Generalised criticisms, delivered from a position of expert authority, are dangerous. These threads are very public. There are varying degrees to the issues raised here (and earlier in the thread) and they interact with different studies variably. Yes, this stuff can be overused or sometimes spurious. But equally not all studies using machine learning will be producing throwaway fluff from thousands of otherwise meaningless measurements. Not every inclusion of sens/spec is a cheap way to put a "score" on a throwaway biomarker paper for clout. I think that the criticism of these approaches needs to be very carefully communicated so as not to herd less knowledgeable community members into a habit of rejecting things upon recognition of superficial red flags in an abstract. The fine details of each case at hand must always, always, always be of chief importance.