Preprint: A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program 2023 Lorman et al

Discussion in 'Long Covid research' started by Andy, Jan 5, 2023.

  1. Andy

    Andy Committee Member

    Messages:
    22,407
    Location:
    Hampshire, UK
    Abstract

    Background
    As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data.

    Methods and Findings
    In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS-CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values.

    Conclusions
    The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.

    https://www.medrxiv.org/content/10.1101/2022.12.22.22283791v1
     
    RedFox, Peter Trewhitt and Trish like this.
  2. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,007
    Location:
    Canada
    I genuinely don't have a clue what this is even about, since they're checking their own filtered data and have nothing to compare to, it's circular checking. Basically they're checking what is checked. So far the 3 (by my count) published as part of the NIH RECOVER program have been useless at best and seem built to influence, not actually research anything. This is research in the same way as looking at explorers' maps and notes is exploration.
     
    alktipping and Peter Trewhitt like this.
  3. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,007
    Location:
    Canada
    Thinking further, they are making a massive mistake of single snapshot in time. They think of symptoms as fixed and permanent, that patients either develop a symptom or not, and when they do, it's a linear thing.

    I've continued to read LC forums every day. By now I've read tens of thousands of comments. It's impossible to miss the fluctuations and variations, not just between individuals but for individuals as well. The 2nd Body Politic study was pretty good at showing how some symptoms are staggered in time, and how they also had variations.

    Some symptoms are more common early on, while some individuals may develop them months later. They can improve then worsen, sometimes with but not always with a clear trigger, almost always exertion. Triggers are really important, especially as some cases develop after an nth infection, but also because they clearly influence, you can easily see how common reports are of temporary improvement in symptoms from having a minor cold, or relapses and even some rare improvement from vaccines. Those are events in time, it needs to be factored in.

    This has to change, it's ruining everything. Time is a major dimension and it's ignored completely other than in waiting for this to all blow over. There needs to be continuity at the individual level data, using random groups like this is not only massively wasteful, it loses almost all information. People need to be followed over time, single-use cohorts are a complete waste of efforts.

    This has been attempted from the start: clusters of symptoms. But the symptoms change, and most of the studies are not capturing those changes, asking the same questions at different points in time will give different answers, simply because reality changes with time. This is something where they seriously need to involve engineers and other experts specialized with differential analysis.

    Not sure I'm using the right term here but there is a type of mathematical analysis that works great on data that change over time, where instead medicine usually uses statistical analysis that ignores time, considers features permanent. There possibly even aren't actual clusters, just probabilities. It sucks but this is as big a difference as between classical mechanics and quantum mechanics. There is a fundamental uncertainty that simply cannot be worked around, it has to be taken into account. There has to be more than plan A.

    And of course RECOVER has no mechanism for patient engagement with the public so I have no idea where to even take that. Everyone needs to change that but if at least someone can just stop fooling around and do useful stuff, anything useful at all, others will follow, if not by the sheer shock of being confronted with something that isn't entirely useless. All this is doing otherwise is contributing to the asymmetry of BS.
     
    oldtimer, Trish, Willow and 2 others like this.

Share This Page