Preprint A Proposed Explainable Artificial Intelligence-Based Machine Learning Model for Discriminative Metabolites for ME/CFS, 2023, Yagin et al

Discussion in 'ME/CFS research' started by John Mac, Jul 25, 2023.

  1. John Mac

    John Mac Senior Member (Voting Rights)

    Messages:
    959
    Full title:
    A Proposed Explainable Artificial Intelligence-Based Machine Learning Model for Discriminative Metabolites for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome

    Abstract
    Background:
    Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex and debilitating disease with a significant global prevalence of over 65 million individuals. It affects various systems, including the immune, neurological, gastrointestinal, and circulatory systems.

    Studies have shown abnormalities in immune cell types, increased inflammatory cytokines, and brain abnormalities. Further research is needed to identify consistent biomarkers and develop targeted therapies. A multidisciplinary approach is essential for diagnosing, treating, and managing this complex disease.

    The current study aims at employing explainable artificial intelligence (XAI) and machine learning (ML) techniques to identify discriminative metabolites for ME/CFS.

    Material and Methods:
    The present study used a metabolomics dataset of CFS patients and healthy controls, including 26 healthy controls and 26 ME/CFS patients aged 22-72. The dataset encapsulated 768 metabolites, classified into nine metabolic super-pathways: amino acids, carbohydrates, cofactors, vitamins, energy, lipids, nucleotides, peptides, and xenobiotics.

    Random forest-based feature selection and Bayesian Approach based-hyperparameter optimization were implemented on the target data. Four different ML algorithms [Gaussian Naive Bayes (GNB), Gradient Boosting Classifier (GBC), Logistic regression (LR) and Random Forest Classifier (RFC)] were used to classify individuals as ME/CFS patients and healthy individuals.

    XAI approaches were applied to clinically explain the prediction decisions of the optimum model. Performance evaluation was performed using the indices of accuracy, precision, recall, F1 score, Brier score, and AUC.

    Results:
    The metabolomics of C-glycosyltryptophan, oleoylcholine, cortisone, and 3-hydroxydecanoate were determined to be crucial for ME/CFS diagnosis. The RFC learning model outperformed GNB, GBC, and LR in ME/CFS prediction using the 1000 iteration bootstrapping method, achieving 98% accuracy, precision, recall, F1 score, 0.01 Brier score, and 99% AUC.

    Conclusion:
    RFC model proposed in this study correctly classified and evaluated ME/CFS patients through the selected biomarker candidate metabolites. The methodology combining ML and XAI can provide a clear interpretation of risk estimation for ME/CFS, helping physicians intuitively understand the impact of key metabolomics features in the model.

    https://www.preprints.org/manuscript/202307.1585/v1
     
    RedFox, Hutan, EndME and 3 others like this.
  2. Trish

    Trish Moderator Staff Member

    Messages:
    53,399
    Location:
    UK
    It looks like this is a data analysis project using metabolomics data from a repository. I can't open the supplementary material to find out the data source. It seems to be all female and probably diagnosed with ICC or CCC from the description in the discussion section.

    The sample was 26 each of ME/CFS and healthy controls from a wide age range. I don't know whether age and diet and time of day etc affect metabolite levels.

    I haven't attempted to follow all the details of what they did - way over my head. They used 80% of the sample to train the model and 20% to test it and claim 98% accuracy. I'm not sure how that works with such small numbers of patients.
     
    RedFox, Simon M, Hutan and 1 other person like this.
  3. Hoopoe

    Hoopoe Senior Member (Voting Rights)

    Messages:
    5,265
    It seems too small a sample for the number of metabolites.
     
    RedFox, Simon M, Sean and 3 others like this.
  4. dreampop

    dreampop Senior Member (Voting Rights)

    Messages:
    443
    In doing a little reading, I found this extremely useful table from a me/cfs metabolomic review article from 2021. It contains a list of abnormal metabolite findings of various studies.

    Table

    Cortisone, 3-hydroxydecanoate were low in the first study in the table and so was Linoleoylcholine (not oleoylcholine but probably related). That's from a Columbia study, here, which uses pretty good matching....but also the Fukuda criteria. Although, I'd like to think that author group were able to get a fairly accurate me/cfs cohort.
     
  5. Andy

    Andy Committee Member

    Messages:
    22,308
    Location:
    Hampshire, UK
    "ME/CFS Metabolomics Dataset

    The metabolomics data of CFS patients and healthy controls were utilized to perform the experiments in the study [2]."

    Reference [2] is Germain A, Barupal DK, Levine SM, Hanson MR. Comprehensive circulatory metabolomics in ME/CFS reveals disrupted metabolism of acyl lipids and steroids. Metabolites. 2020, our discussion thread for that is here, Comprehensive Circulatory Metabolomics in ME/CFS Reveals Disrupted Metabolism of Acyl Lipids and Steroids: Levine,Hanson et al 2020

    Selection criteria: "Out of the 52 female subjects who participated in this study, 26 were healthy controls, while the remaining 26 were established patients of a ME/CFS specialist in New York City (NYC), Susan Levine, M.D. All patients had a confirmed and rigorous diagnosis of ME/CFS according to the CDC criteria"
     
    Hutan and Trish like this.
  6. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,511
    Location:
    UK
    I don't really get how they can make claims with such a small sample set
     
  7. CRG

    CRG Senior Member (Voting Rights)

    Messages:
    1,857
    Location:
    UK
    Re: the claim for a 65 million global patient population, this figure via an earlier paper by the same authors is based (numbers not explicit) on this paper: Estimating Prevalence, Demographics, and Costs of ME/CFS Using Large Scale Medical Claims Data and Machine Learning discussed here: https://www.s4me.info/threads/estim...ne-learning-2018-valdez-proskauer-et-al.7279/

    Number bloat is a problem, the Estimating Prevalence paper would put the UK ME/CFS patient population at 500,000 yet DecodeME is struggling to find a twentieth of that. The 65 million figure has no material implication for the Metabolites paper but placing the headline figures into the wider academic discourse without referencing its contestability is problematic. There's also an issue of consistency - do the Hanson team in addition to accepting the high prevalence number also accept the low male/female ratio that the Estimating Prevalence paper evidences ? That surely has implications for any model of the illness assessed by metabolomics e.g: Sex differences in the human metabolome
     
    Last edited: Jul 26, 2023
    Michelle and Sean like this.
  8. Sly Saint

    Sly Saint Senior Member (Voting Rights)

    Messages:
    9,626
    Location:
    UK
  9. Andy

    Andy Committee Member

    Messages:
    22,308
    Location:
    Hampshire, UK
    Published as
    An Explainable Artificial Intelligence Model Proposed for the Prediction of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and the Identification of Distinctive Metabolites

    Abstract

    Background: Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex and debilitating illness with a significant global prevalence, affecting over 65 million individuals. It affects various systems, including the immune, neurological, gastrointestinal, and circulatory systems. Studies have shown abnormalities in immune cell types, increased inflammatory cytokines, and brain abnormalities. Further research is needed to identify consistent biomarkers and develop targeted therapies. This study uses explainable artificial intelligence and machine learning techniques to identify discriminative metabolites for ME/CFS.

    Material and Methods: The model investigates a metabolomics dataset of CFS patients and healthy controls, including 26 healthy controls and 26 ME/CFS patients aged 22–72. The dataset encapsulated 768 metabolites into nine metabolic super-pathways: amino acids, carbohydrates, cofactors, vitamins, energy, lipids, nucleotides, peptides, and xenobiotics. Random forest methods together with other classifiers were applied to the data to classify individuals as ME/CFS patients and healthy individuals. The classification learning algorithms’ performance in the validation step was evaluated using a variety of methods, including the traditional hold-out validation method, as well as the more modern cross-validation and bootstrap methods. Explainable artificial intelligence approaches were applied to clinically explain the optimum model’s prediction decisions.

    Results: The metabolomics of C-glycosyltryptophan, oleoylcholine, cortisone, and 3-hydroxydecanoate were determined to be crucial for ME/CFS diagnosis. The random forest model outperformed the other classifiers in ME/CFS prediction using the 1000-iteration bootstrapping method, achieving 98% accuracy, precision, recall, F1 score, 0.01 Brier score, and 99% AUC. According to the obtained results, the bootstrap validation approach demonstrated the highest classification outcomes.

    Conclusion: The proposed model accurately classifies ME/CFS patients based on the selected biomarker candidate metabolites. It offers a clear interpretation of risk estimation for ME/CFS, aiding physicians in comprehending the significance of key metabolomic features within the model.
     
    Robert 1973 and oldtimer like this.
  10. Creekside

    Creekside Senior Member (Voting Rights)

    Messages:
    1,039
    I wonder how reliable the classification is when presented with unhealthy controls. Those metabolites might be commonly abnormal in people "who aren't feeling well".
     
    oldtimer and obeat like this.

Share This Page