A generalist medical language model for disease diagnosis assistance, 2025, Liu et al.

Discussion in 'Other health news and research' started by SNT Gatchaman, Jan 8, 2025.

  1. SNT Gatchaman

    SNT Gatchaman Senior Member (Voting Rights)

    Messages:
    6,410
    Location:
    Aotearoa New Zealand
    A generalist medical language model for disease diagnosis assistance
    Liu, Xiaohong; Liu, Hao; Yang, Guoxing; Jiang, Zeyu; Cui, Shuguang; Zhang, Zhaoze; Wang, Huan; Tao, Liyuan; Sun, Yongchang; Song, Zhu; Hong, Tianpei; Yang, Jin; Gao, Tianrun; Zhang, Jiangjiang; Li, Xiaohu; Zhang, Jing; Sang, Ye; Yang, Zhao; Xue, Kanmin; Wu, Song; Zhang, Ping; Yang, Jian; Song, Chunli; Wang, Guangyu

    The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven.

    Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians’ inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach.

    We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management.

    Our findings demonstrate the model’s feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.

    Link | PDF (Nature Medicine)
     
    Peter Trewhitt likes this.
  2. SNT Gatchaman

    SNT Gatchaman Senior Member (Voting Rights)

    Messages:
    6,410
    Location:
    Aotearoa New Zealand
     
  3. SNT Gatchaman

    SNT Gatchaman Senior Member (Voting Rights)

    Messages:
    6,410
    Location:
    Aotearoa New Zealand
    But NB —

    Medical large language models are vulnerable to data-poisoning attacks (2025)
    Alber, Daniel Alexander; Yang, Zihao; Alyakin, Anton; Yang, Eunice; Rai, Sumedha; Valliani, Aly A.; Zhang, Jeff; Rosenbaum, Gabriel R.; Amend-Thomas, Ashley K.; Kurland, David B.; Kremer, Caroline M.; Eremiev, Alexander; Negash, Bruck; Wiggan, Daniel D.; Nakatsuka, Michelle A.; Sangwon, Karl L.; Neifert, Sean N.; Khan, Hammad A.; Save, Akshay Vinod; Palla, Adhith; Grin, Eric A.; Hedman, Monika; Nasir-Moin, Mustafa; Liu, Xujin Chris; Jiang, Lavender Yao; Mankowski, Michal A.; Segev, Dorry L.; Aphinyanaphongs, Yindalon; Riina, Howard A.; Golfinos, John G.; Orringer, Daniel A.; Kondziolka, Douglas; Oermann, Eric Karl

    The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation.

    Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%).

    Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety.

    Link | PDF (Nature Medicine) [Open Access]


    (Not sure what this all means when the "deliberately planted misinformation" is widely understood to be the gold standard of evidence-based medicine.)
     

Share This Page