OpenEvidence: an AI resource from Mayo clinic and NEJM (currently with misleading advice)

Discussion in 'USA clinics and doctors' started by rvallee, Feb 20, 2025.

  1. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    14,246
    Location:
    Canada
    A good example of an AI platform that is obviously tuned and biased towards a cherry-picked editorial viewpoint of ME/CFS. I'm not sure where to place this, I went with the fact that it seems built by Mayo.

    The platform is marketed as "the leading medical information platform". Which is doubtful. http://www.openevidence.com/

    I'm not sure what access there is for the public. It says free and unlimited for professionals and there is a Reddit discussion where some of the users asked questions, they appear limited to X per day, or something like it. The answers are very misleading and promote the company line. A great example of misusing AI.

    There is however a way to report problems with the information. Considering that it's been obviously tuned to promote psychosomatic pseudoscience, I don't have much hope. The editorial slant is blatant, IMO.

    There is a reddit thread with some discussion, including some additional questions and answers made by users.
     
    Last edited: Feb 20, 2025
  2. Yann04

    Yann04 Senior Member (Voting Rights)

    Messages:
    1,700
    Location:
    Romandie (Switzerland)
    GIGO
     
    alktipping likes this.
  3. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    14,246
    Location:
    Canada
    To clarify: I doubt it has been trained specifically to think this of ME/CFS. It's probably a general bias, not a specific one, the biases happen in what the platform gets fed and in the set of instructions that make it weigh some evidence and discard others.

    By contrast, the main general LLMs like ChatGPT do far better. They need to be trained incorrectly, with medical biases, to get things like this wrong.
     
    alktipping, Peter Trewhitt and Yann04 like this.
  4. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    1,143
    Location:
    Norway
    @rvallee bias in AI always means how far off the model is from the training data.

    Bias in AI has nothing to do with how closely the training data matches the ‘true’ world or how we want the world to be.

    That being said, I’m assuming this is an training data issue and not a tuning issue. I.e. the data doesn’t represent the current knowledge of ME/CFS.
     
    alktipping and Peter Trewhitt like this.
  5. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    14,246
    Location:
    Canada
    Notable that some comments on the reddit thread mention having different results, likely because of how they framed the question. Actually, you find similar issues with the big LLMS, where you get different results for example if you use CFS/ME vs ME/CFS vs myalgic encephalomyelitis.

    And of course we can pretty much expect most MDs to use 'chronic fatigue' and ask poor questions, since it takes a bit of training to do that, and of course wanting to get accurate information vs. validating one's opionions.
     
    alktipping, Yann04 and Peter Trewhitt like this.
  6. Utsikt

    Utsikt Senior Member (Voting Rights)

    Messages:
    1,143
    Location:
    Norway
    Pretty much all LLMs are tuned to please the user, to be agreeable. I don’t know how that is for this model.

    The prompts are also extremely important. If you look at research using LLMs to complete tasks, the promts can be many pages long and took a long time to develop.

    This is the prompt from a recent Norwegian AI study that tried to get the AI to answer simple medical questions. The promt is under the header ‘Instruksjoner’
    https://github.com/MMIV-ML/helseveileder/blob/main/README.md#instruksjonene
     
    Peter Trewhitt, alktipping and Yann04 like this.
  7. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,838
    Location:
    UK
    My guess would be they have an LLM running along with a RAG db containing various medical data - the flow that happens is roughly to embed the users query, find the closes matching text chunks in the RAG db and put all of that (and chat history) in a prompt to pass to the LLM. This the bias is probably in the data provided.

    I've asks a few SLM (small language models) about ME (without additional RAG info in the prompt) I think (but I didn't keep notes) Phi3.5/4 and LLama3.x were ok; Deep seekr1-7b was poor (but I found that was generally the case) a larger version (70b) is more coherent. They seemed to know about the IoM report.

    There are some LLMs finetuned with medical data - bioMistral for example but I wasn't convinced about this being good.

    With the new training techniques from deepseek it may be easier to train for medical diagnosis etc if there are resources that could be used for reinforcement learning.
     
    Medfeb, Yann04 and Peter Trewhitt like this.
  8. Adrian

    Adrian Administrator Staff Member

    Messages:
    6,838
    Location:
    UK
    I asked the deepseek-r1:60b model about ME diagnosis and treatment and got this:
    (Note this is a chain of reasoning model so it thinks first (between the tags <think></think> and then produces the result

     
    Peter Trewhitt likes this.

Share This Page