A good example of an AI platform that is obviously tuned and biased towards a cherry-picked editorial viewpoint of ME/CFS. I'm not sure where to place this, I went with the fact that it seems built by Mayo. The platform is marketed as "the leading medical information platform". Which is doubtful. http://www.openevidence.com/ I'm not sure what access there is for the public. It says free and unlimited for professionals and there is a Reddit discussion where some of the users asked questions, they appear limited to X per day, or something like it. The answers are very misleading and promote the company line. A great example of misusing AI. There is however a way to report problems with the information. Considering that it's been obviously tuned to promote psychosomatic pseudoscience, I don't have much hope. The editorial slant is blatant, IMO. There is a reddit thread with some discussion, including some additional questions and answers made by users.
To clarify: I doubt it has been trained specifically to think this of ME/CFS. It's probably a general bias, not a specific one, the biases happen in what the platform gets fed and in the set of instructions that make it weigh some evidence and discard others. By contrast, the main general LLMs like ChatGPT do far better. They need to be trained incorrectly, with medical biases, to get things like this wrong.
@rvallee bias in AI always means how far off the model is from the training data. Bias in AI has nothing to do with how closely the training data matches the ‘true’ world or how we want the world to be. That being said, I’m assuming this is an training data issue and not a tuning issue. I.e. the data doesn’t represent the current knowledge of ME/CFS.
Notable that some comments on the reddit thread mention having different results, likely because of how they framed the question. Actually, you find similar issues with the big LLMS, where you get different results for example if you use CFS/ME vs ME/CFS vs myalgic encephalomyelitis. And of course we can pretty much expect most MDs to use 'chronic fatigue' and ask poor questions, since it takes a bit of training to do that, and of course wanting to get accurate information vs. validating one's opionions.
Pretty much all LLMs are tuned to please the user, to be agreeable. I don’t know how that is for this model. The prompts are also extremely important. If you look at research using LLMs to complete tasks, the promts can be many pages long and took a long time to develop. This is the prompt from a recent Norwegian AI study that tried to get the AI to answer simple medical questions. The promt is under the header ‘Instruksjoner’ https://github.com/MMIV-ML/helseveileder/blob/main/README.md#instruksjonene
My guess would be they have an LLM running along with a RAG db containing various medical data - the flow that happens is roughly to embed the users query, find the closes matching text chunks in the RAG db and put all of that (and chat history) in a prompt to pass to the LLM. This the bias is probably in the data provided. I've asks a few SLM (small language models) about ME (without additional RAG info in the prompt) I think (but I didn't keep notes) Phi3.5/4 and LLama3.x were ok; Deep seekr1-7b was poor (but I found that was generally the case) a larger version (70b) is more coherent. They seemed to know about the IoM report. There are some LLMs finetuned with medical data - bioMistral for example but I wasn't convinced about this being good. With the new training techniques from deepseek it may be easier to train for medical diagnosis etc if there are resources that could be used for reinforcement learning.
I asked the deepseek-r1:60b model about ME diagnosis and treatment and got this: (Note this is a chain of reasoning model so it thinks first (between the tags <think></think> and then produces the result