Idea: Web app to compile all ME/CFS study test results

Discussion in 'General ME/CFS discussion' started by forestglip, Sep 6, 2024.

  1. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    I downloaded every abstract from Pubmed from the search term "chronic fatigue syndrome" (in quotes). I wrote a script to send every abstract, one by one, to the Claude API, to respond with whether it is original research on ME/CFS. Here's the prompt:

    I sent 121 abstracts so far as a test. It's a bit expensive. There are about 8,400 abstracts. It'll cost about $40-50 to get responses for all of them. There's another Claude model that is 10 times cheaper, but the answers I was getting weren't making much sense. This model seems pretty good at making decisions. A lot of the cost is the length of the prompt above, so I might have to figure out a way to shorten it without it losing accuracy.

    Anyway, here are the results of the first few. I attached a text file with all the abstract responses I've gotten so far.

     

    Attached Files:

    Peter Trewhitt and hotblack like this.
  2. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    393
    Location:
    UK
    Interesting, are you using sonnet or opus? If you don’t mind rate limits maybe it’s worth seeing what Gemini comes up with, I’ve been doing some simple experiments (unrelated to this) using it. While Gemini 1.5 Flash may not be powerful enough the limits are low, 1.5 Pro would take too long, but 1.0 Pro may fit the bill if it produces good enough results and you can schedule the work over a week.
    https://ai.google.dev/pricing

    And perhaps an obvious or silly question, but have you tried any of the local models to see if they’re up to the task? Depending upon what hardware you’ve got access to. I’ve only experimented with small (2B) models for basic tasks which wouldn’t but maybe if you can run the larger models?
     
    Last edited: Sep 7, 2024
    Peter Trewhitt likes this.
  3. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    This was with Sonnet. Opus would cost about 5 times more.

    Interesting, looks like for the free tier of 1.0 Pro, I can do all 8,400 in about 10 hours. I'm not too hopeful it'll be much better than Claude Haiku, which was pretty bad, but I'll try with the same 121 abstracts.
     
    Peter Trewhitt and hotblack like this.
  4. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    I don't have GPU on my computer. I once tried the biggest that could comfortably run on my laptop, and it was both pretty bad at understanding compared to the biggest ones and incredibly slow.
     
    Peter Trewhitt and hotblack like this.
  5. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    Oh I wasn't looking carefully at the rate limits. There's also a daily limit of 1500 requests, so it would be closer to a week. But it's also a lot cheaper if I just want to pay and do it quickly. Somewhere around $5-10 for all of them, which isn't that bad.

    Anyway, I ran the same 121 abstracts through Gemini Pro 1.0. It had a different opinion on 28 of them. 9 of which were a flip from a YES to a NO or vice versa. For the rest, one of the models said MAYBE.

    Gemini said this for one: "YES: The study is a meta-analysis of six randomized controlled trials, pooling data to investigate whether cognitive behavioral therapy (CBT) effectiveness is moderated by depressive symptoms in patients with ME/CFS. This analysis is original research and focuses specifically on ME/CFS."

    Even though the prompt explicitly says: "NO: If it describes a review, meta-analysis, other non-original research, or does not specifically focus on CFS/ME/ME-CFS."

    I think it might have to be a better model, or there will be a lot of mistakes like this.

    Here are the first five that didn't match, along with each model's explanation. There's a character limit in these posts, so all 28 are in an attached text file.

     

    Attached Files:

    Last edited: Sep 7, 2024
    Peter Trewhitt and hotblack like this.
  6. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    Oh, I can filter that list of studies down on PubMed to just clinical trials. Does that include everything that would have tested something? Not sure. But it comes down from 8,426 to 516. Much better.

    My plan is to try the embedding approach. After having Claude tell me which of these matches the criteria, as above, then for those I'll try to get the full text of each somehow and send that to Claude and ask it to list every test and result in detail. Depending on how long papers are and how many there are, that step might still turn out very expensive, we'll see.
     
    Peter Trewhitt and hotblack like this.
  7. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    Oops, I had used Gemini 1.5 Flash previously, not 1.0 Pro. I just ran it again with 1.0 Pro, and it's worse. 37 mismatches out of 120. Here's just a few in case anyone is interested, but I'm going to just use Claude Sonnet.

    Edit: Oh right, I need to filter for more than clinical trials to get tests like serotonin levels. But no filter seems to include the deep phenotyping study. I'll probably have to filter backwards by downloading them all and eliminating the ones that are tagged as reviews, commentaries, etc.

    Edit 2: No, this doesn't seem as straightforward as I had hoped. A couple main issues:

    1. I don't think this will work for interventions. At least getting a nice binary result for the heatmap of interventions to show if it improved or got worse, since there can be multiple outcomes per intervention. Maybe just for observational studies this would be okay.

    2. It's expensive to give it full studies, and it's not very good at following instructions perfectly if the text is really long. It was about 5 cents for just the methods and results sections of a random study. If I'm doing 1000 studies, that's $50, more if there are significantly longer ones.

    I may have burned out my brain for a while too. So I'll leave this alone for now, I think. I still think there might be something cool if I could get a bunch of data like this:

    <test>SF-36 Physical Function score</test><result>increased</result>
    <test>xanthine metabolism compounds in urine samples</test><result>increased</result>

    And make a map where it groups similar items together (e.g. serotonin would be closer to dopamine than to symptom questionnaire) and makes items that are increased one color, like blue dots, and items that are decreased red dots. If you see a lot of blue dots clumped together in one spot, or red dots clumped together in one spot, you can zoom in and see that many somewhat similar tests have gotten the same result.

    I'm not even sure this would work as well as I hope. Anyway, maybe a project for the future or for someone else.
     
    Last edited: Sep 7, 2024
  8. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    393
    Location:
    UK
    @forestglip Thanks for sharing your progress, comparisons and results.I’m not surprised your brain is a little fried, mine has been just from following! There’s some really interesting ideas here.
     
  9. hotblack

    hotblack Senior Member (Voting Rights)

    Messages:
    393
    Location:
    UK
  10. Nightsong

    Nightsong Senior Member (Voting Rights)

    Messages:
    806
    On cost: you don't actually have to use the provided APIs. When ChatGPT first came out I wrote a quick Python script to interact with it using browser instrumentation (Selenium/ChromeDriver with a few modifications) - much cheaper!

    Also occurs to me that the hallucination risk might be reduced by using ensemble (e.g. consensus of multiple LLMs) or cross-verification (where one LLM evaluates the output of another LLM for correctness) methods.

    Lots of interesting ideas on this thread. I've no energy to take on a project like this but hope someone picks it up and runs with it.
     
  11. kasi-leko

    kasi-leko New Member

    Messages:
    2
    It seems that the Pubmet format has a field PT ("publication type") that indicates if the article is a review: https://pubmed.ncbi.nlm.nih.gov/help/#pt (the list: https://pubmed.ncbi.nlm.nih.gov/help/#publication-types) so you don't have to use a LLM for that. These would also give the answer to your question 1.

    If you really want to make something that works, you will have to label some data manually. This is absolutely necessary if you at least want to know how well your extraction system work. Incidently you could use the labelled data to train a classifier model using one of the transformer models specifically trained on medical data (for example Med-BERT, Clinical BERT, etc). The advantages of classifiers are that they don't rely on text generations, which are prone to hallucinations no matter what, their performance is quantifiable (as opposed to using a LLM with no manually labelled data), and finally, they're cheaper than querying a LLM.

    As for measurements you probably want to train a dedicated NER model, for the same reasons as above. Of course you could always try a LLM and ask it to extract the relevant information into JSON format, as long as you have some manually labelled data in hand to evaluate the LLM outputs.
     
  12. forestglip

    forestglip Senior Member (Voting Rights)

    Messages:
    1,245
    I decided I'm going to try to do a variation of this idea. The plan is to make a wiki (using the same software as Wikipedia/MEpedia) about the research of chronic illnesses (including ME/CFS, multiple sclerosis, depression, cancer, etc). It won't just be a normal encyclopedia format like MEpedia, though. Each page will be dedicated to a single research paper, and the goal would be to include as many papers as possible about these illnesses.

    Then on top of that structure, I will try to incorporate a MediaWiki extension called Semantic Mediawiki. This allows turning a wiki into something more like a database for making queries.

    What that means is that I can add "parameters" (there's a chance I'm mixing up the terminology they use, I'm just learning about the extension) to every page which include information about that study. For example, a page could have "Condition:ME/CFS" and "Criteria:IOM". Then it becomes simple to search for every study that used IOM criteria.

    But the main goal is to have each study include parameters for findings. For example, "ventricular brain lactate increased" or "plasma lactate decreased". Then a search for "ME/CFS" and "lactate" would bring up both of these studies. A search for "plasma lactate" would only bring up the second. This way it would be trivial to look up all studies that tested lactate. Or all studies that tested a specific drug, and so on. A simple text search for "lactate", like you could do on MEpedia, would list all these studies, but would also bring up irrelevant pages - anything that mentioned that word.

    Also, with this tool, the data could be formatted in various other ways. As another example, one could do a search for "lactate" and view a timeline chart showing when every study on lactate was done. This would use a date parameter from the pages.

    Other than the parameter information for using it as a database, the pages can also include text information, like regular wiki pages, relevant to the specific study. For example, there could be links to all online discussions about the specific paper (e.g. links to threads on S4ME, Phoenix Rising, Reddit, PubPeer), quoted interesting bits of the study, or "meta" information about the study (like links to pages about the institutions or researchers that performed the study).

    No plans to incorporate AI, as when I tested it before, it was too expensive/bad for accurately pulling out information from long papers. It'll just be whoever wants to contribute, and hopefully over time it can grow to include a lot of papers.

    The information would be entered in a structured form. Here is an example of a wiki, MitoPedia, that uses the extension I plan to use. This one actually also has pages for individual research papers. The topic of the wiki is "mitochondrial and chloroplast physiology".

    Screenshot of MitoPedia data entry:
    upload_2024-10-30_20-32-37.png
     
    Last edited: Nov 4, 2024
    alktipping and Peter Trewhitt like this.

Share This Page