Idea: Web app to compile all ME/CFS study test results

forestglip · Sep 6, 2024

I'm imagining a website where people can add all ME/CFS studies that compared one or more things to another group. For example, I, or others, would add level of lactate in CSF from the NIH study, and change in VO2 on second CPET from all the CPET studies, etc, compared to healthy controls. For each test in each study, you would add whether the paper reported that the effect was significant and in which direction (higher or lower).

I think this could be useful so that if you see a new study that measures, say, lactate levels, you could quickly pull up all past studies that have looked at it as well to see if this might be building on a pattern, or doing the opposite.

Apart from that, you could sort by "most replicated" which could be based on some sort of score that gives more points for more replications and penalizes per study that showed opposite effect. This could give a glimpse into potentially some of the strongest leads.

I'm thinking of trying to make this. What do you all think? Actually useful or no?

Trish · Sep 6, 2024

In theory a great idea, but...

Anyone doing a study should do a preliminary literature review beforehand to see what's been done already, so such collation should already exist in research teams.

There are lots of review articles that are intended to collate and review evidence on a particular theme.

There are thousands of medical and biomedical papers published every year, so the task is either massive or has to be restricted to specific aspects and confined only to ME/CFS, such as CPET.

Is there already such a collation done for ME/CFS?

How far back in history would the search go?

Quality is often so poor that data is worthless, how would such a compilation deal with that?

Does it require expert knowledge to be able to use the resulting compilation in any meaningful way?

There are thousands of things that can be measured in humans - from metabolomics, proteomics, genomics ..... would such a website resource just end up with hundreds of short unsorted lists?

Who would use it?

ME/CFS Science Blog · Sep 6, 2024

I think it might be a good idea but I also see potential problems.

The vast majority of studies only report summary data (mean, sd, p-value) or in some cases just that the measurement wasn't significant. So the overview of data that is publicly available likely be a small subset of what has been published and tested.

In addition there is often a problem with the quality of data. If one does not take this into account, the overview of data may be misleading.

forestglip · Sep 6, 2024

Trish said:
Anyone doing a study should do a preliminary literature review beforehand to see what's been done already, so such collation should already exist in research teams.

Trish said:
There are lots of review articles that are intended to collate and review evidence on a particular theme.

That's true, but maybe because the number of different things tested is so large, (for example, 445 chemicals in just the NIH CSF metabolomics study), and there are so many studies, it's hard to get a bird's eye view of the entire ME/CFS field. So while a researcher studying metabolomics might have a pretty good idea of the data in that field, it might be hard to stay current on everything else, to make connections to other fields.

Even in a given field, I think the advantage would be that this is exhaustive, including every single test, while a review would necessarily do a bit of summarizing, I think.

Trish said:
There are thousands of medical and biomedical papers published every year, so the task is either massive or has to be restricted to specific aspects and confined only to ME/CFS, such as CPET.

The idea is just for ME/CFS, though if people wanted to put in the work, I suppose it could be more general. When I search "chronic fatigue syndrome" in Pubmed, I get 8,425 total results. That's high, but I think not impossible. Of course a good bit higher if including long COVID probably.

Trish said:
Is there already such a collation done for ME/CFS?

If there is, I assume it'd be in static form and can't continue to be updated, at least with no delay. Though maybe something similar does exist.

Trish said:
How far back in history would the search go?

All the way back to the earliest ME/CFS (or the alternate names more frequently used then) studies, ideally.

Trish said:
Does it require expert knowledge to be able to use the resulting compilation in any meaningful way?

Trish said:
Who would use it?

The core concept's motivation, I was just thinking of this as more of a fun tool for me to be able to follow the most promising areas. If some random chemical, like X-29542 is repeatedly low in the body in every test, I can set a Pubmed alert for that chemical to see what happens next.

But I think there is potential for it to be actually useful as a quick overview of everything for a researcher, to see at a glance every paper in every field which has had a test be replicated multiple times. If anything catches their eye, they might read the paper and gain some insights.

Or if they are reading a paper in a different, unfamiliar field and want to quickly check a few of the reported tests that might be related to their field, without spending time reading through multiple studies and reviews, they could just quickly get the info on the website.

Also, as new papers come out, they can be immediately added, which might help since reviews take some time.

Trish said:
There are thousands of things that can be measured in humans - from metabolomics, proteomics, genomics ..... would such a website resource just end up with hundreds of short unsorted lists?

The main goal would be to concentrate the fraction of tests that have been looked at multiple times to see those results. The majority of tests would be mostly useless for seeing where the science is if they've only been tested once so far, but they'd be there, ready to be added to when new studies are done.

The main concept would be one master list that includes tests from all fields. If, for example, serotonin has been high in 10 studies, and low in 1, it might be near the top.

Though I do imagine a tagging system as well. Maybe each test could be tagged with the field or organ, for example "exercise" or "brain", and you could filter by those if desired.

ME/CFS Skeptic said:
The vast majority of studies only report summary data (mean, sd, p-value) or in some cases just that the measurement wasn't significant. So the overview of data that is publicly available likely be a small subset of what has been published and tested.

It's true, it could only use what is publicly available, but it might still have value from what is published.

I was thinking at minimum this would store any test that at least says significance of a test or has a p-value, and use <0.05 if they didn't specify a cutoff themselves. And if significant, has whether the ME/CFS group's result is higher or lower than the other group.

ME/CFS Skeptic said:
In addition there is often a problem with the quality of data. If one does not take this into account, the overview of data may be misleading.

Trish said:
Quality is often so poor that data is worthless, how would such a compilation deal with that?

I don't think it could deal with curation on this scale. But I think it might have value just through scope, of which bad studies might only spoil a small portion of the data. And if interested in a specific test, it'd be easy to follow through to all studies to see them.

Trish · Sep 6, 2024

Can AI or machine learning algorithms do the work for you? @mariovitali may have useful comment to make on the work he's done.

forestglip · Sep 6, 2024

Trish said:
Can AI or machine learning algorithms do the work for you? @mariovitali may have useful comment to make on the work he's done.

I do always love trying to come up with potential ML ways to do things. Maybe, but it'd probably be tough. Would be happy to hear his thoughts though.

Nightsong · Sep 6, 2024

I actually like the idea although there might be some issues accessing & collating it all. A lot of data simply isn't released in a raw/machine-readable format & the rest will require manual review which would be a lot of work; some papers just report summaries or analyses rather than give sufficiently granular information. Where there is raw data it may need to be converted out of old XLS or SPSS or other formats (plenty of Python libraries to do such conversions these days, but it'd still be a lot of effort). Some data is tied up behind researcher-only platforms with data-usage agreements like MapMECFS (?)

On the quality issue you might be able to add API-retrievable paper citation/popularity metrics for the papers associated with a given dataset - minimum cite count or Altmetric score that you could filter results by... not the same as quality of course but might be useful.

forestglip · Sep 6, 2024

Nightsong said:
I actually like the idea although there might be some issues accessing & collating it all. A lot of data simply isn't released in a raw/machine-readable format. Some may need to be converted out of old XLS or SPSS or other formats (plenty of Python libraries to do such conversions these days, but it'd still be a lot of effort). Some data is tied up behind researcher-only platforms with data-usage agreements like MapMECFS.

I was thinking this would mostly be manual. Just get the list of all studies from a search for cfs, and slowly chug away at recording the results from each. About 8,500 papers from a quick search, most probably irrelevant for this.

Nightsong said:
On the quality issue you might be able to add API-retrievable paper citation/popularity metrics for the papers associated with a given dataset - minimum cite count or Altmetric score that you could filter results by... not the same as quality of course but might be useful.

Interesting, that might be useful.

forestglip · Sep 6, 2024

The trouble, as with most of my ideas, is the concept seems simple, but as I think more about it, lots of little nuances in the data add complexity. Many times, two tests might be similar but not test the exact same thing, for example how granularly should the tests go? "EBV level"? Or "EBV level - blood" and "EBV level - tissue" or "EBV level - brain tissue"? Trying to make "subgroups" of EBV that are linked to the parent adds a lot of work.

Nightsong · Sep 6, 2024

That would be a lot of manual work. You should think about ways to automate it: I wonder how well the OpenAI GPT (or other companies' LLM) models perform in extracting the most relevant information from papers?

On the whole "subgrouping" thing, maybe consider existing publicly-available ontologies (e.g. UniProt for proteins, ChEBI for metabolites, BRENDA, Human Protein Atlas, Gene Ontology, Reactome)?

forestglip · Sep 6, 2024

Nightsong said:
That would be a lot of manual work. You should think about ways to automate it: I wonder how well the OpenAI GPT (or other companies' LLM) models perform in extracting the most relevant information from papers?

It might be good to just start with feeding every one of those 8500 papers to the GPT API and asking it to list all tests that each paper reported. Maybe fine tuning it first on this task specifically to make sure it's decent at it.

Though there might be results reported in odd supplementary data formats, as you said, or even in the form of images, which would make this a lot harder.

Nightsong said:
On the whole "subgrouping" thing, maybe consider existing publicly-available ontologies (e.g. UniProt for proteins, ChEBI for metabolites, BRENDA, Human Protein Atlas, Gene Ontology, Reactome)?

Might be possible, but I think that's just too much for me to handle. What I first envisioned doesn't take much mental effort, just find where they reported the results, record the name of each test and "increased" or "decreased", and "significant" or "not significant".

And the website would be pretty basic.

Study entry page, maybe with an automatic DOI lookup like Wikipedia's cite tool, and you would enter each test and result.

And two test viewing pages. One to see results for a specific test. The other to see a sorted list for all tests.

Wouldn't be too hard to put the above together, just a lot of time for entry.

But if anything makes me avoid doing this, it'll probably be the prospect of the complexity of what tests look like in the real world.

forestglip · Sep 6, 2024

Nightsong said:
I think that's where I'd start - extracting data from papers using PyPDF2 or whatever, using regexes to see what you can parse out, feeding parts of the paper to various LLMs via the APIs, tweaking the prompts to see how good they are extracting information.

Only other thing I'd add is that there are a number of APIs which could be helpful - e.g. the CrossRef API (DOI -> JSON blob), the core.ac.uk and/or Unpaywall APIs to link to freely available copies of papers, etc.

Thanks, I'll give it some more thought and might try the methods you suggested.

mariovitali · Sep 6, 2024

This is a good idea and I have been discussing and giving some comments on some possible approaches with a PhD guy at University of Texas. We are looking at implementing a RAG (Retrieval-Augmented Segmentation) -actually this is his idea- and we were thinking of building a knowledge base in the way that is discussed here plus some really cool features (which I am not sure as to whether I can share at this point). One that I could share however is the ability of the system to identify conflicting results among research efforts.

I will share this thread with the researcher and get back to you.

forestglip · Sep 6, 2024

mariovitali said:
This is a good idea and I have been discussing and giving some comments on some possible approaches with a PhD guy at University of Texas. We are looking at implementing a RAG (Retrieval-Augmented Segmentation) -actually this is his idea- and we were thinking of building a knowledge base in the way that is discussed here plus some really cool features (which I am not sure as to whether I can share at this point). One that I could share however is the ability of the system to identify conflicting results among research efforts.

I will share this thread with the researcher and get back to you.

Interesting! As I said, there's a good chance a decent product, considering all the complexities discussed above, will be too much for my brain to build, so feel free to use any parts of this in your own projects. But I'd be interested to hear what you find out or do.

forestglip · Sep 6, 2024

@mariovitali Just brainstorming a possible solution for it being difficult to split tests cleanly into discrete tests, as well as group similar tests.

What if instead of storing each test as its name, trying to combine tests that are similar enough, you turned each test into an embedding and stored that instead?

You would feed a detailed description of every test from every paper - maybe another AI could extract relevant text, like "ELISA testing was used to detect immunoglobulin G (IgG) class antibodies specific to CMV" - into a model which generates an embedding. That would be stored in a database, along with the binary result (increased or decreased), and maybe some other information about results.

Then you could view a "map" of the embedding space and look for hotspots of results that agree often.

Or you could search for a test like "EBV" and find all embeddings within a certain distance from the search term.

hotblack · Sep 6, 2024

I seem to remember Deepmind showing a demo (using Gemini I think) to help compile meta-analyses and then searching for updated papers to update the relevant datasets. How much this was staged and how much work is involved versus the old fashioned way I’m not sure.

mariovitali · Sep 7, 2024

forestglip said:
@mariovitali Just brainstorming a possible solution for it being difficult to split tests cleanly into discrete tests, as well as group similar tests.

What if instead of storing each test as its name, trying to combine tests that are similar enough, you turned each test into an embedding and stored that instead?

You would feed a detailed description of every test from every paper - maybe another AI could extract relevant text, like "ELISA testing was used to detect immunoglobulin G (IgG) class antibodies specific to CMV" - into a model which generates an embedding. That would be stored in a database, along with the binary result (increased or decreased), and maybe some other information about results.

Then you could view a "map" of the embedding space and look for hotspots of results that agree often.

Or you could search for a test like "EBV" and find all embeddings within a certain distance from the search term.

All of these steps you mentioned are possible and make sense. You could also add to the system certain criteria that are needed for making sure that proper sampling took place, whether appropriate statistical tests have been used and so forth. The main concern is hallucinations of LLMs and finding strategies to minimise them.

hotblack · Sep 7, 2024

mariovitali said:
The main concern is hallucinations of LLMs and finding strategies to minimise them.

I guess that is what you’re hoping to minimise with RAG?

Given forestglip’s original plan of doing this whole process manually, perhaps having a smaller set of data to go through that needs some manual checking to check for hallucinations would still be less work than going through everything manually?

Deleted member 2545 · Sep 7, 2024

I'm not sure about the best solution and it's a slightly different requirement (@ mod this may need moving) but a meta paper type exercise that rounds up all the recent important biological findings like wasf3, il6 etc and powerfully demonstrates the weight of the biological body of evidence now across certain key indicator metrics would be nice to be point to when encountering scepticism. I've not seen anything to date that does that effectively and concisely.

mariovitali · Sep 7, 2024

@hotblack Yes, RAG would be a strategy to minimise hallucinations. I also did some work with NLP (Natural Language Processing) and Information Extraction which could help identify passages of text with specific importance when it comes to training. See an example below where NLP is used to locate TSPO ligands passages among PUBMED abstracts :

"Fun" fact : Despite the fact that this technology existed since 2018 and despite my repeated efforts, no single patient organisation decided to move forward with using this technology to do the hard work.

Idea: Web app to compile all ME/CFS study test results

Moderator

Moderator

Senior Member (Voting Rights)

Moderator

Moderator

Moderator

Senior Member (Voting Rights)

Moderator

Moderator

Senior Member (Voting Rights)

Moderator

Moderator

Senior Member (Voting Rights)

Moderator

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Deleted member 2545

Guest

Senior Member (Voting Rights)