Machine Learning-assisted Research on ME/CFS

Thanks. If I understand correctly, though, your network analysis is based on mentions in abstracts and the text of a publication and not on the actual data of studies?
First of all, Thank you for the reply. Much appreciated.

What you say is correct. All of the work that was made was an attempt to connect medical concepts that have been appearing to the text of the abstracts. This was the first part of the work , the Network Analysis (which was later used by Wenzhong Xiao).

The second part of the work was taking the various symptoms of MECFS, retrieving abstracts related to each of these symptoms and then asking from machine learning to tell us which combination of topics could predict a symptom vs a non-symptom state.


I do not quite understand what you mean with "actual data" of studies so I would appreciate your time in explaining what this means. But for the moment let's look at results so far given the input I described.

The key question here is : Can the above appear by pure chance or not? What do we need to make a claim that the above methodology works indeed ? From what I understand a key part is to find the number of plausible topics for MECFS and then run an appropriate analysis. I find it extremely hard that this can reliably take place but I am open to any suggestions. I can provide a full list of identified concepts from my part.

If this is cherry picking then I see no harm done apart from wasting 15 years of my life in trying to convince others but at least I can now function and have a near-normal life. So harm for my time (and income) but no harm for the patient's health and wallets.

But if indeed is something taking place here then we are talking about negative bias consistently over the years. Can patients "afford" not to look at this methodology? @forestglip @Hutan also would appreciate your input.
 
I do not quite understand what you mean with "actual data" of studies so I would appreciate your time in explaining what this means.
The raw data of the experiments, for example, in CSV format.

A big problem I see is that almost all studies misrepresent their data in the abstract and text (to make it look like a bigger deal than it is or to promote the authors' favoured theory).

So I think machine learning/AI/network analysis, etc. will only be useful if we skip how authors present their findings and only train it on the actual data, with strong selection criteria so that only high-quality experiments, such as, e.g., DecodeME, are included.
 
But if indeed is something taking place here then we are talking about negative bias consistently over the years. Can patients "afford" not to look at this methodology?
Looking at the list above, I do wonder about matching findings by chance. How many genes did your program identify, and how many genes did the latest PrecisionLife study identify? Didn't the latter report a few hundred or a few thousand genes? Seems to be an opportunity for a few genes to overlap by chance.

I think it would be important to clearly outline what exactly the methodology is. I only have a vague idea of what is being done.

I'm sorry, I don't otherwise have the energy to really try to understand this, and I'm not sure I'd be able to follow a detailed description of the entire pipeline anyway, but it might be good to make for demonstrating to someone that there's some promise to this tool. It's too hard for anyone new to these ideas to try to discern what is being argued from 10 pages of posts in this thread, and various assorted posts on other threads and websites.
 
Last edited:
The raw data of the experiments, for example, in CSV format.

A big problem I see is that almost all studies misrepresent their data in the abstract and text (to make it look like a bigger deal than it is or to promote the authors' favoured theory).

So I think machine learning/AI/network analysis, etc. will only be useful if we skip how authors present their findings and only train it on the actual data, with strong selection criteria so that only high-quality experiments, such as, e.g., DecodeME, are included.


I see so my understanding is that you reference Garbage-In, Garbage-Out (GIGO). If the input data is garbage then the output is Garbage.

The thing is that in my analysis I did not look at the results of the studies. The search space of the analysis was built by identifying which concepts appeared together in research papers so basically a co-occurrence analysis was at the basis of it. I wanted to understand the connections between symptoms and various medical concepts.

If this is so, does this change your GIGO belief?

@forestglip you said :

Looking at the list above, I do wonder about matching findings by chance

Wondering is one thing but we cannot dismiss a hypothesis -especially one with repeated confirmations- that easily, correct ? My message above was about how can we identify whether the computational techniques I used did in fact do better than pure chance. Which is the search space that I should use ? 19000 human genes ? How many pathways ?How many symptoms ?

In any case Thank you both for your time
 
Back
Top Bottom