Raman Spectroscopy Combined with [ML] Reveals [ME]–Associated Biomolecular Signatures at Rest and After Standardized Stress, 2026, Heidarifard et al

forestglip

Administrator
Staff member
Raman Spectroscopy Combined with Machine Learning Reveals Myalgic Encephalomyelitis–Associated Biomolecular Signatures at Rest and After Standardized Stress

Heidarifard, Maryam; Moezzi, Atefeh; Dallaire, Frédérick; Ember, Katherine; Elremaly, Wesam; Caraus, Iurie; Franco, Anita; Leblond, Frédéric; Moreau, Alain; Dehaes, Mathieu

Abstract
Myalgic encephalomyelitis (ME) is characterized by profound fatigue, post-exertional malaise (PEM), and cognitive dysfunction. Despite its clinical significance, the pathophysiology of PEM and disease heterogeneity remain unclear, and no validated biomarkers are available for rapid diagnosis or monitoring.

We aimed to develop a screening approach combining label-free Raman spectroscopy (RS) and machine learning modeling (ML) to detect biomolecular changes in blood plasma and differentiate patients with ME from sedentary healthy controls.

Blood plasma was collected from 115 patients with ME and 45 controls at rest (T0) and 90 min after a standardized, non-invasive stress test designed to induce PEM.

Plasma samples were analyzed by RS, and ML models were developed independently at each time point to differentiate patients with ME and controls. The RS-ML models identified spectral features consistent with contributions from proteins, lipids, and low-molecular-weight metabolites.

At T0 and T90, the area under the receiver operating characteristic curve, accuracy, specificity and sensitivity were 0.85 and 0.83, 79% and 84%, 82% and 90%, and 73% and 69%, respectively.

RS-ML provides a rapid, low-cost approach to detect ME-associated biomolecular signatures in plasma and capture biochemical alterations associated with standardized stress.

Web | DOI | PMC | PDF | International Journal of Molecular Sciences | Open Access
 
Successfully differentiates patients from sedentary healthy controls. Is there reason to think this would work if controls included non-healthy controls from similar illnesses? Why not include them here?
Probably expensive, also we don’t really know of illnesses that are similar.

The best case control match you can get is sedentary controls from NASA testing where they have studied bed rest for sending people to space. I think Rob Wust had access to these controls, obviously this is extremely expensive and limited supply.
 
Probably expensive, also we don’t really know of illnesses that are similar.
Maybe "similar illnesses" was not the right wording. I think I've seen diabetes or MS patients used as controls in similar studies. I'd argue including people with depression or chronic fatigue without PEM would make the same result a lot more interesting.

Without non-healthy, non-MECFS controls, can we conclude anything beyond this method successfully differentiates healthy from non-healthy?
 
In the acknowledgements I don't see any mention of the Oxford team or Dr. Xu who did the pioneering work an Raman in ME/CFS. I thought that a bit strange as it would have made perfect sense to start from best practices and talk to the original authors and build on their work. After all, OMF and OMF research center directors like to promote themselves as good collaborators.
 
The problem isn’t to identify people that look like they might have ME/CFS. That’s very easy.

The problem is to figure out if they have ME/CFS or something else, or both.

If they can get something useful out of this work it would be great, but it feels like yet another fishing expedition. The biggest benefit might be that the field could be moving away from CPETs to these kind of passive massage stimuli tests, or the thumb tests Fluge and Mella are working on.
 
I can understand the push for a diagnostic test, patients want it to prove something which is otherwise disbelieved so charities push for it. But I don’t think having a test alone would change or improve things for us much. And when you add in sensitivity snd specificity you just introduce more problems over who can legitimise say they have a condition.

What may be more interesting is what using a technique like this may be able to tell us about differences and therefore potential mechanisms. That could be worth exploring more in this and the other raman spectroscopy studies. If something reproducible can be done and then dug into more.

Also agree with @Utsikt thst the passive nature may be an interesting one. Although I guess it raises questions over what you’re measuring, can you passively produce PEM or if you’re avoiding PEM what are you measuring, or what does that tell us?
 
Hello all. I am concerned about the machine learning practice in this paper, but do not have the capacity to dig in.

From a cursory glance, there appears to be no validation split of the data? I could have misread while skimming (fog), but the (many) hyperparameters would therefore have been optimised on the test sets of the 5-fold splits. This is poor practice and means model performance will be an overestimate of real life performance on a new cohort.

Additionally, I am not a fan of the random splitting, but may have missed further comment on why it was done. This leads to the situation where the model may, for every test sample, have seen a train sample from the same input space cluster (rather than being able to generalise across input space clusters). Again, this would lead to overestimating real performance. Is there a table somewhere reporting performance on all folds of the final model version, or some other model version? This would clarify performance variance.

For context, I have a PhD in AI (partly using health records) and currently research AI for drug discovery for a big tech company (off sick with ME).
 
Hi and welcome @Jacob Deasy
There are plenty of AI papers here where your expertise would be very useful.
Also agree with @Utsikt thst the passive nature may be an interesting one. Although I guess it raises questions over what you’re measuring, can you passively produce PEM or if you’re avoiding PEM what are you measuring, or what does that tell us?
I forgot to respond to this. It think it could possibly tell us if there are «background» alterations into something related to physical exertion. If PEM is purely caused by e.g. neurons misinterpreting normal peripheral signals, a sub-PEM test wouldn’t show anything in the periphery. Unless PEM has a lasting effect on the periphery beyond the flare.

The issue might be controls as always. Lots of things probably effect the things we are trying to measure in the periphery.
 
Hello all. I am concerned about the machine learning practice in this paper, but do not have the capacity to dig in.

From a cursory glance, there appears to be no validation split of the data? I could have misread while skimming (fog), but the (many) hyperparameters would therefore have been optimised on the test sets of the 5-fold splits. This is poor practice and means model performance will be an overestimate of real life performance on a new cohort.

Additionally, I am not a fan of the random splitting, but may have missed further comment on why it was done. This leads to the situation where the model may, for every test sample, have seen a train sample from the same input space cluster (rather than being able to generalise across input space clusters). Again, this would lead to overestimating real performance. Is there a table somewhere reporting performance on all folds of the final model version, or some other model version? This would clarify performance variance.

For context, I have a PhD in AI (partly using health records) and currently research AI for drug discovery for a big tech company (off sick with ME).

Welcome and Thank you catching that. I do not understand how such issues pass from the review process.

You are right, this kind of cross validation is way too optimistic. I did not have too much time to see the paper but from a quick look I see an issue to the values used for specificity and sensitivity between those shown on table 3 and the confusion matrix (sensitivity and specificity appears to be swapped). Have they used HCs as the positive class?

Also potential issue with Feature selection leakage because it is not clear whether FS took place inside the cross validation loop.

EDIT : I also see a problem on how they are attempting to correct the class imbalance. They are not using equal weights per class and they effectively inverted the balancing.
 
Last edited:
I can understand the push for a diagnostic test, patients want it to prove something which is otherwise disbelieved so charities push for it. But I don’t think having a test alone would change or improve things for us much. And when you add in sensitivity snd specificity you just introduce more problems over who can legitimise say they have a condition.
I used to think a biomarker should be a research priority. I thought it would make diagnosis faster and validate our disease, but I no longer think so. It seems the diagnostic delay is mainly driven by a lack of doctors knowledgeable in our disease. And then, for the validation piece, this interview with Dr. Luis Nacul from 2019 gave me a new perspective. Here’s a quote that sums up his point quite nicely (my emphasis added):
I know quite a few doctors who know about ME and we can talk about ME in a unified language. But many do not understand it and so there is still a stigma about ME. As long as doctors will not take this illness seriously, the stigma will not be resolved. The current paradigm in medicine is that to prove something, you need to show a biomarker, and that’s why I think it’s so important to find one. But really, we don’t need a biomarker. We don’t have a biomarker for migraines, and yet we have legitimized it as an illness. So this should also be possible for ME.
 
I used to think a biomarker should be a research priority. I thought it would make diagnosis faster and validate our disease, but I no longer think so. It seems the diagnostic delay is mainly driven by a lack of doctors knowledgeable in our disease. And then, for the validation piece, this interview with Dr. Luis Nacul from 2019 gave me a new perspective. Here’s a quote that sums up his point quite nicely (my emphasis added):
Useable biomarkers are also based on pathology. The ones that aren’t will be calibrated against diagnoses based on clinical assessments, so they can’t surpass those in accuracy.

And any understanding of pathology will solve the perceived validity issue much better than a non-pathological test, so we might as well shoot for pathological understanding if we’re going to prioritise something.
 
Back
Top Bottom