Developing a blood cell-based diagnostic test for ME/CFS using peripheral blood mononuclear cells, 2023, Xu, Morten et al

Andy

Senior Member (Voting rights)
Preprint.
Paper now published, see this post


Abstract

A blood-based diagnostic test for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and multiple sclerosis (MS) would be of great value in both conditions, facilitating more accurate and earlier diagnosis, helping with current treatment delivery, and supporting the development of new therapeutics.

Here we use Raman micro-spectroscopy to examine differences between the spectral profiles of blood cells of ME/CFS, MS and healthy controls. We were able to discriminate the three groups using ensemble classification models with high levels of accuracy (91%) with the additional ability to distinguish mild, moderate, and severe ME/CFS patients from each other (84%). To our knowledge, this is the first research using Raman micro-spectroscopy to discriminate specific subgroups of ME/CFS patients on the basis of their symptom severity. Specific Raman peaks linked with the different disease types with the potential in further investigations to provide insights into biological changes associated with the different conditions.

https://www.medrxiv.org/content/10.1101/2023.03.18.23286575v1

 
Last edited by a moderator:
It seems that there was also a failed replication attempt included in this paper. The authors write (my bolding):

"We first used a simple approach to examining mitochondrial oxidative phosphorylation in frozen PBMCs from 41 out of the 98 subjects. A previous report by Tomas et al. has shown a difference in whole-cell mitochondrial respiration, consistent with a deficiency in cellular energetics associated with 135 mitochondrial dysfunction or substrate flux feeding into the TCA cycle and mitochondrial respiratory chain (15). However, this assay was difficult to reproduce; Missailidis et al. failed to reproduce this finding in PBMCs but did find differences in immortalised lymphocytes (24). In our study, cell viability following thawing was between 70–85% with a noticeable drop in viability following 24 hr in culture. Mitochondrial respiration was measured in 5-mM glucose media with rates measured over 1–2 hr. No difference was observed in rates of mitochondrial respiration between ME/CFS patients, MS patients and healthy controls (Figure S1A). When ME/CFS patients were divided into severe, moderate and mild patients, no difference was observed (Figure S1B). This demonstrated that mitochondrial function assessment of PBMCs using an oxygen consumption assay on cryopreserved frozen samples failed to discriminate disease cohorts and will be challenging to be developed as a diagnostic approach."​
 
Among the things they found with the Raman spectroscopy were increase in tryptophan and tyrosine, elevated glycerol levels, reduced cholesterol and cholesteryl esters, and reduced glycogen levels.

But it seems that most of the differences applied to ME/CFS and MS patients compared to healthy controls and that there were few clear differences between the MS and ME/CFS groups.
 
Xu et al: “This demonstrated that mitochondrial function assessment of PBMCs using an oxygen consumption assay on cryopreserved frozen samples failed to discriminate disease cohorts and will be challenging to be developed as a diagnostic approach”


The study they were trying to replicate is Thomas et al: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231136

Thomas et al: “Blood samples were prepared and PBMCs isolated as described previously in Tomas et al. (6)”

Thomas et at (6) (my bold): “Blood samples were processed using the Histopaque® method. Briefly, the whole blood sample was centrifuged at 700 x g for 10 minutes and plasma removed. Blood was made up to its original volume with sterile PBS (Sigma Aldrich D8537). Density gradients were prepared with Histopaque® 1.077 (Sigma Aldrich 10771) and Histopaque® 1.119 (Sigma Aldrich 11191). Blood was slowly layered on top of the Histopaque® gradient and the tube spun at 700 x g for 30 minutes with the break off. Plasma layer was aspirated off and the PBMC layer collected. PBMCs were washed with fresh PBS and either used for experiments immediately or frozen at -80°C after being combined with freezing medium (40% FBS (Sigma Aldrich F0804), 10% DMSO (Sigma Aldrich D8418) and 50% RPMI-1640 (Sigma Aldrich R7388). To revive frozen samples, vials were rapidly defrosted in a water bath at 37°C and added to 10ml of fresh RPMI-1640. Cells were centrifuged at 700 x g for 10 minutes, the supernatant removed and cells resuspended in fresh RPMI-1640. Cell viability was then determined using the trypan blue method (see below). PBMC experiments were conducted using RPMI-1640 medium supplemented with 10% FBS and 1% penicillin-streptomycin (Sigma Aldrich P4333). Blood samples were processed within 4 hours of blood collection.”


Xu et al say they couldn’t replicate the results using cryopreserved frozen samples but I don’t know if this is more likely to be because of the biobank’s cryopreservation freezing process or because the Thomas et al result isn’t reliable.

I’m also not sure if Thomas et al looked for any differences between their frozen and fresh samples.

I’ve not read both papers in full so I may have missed discussion of these issues.
 
I’m also not sure if Thomas et al looked for any differences between their frozen and fresh samples.
Figure 2 in the paper you quoted shows the difference between fresh and frozen. There is a big difference.
https://journals.plos.org/plosone/article/figure?id=10.1371/journal.pone.0186802.g002

For bioenergetics it's important to process samples promptly and all samples should be prepared the same way. I think I read somewhere that the UK ME/CFS Biobank can take up to 14 hours to process which is not ideal for many studies where metabolite half life is important.
 
First, awesome paper and glad to see the Morten lab getting their ME/CFS work into the literature. Very nice to see other people looking to cell-based biomarker potential.

It seems that there was also a failed replication attempt included in this paper. The authors write (my bolding):

"We first used a simple approach to examining mitochondrial oxidative phosphorylation in frozen PBMCs from 41 out of the 98 subjects. A previous report by Tomas et al. has shown a difference in whole-cell mitochondrial respiration, consistent with a deficiency in cellular energetics associated with 135 mitochondrial dysfunction or substrate flux feeding into the TCA cycle and mitochondrial respiratory chain (15). However, this assay was difficult to reproduce; Missailidis et al. failed to reproduce this finding in PBMCs but did find differences in immortalised lymphocytes (24). In our study, cell viability following thawing was between 70–85% with a noticeable drop in viability following 24 hr in culture. Mitochondrial respiration was measured in 5-mM glucose media with rates measured over 1–2 hr. No difference was observed in rates of mitochondrial respiration between ME/CFS patients, MS patients and healthy controls (Figure S1A). When ME/CFS patients were divided into severe, moderate and mild patients, no difference was observed (Figure S1B). This demonstrated that mitochondrial function assessment of PBMCs using an oxygen consumption assay on cryopreserved frozen samples failed to discriminate disease cohorts and will be challenging to be developed as a diagnostic approach."​

This matter is a bit complicated. It's kind of a replication but also not. There are similarities and differences in both method and outcome across all 3 studies (Tomas, mine (Missailidis) and Xu/Morten) I'll break it down by the important points in sequence needed to understand what's going on here.

1) Tomas et al found reduced basal OCR in PBMCs. Their paper says the PBMCs were "washed with fresh PBS and either used for experiments immediately or frozen at -80°C after being combined with freezing medium". This means that their results show a combination of frozen and fresh PBMCs. Their frozen ME/CFS PBMCs have the lowest oxygen consumption.

2) I found reduced basal OCR in PBMCs and unchanged basal OCR (but other abnormalities) in lymphoblastoid cell lines (LCLs) created by immortalising PBMCs with EBV. I also found that PBMCs from pwME died faster than healthy control PBMCs post-thaw. PBMCs without stimulation or immortalization will die after being isolated from the blood since their metabolism and proliferation is inactivated. It just happens faster for those from pwME from what I have seen/reported, and maybe damage from the freezing process interacts with the cells being diseased to make this more observable. You also can't really measure untreated PBMCs reliably with seahorse respirometry. The oxygen consumption rate is so low that it's effectively barely at the threshold of detection. So much so that you often get erroneous readings below 0 because the normal error/variation in the reading is probably greater than the actual signal itself in those cases (ie the machine is not sensitive enough to always handle the negligible amount of respiration done by effectively sleeping/dying cells). Not only have I seen this myself but if you look at the Tomas paper they have OCR values below 0 as well so it's happened for them too. So this makes sense, especially given that their post-thaw ME/CFS PBMCs consumed the least oxygen.

3) Xu/Morten et al found no difference in basal OCR in post-thaw PBMCs using a different instrument to measure OCR. It may be that their ME/CFS PBMCs do not die faster like mine did, or were not incubated long enough that dead cells would accumulate (24-48 hr mark was where the difference became most apparent, and my PBMCs were also incubated for a day prior to assay post-thaw as with the Tomas paper). Maybe the sensitivity of the different instrument used is related, it's different to the instrument used by Tomas et al and by myself. Don't know which of these possibilities it is. But IMO this doesn't matter so much since, again, PBMCs are metabolically quiescent anyway. I wouldn't expect to see much or any difference if the cells weren't dying faster in one group. The analogy is like trying to hear which of two speakers has its volume set higher while both speakers are both powered off.

So basically I think that everything makes total sense taken together as a whole across the 3 studies. PBMCs are useful for a lot of things but I don't think they are for oxygen consumption measurements. Xu/Morten and friends have done amazing work here and their other, much more important results using raman spectroscopy in this paper are extremely valuable.
 
Last edited:
@chillier made this post about the paper's Raman micro-spectroscopy results on the
The current state of ME/CFS research, and its prospects thread
chillier said:
I remember seeing this before and not being convinced so looking again to try and remind myself why. First thing is the wave forms of the different cohorts have been translated up or down in figures A and B to aid visibility. I thought this was a bit misleading as it makes it look like they are strikingly different between the groups when just glancing with the naked eye, when in reality the look pretty much identical. The second thing is the cohorts separate out very nicely in the factor analysis on the right - but this isn't a principle component analysis it's a linear discriminant analysis which is supervised - that is, you give it the information about which cohort each datapoint belongs to in advance. The algorithm goes on to find a way to display the data that separates the groups best.

They do go on to train a classifier to distinguish between the groups and they divide the data into training and testing partitions which is important. So these numbers of 91% and 92% sensitivity and specificity in picking out ME from the others could be fair enough.

You can kind of see a spike in the variation of the cells in the healthy control cell intensity corresponding to a peak around 1000cm-1 in their raman spectroscopy data (I've highlighted it with a red arrow). It looks like this might correspond to the phenylalanine or possibly glycerol wavelength which they quantify in the following figure. They attach the breakdowns of intensity at each wavelength in a supplementary table but I guess we can't access yet if it's in preprint:

upload_2023-6-22_13-20-11-png.19747



upload_2023-6-22_13-22-20-png.19748
upload_2023-6-22_13-26-31-png.19749
 
About the method - it provides information about molecules in single cells:
Raman spectroscopy is a non-invasive and label-free approach to probe molecular vibrations in a sample, and when combined with confocal microscopy, it can interrogate individual cells (18). A single-cell Raman spectrum (SCRS) is a phenotypic fingerprint of all biomolecules in that cell and could potentially differentiate between various cell types and give insights into underlying biology (18).

The study builds on an earlier study by this group, comparing ME/CFS (three levels of severity; 61 people), MS with fatigue (16) - and healthy controls (21):
Our previous pilot study demonstrated that a comparison of SCRS could distinguish between ME/CFS patients and healthy controls, and identified a potential PBMC biomarker for ME/CFS (19). Here, we built on our pilot study and further assessed the diagnostic potential of a blood-based platform using single-cell Raman spectroscopy and state-of-the-art ensemble learning classification models to discriminate ME/CFS from two control groups. We also evaluated the capability of the approach to differentiating between different ME/CFS disease severity groups, including mild, moderate and severe.

ME/CFS was defined broadly - CCC or CDC 1994 (I think that's Fukuda) and PEM, although it was noted that many of the participants met both criteria. Severities were based on SF-36 Physical function scores and people classed as Severe being house-bound. There was some effort to exclude people with other conditions presenting similarly to ME/CFS e.g. by blood and urine testing.

All Raman measurements were blinded in this study. Figure 2A presents the averaged SCRS of single PBMCs from the HC (number of cells = 410), ME/CFS (number of cells = 1151) and MS cohorts (number of cells = 594) at the fingerprint region (300–1800 cm–1).
There's of course big differences between cell types - I'll be interested to see if the authors acknowledge this. We have seen some studies lately focusing in on percentages of specific cell types and doing experiments on those specific cell types.
 
Results

I agree with @chillier that there doesn't appear to be a lot of differences in the spectral fingerprints, as shown in figure 2a that was copied in chillier's post (the one with the red arrow added). I find it a bit amazing that the mean signatures are so similar, give the potential for differences in the percentage of cell types and cell ages.

However, Figure 2c plots the differences between ME/CFS and Healthy control groups (the red line) and between MS and healthy controls (blue line). The green straight baseline is the healthy control. And there are interesting differences.

Screen Shot 2023-06-23 at 11.22.32 am.png
 
Screen Shot 2023-06-23 at 11.31.11 am.png

Figure 2D presents the differences on two axes of variation for each of the 1151 cells; Figure 2E presents the differences for each individual. Again, green is healthy controls; red is ME/CFS and blue is MS.

I found the differences in 2D pretty surprising - I would have thought differences between different cell types would have resulted in a lot more overlap. But I think my surprise is the result of expecting the analysis that chillier mentioned, a principal component analysis, which is done to separate data into groups. Instead, the analysis is a linear discriminant analysis (LDA). If I'm understanding things correctly, this sort of operates in the reverse - you tell the analysis that the groups are different and try to find sources of variation that account for the most separation between the groups.
Wikipedia said:
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events.
(This webpage explains LDA - link - I found the video good, although a little annoying, especially in the beginning. It gets better - it is 15 minutes though.)
I think the 63% for the first axis LD1 in the cell analysis (Figure 2D) is pretty good. Maybe one of you have some background in this to comment?

Figure 2E, an LDA at the level of the participants, with the three groups (HC,ME, MS), also looks pretty good.

There's some intriguing separation of the ME/CFS severity levels in the LDAs of Figure 2E and F when a third axis is considered (LD3).
Screen Shot 2023-06-23 at 12.11.24 pm.png

Figures 2H and 2I present LDAs for the five groups: Healthy controls, MS, and mild, moderate and severe ME/CFS at the cell and individual levels respectively.

Screen Shot 2023-06-23 at 12.14.15 pm.png
I'm having a bit of trouble understanding why these two analyses look so different to 2D and 2E, why it becomes so much harder to separate the groups when ME/CFS is broken down into three severity groups.
 
Identification of the differentiating molecules

Figure 3 gives charts comparing the identified levels of tryptophan, tyrosine, phenylalanine, glycogen, glycerol, unsaturated fatty acids, cholesterol +cholesterol esters and glucose. There are significant differences here, but also substantial overlaps. It's hard to know what to make of them. The size of the healthy control and MS cohorts were small.

I note that the earlier study found that phenylalanine levels might be a potential biomarker of ME/CFS, finding higher levels of phenylalanine than in controls. In this study, the results were different - individuals classed as having moderate or severe ME/CFS had lower levels. So, that is a bit of a blow, and casts some doubt over all of the findings.
Quantification of intracellular phenylalanine, on the other hand, suggests metabolic subtypes existing in the ME patients, with the moderate and severe groups having significantly reduced phenylalanine and the mild ME and MS having increased levels relative to controls.
Screen Shot 2023-06-23 at 12.43.25 pm.png
 
There is some work done on producing diagnostic models, but I don't find that sort of thing very interesting when we aren't sure if the differences that drive the models are real.

The discussion notes the difficulty of identifying the severity levels of people with ME/CFS, including because of fluctuations in severity. It is certainly a problem, as we have discussed here many times before.

On tryptophan, I find the discussion confusing.
Here's the chart of the intracellular tryptophan levels
Screen Shot 2023-06-23 at 12.54.54 pm.png
There is enormous overlap between tryptophan levels in the cells of people in each group. Look at the range of results in the healthy controls (green, on the left). Means are slightly higher in the disease groups, as compared to the controls, but it's hard to make a coherent story.
discussion said:
Tryptophan, a necessary amino acid that causes significant changes in mood and fatigue, is the precursor to serotonin and kynurenine. There was a particular decrease in the tryptophan in all disease groups, possibly suggesting changes in the kynurenine pathway and NAD biosynthesis (32). Reduced neuroactive tryptophan metabolites could induce central fatigue via neuronal mechanisms, which is a hallmark of both ME/CFS and MS (39). It has been proposed that high levels of tryptophan in the immune cells of ME/CFS patients could link to a metabolic trap hypothesis (40). The hypothesis proposes that patients are unable to generate kynurenine due to mutations in the Indoleamine 2,3-Dioxygenase 2 (IDO2) gene, causing a build-up of tryptophan inhibiting the production of kynurenine, via repression of the more catalytically active IDO1 285 isozyme. Our data also suggests a build-up of tryptophan in the PBMC cell fractions in ME/CFS patients. However, PBMC fractions contain mixed cell populations; cells expressing IDO1 and IDO2, including myeloid and plasmacytoid dendritic cells, only make up a small percentage of the PBMC fraction. In the future, this study should be continued using IDO1 and IDO2-expressing cell types.

The discussion says that there was a decrease in tryptophan in the disease groups - but there wasn't. There were a slightly increased means and a lot of overlap. It then says that their data suggests a build-up of tryptophan in ME/CFS patients, in line with the tryptophan trap hypothesis. But, the data don't really support that either - many healthy people have similar levels and some healthy people have much higher levels. Have I misunderstood something there?

The discussion goes on to speculate about the causes of the differences, but I find it hard to get enthusiastic about such speculation until there is more replication.



In conclusion, I really like the application of single cell Raman spectroscopy to ME/CFS, and I like the inclusion of a disease control. But, I'd like to see the tool applied to more precisely defined cell types, including tissue cells.

I agree with the authors that it would be good to use fresh cells in an analysis - I'd like to see some comparisons of outcomes using fresh cells as compared to frozen ones.

I don't think we have any particularly solid finding here yet. I hope the team get more funding for further application of single cell Raman spectroscopy though.
 
Last edited:
Thanks for the breakdown @Hutan ! The problem with supervised machine learning of any kind - where you first tell the algorithm what the groups are (ME, MS, Controls) and then tell it to separate them as best as possible - is that if you have a large enough amount of 'features' for each sample you will always find some way to be able to separate the groups no matter what. This is because there will be some noise in the data that just so happens to divide the data up by chance.

We've seen this a lot in metabolomics data where for each sample (a patient) they have thousands of features (measurements of thousands of different metabolites). They then go on to train some machine learning classifier and get good results able to distinguish the patients and controls. The usefulness of their classifier can only be demonstrated if they then apply it to a completely independent set of test data and see if they get good separation. If they don't do this step then it's basically useless.
 
Last edited:
I've modelled this in R here using data with no signal and only random noise.

In this paper they have about 1000 features (readings for 1000 different wavelengths) over 1000s of cells - so its dimensionality is high. I've generated a dataset for 1000 'samples' each with 1000 'features.' The dataset is populated with random decimal values between 0 and 1, so there is no pattern only noise. I've then assigned each of the samples randomly to a group number 0, 1 or 2 to emulate groups of (controls, ME or MS).

Here is a scatterplot of features 1 and 2, each dot corresponds to a sample. You can see there's no pattern:
upload_2023-6-23_9-40-3.png

I've then split up the data into two parts, 70% of the samples will be used to train an LDA model to predict the groups from the data, and the remaining 30% will be used to test it. When you plot the first two LDs from the training data you can see it separates the groups amazingly - based off of absolutely no real signal at all. I was surprised at just how strongly this resembles the plot in the paper:
upload_2023-6-23_9-42-41.png

Then when you go on to use the trained model to predict the groupings on the test data you can see it can't do it at all:
upload_2023-6-23_9-44-34.png

here's the R code if anyone wants to retry:
library(tidyverse)
library(MASS)

# Set the seed for reproducibility
set.seed(12345)

# Generate a dataframe with 1000 datapoints and 1000 features drawn randomly from a uniform distribution
df <- as_tibble(replicate(1000, runif(1000)))
# assign each datapoint to a random group (ie could be ME, MS and controls)
grouping <- c(rep(0, 333), rep(1, 333), rep(2, 334))
# Add the factor column to the dataframe
df <- df %>% mutate(grouping = factor(grouping))

# Plot the first two features as a 3D scatter plot
ggplot(df, aes(x = V1, y = V2)) +
geom_point() +
labs(x = "Feature 1", y = "Feature 2") +
theme_minimal()

#Use 70% of dataset as training set and remaining 30% as testing set
sample <- sample(c(TRUE, FALSE), nrow(df), replace=TRUE, prob=c(0.7,0.3))
train <- df[sample, ]
test <- df[!sample, ]

# training the model using the lda function and the training dataset
model <- lda(grouping~., data=train)
#plotting the LDA for the training data
lda_plot <- cbind(train, predict(model)$x)
#create plot
ggplot(lda_plot, aes(LD1, LD2)) +
geom_point(aes(color = grouping))

# predicting the groups using the model on the test data
predicted <- predict(model, test)

# plotting the LDA for the test data
lda_plot <- cbind(test, predicted$x)
#create plot
ggplot(lda_plot, aes(LD1, LD2)) +
geom_point(aes(color = grouping))
 
Last edited:
Back
Top Bottom