Multi-omics identifies lipid accumulation in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome cell lines: a case-control study, 2026, Missailidis et

@jnmaciuch so a "coherent biological story" is the conversation about B Cells and lipids ? So maybe we are looking at a Single Point Of Failure (SPOF) in ME/CFS am I correct (so we do not care about ER Stress, impaired N-glycans, LXR downregulation, etc etc) ? If this is what you are implying and you believe at a SPOF being at play I have no further comments.
I’m sorry I don’t quite know what you mean. [Edit: I don’t think any chronic disease model is talking about a “single point of failure” except genetic disorders and that’s not what I’ve been discussing?] Maybe best to leave the conversation where it’s at to avoid derailing the thread.
 
Last edited:
I have a question about the treatment of missing values. At line 356 it is mentioned that missing values were replaced with the feature mean and that features with more than 50% of values missing were excluded. 50% of missing values is quite a high bar for exclusion (not a criticism, just an observation). Figure 3A of the lipid PC(O:38-4) is compelling, even exciting, but I'm wondering how many of the ME/CFS fold change values there are the result of an absolute value imputed due to missingness?
I don’t want to speak over Daniel but from the methods it seems like imputation was only used for the specific step of pathway analysis using MetaboAnalyst—individual feature comparisons would not use imputation.

It’s possible for imputation to skew enrichment of some pathways if controls had a much different proportion of imputed values than cases. The way you check for that is doing univariate tests for some of the top features driving your pathway of interest, which they [edit: showed in the main text] for at least a few features (I haven’t checked the supplementals)

That slight difference in cholesterol sulphate that does not survive adjustment for multiple comparisons is the sole driver of the MetaboAnalyst Pathway analysis (line 432) result of 'possible effects on steroid hormone biosynthesis'. Line 439 notes that in each of the 'dysregulated pathways' 'there was one metabolite that the dysregulation was attributable to'. Given all the uncertainties, it seems a very long bow to draw to suggest the slight increase in the mean cholesterol sulphate result is evidence of disease-relevant 'possible effects on steroid hormones biosynthesis'.
I think that part of the text refers to the joint pathway analysis--meaning that those 3 pathways had one lipid feature each, but the enrichment was also driven by features in other assays that are known to be part of the same pathway.
 
Last edited:
I think that part of the text refers to the joint pathway analysis--meaning that those 3 pathways had one lipid feature each, but the enrichment was also driven by features in other assays that are known to be part of the same pathway.
Here's Figure 1A. The caption and the text are quite clear that the things driving the selection of the three 'potentially dysregulated pathways' are each of the three features (ie one feature driving one pathway). The Steroid hormones biosynthesis datapoint is barely significant (I think that's before adjustment for multiple comparisons) and it rates approximately 0 on the x axis, which the caption suggests is a measure of pathway importance, which considers pathway enrichment.


"Figure 1: Potentially dysregulated metabolic pathways in ME/CFS LCLs are attributable to the levels of pyridoxal, thiamine, and cholesterol sulphate. (A) MetaboAnalyst Pathway Analysis of the recognised subset of the polar metabolome shows three significantly dysregulated pathways shown above the blue threshold line: vitamin B6 metabolism, thiamine metabolism, and steroid hormone biosynthesis. The Y-axis indicates significance, and the X-axis represents pathway importance, integrating pathway enrichment with degree centrality (see Methods)."

Screenshot 2026-01-14 at 11.29.39 AM.png

It's looking a little like a shoe-horning of the data to fit with recent claims elsewhere, especially given the relevance of the age differences between the cohorts of cell donors to that particular feature.
 
Last edited:
Here's Figure 1A. The caption and the text are quite clear that the things driving the selection of the three 'potentially dysregulated pathways' are the three features. The Steroid hormones biosynthesis is barely significant (I think before adjustment for multiple comparisons) and it rates approximately 0 on the x axis, which the caption suggests is a measure of pathway importance, which considers pathway enrichment.


"Figure 1: Potentially dysregulated metabolic pathways in ME/CFS LCLs are attributable to the levels of pyridoxal, thiamine, and cholesterol sulphate. (A) MetaboAnalyst Pathway Analysis of the recognised subset of the polar metabolome shows three significantly dysregulated pathways shown above the blue threshold line: vitamin B6 metabolism, thiamine metabolism, and steroid hormone biosynthesis. The Y-axis indicates significance, and the X-axis represents pathway importance, integrating pathway enrichment with degree centrality (see Methods)."

View attachment 30115

It's looking a little like a shoe-horning of the data to fit with recent claims elsewhere, especially given the issue with the age differences between the cohorts of cell donors.
Ah my bad! I thought that was the paragraph talking about Fig 2 not Fig 1 (since the bar plot showed some pathways having 1 lipid hit). I agree that having one upregulated metabolite doesn’t mean there’s evidence for enrichment of the whole pathway—I appreciate the text was upfront about there being only one metabolite at least.

As to your point about the possible influence of age—the finding that seems least likely to be influenced by that is the most striking finding in Fig 4. If anything, you’d expect less mobilization of lipids in B cells with age given the known decline of eg antibody protection from vaccines with age (and the increase in high cholesterol with age would be much more a function of changes in liver cells than B cells).

But the effect of age could be quickly checked just by doing a univariate association between age and total or average lipid content. Ideally you’d want to do a linear model with both age and disease status as covariates to fully rule it out but unfortunately there are too few participants for that.
 
It's looking a little like a shoe-horning of the data to fit with recent claims elsewhere, especially given the relevance of the age differences between the cohorts of cell donors to that particular feature.
I've just gotten back to the office so I haven't read the whole thread yet but just saw this comment (edit: I am editing in more responses as i see them). I'm not sure why it's raising a red flag for shoehorning so I'll explain what I did. I put the features into metaboanalyst and those three pathways came up as significantly affected. I reported what it determined just as it was described in MetaboAnalyst. I didn't manipulate or choose anything different.. it's an unbiased output of the software. The blue significance line wasn't drawn arbitrarily, it's the threshold that the software reported to me. I didn't linger on it and moved on to other results because I didn't think it was a clearly important result either (the polar pathway analysis that is, as we say it is based on individual metabolites which were not hugely different). I very intentionally chose not to belabour it further at that stage of the results section. I included everything that my analysis pipeline included for the sake of thoroughness and transparency even if particular results are or aren't convincing, and distributed my focus accordingly. I included a mountain of supplemental data for the sake of this completeness and transparency as well. I think my language was also pretty transparent in that it this little part of the analysis was an exercise included for thoroughness, just in case.

Regarding the age thing, I did look for relationships between age and total lipid and didn't see evidence of an effect. From memory I also did it for the altered features reported in the paper and didn't detect any relationships either.

Believe me, my approach in this paper was "put the data through unbiased tools and stats and then report the outcome neutrally with relevant context but minimum of interpretation" - I really didn't try to game or push anything in particular and I hope (and believe) that this is apparent in the text. I actually had a disagreement with a reviewer in a prior submission to another journal that was rejected... they wanted more of a "story" and claimed the data read too much like a neutral report of the observations. That was my intention and I didn't budge on it even to my own disadvantage in that instance. The intention was to, in the Results: have all of the data included and left to speak for itself, with some context but minimum of interpretation, and in the Discussion: then focus on interpretation and put forward the most compelling avenues for future work in the Conclusions. (hence why the polar pathway analysis outputs are not mentioned in specific terms in the Abstract or Conclusions). Hope this makes sense.
Missing values
I have a question about the treatment of missing values. At line 356 it is mentioned that missing values were replaced with the feature mean and that features with more than 50% of values missing were excluded. 50% of missing values is quite a high bar for exclusion (not a criticism, just an observation). Figure 3A of the lipid PC(O:38-4) is compelling, even exciting, but I'm wondering how many of the ME/CFS fold change values there are the result of an absolute value imputed due to missingness?
This is only for the metaboanalyst polar metabolite pathway analysis, the individual features I am showing in the paper and analysing elsewhere are all real data, no imputation. All of the scatter plots are real data. Most if not all of the lipids would have signal for every sample so there was no need to deal with missing values for those analyses. I tried very hard to present everything as transparently and readably as possible, hence the scatter plots with minimum stats graphics so as to let the points speak for themselves.

That slight difference in cholesterol sulphate that does not survive adjustment for multiple comparisons is the sole driver of the MetaboAnalyst Pathway analysis (line 432) result of 'possible effects on steroid hormone biosynthesis'. Line 439 notes that in each of the 'dysregulated pathways' 'there was one metabolite that the dysregulation was attributable to'. Given all the uncertainties, it seems a very long bow to draw to suggest the slight increase in the mean cholesterol sulphate result is evidence of disease-relevant 'possible effects on steroid hormones biosynthesis'.
This is all why I intentionally used softer (but I think transparent) language to summarise the MetaboAnalyst output and in specific terms the rationale for its inclusion, and then move on to the more interesting results which received more focus and emphasis, especially at the Abstract and Conclusions which is where people are going to go looking for the important bits. As I say, it is there for completeness, transparency, and just in case it is a piece of the puzzle. The focus is firmly on the more clear results.
 
Last edited:
Thanks @DMissa.
To be clear, I'm impressed with lots of things about this paper so far and I very much appreciate the transparency and readability of the paper and you being here to discuss it. I'm not suggesting you have aimed to misrepresent data, I know you wouldn't do that. I chose the wrong word in 'shoehorning'.

But, when I look at the 1B chart and then read that it is evidence of a possible steroid hormone biosynthesis dysregulation in ME/CFS, it doesn't seem quite right. So, I'm trying to understand.
Edit - There was a decision to undertake and present the pathway analysis on the polar metabolites despite not identifying any polar metabolite dysregulation, so there was a choice made there.
We decided to proceed with a subsequent pathway analysis of the polar metabolite data using a feature inclusion threshold of p < 0.05 as only a brief indicative exercise to ensure that any potential processes of note weren’t overlooked as possible false negatives

Missing values
This is only for the metaboanalyst polar metabolite pathway analysis, the individual features I am showing in the paper and analysing elsewhere are all real data, no imputation.
So, Figure 1B is just the actual values, and any imputed values aren't shown? How many missing values were imputed for the three polar metabolites in the Metaboanalyst metabolite pathway analysis?

Is it possible that the inclusion of imputed values affected the p values for the pathways? For example, if, as is possible under that rule of allowing up to 50% imputed values, a metabolite had a large number of missing values, and mean values were used to replace the missing values, wouldn't that improve the p values? The extra data clustering at each of the group means would surely make the groups look more different.

That's great that the lipid PC(O:38-4) chart consists only of actual values, no imputed values. I'm really looking forward to getting to the discussion about that and I hope you can get support for replication.


Steroid hormone biosynthesis
432 MetaboAnalyst Pathway analysis suggests possible effects on vitamin B6 metabolism, thiamine metabolism and steroid hormone biosynthesis
I'm glad the suggestion of a dysregulated steroid hormone biosynthesis didn't make it into the discussion or the abstract, but I'm still a bit concerned that that sentence in the Results (edit - it's actually a title in the paper) overstates the situation. The paper says
418 No polar metabolites satisfied the threshold for significance (FDR < 0.05) in ME/CFS LCLs using the Benjamini-Hochberg procedure for multiple comparison correction (23).
In any case, we identified no clear polar metabolite dysregulation in these LCLs

So, the possibility of a pathway being dysregulated seems to be based on only one polar metabolite that is actually not in itself significantly different, that isn't itself dysregulated. I get that if there were 4 metabolites that weren't themselves quite significant enough, but that were all in the same pathway, a valid case could be made for the pathway being dysregulated. But, when it is written that only one metabolite is driving the identification of a pathway, and that one metabolite is not significantly different between the ME/CFS cells and the controls, the lack of difference seems to be a problem.

The chart shows that the ME/CFS levels of cholesterol sulphate fit entirely within the range of the healthy controls - there are healthy control cells with both more and with less cholesterol sulphate than the ME/CFS cells. So yes, many things are technically possible, but it surely is extremely unlikely on the basis of that chart that that metabolite is the driver of a steroid hormone biosynthesis dysfunction in ME/CFS?

Figure 1B

Screenshot 2026-01-14 at 5.26.03 AM.png


Was there an adjustment for multiple comparisons in the MetaboAnalyst Pathway analysis?


Age
Regarding the age thing, I did look for relationships between age and total lipid and didn't see evidence of an effect. From memory I also did it for the altered features reported in the paper and didn't detect any relationships either.
It would be great if you could check the relationship between the actual values and age for LCL cholesterol sulphate levels.
 
Last edited:
Lipidome analysis
There were 454 lipids detected. The dataset was adjusted for multiple comparisons. Only one lipid was found to be significantly reduced in the ME/CFS samples compared to the controls, that's the PC(O-38:4).

This is Figure 3. Figure 3A shows the remarkable separation of the groups on PC(O-38:4)
Screenshot 2026-01-15 at 6.07.32 AM.png

Figure 3: A) PC(O-38:4) was the most significantly altered lipid in ME/CFS LCLs after correcting for multiple comparisons via Benjamini-Hochberg method (p =3.823 ×10-6, Mann-Whitney U-test). Data expressed as fold change relative to HC average abundance. B) PCA plot developed using PC(O-38:4) and DG(36:2) levels separates ME/CFS and HC LCLs perfectly. Original values are ln(x)-transformed. Unit variance scaling is applied to rows; SVD with imputation is used to calculate principal components. X and Y axis show principal component 1 and principal component 2 that explain 50.1% and 49.9% of the total variance, respectively. Prediction ellipses are such that with probability 0.95, a new observation from the same group will fall inside the ellipse.

526 Indeed, PC(O-38:4) levels combined with DG(36:2) levels (the most elevated lipid), showed clear clustering of ME/CFS and HC LCLs in PCA (Figure 3B), as might be expected given their preselection as highly altered features.

I don't understand why the PCA (Figure 3B) is there. The point of a PCA is to compress information from a whole lot of features into just two (shown on the x and y axes), If you are only going to use two features, you might as well just plot them against each other and be clear about what you are doing. Basically what that PCA is saying is that 'if we take only two of the lipids that are the most different between the ME/CFS and control groups, just two out of 454, then the PCA shows a separation. Well, yes.

I could take a large random data set for two groups, select the two most different features and plot them and always show a difference. To say that the two lipids account for 100% of the variability means very little when only two lipids are included in the PCA. @DMissa, you seem to be aware of the problem, with that note 'as might be expected given their preselection as highly altered features'. So, why is it useful?

To me, having that PCA there weakens the paper, it reduces credibility, and that's a shame with that really interesting result in Figure 3A. I would far rather have a plot of the lipid that was found to be most increased in the ME/CFS cells compared to the control cells (DG(36:2)) in the place of the PCA. Or plots for the 5 most different lipids.
 
There was a decision to undertake and present the pathway analysis on the polar metabolites despite not identifying any polar metabolite dysregulation, so there was a choice made there.
The choice was made prior to obtaining any data, I decided to be thorough and include the whole pipeline so as not to keep any results from the community. That includes negative results.
So, Figure 1B is just the actual values, and any imputed values aren't shown? How many missing values were imputed for the three polar metabolites in the Metaboanalyst metabolite pathway analysis
Correct, and none.
So, the possibility of a pathway being dysregulated seems to be based on only one polar metabolite that is actually not in itself significantly different, that isn't itself dysregulated.
It is based on the levels of that metabolite within context of levels of each metabolite in the whole metabolome, which is a different question to its levels in isolation.
The chart shows that the ME/CFS levels of cholesterol sulphate fit entirely within the range of the healthy controls - there are healthy control cells with both more and with less cholesterol sulphate than the ME/CFS cells. So yes, many things are technically possible, but it surely is extremely unlikely on the basis of that chart that that metabolite is the driver of a steroid hormone biosynthesis dysfunction in ME/CFS?
See prior comment
It would be great if you could check the relationship between the actual values and age for LCL cholesterol sulphate levels.
What I'm saying in the prior post is that I have done this, specifically.

To be clear, I'm impressed with lots of things about this paper so far and I very much appreciate the transparency and readability of the paper and you being here to discuss it. I'm not suggesting you have aimed to misrepresent data, I know you wouldn't do that. I chose the wrong word in 'shoehorning'.
All good, please don't mistake my tone as terse, I'm just time poor atm so writing with brevity.
I don't understand why the PCA (Figure 3B) is there. The point of a PCA is to compress information from a whole lot of features into just two (shown on the x and y axes), If you are only going to use two features, you might as well just plot them against each other and be clear about what you are doing. Basically what that PCA is saying is that 'if we take only two of the lipids that are the most different between the ME/CFS and control groups, just two out of 454, then the PCA shows a separation. Well, yes.

I could take a large random data set for two groups, select the two most different features and plot them and always show a difference. To say that the two lipids account for 100% of the variability means very little when only two lipids are included in the PCA. @DMissa, you seem to be aware of the problem, with that note 'as might be expected given their preselection as highly altered features'. So, why is it useful?

To me, having that PCA there weakens the paper, it reduces credibility, and that's a shame with that really interesting result in Figure 3A. I would far rather have a plot of the lipid that was found to be most increased in the ME/CFS cells compared to the control cells (DG(36:2)) in the place of the PCA. Or plots for the 5 most different lipids.
Yeah I'd seen what occurred with the Tate paper so I took this question to an accredited statistician we've worked with on other papers where PCA was used to test whether two features were able to produce clusters between clinical groups and they said that it's a valid application of PCA. In any case, the separation is clear from the scatter so I don't think anything is being misrepresented. Maybe scatter of the other lipid could have been better.. I'll take it on board for future analyses.
Was there an adjustment for multiple comparisons in the MetaboAnalyst Pathway analysis?
Yep, it's built in to the tool.

As I say, this early step in the analysis was included for completeness and I don't think I confidently stated anything to be happening that didn't have clear evidence behind it. I tried to be pretty deliberate with the language.
 
Were the adjusted p-values used in the paper? I see that there are FDR values in table S2 for the pathways, in which only vitamin B6 metabolism is below .05. But the text and figure 1A seem to be based on the raw p-values.
-Log10(p-value) is the standard choice for plotting things like enrichment of a bunch of pathways or a volcano plot, since FDR correction will leave you with a lot of redundant values. It's recommended to just use an additional visual indicator for which ones passed FDR or mention in the text/legend. I assume it wasn't mentioned here largely because it becomes a moot point anyways after explicitly stating that each finding was attributable only to one feature
 
PCA
Yeah I'd seen what occurred with the Tate paper so I took this question to an accredited statistician we've worked with on other papers where PCA was used to test whether two features were able to produce clusters between clinical groups and they said that it's a valid application of PCA. In any case, the separation is clear from the scatter so I don't think anything is being misrepresented. Maybe scatter of the other lipid could have been better.. I'll take it on board for future analyses.
Honestly, I'm a bit flabbergasted. After going to all that effort to highlight the problem with the use of a PCA in the Tate paper, one of my favourite ME/CFS scientists is aware of that and still commits even worse PCA abuse.... I don't understand why you would choose to do a PCA in this way.

I've checked a number of guides on the use of PCA and none recommend it for use in this situation of only two highly selected features, when interpretability of the features is what is needed. I don't think that any statistician would disagree.

The figure caption says 'PCA plot developed using PC(O-38:4) and DG(36:2) levels separates ME/CFS and HC LCLs perfectly.' But, I could take an entirely random dataset of 454 features, choose the two most differentiating independent features and produce a PCA that looks much like the one you have there. It's essentially extreme cherrypicking, ignoring the multiple comparisons it took to find those two differentiating features.

The fact that the two most differentiating features out of 454 features can separate the two groups is not at all remarkable. What is truly interesting here are the identities of the particular features that are separating the ME/CFS cells from the controls. We want to start thinking about why those particular features and not other ones? Could they tell us something useful about ME/CFS rather than just be a random result?

So, if you want to plot them against each other, please show their names on the chart, don't bury their identities under the PC1 and PC2 labels and confuse people about what has been found.

I'll take it on board for future analyses.
Hopefully it's not too late to fix it in this paper? Yes, a scatter plot of one or more of the most differentiating lipids would be so much better.
 
Were the adjusted p-values used in the paper? I see that there are FDR values in table S2 for the pathways, in which only vitamin B6 metabolism is below .05. But the text and figure 1A seem to be based on the raw p-values.
-Log10(p-value) is the standard choice for plotting things like enrichment of a bunch of pathways or a volcano plot, since FDR correction will leave you with a lot of redundant values. It's recommended to just use an additional visual indicator for which ones passed FDR or mention in the text/legend. I assume it wasn't mentioned here largely because it becomes a moot point anyways after explicitly stating that each finding was attributable only to one feature
That makes sense.

The problem is that there is a blue line and big blue text on Figure 1A indicating that three pathways are significantly affected.

Screenshot 2026-01-14 at 11.29.39 AM.png

Here are the p values and FDR from S2:
Screenshot 2026-01-15 at 8.38.23 PM.png

After correction for multiple comparisons, the p value for steroid hormone biosynthesis is 0.551, nowhere near significant. The line on the chart saying 'Significantly affected' is misleading. FDR is done for a reason.

The caption on Figure 1A adds further confusion:
454 MetaboAnalyst Pathway Analysis of the recognised subset of the polar metabolome shows three significantly dysregulated pathways shown above the blue threshold line: vitamin B6 metabolism, thiamine metabolism, and steroid hormone biosynthesis.
It looks as though there are 87 features that could have potentially contributed to a Steroid hormone biosynthesis and yet there is only one hit, only one feature that is suggestive of pathway dysregulation - and the pathway isn't significant after correction for multiple comparisons. It has an impact of zero.

I think it is important that this is fixed.
 
Back
Top Bottom