Preprint Identification of a multi-omics factor predictive of long COVID in the IMPACC study, 2025, Gabernet et al.

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Identification of a multi-omics factor predictive of long COVID in the IMPACC study
Gisela Gabernet; Jessica Maciuch; Jeremy P Gygi; John F Moore; Annmarie Hoch; Caitlin Syphurs; Tianyi Chu; Naresh Doni Jayavelu; David B Corry; Farrah Kheradmand; Lindsey R Baden; Rafick-Pierre Sekaly; Grace A McComsey; Elias K Haddad; Charles B Cairns; Nadine Rouphael; Ana Fernandez-Sesma; Viviana Simon; Jordan P Metcalf; Nelson I Agudelo Higuita; Catherine L Hough; William B Messer; Mark M Davis; Kari C Nadeau; Bali Pulendran; Monica Kraft; Chris Bime; Elaine F Reed; Joanna Schaenman; David J Erle; Carolyn S Calfee; Mark A Atkinson; Scott C Brackenridge; Esther Melamed; Albert C Shaw; David A Hafler; Al Ozonoff; Steven E Bosinger; Walter Eckalbar; Holden T Maecker; Seunghee Kim-Schulze; Hanno Steen; Florian Krammer; Kerstin Westendorf; IMPACC Network; Bjoern Peters; Slim Fourati; Matthew C Altman; Ofer Levy; Kinga K Smolen; Ruth R Montgomery; Joann Diray-Arce; Steven H Kleinstein; Leying Guan; Lauren I R Ehrlich

Following SARS-CoV-2 infection, ~10-35% of COVID-19 patients experience long COVID (LC), in which often debilitating symptoms persist for at least three months. Elucidating the biologic underpinnings of LC could identify therapeutic opportunities.

We utilized machine learning methods on biologic analytes and patient reported outcome surveys provided over 12 months after hospital discharge from >500 hospitalized COVID-19 patients in the IMPACC cohort to identify a multi-omics "recovery factor".

IMPACC participants who experienced LC had lower recovery factor scores compared to participants without LC. Biologic characterization revealed increased levels of plasma proteins associated with inflammation, elevated transcriptional signatures of heme metabolism, and decreased androgenic steroids in LC patients. The recovery factor was also associated with altered circulating immune cell frequencies.

Notably, recovery factor scores were predictive of LC occurrence in patients as early as hospital admission, irrespective of acute disease severity. Thus, the recovery factor identifies patients at risk of LC early after SARS-CoV-2 infection and reveals LC biomarkers and potential treatment targets.


Link | PDF (Preprint: BioRxiv) [Open Access]
 
I think this study may suffer from an overly permissive view of what Long Covid is. That, coupled with their hospitalised samples means that the biochemical markers are not necessarily relevant to LC ME/CFS.

The Recovery factor is the combination of the parameters that best differentiated their model training recovered versus LC cohorts. Higher 'Recovery Factor' values are better. I don't think the results are overly impressive.

Screen Shot 2025-02-15 at 3.39.07 pm.png

They only had about 500 people in their sample, and only 80 of those seem to have the LC 'physical deficit' that they built their model on. Slice and dice that into the 80% training and 20% test, and take into account that they had a very large number of measurements (nearly 7000) and I don't think they got very good separation of the LC and Recovered groups.

We focused on predicting LC in the Convalescent cohort from the multi-omics immune profiling data collected during the convalescent phase. Since the binary LC labels (presence or absence of LC) per participant could omit valuable information captured by the numeric PRO measures at each participant visit, we constructed separate SPEAR models to generate supervised factors including PRO measure scores (SPEAR Physical, SPEAR Cognitive, SPEAR Mental, SPEAR Impact, SPEAR Dyspnea) or the LC binary labels (SPEAR LC) as response variables (Figure S2A).

The lasso model trained on the SPEAR Physical multi-omics factor achieved the highest predictive performance as evaluated with the area under the receiver- operating characteristic curve (AUROC) (Figure 2A).

As far as I can see, they subsetted their LC cohort, and then played around with models, selecting the model that performed best for them. So, for example, the model predicting people who had ongoing breathlessness didn't make the cut. It's just another opportunity for bias.
 
Last edited:
The SPEAR Physical model identified 26 analytes across four assays that were significant in the recovery factor (SPEAR Bayesian posterior selection probability ≥ 0.95), and we performed individual associations of these features with LC status in the test cohort, adjusting for age and sex (Figure 3B, Figure S5).

Of these, DNER (Delta And Notch-Like Epidermal Growth Factor- Related Receptor), a non-canonical Notch ligand that has been implicated in promoting tumor growth and metastasis and in supporting wound healing48,49 was significantly reduced in LC participants, consistent with a prior study of plasma proteomics in LC subjects28. The remaining serum Olink analytes were negatively associated with the recovery factor.

In particular, they included proteins and cytokines associated with chronic inflammatory conditions50–55, particularly endothelial/vascular inflammation (FGF23, FGF21, CXCL9, TNFRSF11B and TNFRSF9 (CD137)), as well as inflammation-associated myeloid regulators56–58 (MMP10 and CSF1). Elevated levels of IL10RB have been previously associated with worse outcomes in acute COVID- 19 infection59, consistent with elevation under inflammatory conditions. LRG1, a protein elevated in LC participants, is induced by IL-6 and other inflammatory cytokines and has been implicated in angiopathic activity60–62.

Phenylacetylglutamate and phenylacetylglutamine are gut microbiota- derived metabolites associated with vascular inflammation and thrombosis63. Finally, the OSBP2 (ORP4) transcript, which encodes an oxysterol binding protein64, was a leading edge gene in the Hallmark Heme Metabolism gene set that was elevated in LC participants.
 
androgenic steroids
Several metabolites from the androgenic steroids pathway were represented in the 26 significant analytes and were positively associated with the recovery factor, indicating higher levels correlate with better physical function. When we tested these metabolites for their individual association with LC status, five (DHEA-S, epiandrosterone sulfate, androsterone sulfate, 5alpha-androstan- 3beta,17beta-diol monosulfate (2), 5alpha-androstan-3beta,17alpha-diol disulfate) were significantly lower in LC participants, adjusting for age and sex (Figure 3B).

Androgens can suppress inflammation65, suggesting that the higher level of androgenic steroids in participants of the MIN group could reflect better control of chronic inflammation. These findings are consistent with prior reports showing lower levels of sex hormones in LC31. Five metabolites related to pregnenolone were also represented in the significant SPEAR analytes (Figure 3B).

Pregnenolone is synthesized from cholesterol as the first step of the steroid hormone biosynthesis pathway and is known to have potent effects as an inhibitor of inflammation66 and as a neurosteroid67. Altogether, these findings are consistent with a prominent role for persistent inflammation in LC with dysregulation of key analytes that may contribute to symptoms in LC, including elements that drive angiopathy, reduce wound healing, and alter heme metabolism.

Figure 3b shows that of the 15 or so steroids, cortisol is not one found to be different.


________
73 unique features they believe are particularly useful in separating the recovered and LC groups
The feature sets from heme metabolism and androgenic steroids identified by GSEA analysis combined with the significant SPEAR analytes represent 73 unique features that potentially condense the predictive power of the recovery factor into a smaller feature set. To test this hypothesis, we calculated the geometric mean of the 43 leading edge heme metabolism and 12 androgenic steroid features, as well as the 26 significant SPEAR analytes. All three geometric mean scores were independently significantly associated with LC in the test cohort (Figure 3C). Furthermore, the combined score that includes analytes from all three feature sets discriminates MIN and LC participants with even greater significance (Figure 3C). Thus, while the recovery factor is comprised of weighted contributions from 6,807 features, we have identified a smaller set of 73 unique features that discriminates participants according to LC status in the convalescent period.
 
Last edited:
Consistent with our finding, the Hallmark Heme Metabolism pathway was previously reported by Hanson et al.29 as an enriched pathway in participants with persisting symptoms 1-3 months after acute SARS-CoV-2 infection compared to participants without persisting symptoms. This cohort comprised 102 participants, including non-hospitalized and hospitalized individuals29. To determine whether the same heme metabolism-related genes were dysregulated in LC participants in the IMPACC and Hanson et al. cohorts, we used the leading edge genes from the significant Hallmark Heme Metabolism pathway in our GSEA analysis (Figure S5A) and calculated the geometric mean gene expression in PBMCs from the Hanson et al. cohort29. We found that our heme metabolism leading edge genes significantly differentiated participants with persistent vs. resolved symptoms after COVID-19 infection at multiple time points in the independent cohort (Figure S5C), validating the reproducibility of the gene expression datasets and underscoring the importance of this subset of heme metabolism genes.
 
With the help of Grok 3, I realized that the study found several metabolites below that are related to dehydrotestosterone and more specifically 5 alpha reductase - related metabolites. Given my story ( I got ME/ CFS from Finasteride which is a 5 alpha reductase inhibitor) these results may be yet another important piece of the puzzle . Here is a part of the answer from Grok3

The presence of 5α-androstan-3β,17β-diol monosulfate and 5α-androstan-3β,17α-diol disulfate indicates involvement of 5α-reductase activity (which converts testosterone or DHEA intermediates into 5α-reduced compounds) followed by sulfation
 
Hi all! I wasn't sure if the forum would have some discussion of the preprint, or if that would come later when we finally got published. @Hutan thanks for your analysis and feedback so far. I'll provide some clarification from the text based on your comments below:

I think this study may suffer from an overly permissive view of what Long Covid is.
I see your point here. Since this paper is connected to others which have examined the same cohort, the definition of LC was derived from this paper, which performed hierarchical clustering on several patient-reported outcome measures. The purpose was to better identify a post-COVID deficit that was not solely based on the presence or absence of symptoms and that incorporated information across different health domains.

The four PRO groups were collapsed for the purposes of our study, with COG (primarily cognitive deficit), PHYS (primarily physical deficit), and MLT (strong deficit in both) combined into the LC label. The purpose of this was to gain more statistical power, since this was a convalescent cohort following hospitalization and we were not able to recruit X amount of LC vs control at the outset. The LC definition may still suffer from being too broad, though I think that is an unfortunate general trend in the field that needs to be addressed more formally.

The Recovery factor is the combination of the parameters that best differentiated their model training recovered versus LC cohorts.
Given the ambiguity of the LC label, myself and other co-authors actually pushed to train the supervised ML model on available PROMIS scores, rather than a binary or categorical label. So the best performing model (that gave us the recovery factor) is actually trained on predicting PROMIS Physical Scores. The ranking of analytes within the factor represents their relative importance in the model's ability to predict physical function for each participant at each measured timepoint.

There were several reasons for this choice:
1) This provided the opportunity for the algorithm to learn from data that was taken at the same time point as the patient reported outcome surveys were completed (as opposed to one categorical label applied to 2-4 different time points from each person, which may flatten time-dependent changes).
2) It allowed for the model to be trained on a more "objective" measure that was not defined by our previous analysis.
3) Since we still had the LC labels, we would be able to run additional statistical tests on the association between recovery factor scores and binary LC label, which could include additional corrections for age, sex, and sample collection site. This essentially tells us that training a model on only PROMIS Physical Scores actually gives you good enough information to predict a more general label of post-COVID deficit.

That last point is why all the graphs show the difference between MIN vs LC groups even though the model was trained on PROMIS Physical Scores (in addition to just being visually easier to see, as opposed to a bunch of scatter plots with a million dots). We also did additional analysis correlating the recovery factor with other clinical outcomes besides the MIN vs LC label.

I don't think the results are overly impressive.
I understand your assessment--it's not too impressive visually. The reason we were so excited about these results is that finding even a weak statistically significant multi-omic signature of such a potentially heterogeneous phenomenon like LC is a phenomenally difficult ML task. To use a dramatic example: it's not only finding a needle in a haystack, it's trying to find a needle that may or may not be perceptible at all with your available tools, which may also change every time you look in the haystack.

The main issue of big data approaches to ME/CFS and LC is the extremely low signal-to-noise ratio. The differences in metabolites or transcript levels associated with the outcome may be pretty subtle, especially when there's a massive amount of interpersonal variation in all the data that you measure, even between members of the same group. Other chronic illnesses like rheumatoid arthritis have the benefit of a very strong and persistent molecular signature which can be detected even despite interpersonal variation. Something like LC is a different beast--it was a distinct possibility that such a wide-net search wouldn't find any signal at all.

In particular, I think it's a strength of this paper that two of the 3 main findings were almost exactly replicated in other cohorts (the heme metabolism finding in the Hanson et al. LC cohort and the androgenic steroid finding in the Germain et al. ME/CFS cohort). Meaning that our shape-shifting needle in a haystack search actually found signatures strong enough to be consistent despite differences in disease definition and experimental design. In a field where inconsistent results are par for the course, I think this is something notable (though obviously I'm biased in my assessment).

As far as I can see, they subsetted their LC cohort, and then played around with models, selecting the model that performed best for them. So, for example, the model predicting people who had ongoing breathlessness didn't make the cut. It's just another opportunity for bias.
Selecting the best performing model would actually be the best practice in this case to avoid bias. Much of the patient reported outcome data suffered from missingness from factors outside of our control, and as in any study, we don't know ahead of time if the particular -omics data we collected would have a strong correlation with any of the patient reported outcomes. This might be due to the fact that a phenomenon like breathlessness simply won't be strongly reflected in PBMCs, or blood plasma, or serum cytokine levels. The AUROC analysis tells us that our best performing model does much better than what would be expected by chance. The reason for splitting the cohort into test and train is to allow for independent confirmation of the model's validity once you have chosen the best performing model based on the train data only--another standard best practice in the field.

If we tested a bunch of models and then ran the entire analysis of the paper with every single one of them only to choose our favorite results out of everything, that would be an example of cherry picking and bias. However, in this case, the best model was chosen before any test cohort data was used for validation and before any downstream analysis was performed, which is what was recommended by the biostatistics experts on our team.

Figure 3b shows that of the 15 or so steroids, cortisol is not one found to be different.
Cortisol was not detectable at all in our metabolomics assay, and even if it was, I would not necessary trust the results since cortisol has strong diurnal variation and the sample collection could not be done at the same time each day for everyone. However, as I wrote in the discussion, the androgenic steroid findings actually point to a rate limiting step upstream of all the steroid hormones, including testosterone and cortisol. I'm actually already looking into the implications of this finding, hoping to have something positive to report soon!

Happy to answer any other questions that come up, and grateful to see this interaction with our work.
 
The main issue of big data approaches to ME/CFS and LC is the extremely low signal-to-noise ratio. The differences in metabolites or transcript levels associated with the outcome may be pretty subtle, especially when there's a massive amount of interpersonal variation in all the data that you measure, even between members of the same group. Other chronic illnesses like rheumatoid arthritis have the benefit of a very strong and persistent molecular signature which can be detected even despite interpersonal variation. Something like LC is a different beast--it was a distinct possibility that such a wide-net search wouldn't find any signal at all.

Given the stark difference in symptoms between ME/CFS and healthy controls why would we expect that subtle differences in metabolites could play an important role in the disease? There are thousands and thousands of variables that can cause changes in given levels of metabolites that subtle shifts don't seem that impressive. In theory I can see how it could be useful as a clue to some other upstream process. But I think small shifts can be just as likely to result from some unrelated variable that we can't know and probably unrelated to whatever is causing symptoms.
 
Given the stark difference in symptoms between ME/CFS and healthy controls why would we expect that subtle differences in metabolites could play an important role in the disease? There are thousands and thousands of variables that can cause changes in given levels of metabolites that subtle shifts don't seem that impressive. In theory I can see how it could be useful as a clue to some other upstream process. But I think small shifts can be just as likely to result from some unrelated variable that we can't know and probably unrelated to whatever is causing symptoms.

Great question! It would all be dependent on the stoichiometry of the reaction at hand, whether baseline levels of the metabolite are even abundant enough to detect in a global metabolomics assay, and how much its downstream effects are actually amplified. For example, there might be a very vital reaction where, because the metabolite is supposed to get recycled as part of a cycle, a small difference in abundance may have a strong phenotypic effect because you've cut off the cycle at a choke point. As another example, sex steroids are actually incredibly sparse in the body. But because their binding with a receptor triggers such a strong cascade of gene transcription, a microscopic differences in levels may be the difference between gene programs turning on or not in a specific tissue.

It might also be a case where the "small shift" would be a big shift if we knew the exact right cell type to look at in the exact right conditions. If we're just looking at blood plasma in steady state, we might only be able to detect small differences in the excretion of a metabolite mostly used in one particular type of cell. If very little of that metabolite usually gets excreted from the cell anyways, the difference between healthy and control may be juuuuust below detection limits. Finding the "big shift" would be a matter of more hypothesis-driven studies, rather than a wide-net exploratory study like this one. But this type of study gives us a great place to start.

And you're exactly right that small shifts could be related to some unrelated thing--that's the point of doing a robust statistical analysis. Theoretically, you'd get some indications of whether an important latent factor is missing in the analysis. And if it the variance happens to be large overall, then you often need a very very high sample size in order for a small trend to come up as statistically significant.
 
Last edited:
However, as I wrote in the discussion, the androgenic steroid findings actually point to a rate limiting step upstream of all the steroid hormones, including testosterone and cortisol. I'm actually already looking into the implications of this finding, hoping to have something positive to report soon!
Happy to answer any other questions that come up, and grateful to see this interaction with our work.

Hello!

Could I ask what a rate limiting step is? I can guess (something upstream that limits 'how much/time' but then I come a bit unstuck as to whether that is in reaction to something etc) and I'm no biology expert so risk of me 2+2=5ing!
 
The differences in metabolites or transcript levels associated with the outcome may be pretty subtle, especially when there's a massive amount of interpersonal variation in all the data that you measure,
And intra-personal? i.e. dynamic/transient but still significant & relevant changes in the body? Could that be a confounding variable?

Appreciate your feedback and engagement here. You will not find a more motivated group to find the correct explanation and best therapies (or at least management practices) than patients.
 
Thanks very much for engaging here @jnmaciuch, it's much appreciated.

I need to read the paper again to try to understand exactly what was done.

But, in the meantime, I guess one question is, given the participants were 100% hospitalised during their acute Covid-19 infection and your model seems to have been built on scores of physical function, isn't it possible that the model tells us mostly about the multi-omics of someone who has lasting physical impacts from a severe Covid-19 infection?

Could your steroid (and other) findings be related to treatments given during and after the severe infection? Or the period (possibly ongoing) of reduced oxygenation?

Do you think the study tells us much about post-Covid ME/CFS?
 
Yes, thanks for engaging, @jnmaciuch.

I think studies of this sort are very useful, but I guess I would see the data rather in the same light as I saw those from Sjoerd Beentjes in Chris Ponting's group looking at ME/CFS.

I agree that looking for minor shifts is important because they may be indicating what is going on indirectly or in suboptimal sampling systems. The difficulty with identifying such shifts statistically in big cohorts is that you are likely to pick up systematic confounders. My conclusion in relation to the Beentjes paper was that some of the findings might well relate to confounders but that one or two unexpected results might be very important.

The most likely confounder I see for an ME/CFS cohort is a higher rate of other subclinical pathology such as glucose intolerance or low grade chronic infection which might take the subject to see more doctors and because of the hit and miss nature of ME/CFS diagnosis that would increase the chance of getting a (valid) ME/CFS diagnosis.

The worry I would have about post-Covid subjects is that those diagnosed with 'Long Covid' are almost certainly likely to include a higher proportion of people who had some form of subclinical health problem before getting Covid. So omics studies may pick up things like slight 'inflammatory signals' or again glucose intolerance. And comparing across studies, as you have, to see if we can identify 'usual suspect' confounders may be a very important part of methodology for ME/CFS-type illness with no structural pathology to guide us.

The specific niggle I have is authors' tendency to be vague in abstracts - referring to 'inflammatory pathways' when what is really meant is some particular set of cytokines or other signals. A key feature of ME/CFS, and I suspect LC, is that there isn't any inflammation as such. So if these pathways are active they are not being inflammatory pathways, they are doing something else.
And thinking out of the box 'something else' seems to me to be essential, whether we are thinking of immune complexes in RA or the mystery of PEM.
 
Hello!

Could I ask what a rate limiting step is? I can guess (something upstream that limits 'how much/time' but then I come a bit unstuck as to whether that is in reaction to something etc) and I'm no biology expert so risk of me 2+2=5ing!
Oh sure! It’s absolutely understandable to find it confusing since there’s a technical definition and a more colloquial meaning, and it can be hard to parse which is intended.

Technically, rate-limiting step refers to whichever step is the slowest in a chain of reactions where the output of one reaction is the input of the next. Think of cholesterol getting converted to pregnenolone and then a bunch of other forms before finally becoming testosterone or cortisol. Each individual reaction (e.g. cholesterol -> pregnenolone) has its own rate which is determined by a host of factors. In a chain, the slowest reaction sets the maximum rate for anything downstream, since those later reactions can’t happen without the first one. To use a more accessible example, the slowest driver on a one-lane street sets the maximum speed for everyone else. So, theoretically, if you can identify the slowest driver and measure their speed, you know the speed of everyone else.

According to my old bio prof, that definition has gone out of fashion since we now know that rates of multi-step reactions in biological systems are often regulated at many points (usually via the enzymes involved), not just the rate-limiting step.

So the colloquial use, which I’m using in the quote, generally tends to mean “the step at which things are getting held up,” but it doesn’t necessarily mean that you can derive the numerical rate of everything downstream by measuring that one reaction.
 
Last edited:
And intra-personal? i.e. dynamic/transient but still significant & relevant changes in the body? Could that be a confounding variable?
If I’m understanding you correctly, you’re referring to the changes over the course of hours, perhaps in response to something like exertion? That’s definitely a confounding factor in most studies where reduced tolerance to activity is at play, since you don’t know how much they’ve pushed themselves just to get to the sample collection site. In future studies where I’m a part of the planning phase, I definitely plan to bring this up.

The other instance where this might be relevant is changes over time, i.e. if someone started recovering after 6 months and still gave samples at 9 and 12 months. We did our best to address that by training the model on PROMIS Physical scores, so we can match up samples and reported physical function at that same point in time. The statistical analysis also included a random effect to account for patient-level differences, so that the differences over time are less obscured by inter-personal variation.

I hope this addresses your question!
 
But, in the meantime, I guess one question is, given the participants were 100% hospitalised during their acute Covid-19 infection and your model seems to have been built on scores of physical function, isn't it possible that the model tells us mostly about the multi-omics of someone who has lasting physical impacts from a severe Covid-19 infection?

That’s a good concern to have. I think the answer is yes and no (though more "no" than "yes"). Yes, because we did not have non-hospitalized patients to compare to. No, because not everyone displayed this signature despite the fact that everyone was hospitalized.

That’s another reason why we trained the model on PROMIS scores—they are normalized to the general, pre-COVID population. Meaning that if someone had a score of 50+ after hospitalization, they recovered enough to match the average physical function of the population. Now if someone was a marathon runner pre-COVID and a score of 50 was a downgrade, we wouldn't really be able to tell that. But considering how low pwME tend to score against the general population, it's a decent indication towards ME/CFS or an ME/CFS-like deficit.

Additionally, the heme metabolism signature validation in the other LC cohort included non-hospitalized patients, and the androgenic steroid signature was validated in a pre-COVID ME/CFS cohort, so it doesn't seem to be hospitalization that is driving at least 2/3 of our strongest signatures.

I'm planning to make an additional comment highlighting some of the additional analysis we did with acute phase data, which may provide more insight to your thoughts here.

Briefly, a prior analysis stratified patient trajectories during hospitalization, from more mild cases where they presented with chest pain but got discharged quickly, to more severe cases where they were on a ventilator for an extended period of time. Even comparing participants in the same exact trajectory group, the analytes in the "recovery factor" signature (that was identified only on post-hospitalization data) are actually able to distinguish, in acute phase data, who is going to go on and display long-term physical deficit. And it wasn't just the most severe cases that were going on to have low physical function scores in the convalescent phase.

Could your steroid (and other) findings be related to treatments given during and after the severe infection?
Our clinical team members asked the same question, and thankfully we had data about medication administered during hospitalization. dexamethasone was one of the ones tracked--my co-author did that analysis a while ago, but iirc it was determined not to be a concern.

Or the period (possibly ongoing) of reduced oxygenation?
Although oxygenation was only directly measured during hospitalization, hypoxia has a well-studied transcriptomic signature that would have been evaluated in the pathway analysis. That pathway did not come up as significant. It's not a perfect analog, but it is indicative.

Do you think the study tells us much about post-Covid ME/CFS?
That's one of the reasons I was so excited that the best performing model ended up being the one trained on PROMIS Physical Scores--out of the patient-reported outcomes that were measured, I felt that it would come the closest towards measuring ME/CFS. Obviously a more in-depth patient assessment would be needed to confirm ME/CFS. Some of the study participants are still being seen in some of the site LC clinics, so there is potential to do a more thorough diagnostic assessment for participants that meet ME/CFS criteria and retroactively label the samples from those participants. Unfortunately I can't make promises on that, but I can tell you I've already been looking into it.

Thanks for your questions! You and others are asking questions very similar to the ones we asked ourselves during analysis, which is always a good sign.
 
Last edited:
The difficulty with identifying such shifts statistically in big cohorts is that you are likely to pick up systematic confounders.
That concern drove a lot of our choices in the analysis. Initially we started off doing a completely unsupervised multi-omics integration approach, but it kept getting tripped up on variances in the population that were not at all correlated with patient reported outcomes. One of our team members pioneered a method for supervised multi-omics integration, so that's when we started working with training the models first. Since physical function might be correlated to other factors (most notably age), we included those as covariates in the downstream statistical analysis. Batch corrections were performed on the -omics data and statistical analysis included additional corrections for sample collection site and patient-level baseline differences appropriate for a longitudinal analysis.

Obviously this may not have addressed everything, but we did spend a lot of time on the issue of confounders.

The worry I would have about post-Covid subjects is that those diagnosed with 'Long Covid' are almost certainly likely to include a higher proportion of people who had some form of subclinical health problem before getting Covid.
I think a strength of this study is that we actually didn't recruit LC patients. We only followed participants after hospitalization, and then retroactively looked at their reported outcomes to see who had a clear post-COVID deficit. This does unfortunately introduce some additional bias as to who is most likely to get hospitalized in the first place. However, in my previous comment to @Hutan, I described some additional points regarding this potential bias and why this data is still valuable.

The specific niggle I have is authors' tendency to be vague in abstracts - referring to 'inflammatory pathways' when what is really meant is some particular set of cytokines or other signals. A key feature of ME/CFS, and I suspect LC, is that there isn't any inflammation as such.
I see your point. I ended up doing some additional literature review trying to characterize the specific inflammatory signature, and got some really interesting hits tying all of them to conditions of chronic vascular inflammation in particular (i.e. chronic kidney disease, aging-associated inflammation, etc.). There's more details on that in the results section. There was a bit of discussion and other authors had some preferences for going with a more general descriptor in the abstract. We were also coming up against a pretty strict abstract word limit for our journal submissions. Since we're looking at other journals now, it might be possible to change.

Thanks for your thoughts!
 
Last edited:
With the help of Grok 3, I realized that the study found several metabolites below that are related to dehydrotestosterone and more specifically 5 alpha reductase - related metabolites. Given my story ( I got ME/ CFS from Finasteride which is a 5 alpha reductase inhibitor) these results may be yet another important piece of the puzzle . Here is a part of the answer from Grok3

Thanks for sharing your story! Given that our study identified differences in both androgenic steroids and pregnenolone (the latter of which is upstream of the 5 alpha reductase-mediated reactions), my best guess is that the first conversion step in the steroid hormone biosynthesis pathway (cholesterol -> pregnenolone) is responsible for our signal. However, there would probably be some overlap between symptoms caused by impairment of everything downstream of cholesterol vs. impairment of 5 alpha reductase-dependent metabolites.
 
The vast majority of ME/CFS cases are not preceded by a serious infection requiring hospitalization. What is causing the symptoms in these typical ME/CFS cases may be quite different from what is causing symptoms in people who were hospitalized for covid.

Long covid is a broad category that includes many different problems, ME/CFS being just one of them.
 
Last edited:
Back
Top Bottom