Systemic increase of AMPA receptors associated with cognitive impairment of long COVID, 2025, Fujimoto et al.

The scientist in me says you're right, but the patient can't help but be happy about drug trials.
The lay scientist in me says that the fastest way forward is to do the basic research first, unless a trial is the only way to check your hypothesis.

Trials are incredibly resource intensive and even most of the ones with a decent rationale will fail. We have limited resources and should minimise the expected time to the answer through systematic work, not playing the lottery and hoping to get lucky with a very long shot.
 
I think you're right that it theoretically shouldn't make a big difference, it's just such a bizarre choice that it makes me wonder whether somehow predicting age/sex/LC all together generates a model that performs better for LC prediction than training on LC alone. Off the top of my hand I can't think of how that might work mathematically, but it wouldn't be the first time I come across something unexpected like that in machine learning
Neither can I. Sounds more like an artifact of something they tried that didn't make it to the paper in the end.

Edit: I could Imagine that it might prevent overfitting in some scenarios. Maybe. But applied machine learning seems to be only 50% math. The rest is voodoo.
 
Last edited:
Okay these methods are really hard to follow from the minimal details and it was bothering me how every reread seemed to give a different interpretation. So to make sense of it I made a dedicated effort to map out what they did:

The supplementary figure schematic depicts the input data as the images themselves, but I think this is actually incorrect if they're just using a standard PLS model. The input would have to be in discrete data points already, so going by the text it seems to be the voxel-wise SUVR values. Basically for every identified "voxel" (a little portion of the brain), there's a corresponding signal value. Depending on how many "voxels" they defined, this could end up being a very large amount of features.

Therefore I was wrong earlier: it's not 2400 image slices. Since it said they used leave one "pair" out cross validation, I realized 2400 is the total of 30 LC x 80 HC participants.

In normal machine learning, leave-one-out cross validation is a method where you train the model N-1 number of times, leaving out and generating a predicted value for one data point each time. Theoretically this is the best option for reducing overfitting, since each data point is only used to generate a prediction once. Therefore, if looking at particular rare feature gives perfect classification for only 2 or 3 individuals, its contribution to the model should be basically nill for the vast majority of CV folds and it will end up not having much importance in the final model that aggregates weights across the CV folds.

So what they're doing for the cross-validation is training a model with all the voxel-wise SUVR values for every participant except for the ones from the held-out LC and HC individuals in that CV fold. But this is not truly leave-one-out cross validation since every participant is being "tested" multiple times in the combinatorial pairs. Meaning that the likelihood of overfitting features being retained in the final model goes way up.

For the performance metric, it seems like both of the participants in the leave-one-out-pair are getting a score predicting the likelihood of being LC or HC. Then according to the text they are averaging the score for each participant across all the folds that held them out. The ROC in Fig 4B plots the sensitivity/specificity of increasing "threshold" values used to determine whether a participant with a given score is classified as LC or HC. At the best performing threshold, the reported sensitivity is 0.912 and specificity is.

As I already noted, these metrics are not on a truly held-out test cohort which is what you would want to see to determine if this model is purely hung up on artifacts or actually generalizable.

Also as already noted earlier in the thread, the healthy control images were from a prior study. There is no mention in the text about doing any sort of batch correction. The fact that this signal appears higher in every single brain region shown in Fig. 2 makes it pretty obvious it's a global skew. So if you have consistently higher levels for every single one of your features, you can get a fantastic prediction model no matter which features you use.

@fst much more confident in this analysis rather than what I replied to you with initially, sorry if my going back-and-forth caused confusion!
 
Last edited:
Okay these methods are really hard to follow from the minimal details and it was bothering me how every reread seemed to give a different interpretation. So to make sense of it I made a dedicated effort to map out what they did:

The supplementary figure schematic depicts the input data as the images themselves, but I think this is actually incorrect if they're just using a standard PLS model. The input would have to be in discrete data points already, so going by the text it seems to be the voxel-wise SUVR values. Basically for every identified "voxel" (a little portion of the brain), there's a corresponding signal value. Depending on how many "voxels" they defined, this could end up being a very large amount of features.

Therefore I was wrong earlier: it's not 2400 image slices. Since it said they used leave one "pair" out cross validation, I realized 2400 is the total of 30 LC x 80 HC participants.

In normal machine learning, leave-one-out cross validation is a method where you train the model N-1 number of times, leaving out and generating a predicted value for one data point each time. Theoretically this is the best option for reducing overfitting, since each data point is only used to generate a prediction once. Therefore, if looking at particular rare feature gives perfect classification for only 2 or 3 individuals, its contribution to the model should be basically nill for the vast majority of CV folds and it will end up not having much importance in the final model that aggregates weights across the CV folds.

So what they're doing for the cross-validation is training a model with all the voxel-wise SUVR values for every participant except for the ones from the held-out LC and HC individuals in that CV fold. But this is not truly leave-one-out cross validation since every participant is being "tested" multiple times in the combinatorial pairs. Meaning that the likelihood of overfitting features being retained in the final model goes way up.

For the performance metric, it seems like both of the participants in the leave-one-out-pair are getting a score predicting the likelihood of being LC or HC. Then according to the text they are averaging the score for each participant across all the folds that held them out. The ROC in Fig 4B plots the sensitivity/specificity of increasing "threshold" values used to determine whether a participant with a given score is classified as LC or HC. At the best performing threshold, the reported sensitivity is 0.912 and specificity is.

As I already noted, these metrics are not on a truly held-out test cohort which is what you would want to see to determine if this model is purely hung up on artifacts or actually generalizable.

Also as already noted earlier in the thread, the healthy control images were from a prior study. There is no mention in the text about doing any sort of batch correction. The fact that this signal appears higher in every single brain region shown in Fig. 2 makes it pretty obvious it's a global skew. So if you have consistently higher levels for every single one of your features, you can get a fantastic prediction model no matter which features you use.

@fst much more confident in this analysis rather than what I replied to you with initially, sorry if my going back-and-forth caused confusion!
No problem at all. Appreciate you took the time for a deep dive!
 
Back
Top Bottom