Thanks @jnmaciuch. So maybe not quite as exciting as what I thought was happening which did seem too good to be true: lots and lots of synaptic genes coming up independently. As far as I understand your description, it's something more like, if the model sees some variants in, say, both NLGN1 and DLGAP3, then it says, might as well invite all the rest of the synaptic genes to the party.
On another note, I wonder if there's any reason to be concerned about a point brought up by @mariovitali's AI:
On another note, I wonder if there's any reason to be concerned about a point brought up by @mariovitali's AI:
I don't know much about the machine learning world, but most or all of the reported confidence intervals do seem very small, for example AUROC: 0.670 ± 0.003. Maybe with a test set this small, it's not actually very unlikely to get an AUROC of 0.670 just by chance.External validation is underpowered
The Cornell cohort (36 cases, 21 controls) gives an AUROC of 0.670, but the confidence interval is artificially tight because the authors repeat the same split 500 times; the effective N is still 57, so real world performance uncertainty is larger than portrayed.