Hey everyone, I finally gathered enough spoons to sort through some of this data myself, and it does not look good.
I ran mixed effect models on the data from 4B and 4C (2way ANOVA cannot handle missing values). 4B had no main effect of block or patient group, and while there was an interaction present, my understanding is that when there is no main effect present, the interaction terms should not be considered.
Similar results for 4C:
Then I tried a regression analysis on the grouped data, this gave results closer to what was reported in the manuscript but still off.
For 4B (Dimitrov Index):
- Reported data for 4B: 0.2 ± 0.5 versus −0.43 ± 0.3, t(12) = 3.2, p = 0.008
- Simple linear regression in Prism10 for 4B: 0.0003470 ± 0.002101 versus -0.004309 ± 0.001137
In the 4B dataset at least, the HV regression does not significantly differ from zero, but the PI-ME/CFS group does:
For 4C (MEP amplitude):
- Reported data for 4C: −0.13 ± 0.2 versus 0.13 ± 0.2 MEP units; t(12) = 2.4, p = 0.03
- Simple linear regression in Prism10 for 4C: -0.1084 ± 0.07854 versus 0.1451 ± 0.1141
For the MEP dataset, neither regression significantly differs from zero:
So where did the significant values come from in the manuscript? I did a few different calculations, but it actually seems like they performed individual regressions on each subject, and then used THOSE values (slope of the line) to perform the t-test.
I plotted these out too, and found that apparently, only 2 of the 8 ME/CFS participants had regression slopes that were significantly different from 0:
4C is a similar story with only 1 ME/CFS participant showing a significantly non-zero slope, I don't think I need to paste in another table here but I can if anyone wants.
So when you take the best-fit "slope" values from these individual regressions, we finally get data that mirrors the reported statistical effects:
This matches up with the reported effects, with some errors in the order of magnitude of the reported effect size. The p=0.0077 would be rounded up to 0.008, and the t value to 3.2,
The same story is true for 4C:
This also matches up with the reported effects, with a
p value rounded down to 0.03 and t=2.4
I'm having a hard time wrapping my brain around this. I don't think I have ever seen a published analysis of this type of data that ran a t-test on individually derived slopes from a study with two dependent variables (patient group and block). This is giving off a lot of red flags RE p-hacking - I find it hard to believe that the NIH team would have jumped to such a convoluted analytical approach organically, especially considering the fMRI data in fig 4E were analyzed correctly using 2-way ANOVA. If I tried to get away with this kind of analysis as a graduate student I would have been rightly roasted by my thesis committee for misrepresenting data.
I'm still considering what to do about this moving forward. I was planning on publishing some of this on my blog and reaching out to Dr. Nath to see what he has to say about it.