Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Discussion in 'ME/CFS research' started by Andy, Feb 21, 2024.

  1. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,282
    Please excuse my break, I had to take some time away from this study, but I’m still very much up to this task and agree with the 4 ICMJE criteria. Google doc sounds good, I'm also fine with something like overleaf in case you used LaTex.

    After the correspondence @Murph had with Treadway and the correspondence I had with Ohmann, I'm not too sure whether they would join a letter or how sensible that would be. Both of them seemed rather satisfied with the use of EEfRT in this study and the conclusion the study made. Alternatively I was thinking of writing a message to Carmen Scheibenbogen (who has co-authored similar responses in the past and has many statisticians in her team), would that seem like a good idea?

    The most interesting part to me which I've begun looking at again, after taking a longer break, is still the effect of motoric ability on the results based on the precedence in the literature where multiple studies imply that participants with higher motoric abilities go harder more often. Of course we didn't see an obvious correlation (like percentage of completed trials not being directly correlated to percentage of hard tasks chosen), however apart from the things already mentioned in various discussions above there's things such as "a necessary condition for people to go hard in at least 20 out of the first 35 rounds is a 100% completion rate" and so forth and I've been trying to figure out how well these things hold up in those studies that found that motoric abilities influence the choice of going hard. If it turns out that other studies also have a similar necessary condition for participants to be beyond some treshold of hard trials etc, then this would strengthen the GEE you did where you accounted for certain cut-off rates (for example the 90% you used), or at least could help us find a valuable and not arbitrary cut-off (did you run the calculation with a cut-off at 99% and what did this yield?).

    Since there is enough evidence for the authors to argue that no calibration phase was needed in this study, I was also wondering whether you @andrewkq had evaluated whether something equivalent to the MaxMot which Ohmann uses reaches statistical significance (it's hard to say what exactly something equivalent would be since there is no motoric abilities/calibration phase but I think using the click count for one of the hard practice rounds and adding on top of that how many clicks would have been additionally be performed in the remaining time could be close enough, i.e. essentially taking the click rate of the first hard task in the trial rounds, and one could see if such if such an analysis holds up if one then looks at one of the high reward trials where the majority of people go hard in the beginning, something like taking the average click rate of the combination of trials 1, 4 ,11, 14 for whoever went hard in those trials).
     
    bobbler, Arvo, Braganca and 5 others like this.
  2. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    15,684
    Location:
    London, UK
    Yes, happy to review a draft with a view to joining authorship.
     
    bobbler, andrewkq, alktipping and 9 others like this.
  3. JoClaire

    JoClaire Established Member (Voting Rights)

    Messages:
    54
    Location:
    USA
    Thank you.

    I am compiling a list of concerns based on what I’ve seen so far.

    Many do not require further modeling/analysis. But may support/expand critique of logic/rigor.

    Others are areas of suspicion where strange data modeling/statistical practices may be distorting or misrepresenting results. These may require further modeling to validate.

    I included grip test in this evaluation.

    I hope to share today, depending on my capacity. I will tag you and @andrewkq

    (Andrew, I’m not sure of timing constraints/scope of letter to Nature. Some of the concerns may be relevant depending on context/timing. I’d love to support/review letter. Not sure if my stamina will allow me to fully support as a named contributor.)

    ________

    A quick concern is that many of the HV’s and pwME’s data have missing trials. Eg missing trials 3,5, etc.

    Is this something that you or others noticed, asked about? (based on data shared in spreadsheet earlier in thread.)

    I haven’t fully caught up.
     
    Arvo, Braganca, alktipping and 3 others like this.
  4. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,282
    I don't see any missing data. Where did you see that (do you have some specific examples)?
     
  5. JoClaire

    JoClaire Established Member (Voting Rights)

    Messages:
    54
    Location:
    USA
    Very disappointing but I guess not surprising.

    The paper directly cautions that the test could conflate ability and choices.

    The culture seems very protective of the published.
     
    Arvo, alktipping, Sean and 3 others like this.
  6. JoClaire

    JoClaire Established Member (Voting Rights)

    Messages:
    54
    Location:
    USA
    This is the data I'm using:

    HVA includes trials from Trial 1 to Trial 44, but only 38 observations, missing Trials 3, 6, 8, 11 and others.
    PI-ME/CFS A includes trials from Trail 1 to Trial 47, but only 33 observations; missing Trials 4, 6, 8 and others.

    Edited to add: There are other examples like this.
     
    Last edited: Mar 28, 2024
    Arvo, alktipping and Peter Trewhitt like this.
  7. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,282
    I am still not sure what you are referring to. Figure 3A is complete and there is no data missing as far as I see.

    For example: The data for HV A and trial 3 isn't missing. He chose hard and pressed 79 times. What do you mean by "observations"?
     
  8. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,127
    Location:
    Belgium
    I also don't see missing data, HVA has data for all trials in my dataset. Perhaps an error with uploading or something like that?
     
    Arvo, alktipping, JoClaire and 2 others like this.
  9. JoClaire

    JoClaire Established Member (Voting Rights)

    Messages:
    54
    Location:
    USA
    Facepalm. My bad. I double checked my filtering, but on reopening, data reappears. (The error was between the screen and the chair.)

    I'll go with my inner wisdom and rest now!!!
     
    Evergreen, Arvo, alktipping and 4 others like this.
  10. JoClaire

    JoClaire Established Member (Voting Rights)

    Messages:
    54
    Location:
    USA
    Yep. Filtering bug or user error.

    thanks!
     
    Arvo, alktipping and Peter Trewhitt like this.
  11. Dakota15

    Dakota15 Senior Member (Voting Rights)

    Messages:
    891
    @JoClaire @Murphy @ME/CFS Skeptic - my FOIA agent asked me what timeframe I'd like to audit from Walitt with the following terms:

    - Effort-Expenditure for Reward Task
    - EEfRT
    - Effort Preference

    If I state to start from 2016 (when he was named a lead researcher in the study), that could be a massive file sent my way. Just taking inventory on what you all think I should advise.
     
    bobbler, Arvo, alktipping and 5 others like this.
  12. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,127
    Location:
    Belgium
    Thanks for pointing this out and apologies for the typo about the p-value (I accidentally wrote 0.41 instead of 0.041). So when taking al trials into consideration in the GEE modelling, the results are quite similar so I don't suspect anything fishy here.

    I'm still not quite sure what they have done with this calculation because in the results for Trial 1, patients (6 out of 15) chose hard tasks more often than controls (3 out of 13). So this would result in an odds ratio of 0.34 instead of 1.65. They also refer to the 'probability' of choosing the hard tasks so I assume this refers to predicted results from the modelling and not the actual data. But if they use the predicted probability (which is continuous between 0 and 1, not categorical) why did they use a Fisher exact test on this?
     
    Last edited: Mar 28, 2024
    Arvo, alktipping, Amw66 and 1 other person like this.
  13. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,127
    Location:
    Belgium
    My initial guess would be to ask for 2016 anyway: if it is too large, perhaps we could filter it ourselves but then at least we have the info. But I don't have any experience with these things so happy to hear what others think.
     
    Last edited: Mar 28, 2024
    bobbler, Arvo, JoClaire and 7 others like this.
  14. ME/CFS Skeptic

    ME/CFS Skeptic Senior Member (Voting Rights)

    Messages:
    4,127
    Location:
    Belgium
    This is what I got for the GEE modelling with the 3-way interaction:
    model_formula = "Successful_Completion_Yes_is_1 ~ Sex_Male_is_1 + Value_of_Reward + Probability_of_Reward + Trial + Expected_Value + is_patient + Trial * Trial_Difficulty_Hard_is_1 * is_patient"

    upload_2024-3-28_15-10-44.png

    Taking the results for the interaction Trial_Difficulty_Hard_is_1:is_patient, this would result in an odds ratio of approximately 100, rather than 27 so that is probably not the correct analysis.

    EDIT: a 2-way interaction between patient group and hard task choices results in an odds ratio of approximately 10, which is also what I get when I don't use interactions and filter the data to only include only hard tasks.
     
    Last edited: Mar 28, 2024
    Arvo, alktipping and Peter Trewhitt like this.
  15. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,978
    Location:
    Canada
    This really says that the entire premise of such tests is flawed beyond repair, and that they simply should never be used for anything or taken seriously beyond how Meyers-Briggs is.

    When you read how the creators of this test intend it to be used and the caution they place on its use, and how this use here doesn't respect them and they're fine with it, you get the general idea that basically nothing matters and that everything about those kinds of tests is arbitrary and ultimately as useless as any of the rules of alchemy or astrology.

    Which shouldn't be surprising. If there's anything about decades of clinical psychology applying bizarre tests it's that not a single one of them is actually useful at measuring the thing they claim it does, or at measuring anything. Not anymore than Meyers-Briggs assesses people's true personalities, or any quiz in a magazine will tell you which Succession character you are.

    The entire discipline is biased towards producing positive results, even when completely fake. Especially when you notice that the entire basis of belief in pla/nocebo is based around interpretation of such tests, and inability to understand that fuzzy results cannot be used as if they are cardinal and accurate.

    There is too much resting on continuing the lies, entire pillars of modern medicine are based on pretending that this is a legitimate way of assessing reality, so the pressure isn't even limited to psychology, it goes deep in the fabric of our modern economies, affecting government policies and some of the biggest industries on the planet.

    But of course how do we get people who have been trained to interpret those fake results to acknowledge that they are fake? When everyone arounds them is OK with them being fake? Even wants the fake results? Needs them? I don't think we can reason with people here, beliefs are not open to it. I still think our best chance will be with AIs, once they are able to reason and will refuse reasoning that depend on logical fallacies. But it's still worth arguing it beforehand, it's just unlikely to get any traction. Until everyone will "have always knew it was overhyped and unreliable".
     
    alktipping, Sean and Peter Trewhitt like this.
  16. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,282
    I think the crucial point is actually that the authors are aware of all of these problems but believe that they have sufficiently proven that the results in the study are not driven by ability and this is essentially what the other authors seem to agree with.
     
    bobbler, Evergreen, Arvo and 3 others like this.
  17. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    41
    Sounds good! I'll send the link as a private message.

    I had missed your message about Ohmann's response but I just went back and read it. That's disappointing, I don't get how you could see the discrepancy in hard trial ability and not conclude that this is invalid. I am still slightly (perhaps naively) optimistic that Treadway just didn't read the study closely before he responded to Murph and that he might change his mind if he understands how bad the completion rate is and the high level of disability inherent in ME. I tailored the letter to his comment that this can be thought of as an initial validation of EEfRT in ME and focused on how, when looked at from that perspective, the results show that it was a failed validation. I think it wouldn't hurt to present the facts to him and see if he'll join and I think it would drastically increase the likelihood of the letter being published if he did, but I'm not holding my breath.

    Reaching out to Carmen sounds good to me. I have a former professor/mentor who is a cognitive psych researcher and has a lot of experience doing stats for cognitive tasks. I think there's a good chance he'd agree to be a co-author if I asked him. What do you think of that? I think he'd be able to help refine the validity argument and he'd definitely have opinions about what the best argument is for showing statistically that ability confounds preference in the task. He's generally very motivated in get involved with social justice applications of science and is a bit familiar with ME because of my illness.

    Yes I ran this at < 100 vs. >=100 and it was significant (p=0.00042) and made diagnostic group non-significant (p=0.18), so same trend as the 90% threshold. It was also significant at 85% but not 80% or below. I think that can be explained as the result of decreasing sample size, because you only have 9 participants below 80% (compared to 10 below 85, 11 below 90, and 15 below 100). The argument could be that experiencing even 1 failed hard trial biases participants to pick fewer hard trials overall, which theoretically makes more sense to me than the dose-response that would be modeled by a correlation. I tried writing this up but it got kind of complicated, I think we'd have to cut from other parts of the letter in order to focus on this. I do think it's probably important to address though, because Wallit will almost certainly come back and say "There's no relationship between ability and preference so it's not a confound and the task is valid" if we don't address it.

    Let me think more about how we could include a pseudo-MaxMot, I think it would be difficult to explain our method there in a way that they wouldn't just dismiss as speculative.
     
    bobbler, Evergreen, Arvo and 8 others like this.
  18. Dakota15

    Dakota15 Senior Member (Voting Rights)

    Messages:
    891
    FYI the person processing my FOIA claim sent me this today: 'My understanding based on your descriptions is that you are concerned about potential research misconduct in the intramural study, and were looking at Dr. Walitt because he was the lead researcher. In consulting NINDS, we learned that Brian Walitt did not administer the EEfRT task. Brian Walitt also did not develop the version of the test used in the study. Nor did Brian Walitt conduct most of the data analysis. Results were reviewed and analyses were refined in discussions with Brian Walitt and others.'

    (I know this could be a smokescreen, but sharing)
     
    bobbler, Arvo, JoClaire and 9 others like this.
  19. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    41
    I'm really impressed with all of your work on this @Dakota15, thank you for everything you are doing I think it could be really fruitful. This strikes me as a smoke screen. This would be true of any PI on any large study, they are never the ones that administer the task or do the majority of the data analysis. Their job is to review the analysis results, refine the interpretation, and (sometimes) write it up. Perhaps you could expand to clarify that you are requesting all communications from any team members on the NIH study team regarding EEfRT, though Wallit and Nath are of particular concern because they are ultimately responsible for the study as the two leads.
     
    bobbler, Arvo, alktipping and 8 others like this.
  20. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,282
    I don't know if it's a smokescreen because I a priori don't see how it would matter who administered the EEfRT or who conducted most of the data analysis. Of course they are working together with other people. Whether it's a smokescreen would depend on the basis you used for filing for research misconduct (I don't see evidence of research misconduct, just bad research).

    The problem with the EEfRT is not that is being used in this study or that a certain choice of version of EEfRT was made and then incorrectly applied. It's more so that the conclusions of the study are held by a merely visible thread of overchewed bubble gum that depends on a very certain and arbitrary statistical interpretation that is very close to completely statistically insignificant.
     
    bobbler, Evergreen, Arvo and 7 others like this.

Share This Page