Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Discussion in 'ME/CFS research' started by Andy, Feb 21, 2024.

  1. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    41
    I agree that HV F's data suggests that they were not performing the task in the way that it was meant to be performed but I don't think that is justification alone for removing them.

    If the study team came back and said "There was clear evidence of equipment failure during HV F's task. The button cracked in half during the task because they were extremely strong. We bought a new keyboard and tested it before administering the task to the next participant. The issue was documented by study staff in the following protocol deviation log [see scanned document]" then I'd be satisfied.

    If they just looked at the data and noticed that HV F completed the easy task at an abnormally low rate (which they did, only 2% completion on easy tasks) then I wouldn't consider that enough reason to remove the participant because participants in the ME group had similar completion levels on the hard trials. One participant had a 0% completion rate on hard trials--they completed 0 of the 19 they attempted.

    Which is actually the more important point here. There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check. So while they could argue that HV F was excluded because they had a low completion rate on easy trials, they would then need to exclude half of the ME patients on the hard trials. I believe that this difference in ability actually invalidates the findings completely but I'm curious to hear others thoughts.

    Here's where Treadway explains why the validity check needs to be performed, which others have already noted earlier in the thread:
     
    JoClaire, Zombie Lurker, Ash and 14 others like this.
  2. Sam Carter

    Sam Carter Established Member (Voting Rights)

    Messages:
    41
    Curiouser and curiouser.

    Before the clock started ticking each participant (I think) got four trial runs (they're marked as trials -4, -3, -2 and -1 in the data).

    In HV F's trial runs he scored 30 / 30, 30 / 30, 98 / 98 and 98 / 98 (a perfect run), but when the experiment proper started he scored 10 / 30, 9 / 30, 28 / 30, ... It's a pretty striking pattern. No wonder eyebrows were raised.

    Just to show how strange his answers are, here's a list of the participants and the percentage of tasks they completed successfully.

    HV A 86
    HV B 94
    HV C 100
    HV D 100
    HV E 98
    HV F 19 <--
    HV G 100
    HV H 100
    HV I 100
    HV J 98
    HV K 100
    HV L 100
    HV M 100
    HV N 98
    HV O 93
    HV P 100
    HV Q 100
    PI-ME/CFS A 70
    PI-ME/CFS B 77
    PI-ME/CFS C 100
    PI-ME/CFS D 52 <-- lowest ME completion rate
    PI-ME/CFS E 98
    PI-ME/CFS F 100
    PI-ME/CFS G 84
    PI-ME/CFS H 56
    PI-ME/CFS I 89
    PI-ME/CFS J 100
    PI-ME/CFS K 100
    PI-ME/CFS L 92
    PI-ME/CFS M 100
    PI-ME/CFS N 96
    PI-ME/CFS O 87


    All that said, it would still be useful to hear directly from the PIs what the justification for his exclusion was.
     
    JoClaire, Hutan, Fero and 9 others like this.
  3. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    41
    From what I can tell they only performed the GEEs using the binary ME vs Control variable in place of the anhedonia scales that Treadway used. They were essentially equating ME with anhedonia in the analysis to answer the question "Behaviorally, is ME just anhedonia?" Of course they don't frame it that way because the answer they found was no, ME participants did not act like people with anhedonia, and they instead invented the effort preference theory based on their one flimsy significant result (which isn't what Treadway designed the task to measure).
     
    Hutan, Fero, Simon M and 11 others like this.
  4. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    41
    Interesting, yeah I could see why they might justify that to themselves as being enough of an outlier to exclude them (even if I don't think its a legitimate justification given all the other factors). But this is also why it's important to specify a priori what criteria will be used to determine if data is valid, which of course they didn't do. I'll send them an email and report back.
     
    Hutan, Midnattsol, Jaybee00 and 15 others like this.
  5. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    Yes it is a tool that its creators have validated to 'operationalise reward wanting' in Treadway et al (2009). There are important but whole other questions on whether any of the changes mean it isn't valid, or make it measure these other things. Hence my cut-pastes at the bottom below (I'm sure there are better ones but it is something)

    On your main point/gist indeed. I've stopped short of starting to wonder whether it was because it was something off-the-shelf etc. But you can't validate something new on this sample size with this method? Or can you?

    I'm guessing, given that Walitt keeps using 'effort-preference' rather than 'reward-wanting' that maybe they weren't thinking it was relevant to phenotyping ME either?

    I'm also curious about the very specific choice of term in the data analysis part pasted by Simon M where he keeps saying 'emulating Treadway' when it comes to his tweaks/extras etc. As if he knows it isn't 'as per' the following which he says of the earlier part of the analyses before his additions: "Following the analytic strategy described by Treadway15". I've even looked up the definition: "to copy someone’s behavior or try to be like someone else because you admire or respect that person" which isn't the same.

    Then the next part notes he 'departed from':

    I'd love anyone to nail that or build on it, my brain currently is just about squeaking to regurgitate the odd copy-paste where something as a question makes me think of something I managed to spot or have come across in the past days - so it isn't that I don't think those are the most fundamental of the questions. And then working out how to communicate it in a way most laypersons might care to get to the end of....

    Does the following paragraph help? from the discussion (section 4.1 on reliability and limitations) of Examining the reliability and validity of two versions of the Effort-Expenditure for Rewards Task (EEfRT) | PLOS ONE :

    And this second one from section 4.2 LImitations and Future Directions - which seems to talk about validity when you are messing around with modifications and using it for different measures and cohorts:

     
    Last edited: Mar 1, 2024
    Hutan, Peter Trewhitt, Sean and 4 others like this.
  6. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    O wow.

    And the same again for you managing to run the GEE models they specified, it looks like no small task working that one out. I'm really glad to hear from you!

    Yes! although I've never done something like this before so I might need to be told what to do?
     
  7. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    I agree with your thoughts on this in relation to the validity check, it was pretty explicit from Treadway - I didn't have the figures and note that a 65% completion rate doesn't seem consistent with what Walitt has inferred when he talks about 'having check it isn't fatigue' in his paper either.
     
    Hutan, Laurie P, EndME and 6 others like this.
  8. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    except, and correct me if I'm wrong, in the GEE equations they were correlating 'more or less anhedonia-ness' to the various designated other variables defined in MOdel 1,2,....6 - in fact I didn't even check whether there is such a thing as an anhedonia diagnosis (or other diagnoses said students might have had) in Treadway et al (2009). They also used different specific parts of specific scales to narrow it down

    But if I'm hearing what you are saying they just used 'ME-CFS or HV'. When if you go by the SF-36 from memory the scale for ME-CFS was from something really low at the bottom end up to 75, where the EDIT: average bottom for HV was around 85. So if he was testing certain factors, (like "One new two-way interaction, the interaction of PI-ME/CFS diagnosis and trial number, was tested as well in order to determine whether rate of fatigue differed by diagnostic group.") then theoretically on that factor some ME-CFS could have been closer to other HVs than other ME-CFS, or vice versa if it was instead using a scale-based measure (like was used for 'anhedonia' in Treadway et al (2009)), but this wouldn't have been accounted for?
     
    Last edited: Feb 29, 2024
    Hutan, EndME, Amw66 and 3 others like this.
  9. Evergreen

    Evergreen Senior Member (Voting Rights)

    Messages:
    363
    I am unwisely logging on despite my need to not click buttons (so a pain flare will go down) because I think your point is absolutely key and needs to be amplified.

    If you're not successful at completing the hard tasks, then it would be illogical to keep attempting them. Because this changes the game - suddenly the reward choice is between $0 for a hard task because you won't be able to complete it and $1 or whatever for an easy task. More complicated than that if people are able to complete the hard task sometimes but still, it changes the game, and potentially flips the reward system on its head. The reward is higher for easy tasks than hard tasks if you cannot complete the hard ones.

    Is there a pattern in the data of non-completion of maybe two hard tasks followed by non-choice of hard tasks?

    This would show that patients are not avoiding the hard tasks, they're choosing to increase their chances of winning money.

    OK, enough clicking. (Does anyone else find double-clicking excruciating when pain is bad?)
     
    Sam Carter, Hutan, Binkie4 and 8 others like this.
  10. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    I've found the following paper, and it shows a bit more than the abstract but still not in full, Ohmann et al (2018): Left frontal anodal tDCS increases approach motivation depending on reward attributes - ScienceDirect

    This has the following, which seems to be consistent for the HVs


    PS I looked this paper up looking for more detail on the weakness of the EEfRT with relation to people employing strategies, hence was disappointed I couldn't see it in full.

    It is reference 28 in the Ohmann et al (2022) paper here:

     
    Last edited: Mar 1, 2024
    Hutan, Lilas, Binkie4 and 4 others like this.
  11. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    It's relevant and would be interesting to do.

    Ohmann et al (2022) did include a test of motoric abilities (but it wasn't significant 'as they tested it'), which also has a few references worth looking up:


    and then in the section 3.2.1 Original EEfRT—validity of basic task variables:



    However, I think that the following reference might be worth digging into further:
    Effort-Based Decision-Making Paradigms for Clinical Trials in Schizophrenia: Part 1—Psychometric Characteristics of 5 Paradigms | Schizophrenia Bulletin | Oxford Academic (oup.com)

    Particularly because its focus is very much about 'effort'. And uses different tests here to compare them.

    Of note in this is the fact that they calibrated the EEfRT, in order that 'hard' was a calibrated number of clicks to the individual - seemingly based on motoric tests beforehand.

    EDIT: ie they ran a motor test at the start and the number of clicks required for hard was individualised to ability.


    On the second reference (32) in the top paper, I can only see the abstract and snippets, which don't have clues as to what 'motoric' content there is: Incentive motivation deficits in schizophrenia reflect effort computation impairments during cost-benefit decision-making - ScienceDirect
     
    Last edited: Mar 2, 2024
  12. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    1,005
    Location:
    UK
    Thanks for running the GEEs - and yes, I completely agree. I think the participant exclusion is flawed but not that relevant because the 65% hard-task completion rate for PwME (thanks for running that analysis) shows the test was invalid for use in this study. Game over.
     
  13. Sid

    Sid Senior Member (Voting Rights)

    Messages:
    1,190
    Are you planning to write a letter to the editor? This right here invalidates their whole thing.
     
    ME/CFS Skeptic, Hutan, Lilas and 11 others like this.
  14. Evergreen

    Evergreen Senior Member (Voting Rights)

    Messages:
    363
    I agree completely.

    I have to say I missed the corresponding part in the results section when I read the paper, because the results section is above the methods. So when I read this, I thought "complete" meant "chose":
    It's only in the methods section that it is made clear that "complete" means "complete successfully".

    Maybe I'm particularly dense but I think lots of readers will miss that, and I'm pretty sure most readers will never darken the door of the methods section.
     
    ME/CFS Skeptic, Hutan, Lilas and 6 others like this.
  15. Evergreen

    Evergreen Senior Member (Voting Rights)

    Messages:
    363
    PS My interpretation on first skim does not, of course, make sense given that earlier in the same paragraph they say this:
    But still, readers have limits and biases and I'm not sure how many would go, hm, I'll check out the method.
     
    bobbler, Sean, EndME and 2 others like this.
  16. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,241
    I haven’t even gotten to looking at the actual data yet, but in case it hasn’t been mentioned yet, data on the following things would also seem interesting to me:
    • Was it mentioned whether someone was ambidextrous (seems unlikely at this sample size, but would still be possible)?
    • How often do HV do hard rounds after each other, how often do pwME do hard rounds after each other, how do these statistics change as the game progresses?
    • Did HV or pwME time out more often upon given certain choices?
    • As @andrewkq said ME patients had a significantly lower completion rate for hard trials, which could invalidate the data according to Wallits original paper. InTrait Anticipatory Pleasure Predicts Effort Expenditure for Reward” Treadway further states “There was also a significant negative effect of trial number, which is routinely observed in studies using the EEfRT [47,48], potentially reflecting a fatigue effect.” Have other papers looked at such things? Is there an analysis for completion rate of hard trials in pwME as the game progresses? What can we see in the choices of the first 4 test rounds compared to the choices as the trial progress? Do learning effects, dominate motivational effects? Other trials have found that expected value is a significant independent predictor of choice. It might be interesting to look at something like "real expected value" which would be a combination of expected value and probability of completing a hard task and whether that differs here to other studies.
    • It’s hard to exclude someone on the basis of that they are playing a strategy a posteriori. In a game everything can be considered a strategy, even randomly pressing a button. If you want to exclude certain strategies that you believe don’t capture the nature of the game or are strategies that are non-reflective of the psychological phenomena that you want to study, because they are outliers, then it’s most sensible to specify which strategies are not allowed/will be excluded before the game starts. Doing this a posteriori creates some problems, if not done for very specific reasons (like a broken button) or if not rigorously justified (other EEfRT studies also look at noncompliance of participants and I have started to look into this). Especially if all the results of your study depend on this exclusion. In a sample size this small there will often be statistical outliers that change your results depending on what you’re looking at, the authors should have known this. Depending on what you look at PI-ME/CFS D & H could also be “outliers” in terms of completion rate, whilst PI-ME/CFS B could also be an “outlier” in terms of how often they choose an easy task. If they had something like prespecified exclusion criteria for data this would seem very fair (there have been over 30 EEfRT studies, so they should have sufficient knowledge to do this). Only looking at completion rate looks like a bad a posteriori exclusion criteria to me (because the completion rate depends on the choices your given in the game, your capabilities, the results in your first rounds etc, i.e. it depends on your “strategy”), but who knows. If the authors reasoning is somewhere along the lines “his strategy is non-reflective of the average strategy in the population” then that reads more as a sign to me that your sample size isn’t able to effectively reflect the average population, especially if one “outlier” completely changes your analysis. Perhaps the authors can provide an analysis where “outliers” aren’t thrown out, but instead “averaged out” which is the expected behaviour you would see if your sample size was sufficiently powered and if your sample was reflective of the average population.
      • Note: I haven’t had time to look at the data yet, but quickly glancing over it, it’s already very clear that whilst the person that was excluded (HV F) has by far the lowest completion rate, he is also clearly playing a non-optimal strategy.
    • I will keep looking at other EEfRT studies to see how often people were excluded from the analysis and for what reasons and whether completion rate is one of those.
    • How capable and which choices do HV and pwME make at maximal possible income? I.e. what choices are made when both the option 88% win probability shows up alongside the maximal reward for the hard task $4.30 and how likely are wins in that scenario (it further seems sensible to me to look at this data at different probabilities and some intervals around the maximal reward).
    • The original 2009 EEfRT paper found gender to influence the results “We also found a main effect of gender, with men making more hard-task choices than women (F(1,59) = 3.9, p = .05). Consequently, gender was included as a covariate in all subsequent analyses.”. Is such an analysis in included in the intramural study (note in the PV group there are more proportionally more males than in the ME/CFS group). For most parts of the study they actually did a sex dependent analysis even if the sample sizes were miniscule. Was the same done here and if not what would the results of that be? I will have a look at some other EEfRT papers to see if sex differences is something that is commonly reported.

    Finally, as @Trish mentioned, the team said subsequent studies would be published. Given the focus point of the study, I’d be surprised if they wouldn’t publish a study called “The EEfRT in ME/CFS”. Apart from actually reading more about the "EEfRT" in general, having to first wrap my head around everything and still having to actually have a look at the data in the intramural study, that's one of the reasons why I don't think an immediate response makes sense. A response seems sensible to me, but one should at least wait for the answer @andrewkq is given and then decide on further steps.
     
    Last edited: Mar 1, 2024
  17. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    plus the added flaw of it being for incentives like $5 which 'pushing through' on the particular task makes little difference for anyway? and (classic for certain research we see elsewhere) the 'consequences' weren't being measured (given we get PEM, but there is also fatiguability and being very ill afterwards)?

    whereas in real life the scenario is 'I'll lose my job' or 'must pick up my child from nursery' or 'have no dinner' and sometimes the 'can't' wins out because you actually can't, and sometimes you push through and collapse (and may or may not hide it, because we do so in a way that others either don't see or choose not to see as 'collapse' given they are allowed to reframe it as saying 'oh she just didn't have breakfast I'm sure' etc), always you feel it afterwards.
     
    Zombie Lurker, Hutan, FMMM1 and 8 others like this.
  18. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,964
    And of course where they have used the EEfRT to claim that then it looks like the 'choice behaviour' is more than explained by the 'completion rate'
     
  19. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,842
    Location:
    Canada
    If this were a test of ability, it could be defended. But this was a test of motivation to a pointless task, and it seems to me like tester F made his effort preference clear enough. Assign a stupid test, get stupid results. When the test is about motivation, "I don't want to play, this is stupid" is a valid result, just not one that Walitt wanted.

    And that this one single outlier would have tipped the statistical significance says it all. The authors made a preference choice here, to preserve the 'validity' of their effort.
     
    Zombie Lurker, Lilas, EzzieD and 5 others like this.
  20. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,842
    Location:
    Canada
    Given that the test was designed to be about reward, not performance, and features a 96% completion rate for the hard task in its validation experiment, I don't see how this doesn't invalidate the entire test. Among many other reasons. This is much farther outside of the test's criteria than the one outlier F. The creator of the test explicitly states that it's not supposed to be limited by ability, and the BS interpretation is made strictly on the hard/easy task ratio. And yet here it is clearly limited by ability.

    Did the reviewers miss all of this? This they just not bother to look into this? Given the prominence this single test was given in the paper? Good grief this is ridiculous.
     
    JoClaire, Hutan, EzzieD and 6 others like this.

Share This Page