Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Discussion in 'ME/CFS research' started by Andy, Feb 21, 2024.

  1. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    Yep this starts to make it look like Treadway's description which emphasises 'especially when rewards are uncertain'

    is the more appropriate description vs Walitt's 'when probability and reward were equal' (which is describing something else - and not the data but what people were shown, so has he misunderstood the test too when he is saying this, or is it a misrepresentation he is aware of, or worse could what @Hutan has discovered be correct and there were no differential rewards and he is pointing at the 50:50 and the fact you win the same amount whichever one you pick?)

    as well of course as Treadway et al (2009) stating the EEfRT is describing 'reward wanting' (and used the choosing hard to 'operationalise that') rather than Walitt trying to invert that and claim he measures 'willingness to exert effort'.

    From the conclusion of Treadway et al (2009):
    "Based on a well-validated animal paradigm, the EEfRT operationalized reduced reward ‘wanting’ as a decreased willingness to choose greater-effort/greater-reward options, particularly when rewards are uncertain."

    Anyway, I'm aware that in the quote above the 'when the rewards are uncertain' relates to the overall set-up of the experiment/trial, but I think you are correct in pointing out that in effect logically 'the business-end' would theoretically be where probability is 50:50 and where a reward is neither at the low or high end of the spread of reward magnitudes

    - that is where effectively, if you were being purely logical about 'the game', it becomes a judgement-call rather than a no-brainer whether you waste 15secs extra time (if you aren't ill) on a 50:50 it might count $2-3 amount or 'guess' that by doing so you might miss out on some later-on higher probability and amount ones by doing so. It is where all the 'unknowns' come in. To label this as 'effort preference' seems an inaccurate description, before you take in all the additional factors of a cohort with an energy-limiting condition and the way he has thrown off all the balance of pros vs cons by doing that ( which Ohmann et al (2022) makes clear: Examining the reliability and validity of two versions of the Effort-Expenditure for Rewards Task (EEfRT) | PLOS ONE )
     
    Last edited: Feb 28, 2024
    Hutan, Ron, Kitty and 3 others like this.
  2. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    I thought it was worth pointing out a few other things, given that you are correct - if you think of the point/set-up of the game makes it 'the logical thing' to choose 'easy' for low probability and 'hard' for high probability:

    ME-CFS being lower % hard than HVs for low probability is 'more correct' (even as a 'reward wanting strategy' based on choosing the short task when it's likely to not count). The difference is 2.03%

    ME-CFS being higher % hard than HVs for high probability is also 'more correct' ie more 'reward wanting' as per Treadway definition. Except the difference is only .34% here

    But if the calculation Walitt has used rolled both of these into his 'overall % hard' (and didn't differentiate it by probability) he is basically including '2%' of behaviour that could be argued isn't consistent with a sensible strategy of 'reward-wanting' from HVs

    effort preference totals.png

    Anyway, there were more reasons I wanted to repost this.

    It's worth noting that there were only 15 ME-CFS vs 17 HV participants in this

    Even if Walitt had only used the 50:50 probability for his claims, then you'd be indeed looking at 30.86% ME-CFS vs 40.67% HVs choices being 'hard' when it is 50:50

    But what is worth underlining is the number/count that makes this up: 122-83 'choices' = 39 .

    It's worse than that too, because with 2 extra participants (so approx 15 of that 122 is just having extra people) for HVs 122/17 = 7.18 , for ME-CFS 83/15 = 5.53

    Which means that at the 50:50 probability-level he is basing his assertion on an average of 1.5 less tasks being chosen as 'hard' by each ME-CFS vs HV participant?

    Am I correct, or have I done something silly here?

    Because I'm almost rubbing my eyes in wonder if it is right..

    Or of course, given I'm talking averages, almost all of that could be accounted for by one or two ME-CFS participants passing out or something else meaning they had to select 'easy' for some reason that had nothing to do with anything.
     
    Last edited: Feb 28, 2024
    RedFox, cfsandmore, Hutan and 3 others like this.
  3. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    Oh and PS, given the basing claims on a difference of on average 1.5 more tasks being chosen as 'hard' by HVs vs ME-CFS at the 50:50 probability-level

    I also have questions about Power. Because maybe you could try and blag that meant something if you had 1,000 participants. But this was 15 and 17

    The original Treadway et al (2009) used 60 participants

    Ohmann et al (2022) used 120 participants

    and that latter study (a review: Examining the reliability and validity of two versions of the Effort-Expenditure for Rewards Task (EEfRT) | PLOS ONE) notes the following in its introduction section:

    "However, there are also various limiting aspects (see Table 1). First, the number of studies reporting a significant link between the behavioral measurements within the EEfRT and self-reported personality traits related to approach motivation is still small, although many studies refer to this link as validity evidence of the EEfRT.

    Second, the number of participants in studies which used the EEfRT has often been relatively small, resulting in low statistical power to detect effects sizes that can be expected in individual difference research [42]. "

    So even those larger participant numbers by comparison to Walitt are noted as having low statistical power


    PS, in case anyone is curious that reference '42' at the end of the quote is: Effect size guidelines for individual differences researchers - ScienceDirect
     
    Hutan, Sean, RedFox and 3 others like this.
  4. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    And on that last line I'm intrigued by the data on the SF-36 'physical function' scores for ME-CFS participants in @Karen Kirke 's comment

    Deep phenotyping of post-infectious myalgic encephalomyelitis/chronic fatigue syndrome, 2024, Walitt et al | Page 31 | Science for ME (s4me.info)
     
    Kitty and Peter Trewhitt like this.
  5. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    12,919
    Location:
    Canada
    So it's an element of gamification with an element of gambling, levering either or both effects. Neither of which has scientifically accurate models, it must be said, and gambling is all about the law of big numbers with a small number of outliers accounting for a huge % of the total, gambling addicts.

    But of course a problem with this is that not everyone cares about this stuff. Not everyone likes to gamble, so you'd need a large number of participants or some way to either filter or account for gamification (token rewards) and gambling preferences. And the number of participants here was so tiny that a single outlier creates or removes statistical significance.

    I personally don't give a fig about token rewards or gambling, and would always choose the easy task simply because I don't care about what this test has to offer me. My data would be worthless in a similar way as someone who doesn't believe in a religion would be invalid in a test of the healing power of prayer, if the test is about the effect of personal belief anyway.

    This test seems to assume that something common must be universal. It's ridiculously wrong in even more ways than it looked at first.
     
    Hutan, Sean, Amw66 and 7 others like this.
  6. Sam Carter

    Sam Carter Established Member (Voting Rights)

    Messages:
    41
    I've been trying to work out what the optimal play is given the rules of the game.

    I think it would be something like this: since your actual prize is two amounts chosen probabilistically from the basket of all the tasks you complete successfully, the best strategy is to i) flunk all the low-value tasks (i.e. don't press the button enough times to win), especially the $1 dollar easy tasks because you don't want to fill your basket with low value prizes, and ii) go hell for leather on the high-value, high-probability tasks so that your two actual prizes are pulled from the higher end of the range of prizes.

    Did anyone do that? I think Healthy Volunteer F might have! Out of 52 games HVF only successfully completed 10 of them, one easy and nine hard (for which the prize was always $3.22 or higher).

    HVF's data were declared invalid so someone must have noticed that the system was being gamed!
     
    Murph, Fero, Missense and 11 others like this.
  7. Kitty

    Kitty Senior Member (Voting Rights)

    Messages:
    5,920
    Location:
    UK
    I wonder how many pwME would have been able to work that out, given that they were in a very unfamiliar situation and almost certainly well into PEM?

    I couldn't work out probabilities if I were given a calculator, a maths tutor, and a month to do the sums, but even for people without that problem, surely brain fog would come into play? If someone presented me with options when I was shattered and foggy, I'd choose randomly because I wouldn't have the mental bandwidth to process it.
     
    Hutan, Fero, Peter Trewhitt and 12 others like this.
  8. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    fascinating. not a bad strategy. I hadn't noticed the declaring invalid so I wonder on what basis they can justify that?
     
    Peter Trewhitt and Kitty like this.
  9. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,010
    I've started looking into it a bit now, but still have to wrap my head around it. So far I've gathered:

    There is no universal optimal strategy (see the example below) to this game since the optimal strategy depends on the persons ability to complete easy and hard tasks (which additionally is something nobody has knowledge on, especially not before starting to play, so any optimal strategy will be based on the assessment of the individuals capabilities and results as the game progresses).

    It’s fairly easy to decide optimal strategies to maximise your income (which is not the goal for most people participating in this trial) for two groups of people:
    • For those that a priori* know that they cannot complete any task at all: All strategies are optimal.
    • For those people that a priori* know that they can complete any task: Always choosing the hard task is the optimal strategy.**
    This example shows that there is no universal optimal strategy, in the sense that an optimal strategy will depend on the assessment of your capabilities. The above groups are the polar opposite groups in terms of capabilities and they have distinct optimal strategies. In the absence of complete knowledge (i.e. knowing ones capabilities), which is the case here, everything becomes a bit more complicated.

    The strategy you proposed above is not universally optimal because there might be people that are not able to complete any hard tasks at all. In this case they wouldn't be left with any money as they would flunk all the games, even though they might be capable of completing easy tasks.

    It's actually quite a hard task to figure out your own optimal strategy and certainly not something most people would be capable of, especially not within a trial setting.

    Note: Optimal strategy refers to the game theoretic idea of optimising your monetary income. In reality, people have all sorts of reasons to participate in trials and as such are not looking for an optimal strategy in the classical sense to begin with because not everybody wants to optimise their income.

    *Of course a priori knowledge does not exist in this game, but that's irrelevant for this example.

    **This is not entirely correct and an oversimplification. The optimal strategy in this case depends on the “reward magnitude” they are given after completing hard tasks and whether something was a win. For example if they are twice given the maximal reward of $4.30 at the begining of the game and both of these games are "win" trials, their optimal strategy then becomes to loose every game on purpose (and it doesn't matter whether they choose easy or hard). One important thing to note is that the distribution of the “reward magnitude” is not specified in the trial. Which is either a mistake of psychologist trying to do mathematics (or a deliberate choice to make finding an optimal strategy even harder).
     
    Last edited: Mar 1, 2024
    Hutan, Sean, Peter Trewhitt and 2 others like this.
  10. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,010
    I've only now begun looking at "EEfRT" and the data, so I might still be a bit behind. My first two questions are:
    1. Walitt seems to not be using the "EEfRT" as used in Worth the ‘EEfRT’? The Effort Expenditure for Rewards Task as an Objective Measure of Motivation and Anhedonia. At least Figure S5 from the supplementary information file reads "Subjects receive reward feedback as to whether they received a score increase for that trial (adapted from Treadway, 2009)". I don't know what that is supposed to mean, but that makes it seem like they changed the rules of the game. Are the specific rules Walitt uses, the changes made to the original game, written down more precisely somewhere? Typically such games are not robust to rule changes at all and their dynamics change fundamentally under small adaptations.
    (Another very slight difference to the rules of the Treadway study is that there people received a base compensation for playing the game. It seems inconsequential and it's most likely inconsequential, but even that may impact the dynamics of how the game is played.)
    2. @bobbler has any reason been given why one person has been excluded and has anybody looked at how this data looks like with this person included vs without this person included?
     
    Last edited: Feb 29, 2024
    SNT Gatchaman, Hutan, Kitty and 4 others like this.
  11. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    925
    Location:
    UK
    I've not read this, but it might answer some of the questions being raised about EEfRT and how it was used

    Statistical analysis of effort expenditure for rewards task
    Following the analytic strategy described by Treadway15, generalized estimating equations (GEE) were used to model the effects of trial-by-trial and participant variables on hard task choice. A binary distribution and logit link function were used to model the probability of choosing the hard task versus the easy task. All models included reward probability, reward magnitude, expected value (the product of reward probability and reward magnitude), and trial number, in addition to binary categorical variables indicating participant group and sex. Emulating Treadway et al., the two-way interactions between PI-ME/CFS diagnosis and reward probability, PI-ME/CFS diagnosis and reward magnitude, and PI-ME/CFS diagnosis and expected value were also tested, as was the three-way interaction among PI-ME/CFS diagnosis, reward magnitude, and reward probability. One new two-way interaction, the interaction of PI-ME/CFS diagnosis and trial number, was tested as well in order to determine whether rate of fatigue differed by diagnostic group.

    Departing from the procedures described by Treadway15, GEE was also used to model the effects of trial-by-trial and participant variables on task completion. A binary distribution and logit link function were again used given the binary nature of the task completion variable (i.e., success or failure). The model included reward probability, reward magnitude, expected value, trial number, participant diagnosis, and participant sex, as well as a new term indexing the difficulty of the task chosen (easy or hard). The three-way interaction of participant diagnosis, trial number, and task difficulty was evaluated in order to determine whether participants’ abilities to complete the easy and hard tasks differed between diagnostic group, and in turn whether fatigue demonstrated differential effects on probability of completion based on diagnosis and task difficulty. Additionally, GEE was used to model the effects of these independent variables and interactions on button press rate, to provide an alternative quantification of task performance. This time, the default distribution and link function were used. The model’s independent variables and interaction terms were the same as in the above task completion model.

    All three sets of GEE models were performed using an exchangeable working correlation structure. Unstructured models were tested as well, but failed to converge. All GEE models were implemented in SAS 9.4.
     
  12. duncan

    duncan Senior Member (Voting Rights)

    Messages:
    1,628
    I'm still not clear on the purpose of this test in ME/CFS patients. It's purpose, not whether it's actually capable of measuring anything, or whether inferences are rooted in reality, or if motivations of pwME make sense. Why'd this element find its way into a phenotype study?

    "The EEfRT test was used to assess effort, task-related fatigue, and reward sensitivity."

    As far as reward sensitivity goes, in what way is this relevant to phenotyping pwME?

    If it is to distinguish fatigue between HV and pwME, this seems hopelessly like a force fit of a square peg into a round reality. There are easier and less theoretical mechanisms to deploy for comparisons.

    As for assessing effort....what are they shooting for here again? Didn't patients endure a battery of neuropsych studies which have effort sensors embedded?

    To me, it almost has the feel of a "gotcha" attempt.

    What is it's purpose as it pertains to phenotying, and why was that purpose compelling enough to get the entire team to sign off on it?

    I apologize if this has already been explored here.
     
    Last edited: Feb 29, 2024
    Fero, Sean, Keela Too and 12 others like this.
  13. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    37
    Answer to #2: I reran the GEE models and when you include participant HV F (who was excluded for having "invalid data") the effect is no longer significant (p-value goes from .04 to .14). HV F just so happens to be the participant in the control group with the lowest PHTC value (aka the lowest effort preference). They do not provide any justification for removing this participant from the analysis other than saying that they had invalid data. I'm going to request that they provide a detailed explanation of why this data was deemed invalid and what process they used to decide this, because that looks awfully suspicious to me. I would never dream of removing participant data from an analysis without explaining in detail why that decision was made, especially when the decision gives you the significant result that you hang your entire theory on.

    @EndME @bobbler would you others you know of be interested in being a part of requesting those details?

    Sorry I've been working on this on my own and just found out about the conversations here and it's hard to track what all has already been discussed. Would people be open to moving the EEfRT discussion to it's own thread so it's easier to track? I saw @EndME started a separate thread for it here
     
  14. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,010
    I'd be highly interested in this! I'm currently still working out the details myself and only got to the whole "EEfRT" discussion yesterday.

    Wow, thanks for the info! They definitely have to provide a justification on why this person was excluded. In this setting it's additionally hard to just argue "we removed his data because he was a statistical outlier", because these games are a priori designed to always create statistical outliers. That's why it seems fundamentally important to me to have large sample sizes (it's called the law of LARGE numbers after all, not the law of "32 is enough") or a priori specify which kind of people you exclude from your data before the games begin. Judging players once the games are finished is inherently flawed in this setting if you don't do it extremely rigorously.
     
    Last edited: Feb 29, 2024
    JoClaire, Hutan, Fero and 11 others like this.
  15. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,965
    Location:
    London, UK
    Oh, dearie me!

    A thread for the effort measure sounds a good idea. I would love to see conclusions but am unlikely to be able to follow the detail (even if because of lack of effort preference).
     
    Hutan, Ash, FMMM1 and 13 others like this.
  16. Sam Carter

    Sam Carter Established Member (Voting Rights)

    Messages:
    41
    Healthy volunteer F was indeed a highly atypical button presser(!), and I can see why the investigators treated his data with caution.

    If you subtract the number of times he pressed a button from the required number of presses for a given trial, you get this list:

    [20, 21, 2, 0, 19, 1, 2, 4, 2, 4, 0, 2, 2, 3, 0, 1, 0, 2, 0, 2, 4, 3, 0, 0, 5, 5, 5, 4, 5, 7, 2, 4, 12, 10, 7, 0, 0, 5, 10, 14, 15, 9, 8, 11, 12, 5, 22, 9, 0, 5, 4, 5].

    Where a 0 (zero) appears it means that he completed the correct number of presses for the task (which happened only 10 out of 52 tries).

    Also note how often he only just missed the correct number of presses:
    -- 1 press too few on 2 occasions
    -- 2 presses too few on 8 occasions
    -- 3 presses too few on 2 occasions
    -- 4 presses too few on 6 occasions
    -- 5 presses too few on 8 occasions

    Something is definitely up with him. I think either the equipment failed during his run or (just maybe) he really was gaming the system and trying to conceal it.
     
  17. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,010
    One cannot really judge whether something is up with him by judging the final results of the game (the analogy to this is that you can't judge a prisoners choice in the prisoners dilemma by the end result, the optimal strategy in the prisoners dilemma is always to testify even though that doesn't equal the maximal possible reward). That is because the game is played round by round and any optimal strategy has to develop round by round. I personally think it is unlikely that he was perfectly gaming the system, given the complexity of task at hand. The whole point of the game is to be set-up in a way that it cannot be perfectly gamed, that is why participants are not given complete information on the game (for example the distribution of high reward money), at least that is my interpretation up until now (but I might have missed something). In any case if he was playing a strategy, his strategy would most likely have been suboptimal.

    Furthermore I see no reason to exclude participants on the basis that they were following some strategy, that is because the authors themselves have acknowledged that all ME/CFS patients were appliying a gamified version of pacing, which in itself already is a strategy.
     
    Last edited: Mar 1, 2024
    Zombie Lurker, Fero, rvallee and 10 others like this.
  18. Kitty

    Kitty Senior Member (Voting Rights)

    Messages:
    5,920
    Location:
    UK
    It's always possible he was just bored witless and couldn't be bothered. Which is a perfectly valid response to a psychology exercise like this, as is attempting to game the game.
     
    Hutan, rvallee, Sean and 8 others like this.
  19. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    It sounds like you are doing well getting up to speed with the complexity

    re question 2. are you referring to the following? Deep phenotyping of post-infectious myalgic encephalomyelitis/chronic fatigue syndrome, 2024, Walitt et al | Page 33 | Science for ME (s4me.info)
     
    cfsandmore, EndME, Binkie4 and 2 others like this.
  20. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,909
    @EndME in particular here, if you are interested?

    Treadway et al (2009) was using these GEEs vs scale-based measures eg depression inventory rather than 'on-off' type measure of diagnosis or not? they were looking for 'amount of trait anhedonia' in students correlating to these different aspects.

    One of my questions is whether given the data we now have about eg the SF-36 (and the range it showed for ME-CFS vs HVs, but also noting the potential range within ME-CFS) is checking whether they did use any of these types of scales or if all of these GEEs were done with just 'ME-CFS diagnosis vs HV'?
     
    Amw66, Sean, Peter Trewhitt and 2 others like this.

Share This Page