Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Discussion in 'ME/CFS research' started by Andy, Feb 21, 2024.

  1. Eddie

    Eddie Senior Member (Voting Rights)

    Messages:
    126
    Location:
    Australia
    That's awesome, I think that's exactly what's going on.

    The authors concluding that this is a test of effort preference when the optimal strategy involves failing the vast majority of tasks is so ironic. You know it's bad when you can maximize returns by pressing the button a total of zero times in all but two of the tasks (so long as you get lucky with the probabilities). Sorry if I missed it but, what was their rationale for throwing out the data of HC F when presumably he got paid out more than the vast majority of other participants?

    Also I think that a major issue with this game is the absolutely paltry rewards. Imagine if participants had gotten to keep all their winnings (not receiving two randomized ones) and the rewards were 100x larger. We would have almost certainly seen a significantly higher % of all participants choosing the harder tasks (assuming that you couldn't win more by doing many fast easy tasks). If by the end of that test the ME/CFS participants had started failing the harder tasks (or choosing easier ones they could complete) we wouldn’t assume that they had an effort preference issue, we would assume they had a fatigue/PEM problem.
     
  2. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,593

    OK so I've tried to see if I could put this into a table format that could be more pictoral than scrolling down and down through looking at each participant in order. I've stripped out data to ration it down to just 'complete' (1 and green for completed, 0 and red for not completed), and clicks (anything less than 98 is non-completing on hard tasks). For hard tasks chosen only.

    I haven't got ALL of the trial numbers on there - only so far the ones that any participant chose hard for, so there will be some gaps (which would just be back-filled by adding a blank row, as it would be a trial where all picked 'easy' or didn't do because the trial timed out/ended), a quick look and it might just be number 5 all the way up to trial 49.

    Anyway it is me trying to show what I've described above, which was quite stark when I looked through. On this table each participant is in a different column so you are looking 'down' at the colours' and noting how there are differing patterns as you go across in that pattern of many HVs who are 'all green' vs those who had lots of non-completions (most of whom were ME-CFS, just one HV had many non-completions)

    I'm not sure I'm quite there on it being striking, I've been trying to play with formatting to work with size, lines and so on to try and make those colour patterns downwards be easier to see.

    walitt hard tasks chosen completion and click numbers.png
     
    Last edited: Mar 2, 2024
  3. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,593
    Patient B is unusual because they only choose hard 3 times when given 'high probability' (which is when you'd be more likely to choose high on basis of it being 88% chance of 'counting' by being a win trial). The values are: 3.22, 1.96, 3.94. They complete all 3

    And only chooses hard once when it was low probability (with a small value) and then only does 85 clicks after apparently having spent 5 secs deciding what to do/whether to do choose hard or easy.

    Then on 50:50 probability they select hard 10 times. Only failing to complete once with a few clicks off for a relatively smaller value (1.73) whereas those others were higher values somewhat, but I don't know they were 'the highest'.

    On Treadway's description apparently everyone got each value magnitude shown to them with each probability combination. I'm not sure this is the case here, but if so then that would mean it was only when things were 50:50 that HVB seemed to be motivated to select hard.
     
    Peter Trewhitt, Murph and Hutan like this.
  4. Sid

    Sid Senior Member (Voting Rights)

    Messages:
    1,057
    Reaching out to Treadway is a good idea as long as the letter is laser focused on the misinterpretation of the EEfRT task, not the wider ME/CFS politics of the study. I don't know him but generally most civilians have been very reluctant (to put it mildly) to wade into ME/CFS waters. You could also try contacting people who hold academic positions in psychology departments like Hughes and Wilshire.
     
    Peter Trewhitt, Ash, bobbler and 2 others like this.
  5. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,593
    when you stick at focusing on the completion data but switch to looking only at the easy tasks then HVF stands out given you only get the odd fail from HVs in a mass of green completions.

    ME-CFS B however does have quite a few fails on easy. However I don't know whether it is strategy in the sense of money/incentive because it seems like rest breaks from trial 26 onwards could also be possible due to the pattern they are in and it only doing eg 4 taps. and on one 3.76 50:50 trial they'd chosen easy and got 23 clicks through.

    But all the rest of the HVs and indeed ME-CFS seem to have managed to complete nearly all of the easy ones.
     
  6. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,593
    I'm putting a link to this post from @Karen Kirke which includes SF-36 info

    Deep phenotyping of post-infectious myalgic encephalomyelitis/chronic fatigue syndrome, 2024, Walitt et al | Page 26 | Science for ME (s4me.info)

    I'm sure there might be other disability-related scales somewhere we can also use to scan vs this, but it struck me when I saw the range within ME-CFS on this and vs HVs that the pattern on non-completion of hards and how to me it seems to 'group' seems reflective of this.

    WHich to me would indicate a disability-issue with not having eg pre-calibrated the level of hard to individualise it for disability or something similar in as part of the checks process properly done etc
     
    Peter Trewhitt likes this.
  7. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,593
    I've just noticed how similar HVN and HVO look, and that might have been me making an error so I will check that when I'm operational to do so again. In case I've duplicated one or the other rather than the right data for each.

    Yes I've dragged the copy of N across, because HVO only has one non-complete on trial 10.

    I'll amend and update when I can but for now I'm intrigued whether the 'concept' of something like this as a way of getting things across ie are these the right variables (does the pattern seem as useful to anyone else) and how to visually make it 'readable' and the patterns more see-able etc might be good. ie is this useful?
     
    Peter Trewhitt likes this.
  8. Murph

    Murph Established Member (Voting Rights)

    Messages:
    56
    Above I posted a chart of Healthy volunteer F and their button presses. Below is an equivalent chart for all participants, which shows two important things.

    1. Several participants lose easy tasks at various points, possibly deliberately (look for short red bars). Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. None make such a habit of it as Healthy Volunteer F, however!
    2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool.

    facets of buttong presses.jpeg



    As I said above, the chart shows two important things.

    1. Several participants lose easy tasks at various points, possibly deliberately. Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. This makes the decision to chuck out HVF's data debatable. And with his data included the test shows no significant between group differences in the primary endpoint.
    2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool, quoted here:

    "An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. As a second manipulation check, we used trial number as an additional covariate in each of our GEE models."


    Side observation: how amazing is it when authors make their data available directly with the paper. Makes a big difference. This is one thing NIH has done well with this paper.
     
    Last edited: Mar 2, 2024
  9. Hutan

    Hutan Moderator Staff Member

    Messages:
    27,186
    Location:
    Aotearoa New Zealand
    I'm interested in what the participants remember knowing about the experiment before they started.

    If they understood that they would get paid for two rewards chosen randomly only from the tasks that they completed, then I think it would be fairly easy to realise that you want to keep the number of low value rewards down and just have a few of the highest value rewards. I think a significant number of people would work that out before the live games started. It is sort of hilarious that the smartest solution was to carefully select the most important work to do and not worry about the rest - pacing was the best strategy.

    But, it's possible that the explanation wasn't clear or the participants misinterpreted what they were told, and so thought that they needed to try to get a reward for each task. I mean, it is a rather unusual, counterintuitive approach, to not pay out for each task, or an averaged amount, but to instead randomly select two rewards to pay. I wonder how the investigators explained it. Some participants might have thought that the pool of tasks that the reward would be chosen from included the ones that they didn't complete or tasks that ended up with a zero reward. If participants' understanding of the rules of the game they were playing differed, then that makes the experiment fairly worthless.

    I'm also interested in what the participants were thinking as they did the task. Were they motivated to get the highest total payout? At what time of the day was the experiment done? Did participants find the tasks hard from the beginning; did they feel fatigued as time went on? As others have said, if they struggled to complete the tasks, that would invalidate the experiment.

    I guess participants' retrospective accounts of what happened might be inaccurate, but I still think it would be interesting to hear them.
     
    Last edited: Mar 2, 2024
  10. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    907
    Location:
    UK
    Brilliant analysis and data presentation from @Murph.

    Perhaps just as important, these patients keep trying despite many near-miss failures - that surely suggests they are trying VERY hard (because they nearly succeed and seem desperate to do so). Which is the opposite of what is suggested. By contrast, HVs have relatively few failures suggesting they don’t need to work so hard.

    Unintentionally, EEfRT, which was inappropriately used (and interpreted) to show low effort preference of pwme, appears to show they are a bunch of triers. Hardly surprising, given their level of disability and the ordeal of intense testing they signed up to (shout to @Robert 1973 ).
     
    Last edited: Mar 3, 2024
  11. Robert 1973

    Robert 1973 Senior Member (Voting Rights)

    Messages:
    1,317
    Location:
    UK
    My experience is that qualifications make little or no difference in determining whether letters are published in Journals. I don't have a degree and have had letters published in Nature, The Lancet etc. @Tom Kindlon has had numerous letters published and has no degree.

    The problem with letters is the word limit. If you are collaborating, I would suggest coordinating so that different people write making different points in different letters, or submitting as a paper instead of a letter so that you can use more words. The latter might be harder to get published, but I'm not knowledgeable about that.

    Apologies if this has already been said, I've not read all the posts.
     
    Last edited: Mar 2, 2024
  12. Evergreen

    Evergreen Senior Member (Voting Rights)

    Messages:
    312
    I love this kind of discussion, where people are thinking and sharing and honing and finally figuring it out. So here's a summary of some of the key observations that moved it forward:

    Bobbler & Simon M start spotting the real problem:
    Sam Carter spots what healthy volunteer F is doing:
    Andrewkq gets to the heart of the matter:
    Murph exposes healthy volunteer F's gaming:
    Simon sums it up:
    I like to think I contributed a teensy bit by hectoring people to look at the data and fangirling about @andrewkq 's observation!

    I know we're all kind of lying groaning on the battlefield now, but it also feels like this hard task was worth it.:thumbup:

    Edited to add a narrative.
     
    Last edited: Mar 2, 2024
  13. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    12,602
    Location:
    Canada
    Seems worth the try to me. I am quite sure he would see it as a misuse of his test, and if he cares about its validity, he should be motivated to at least say so.
     
  14. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    12,602
    Location:
    Canada
    Again reminding that in the original design validation, the completion rates were 98% and 96%. Failing the easy task here clearly shows a strategic choice based around the way the 'game' was designed, which invalidates the test in yet another way.

    And yeah basically they threw away HV F's results because he played the game as they designed it. I guess the players were not supposed to catch on it, but they clearly did. There are so many reasons why this test should have been thrown out, it makes the researchers look like a bunch of fools at best.
     
    Peter Trewhitt, Sean, Hutan and 8 others like this.
  15. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    907
    Location:
    UK
    I think it is a great example of the power of the crowd, even when it is a crowd as ill as this one.

    Yes - and the graphs!

    Thank you for the great narrative.
     
  16. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    36
    @bobbler @Murph @Karen Kirke I love all of your visualizations, they've been really helpful for trying to see all of the pieces at play, it's so hard to visualize these at the trial-by-trial level but y'all nailed it

    100%
    I second this @Evergreen :party:
     
  17. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    36
    And as our reward... drum roll please...

    A lackluster response from Nath!

    Screenshot 2024-03-02 at 12.10.22 PM.png

    Sure sounds like @Sam Carter @Murph and @EndME hit the nail on the head. Presumably the participant was attempting a unique strategy, but it seems very subjective to consider this "not following instructions" if the instructions were to try to win as much money as possible without wasting energy. At least we know for sure that it wasn't a mechanic failure.

    I'm going to ask him to clarify how the participant was not following instructions and how this was determined (i.e. was it determined somehow at the time the task was administered or was it determined post hoc by looking at their data)
     
  18. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    36
    That's great to know. It would especially be helpful to have some folks contribute who have experience with the PACE initiatives and can contextualize why this may seem like a minor detail but has major implications for the community. Brian seems like the perfect co-author given his psych expertise.

    I looked into Nature Communication's letter to the editor policies and they call them "Matters Arising" articles. They have a 1200 word limit and they say that "If the submission serves only to identify an important error or mistake in the published paper, it will usually lead to the publication of a clarification statement (correction or retraction, for example)." The main methodological critiques of EEfRT could certainly take up 1200 words, so I think it makes sense to write one just one on this that can then be referenced by others looking to critique the full paper. I'm thinking I'll write up a draft just presenting the methodological critiques then circle back to see if co-authors want to sign on and help write the final product. Does that sound good?

    That's good to know that it's been a pain point in the past. I think that combined with the short word limit makes it pretty clear to just focus on the methodological critiques as you said.

    That reassuring! Especially might not matter as much in this situation because it sounds like they likely won't even publish the letter if it only is pointing out methodological flaws.

    Yes definitely, I think this can fill the 1200 word limit. I'll keep an eye out for messages about coordinating different letters or submitting a paper. If anyone knows of someone already starting to coordinate this please let me know!
     
  19. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    13,690
    Location:
    London, UK
    yep
     
  20. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    2,593
    I REALLY like this :). It is a neat way of showing/being able to see the disability-effect vs hard, alongside how eg ME-CFS B used some pacing on easy to try and help with that. It also shows the 'noise' of having a task with different strategies operating 'alongside this'.

    And how, I think, the disability-related effects are basically 'drowning out' anything the tool might be looking to operationalise because the 'peaks' from the non-completion due to those most disabled not having a calibrated 'hard task' would overwhelm any analysis of the [necessarily made so] subtleties inherent in the other test variables.
     

Share This Page