Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Discussion in 'ME/CFS research' started by Andy, Feb 21, 2024.

  1. rvallee

    rvallee Senior Member (Voting Rights)

    Messages:
    13,662
    Location:
    Canada
    I think we have a basis to demand that. This entire test and any discussion of it needs to be removed from the paper. It will still be a largely useless study, but at least it will not cause more harm.

    I don't have the mental/energy bandwidth to do this, but it should be rather short as the creators of the test made it explicitly clear that it's about reward and should not be affected by performance.

    Damn we are so close to being able to rely on AIs to do this. It will make things so much easier for us.
     
    Midnattsol, cfsandmore, Lilas and 7 others like this.
  2. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    Does this help to visualise the fundamental problem with the effort task's validity when used to compare pwME and healthies?

    To me, it helps me see that while, yes, the blue lines are a little lower for the patients than the healthies, the real issue is the difference between the red lines, ie between patients' and healthies' ability to successfully complete hard tasks when they try to.

    Patient H does not have a red line because despite valiantly attempting 18 hard tasks, they did not complete any successfully.

    Would a different chart type/visualisation show this better?

    upload_2024-3-1_15-52-42.png
     
  3. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,204
    Great illustration! Thanks a lot!

    This shows the high variance among pwME in their abilities to complete hard tasks. It seems patient H never got a fair chance at playing the game at all. What happens to the conclusion if we decide to exclude those patients?

    If it's possible and who have the time and energy, the following seems sensible to me:

    Something that I believe could also be done to this chart or even better, make an additional chart is that certain buckets will help interpreting this data. One will need buckets depending on the size of magnitude of reward and also the probability attached to the task (perhaps you might want to plot expected value first to make things a bit easier). For reward magnitude other studies have typically used 3 buckets low, medium and high (in this study where the maximal reward was 4.00 they used: high >$3, medium $2.01–$3.00, and low <$2).

    Otherwise the above data might still just mean pwME are less likely to finish hard tasks if the chance of a high reward is minimal (ideally one would also like to know whether they have already successully completed 2 tasks of a higher value and given the reward, or even better whether their current average reward is below the reward of the trial, but that will further complicate the analysis and at the end of the day players are not supposed to think but play intuitively), which could also just mean something along the lines of "they are playing the game better" or that they need more motivation to complete something, who knows. Funnily enough, the reverse arguments has also been used in EEfRT schizophrenia studies that conclude pwSCZ display non-ideal behaviour because their strategies are bad "Thus, individuals with schizophrenia displayed inefficient effort allocation for trials in which it would be most advantageous to put forth more effort, as well as trials when it would appear strategic to conserve effort."

    To not make things too complicated I would just start of by graphing the above data for the high probability + high reward trials. If that data looks similar to the data above then it is very clear to me that they simply cannot exert themselves at all.

    Another thing that could be done is to repeat the above graph but split into 2 (or more), for the first half of the total number of rounds played and then for the second half of the game. Does this show fatigueability in the pwME but not in the HVs?

    All of these suggestions might be obsolete if this was already sufficiently analysed in the paper. I still haven't gotten to looking at the data from the intramural study.
     
    Last edited: Mar 1, 2024
  4. Simon M

    Simon M Senior Member (Voting Rights)

    Messages:
    995
    Location:
    UK
    Thank you for this analysis.

    I've always had a thing for making graphs easy to understand and would like to make a couple of suggestions (without considering changing chart type):

    1. The paper consistently uses red for pwme and blue for HV and think we should stick with that for % hard choices. The completion rate might be another colour (e.g. pale pink, pale blue) or, say, black for both.
    2. Rather than rank alphabetically, place pwme and hv in order ranked by % hard choices.

    Possibly add a mean/median line for each group, or show this data as text. That would help show both between and within group differences.

    Added:
    Or a scatter plot?
     
    Last edited: Mar 1, 2024
  5. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    I've sent you a message! I wanted to do number 2 but my brain imploded at the thought of patients and healthies being mixed and at the thought of how to separate them. You clearly have skillz. So maybe we can collaborate?
     
  6. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    Glad it made some sense. My brain can't follow this after chart-exertion, but other brains will and I trust they will reply and do what needs doing!
     
  7. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    Ooh I managed to do this bit.
     
  8. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    147
    I've been looking at this data for a few days now and thought I'd make an account here to post some of the things I've found.
    First, each participants choices on a chart. I placed a dot high to show a hard choice (Hard on y-axis), low to show an easy choice (Easy on left axis). The top left chart is healthy volunteer H, depicted in blue. They chose easy on the first practice round, then chose hard twice, then easy on the round 4, the last practice round, etc.

    The charts are arranged as per Simon's suggestion above, from most hard choices to fewest. I'd like to draw your attention to Healthy volunteer F at the bottom there. Theirs is the data that got chucked out. My next post is about that!

    allchoices.jpeg
     
  9. Murph

    Murph Senior Member (Voting Rights)

    Messages:
    147
    Healthy control F matters a lot. They chucked his data, but what his data shows is that EEfRT is a joke. To understand why I'm going to ask you to Imagine a lottery...

    1. ... you will win two prizes drawn from a barrel. This is a pretty great lottery, because you choose the prizes that go in a barrel. I give you a choice. I have 50 prizes we can put in the barrel, some worth $1, some worth $2, some worth $3, some worth $4. You may put in as few or as many as you like. Would you put in 50 of many different values? Or simply put in two prizes both worth $4?

    2. Remember that in the effort preference test you get paid for two of your wins. If you have only a few wins, those will be the ones NIH pay out on. Like putting just two prizes in the barrel. If you could complete two wins worth $4.12 and that's all, the NIH would pay you $8.24.

    3. And that's what's healthy control F tried to do. He lost on purpose when the prize was low. To win on easy you had to press 30 times. He would chose easy and stop on exactly 29. He was playing an optimal strategy to maximise payout. IN other words, he was trying his hardest to win the game. That confused the researchers. They chucked the data out.

    This next chart shows each round and how often he pushed the button. When he wins the round, I outlined it in green.

    upload_2024-3-2_7-28-30.png


    Healthy volunteer F is a 21 year old male. The four trial rounds show he can easily do the button-pressing. In what follows he chooses not to.

    This next chart is the same as the above, but with a bit more information. It shows that he played to win only the rounds where the prize was high. After he got a feel for the range of prizes on offer, he chose hard and completed the task only when the prize was over $3.50. If the prize was low he didn't try to win (except round 23, where even if you did win there was only a 12% probability of the prize being awarded and added to the metaphorical prize basket).

    upload_2024-3-2_7-33-50.png

    4. So it turns out the test was solvable. Most people just tried to push buttons as much as they could. But this guy understood it. It meant he mostly chose easy. That confounded the primary endpoint (how often do you choose hard). The metric is supposedly validated in depressed people ; looks like they didn't battle-test it enough!

    5. No other participant took it to the same extreme. But there are signs others flirted with a similar strategy, choosing easy and not trying to win in certain rounds. Throwing out the data of only one participant is suss. Smarter would be to drop the whole metric. Certainly drawing major conclusions based on such a fundamentally flawed game is dumb.

    tl;dr, despite what they think, the EEfRT can be played strategically rendering it void as a measure of anything.
     
  10. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    15,175
    Location:
    London, UK
    Is the punch line that HV F looks just like ME B?
    Cross posted. It seems not quite but maybe ME B was on to it?
     
    Last edited: Mar 1, 2024
  11. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    My second attempt, which addresses @Simon M 's 2nd point, but not his first, because his first is beyond me. Hopefully one or other of us will be able to make it better at some point.

    upload_2024-3-1_21-14-41.png
     
  12. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    I thought patient B might have been purposely giving himself breaks, as he had four tasks in the second half of his 53 trials where he only pressed the button a few times - no-one else did this (unless Healthy F did it), and he did it four times, interspersed with successfully completed tasks. I thought he was giving his hands a rest, because he needed to. But I did not look at the probability and rewards, so @Murph 's thing may hold for patient B too.

    Edited to correct.
     
    Last edited: Mar 1, 2024
  13. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    15,175
    Location:
    London, UK
    Very possibly. But had he realised what F had as well?
     
  14. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    Sorry I was editing my post as realisation dawned that Murph's point could apply. I'll leave that to others to check.
     
  15. Karen Kirke

    Karen Kirke Established Member (Voting Rights)

    Messages:
    73
    Correction:
    My original post stated that 7/17 (41%) of patients had a lower success rate for hard tasks than all healthy volunteers. This should have been 7/15, making the correct percentage 47%.

    No wonder there were so many zeros on that p-value.
    Surely the major finding of this task should have been that patients couldn't do the hard task due to their condition and as such, it had to be removed from the analysis.
     
    Last edited: Mar 4, 2024
  16. andrewkq

    andrewkq Established Member (Voting Rights)

    Messages:
    41
    Sorry I haven't been very active the past two days, all this work has me crashing pretty hard.

    Yes I think I'd like to write a letter to the editor arguing that the task was misused and that the results were misinterpreted, based largely on the 65% completion rate finding. I worked in an affective neuroscience lab for 3 years after undergrad running participants on similar tasks to EEfRT and I've been a co-author on a few papers in the same general area, so I feel like I could write it, but I only have a bachelors degree (thanks ME) so I think I'd need to get some PhDs to join as co-authors in order to have any hope of a letter to the editor getting published. I was thinking that I'd reach out to Treadway, present the concerns to him, and ask if he'd be willing to be a co-author. I figure the worst that could happen is he says no. I've never done this before so definitely open to thoughts people have, especially around whether this is enough to warrant an explicit retraction request and how that is usually done.
     
  17. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    Messages:
    15,175
    Location:
    London, UK
    Brian Hughes might be interested in co-authoring. I am happy to, but not an expert in the area. There are one or two other senior members who might be ready to join in, although not chipping in just at present. I guess Treadway might or might not want to get involved but might join a letter expressing methodological concern. There may well be scope for a more extended response to the study which those outside the immediate field would probably not want to join, but that is probably a different project.
     
  18. EndME

    EndME Senior Member (Voting Rights)

    Messages:
    1,204
    I'll definitely be joining in! Before that I'll still plot some different data though, for example completion rates on hard tasks with low expected values, as well as completion rates on different hard tasks as time progresses (otherwise Walitt et al might argue something like pwME weren't having problems with completing hard tasks but were having problems motivating themselves to complete hard tasks with low expected values, so I'd like to make sure of all the necessary details first).

    I've also thought about reaching out to Treadway or alternatively Ohmann (from what I've gathered he might be a bit more interested in critically analysing the EEfRT) and think that could be a solid idea. I'll get back to you once I've analysed all the data I still want to analyse, or just by Monday.

    I also had to end my PhD studies due to ME, so am a MSc, but I'm fairly certain that, that will be no problem especially as other members on here have sufficient credentials to join us on these endeavours, wherever they may lead to. But I also don't think time is running away too quickly. I think we should first wait for a response by Walitt to your email and then also ask them if or when they are planning to publish a seperate paper on "EEfRT in ME" since many companion papers were orginially planned and I'd be a bit suprised if this isn't planned, since it's one of their main results. In that case one may even also write ones own paper with "our analysis" if one wishes to.

    In either case if things keep adding u,p like they have been, an extended response to the study seems feasible. I wouldn't be surprised if other users still find some things that look like irregularities whilst fishing through the data.

    But I also need to take a break for now and I'll be back on Monday.
     
  19. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,734
    OK I'm trying to go through and look at whether eg two non-completions put off participants from picking hard (or something similar). I've not doing anything high-brow, just filters and hiding columns initially and then used simple conditional formatting

    The first thing that struck me when I filtered by completion and hard was how few HVs had failed to complete hard tasks. Or to be more precise, how few hard tasks HVs had failed to complete.

    Of the hard tests chosen, one HV failed to complete 6 'live' and 2 further prep (- numbers on excel sheet) tests, and all the rest of the HVs only failed to complete 5 'live' and 3 'prep' ones. Compared to over 100 non-completions from ME-CFS. It's pretty striking actually when you just chuck those filters on.

    I then tried highlighting the last ones failed to complete and then removing that filter to just have 'hard selected' and see if there was any pattern where any participants obviously stopped choosing hard after that. So far that doesn't look like the case. In fact if you use conditional formatting on the two columns 'complete task yes or no' and 'reward yes or no' (I put this on because.. you know it is testing motivation so 'look what you could've won' vs 'was a no-win one anyway' was worth keeping an eye on) the difference between participants becomes quite striking.

    HVA failed to complete 6 rounds early on (complete one, failed one, completed one, failed 5 hard ones, then completed the rest of the hard ones they selected) and then got it together and it's 'all green' (I conditionally formatted 'completing tasks' into green and red for failed) pretty much for them. They did 12 more hard ones after they got it together and seemed to be basing that on reward magnitude as they seemed to have selected enough low probability that didn't seem a discriminator for them.

    Across the rest of the HVs you've then just got HVB failing 2 non-prep hard ones and E, N, O each failing to complete just one out of all the hard tasks they selected.



    Then the variation within the ME-CFS is pretty stark 'groupings'. It's not what I thought though with people just giving up when they fail x amount of hard in a row.

    The following 6 ME-CFS don't seem to have a 'completion issue'/ are relatively consistent with what you might see in some of the HVs. ME-CFS C, E, F (who seems to be sensibly picking high probability or 50:50 and high value for hard), J (but even more only high probability and the odd 50:50), K (same on high probability and 50:50 higher value), M (same with high probabilitity and high value 50:50)

    ME-CFS N failed twice (having failed to complete once in the warm-up too), not in a row (there was a completion of one hard in between), but they then seemed to only pick high probability trials as 'hard' and completed them, choosing 15 hard in total - by comparison HV N only chose 13 hard but M 20 hard.


    There is a group with 'a lot of red' from non-completion yet clearly continuing to choose hard after that of 4 ME-CFS. Excluding the 'prep' trials ME-CFS A failed to complete all but 2 of the 15 they chose hard for (similar strategic choices in those just a few clicks short), B only chose 9 hard and only complete 3 of those falling short by just one on a few so clearly a capability issue, D chooses hard loads despite failing nearly every time by significant amounts (clicks in the 70s and 80s) and is clearly determined to 'get there' finally managing two completions and two near-misses of one right at their end hard ones, H selects hard loads of times based on probability and value but fails to complete with clicks in the 80s

    Then there are the 'in-betweeners' who to my eye are clearly being affected by capability issues in some way in the task just not to the extent of the group above.

    L was 'OK' and selected hard a good bit early on, but failed 3 in the middle, 2 by quite a way (83, 85 clicks) by trial 27 and then only picked hard 3 more times (which they completed and were high probability, high value).

    O looks like 'fatigue/fatiguability' too as after warm-up fails they then do 5 successful hard completions early on, one fail (96), two completions, one fail (97), one completion, three fails (96, 96, 97) then only selects hard 3 more times. It's not quite obvious/direct enough on whether the ones failed were 'sequential' but their first fail was doing hard for trial 9 (successful) and then 10 (just missed), then 13 and 15 they selected hard and completed and 17 they selected hard again and failed (97 clicks tho), 19 completed and then 20 another hard straight-after they failed (97) then 23, 24 hard selected and failed both (97,96) their next hard was trial 29 and they completed, failed 32 (97) and then completed 37.

    So it's easy for me to look at and relate and think the person was 'borderline' ie their clicks when they missed were just a few off vs the group above who were often 10-20 clicks away+, given when they missed it was so close and my gut is that fatiguability is playing a part - but provability-wise the stats wouldn't be there etc.

    G fails to complete hard ones 6 times in a row early (by just a few clicks) then selects hard 7 more times a bit more spaced out and completes 5/7 times.

    I shows a similar pattern failing 4 hard early on (90-95), completing one, failing one (97), completing 2 then failing 1 (96) and then completing 4 more hards. Interesting to note here for 'I' that those two latter fails where when they had selected 2 trials as 'hard' in a row (26 they complete, 27 they fail; 36 they complete, 37 they fail).


    It makes it quite hard to come up with an analytical strategy that would be a neat 'calculation' - unless you can think of some genius? But I think it is worth analysing at the descriptive level and noting the 'within group' variation is significant. As well as some of the inferences perhaps then being not there because 'fails to complete' could be coming from one sub-group (who seemed to perhaps to be desperate to give it a go and try and eventually get the odd win 'I will manage 98' - which sounds like me on some days in the past with my illness) where another group are doing strategic things to manage their perhaps less extreme functional issue by 'picking less hard' or other things etc. for the purposes I suspect of trying to get the best out of the body they are working with (also sounds like me on better days where I have had a little more in the capability-tank than blind-effort so had to use it wisely)
     
    Last edited: Mar 2, 2024
    Peter Trewhitt, Keela Too and EndME like this.
  20. bobbler

    bobbler Senior Member (Voting Rights)

    Messages:
    3,734
    Tbf I'm wondering if there are ways of getting out of this some neat visual angles. Given the complexity of the context of either explaining to someone 'the tool' and/or 'the condition' at the very least starts making the whole thing a heck of a communication load which would be eased by having the odd thing to point at.

    So any people who might be good at graphs and visuals and knowing how to link those with circled out bits of 'the game' or 'the condition' that these parts might link to would perhaps also help. It's a big cognitive load to layer on describing the twists of the game and the illness and so on and then try and point out 'and so this stat'. So communication angles might be useful to think of too.

    There might be other bits where 'common sense doesn't add up' findings just jump out too. But eg when I look at the above, even if we were to use it then I'm trying to get my head around how you could 'display it' - would it be taking each participant as a 'column' and then showing their conditionally-formatted completion for all their hards (but then you couldn't use trial number because not everyone chose the same trial)
     
    Peter Trewhitt and Keela Too like this.

Share This Page