Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

rvallee · Mar 1, 2024

Sid said:
Are you planning to write a letter to the editor? This right here invalidates their whole thing.

I think we have a basis to demand that. This entire test and any discussion of it needs to be removed from the paper. It will still be a largely useless study, but at least it will not cause more harm.

I don't have the mental/energy bandwidth to do this, but it should be rather short as the creators of the test made it explicitly clear that it's about reward and should not be affected by performance.

Damn we are so close to being able to rely on AIs to do this. It will make things so much easier for us.

Karen Kirke · Mar 1, 2024

Does this help to visualise the fundamental problem with the effort task's validity when used to compare pwME and healthies?

To me, it helps me see that while, yes, the blue lines are a little lower for the patients than the healthies, the real issue is the difference between the red lines, ie between patients' and healthies' ability to successfully complete hard tasks when they try to.

Patient H does not have a red line because despite valiantly attempting 18 hard tasks, they did not complete any successfully.

Would a different chart type/visualisation show this better?

EndME · Mar 1, 2024

Karen Kirke said:
Does this help to visualise the fundamental problem with the effort task's validity when used to compare pwME and healthies?

To me, it helps me see that while, yes, the blue lines are a little lower for the patients than the healthies, the real issue is the difference between the red lines, ie between patients' and healthies' ability to successfully complete hard tasks when they try to.

Patient H does not have a red line because despite valiantly attempting 18 hard tasks, they did not complete any successfully.

Would a different chart type/visualisation show this better?

View attachment 21245

Great illustration! Thanks a lot!

This shows the high variance among pwME in their abilities to complete hard tasks. It seems patient H never got a fair chance at playing the game at all. What happens to the conclusion if we decide to exclude those patients?

If it's possible and who have the time and energy, the following seems sensible to me:

Something that I believe could also be done to this chart or even better, make an additional chart is that certain buckets will help interpreting this data. One will need buckets depending on the size of magnitude of reward and also the probability attached to the task (perhaps you might want to plot expected value first to make things a bit easier). For reward magnitude other studies have typically used 3 buckets low, medium and high (in this study where the maximal reward was 4.00 they used: high >$3, medium $2.01–$3.00, and low <$2).

Otherwise the above data might still just mean pwME are less likely to finish hard tasks if the chance of a high reward is minimal (ideally one would also like to know whether they have already successully completed 2 tasks of a higher value and given the reward, or even better whether their current average reward is below the reward of the trial, but that will further complicate the analysis and at the end of the day players are not supposed to think but play intuitively), which could also just mean something along the lines of "they are playing the game better" or that they need more motivation to complete something, who knows. Funnily enough, the reverse arguments has also been used in EEfRT schizophrenia studies that conclude pwSCZ display non-ideal behaviour because their strategies are bad "Thus, individuals with schizophrenia displayed inefficient effort allocation for trials in which it would be most advantageous to put forth more effort, as well as trials when it would appear strategic to conserve effort."

To not make things too complicated I would just start of by graphing the above data for the high probability + high reward trials. If that data looks similar to the data above then it is very clear to me that they simply cannot exert themselves at all.

Another thing that could be done is to repeat the above graph but split into 2 (or more), for the first half of the total number of rounds played and then for the second half of the game. Does this show fatigueability in the pwME but not in the HVs?

All of these suggestions might be obsolete if this was already sufficiently analysed in the paper. I still haven't gotten to looking at the data from the intramural study.

Simon M · Mar 1, 2024

Karen Kirke said:
Would a different chart type/visualisation show this better?

Thank you for this analysis.

I've always had a thing for making graphs easy to understand and would like to make a couple of suggestions (without considering changing chart type):

1. The paper consistently uses red for pwme and blue for HV and think we should stick with that for % hard choices. The completion rate might be another colour (e.g. pale pink, pale blue) or, say, black for both.
2. Rather than rank alphabetically, place pwme and hv in order ranked by % hard choices.

Possibly add a mean/median line for each group, or show this data as text. That would help show both between and within group differences.

Added:
Or a scatter plot?

Karen Kirke · Mar 1, 2024

Simon M said:
Thank you for this analysis.

I've always had a thing for making graphs easy to understand and would like to make a couple of suggestions (without considering changing chart type):

1. The paper consistently uses red for pwme and blue for HV and think we should stick with that for % hard choices. The completion rate might be another colour (e.g. pale pink, pale blue) or, say, black for both.
2. Rather than rank alphabetically, place pwme and hv in order ranked by % hard choices.

Possibly add a mean/median line for each group, or show this data as text. That would help show both between and within group differences.

Added:
Or a scatter plot?

I've sent you a message! I wanted to do number 2 but my brain imploded at the thought of patients and healthies being mixed and at the thought of how to separate them. You clearly have skillz. So maybe we can collaborate?

Karen Kirke · Mar 1, 2024

EndME said:
who have the time and energy

Glad it made some sense. My brain can't follow this after chart-exertion, but other brains will and I trust they will reply and do what needs doing!

Karen Kirke · Mar 1, 2024

Simon M said:
2. Rather than rank alphabetically, place pwme and hv in order ranked by % hard choices.

Ooh I managed to do this bit.

Murph · Mar 1, 2024

I've been looking at this data for a few days now and thought I'd make an account here to post some of the things I've found.
First, each participants choices on a chart. I placed a dot high to show a hard choice (Hard on y-axis), low to show an easy choice (Easy on left axis). The top left chart is healthy volunteer H, depicted in blue. They chose easy on the first practice round, then chose hard twice, then easy on the round 4, the last practice round, etc.

The charts are arranged as per Simon's suggestion above, from most hard choices to fewest. I'd like to draw your attention to Healthy volunteer F at the bottom there. Theirs is the data that got chucked out. My next post is about that!

Murph · Mar 1, 2024

Healthy control F matters a lot. They chucked his data, but what his data shows is that EEfRT is a joke. To understand why I'm going to ask you to Imagine a lottery...

1. ... you will win two prizes drawn from a barrel. This is a pretty great lottery, because you choose the prizes that go in a barrel. I give you a choice. I have 50 prizes we can put in the barrel, some worth $1, some worth $2, some worth $3, some worth $4. You may put in as few or as many as you like. Would you put in 50 of many different values? Or simply put in two prizes both worth $4?

2. Remember that in the effort preference test you get paid for two of your wins. If you have only a few wins, those will be the ones NIH pay out on. Like putting just two prizes in the barrel. If you could complete two wins worth $4.12 and that's all, the NIH would pay you $8.24.

3. And that's what's healthy control F tried to do. He lost on purpose when the prize was low. To win on easy you had to press 30 times. He would chose easy and stop on exactly 29. He was playing an optimal strategy to maximise payout. IN other words, he was trying his hardest to win the game. That confused the researchers. They chucked the data out.

This next chart shows each round and how often he pushed the button. When he wins the round, I outlined it in green.

Healthy volunteer F is a 21 year old male. The four trial rounds show he can easily do the button-pressing. In what follows he chooses not to.

This next chart is the same as the above, but with a bit more information. It shows that he played to win only the rounds where the prize was high. After he got a feel for the range of prizes on offer, he chose hard and completed the task only when the prize was over $3.50. If the prize was low he didn't try to win (except round 23, where even if you did win there was only a 12% probability of the prize being awarded and added to the metaphorical prize basket).

4. So it turns out the test was solvable. Most people just tried to push buttons as much as they could. But this guy understood it. It meant he mostly chose easy. That confounded the primary endpoint (how often do you choose hard). The metric is supposedly validated in depressed people ; looks like they didn't battle-test it enough!

5. No other participant took it to the same extreme. But there are signs others flirted with a similar strategy, choosing easy and not trying to win in certain rounds. Throwing out the data of only one participant is suss. Smarter would be to drop the whole metric. Certainly drawing major conclusions based on such a fundamentally flawed game is dumb.

tl;dr, despite what they think, the EEfRT can be played strategically rendering it void as a measure of anything.

Jonathan Edwards · Mar 1, 2024

Is the punch line that HV F looks just like ME B?
Cross posted. It seems not quite but maybe ME B was on to it?

Karen Kirke · Mar 1, 2024

My second attempt, which addresses @Simon M 's 2nd point, but not his first, because his first is beyond me. Hopefully one or other of us will be able to make it better at some point.

Karen Kirke · Mar 1, 2024

Jonathan Edwards said:
Is the punch line that HV F looks just like ME B?
Cross posted. It seems not quite but maybe ME B was on to it?

I thought patient B might have been purposely giving himself breaks, as he had four tasks in the second half of his 53 trials where he only pressed the button a few times - no-one else did this (unless Healthy F did it), and he did it four times, interspersed with successfully completed tasks. I thought he was giving his hands a rest, because he needed to. But I did not look at the probability and rewards, so @Murph 's thing may hold for patient B too.

Edited to correct.

Jonathan Edwards · Mar 1, 2024

Karen Kirke said:
I think he was giving his hands a rest, because he needed to.

Very possibly. But had he realised what F had as well?

Karen Kirke · Mar 1, 2024

Jonathan Edwards said:
Very possibly. But had he realised what F had as well?

Sorry I was editing my post as realisation dawned that Murph's point could apply. I'll leave that to others to check.

Karen Kirke · Mar 1, 2024

Correction:
My original post stated that 7/17 (41%) of patients had a lower success rate for hard tasks than all healthy volunteers. This should have been 7/15, making the correct percentage 47%.

No wonder there were so many zeros on that p-value.

HVs were more likely to complete hard tasks (OR = 27.23 [6.33, 117.14], p < 0.0001)

Surely the major finding of this task should have been that patients couldn't do the hard task due to their condition and as such, it had to be removed from the analysis.

andrewkq · Mar 1, 2024

Sid said:
Are you planning to write a letter to the editor? This right here invalidates their whole thing.

Sorry I haven't been very active the past two days, all this work has me crashing pretty hard.

Yes I think I'd like to write a letter to the editor arguing that the task was misused and that the results were misinterpreted, based largely on the 65% completion rate finding. I worked in an affective neuroscience lab for 3 years after undergrad running participants on similar tasks to EEfRT and I've been a co-author on a few papers in the same general area, so I feel like I could write it, but I only have a bachelors degree (thanks ME) so I think I'd need to get some PhDs to join as co-authors in order to have any hope of a letter to the editor getting published. I was thinking that I'd reach out to Treadway, present the concerns to him, and ask if he'd be willing to be a co-author. I figure the worst that could happen is he says no. I've never done this before so definitely open to thoughts people have, especially around whether this is enough to warrant an explicit retraction request and how that is usually done.

Jonathan Edwards · Mar 1, 2024

andrewkq said:
I'd need to get some PhDs to join as co-authors in order to have any hope of a letter to the editor getting published. I was thinking that I'd reach out to Treadway, present the concerns to him, and ask if he'd be willing to be a co-author.

Brian Hughes might be interested in co-authoring. I am happy to, but not an expert in the area. There are one or two other senior members who might be ready to join in, although not chipping in just at present. I guess Treadway might or might not want to get involved but might join a letter expressing methodological concern. There may well be scope for a more extended response to the study which those outside the immediate field would probably not want to join, but that is probably a different project.

EndME · Mar 1, 2024

andrewkq said:
Sorry I haven't been very active the past two days, all this work has me crashing pretty hard.

Yes I think I'd like to write a letter to the editor arguing that the task was misused and that the results were misinterpreted, based largely on the 65% completion rate finding. I worked in an affective neuroscience lab for 3 years after undergrad running participants on similar tasks to EEfRT and I've been a co-author on a few papers in the same general area, so I feel like I could write it, but I only have a bachelors degree (thanks ME) so I think I'd need to get some PhDs to join as co-authors in order to have any hope of a letter to the editor getting published. I was thinking that I'd reach out to Treadway, present the concerns to him, and ask if he'd be willing to be a co-author. I figure the worst that could happen is he says no. I've never done this before so definitely open to thoughts people have, especially around whether this is enough to warrant an explicit retraction request and how that is usually done.

I'll definitely be joining in! Before that I'll still plot some different data though, for example completion rates on hard tasks with low expected values, as well as completion rates on different hard tasks as time progresses (otherwise Walitt et al might argue something like pwME weren't having problems with completing hard tasks but were having problems motivating themselves to complete hard tasks with low expected values, so I'd like to make sure of all the necessary details first).

I've also thought about reaching out to Treadway or alternatively Ohmann (from what I've gathered he might be a bit more interested in critically analysing the EEfRT) and think that could be a solid idea. I'll get back to you once I've analysed all the data I still want to analyse, or just by Monday.

I also had to end my PhD studies due to ME, so am a MSc, but I'm fairly certain that, that will be no problem especially as other members on here have sufficient credentials to join us on these endeavours, wherever they may lead to. But I also don't think time is running away too quickly. I think we should first wait for a response by Walitt to your email and then also ask them if or when they are planning to publish a seperate paper on "EEfRT in ME" since many companion papers were orginially planned and I'd be a bit suprised if this isn't planned, since it's one of their main results. In that case one may even also write ones own paper with "our analysis" if one wishes to.

In either case if things keep adding u,p like they have been, an extended response to the study seems feasible. I wouldn't be surprised if other users still find some things that look like irregularities whilst fishing through the data.

But I also need to take a break for now and I'll be back on Monday.

bobbler · Mar 1, 2024

EndME said:
I haven’t even gotten to looking at the actual data yet, but in case it hasn’t been mentioned yet, data on the following things would also seem interesting to me:

Was it mentioned whether someone was ambidextrous (seems unlikely at this sample size, but would still be possible)?

How often do HV do hard rounds after each other, how often do pwME do hard rounds after each other, how do these statistics change as the game progresses?

Did HV or pwME time out more often upon given certain choices?

As @andrewkq said ME patients had a significantly lower completion rate for hard trials, which could invalidate the data according to Wallits original paper. In “Trait Anticipatory Pleasure Predicts Effort Expenditure for Reward” Treadway further states “There was also a significant negative effect of trial number, which is routinely observed in studies using the EEfRT [47,48], potentially reflecting a fatigue effect.” Have other papers looked at such things? Is there an analysis for completion rate of hard trials in pwME as the game progresses? What can we see in the choices of the first 4 test rounds compared to the choices as the trial progress? Do learning effects, dominate motivational effects? Other trials have found that expected value is a significant independent predictor of choice. It might be interesting to look at something like "real expected value" which would be a combination of expected value and probability of completing a hard task and whether that differs here to other studies.

It’s hard to exclude someone on the basis of that they are playing a strategy a posteriori. In a game everything can be considered a strategy, even randomly pressing a button. If you want to exclude certain strategies that you believe don’t capture the nature of the game or are strategies that are non-reflective of the psychological phenomena that you want to study, because they are outliers, then it’s most sensible to specify which strategies are not allowed/will be excluded before the game starts. Doing this a posteriori creates some problems, if not done for very specific reasons (like a broken button) or if not rigorously justified (other EEfRT studies also look at noncompliance of participants and I have started to look into this). Especially if all the results of your study depend on this exclusion. In a sample size this small there will often be statistical outliers that change your results depending on what you’re looking at, the authors should have known this. Depending on what you look at PI-ME/CFS D & H could also be “outliers” in terms of completion rate, whilst PI-ME/CFS B could also be an “outlier” in terms of how often they choose an easy task. If they had something like prespecified exclusion criteria for data this would seem very fair (there have been over 30 EEfRT studies, so they should have sufficient knowledge to do this). Only looking at completion rate looks like a bad a posteriori exclusion criteria to me (because the completion rate depends on the choices your given in the game, your capabilities, the results in your first rounds etc, i.e. it depends on your “strategy”), but who knows. If the authors reasoning is somewhere along the lines “his strategy is non-reflective of the average strategy in the population” then that reads more as a sign to me that your sample size isn’t able to effectively reflect the average population, especially if one “outlier” completely changes your analysis. Perhaps the authors can provide an analysis where “outliers” aren’t thrown out, but instead “averaged out” which is the expected behaviour you would see if your sample size was sufficiently powered and if your sample was reflective of the average population.

Note: I haven’t had time to look at the data yet, but quickly glancing over it, it’s already very clear that whilst the person that was excluded (HV F) has by far the lowest completion rate, he is also clearly playing a non-optimal strategy.

I will keep looking at other EEfRT studies to see how often people were excluded from the analysis and for what reasons and whether completion rate is one of those.

How capable and which choices do HV and pwME make at maximal possible income? I.e. what choices are made when both the option 88% win probability shows up alongside the maximal reward for the hard task $4.30 and how likely are wins in that scenario (it further seems sensible to me to look at this data at different probabilities and some intervals around the maximal reward).

The original 2009 EEfRT paper found gender to influence the results “We also found a main effect of gender, with men making more hard-task choices than women (F(1,59) = 3.9, p = .05). Consequently, gender was included as a covariate in all subsequent analyses.”. Is such an analysis in included in the intramural study (note in the PV group there are more proportionally more males than in the ME/CFS group). For most parts of the study they actually did a sex dependent analysis even if the sample sizes were miniscule. Was the same done here and if not what would the results of that be? I will have a look at some other EEfRT papers to see if sex differences is something that is commonly reported.

Finally, as @Trish mentioned, the team said subsequent studies would be published. Given the focus point of the study, I’d be surprised if they wouldn’t publish a study called “The EEfRT in ME/CFS”. Apart from actually reading more about the "EEfRT" in general, having to first wrap my head around everything and still having to actually have a look at the data in the intramural study, that's one of the reasons why I don't think an immediate response makes sense. A response seems sensible to me, but one should at least wait for the answer @andrewkq is given and then decide on further steps.

OK I'm trying to go through and look at whether eg two non-completions put off participants from picking hard (or something similar). I've not doing anything high-brow, just filters and hiding columns initially and then used simple conditional formatting

The first thing that struck me when I filtered by completion and hard was how few HVs had failed to complete hard tasks. Or to be more precise, how few hard tasks HVs had failed to complete.

Of the hard tests chosen, one HV failed to complete 6 'live' and 2 further prep (- numbers on excel sheet) tests, and all the rest of the HVs only failed to complete 5 'live' and 3 'prep' ones. Compared to over 100 non-completions from ME-CFS. It's pretty striking actually when you just chuck those filters on.

I then tried highlighting the last ones failed to complete and then removing that filter to just have 'hard selected' and see if there was any pattern where any participants obviously stopped choosing hard after that. So far that doesn't look like the case. In fact if you use conditional formatting on the two columns 'complete task yes or no' and 'reward yes or no' (I put this on because.. you know it is testing motivation so 'look what you could've won' vs 'was a no-win one anyway' was worth keeping an eye on) the difference between participants becomes quite striking.

HVA failed to complete 6 rounds early on (complete one, failed one, completed one, failed 5 hard ones, then completed the rest of the hard ones they selected) and then got it together and it's 'all green' (I conditionally formatted 'completing tasks' into green and red for failed) pretty much for them. They did 12 more hard ones after they got it together and seemed to be basing that on reward magnitude as they seemed to have selected enough low probability that didn't seem a discriminator for them.

Across the rest of the HVs you've then just got HVB failing 2 non-prep hard ones and E, N, O each failing to complete just one out of all the hard tasks they selected.

Then the variation within the ME-CFS is pretty stark 'groupings'. It's not what I thought though with people just giving up when they fail x amount of hard in a row.

The following 6 ME-CFS don't seem to have a 'completion issue'/ are relatively consistent with what you might see in some of the HVs. ME-CFS C, E, F (who seems to be sensibly picking high probability or 50:50 and high value for hard), J (but even more only high probability and the odd 50:50), K (same on high probability and 50:50 higher value), M (same with high probabilitity and high value 50:50)

ME-CFS N failed twice (having failed to complete once in the warm-up too), not in a row (there was a completion of one hard in between), but they then seemed to only pick high probability trials as 'hard' and completed them, choosing 15 hard in total - by comparison HV N only chose 13 hard but M 20 hard.

There is a group with 'a lot of red' from non-completion yet clearly continuing to choose hard after that of 4 ME-CFS. Excluding the 'prep' trials ME-CFS A failed to complete all but 2 of the 15 they chose hard for (similar strategic choices in those just a few clicks short), B only chose 9 hard and only complete 3 of those falling short by just one on a few so clearly a capability issue, D chooses hard loads despite failing nearly every time by significant amounts (clicks in the 70s and 80s) and is clearly determined to 'get there' finally managing two completions and two near-misses of one right at their end hard ones, H selects hard loads of times based on probability and value but fails to complete with clicks in the 80s

Then there are the 'in-betweeners' who to my eye are clearly being affected by capability issues in some way in the task just not to the extent of the group above.

L was 'OK' and selected hard a good bit early on, but failed 3 in the middle, 2 by quite a way (83, 85 clicks) by trial 27 and then only picked hard 3 more times (which they completed and were high probability, high value).

O looks like 'fatigue/fatiguability' too as after warm-up fails they then do 5 successful hard completions early on, one fail (96), two completions, one fail (97), one completion, three fails (96, 96, 97) then only selects hard 3 more times. It's not quite obvious/direct enough on whether the ones failed were 'sequential' but their first fail was doing hard for trial 9 (successful) and then 10 (just missed), then 13 and 15 they selected hard and completed and 17 they selected hard again and failed (97 clicks tho), 19 completed and then 20 another hard straight-after they failed (97) then 23, 24 hard selected and failed both (97,96) their next hard was trial 29 and they completed, failed 32 (97) and then completed 37.

So it's easy for me to look at and relate and think the person was 'borderline' ie their clicks when they missed were just a few off vs the group above who were often 10-20 clicks away+, given when they missed it was so close and my gut is that fatiguability is playing a part - but provability-wise the stats wouldn't be there etc.

G fails to complete hard ones 6 times in a row early (by just a few clicks) then selects hard 7 more times a bit more spaced out and completes 5/7 times.

I shows a similar pattern failing 4 hard early on (90-95), completing one, failing one (97), completing 2 then failing 1 (96) and then completing 4 more hards. Interesting to note here for 'I' that those two latter fails where when they had selected 2 trials as 'hard' in a row (26 they complete, 27 they fail; 36 they complete, 37 they fail).

It makes it quite hard to come up with an analytical strategy that would be a neat 'calculation' - unless you can think of some genius? But I think it is worth analysing at the descriptive level and noting the 'within group' variation is significant. As well as some of the inferences perhaps then being not there because 'fails to complete' could be coming from one sub-group (who seemed to perhaps to be desperate to give it a go and try and eventually get the odd win 'I will manage 98' - which sounds like me on some days in the past with my illness) where another group are doing strategic things to manage their perhaps less extreme functional issue by 'picking less hard' or other things etc. for the purposes I suspect of trying to get the best out of the body they are working with (also sounds like me on better days where I have had a little more in the capability-tank than blind-effort so had to use it wisely)

bobbler · Mar 2, 2024

Tbf I'm wondering if there are ways of getting out of this some neat visual angles. Given the complexity of the context of either explaining to someone 'the tool' and/or 'the condition' at the very least starts making the whole thing a heck of a communication load which would be eased by having the odd thing to point at.

So any people who might be good at graphs and visuals and knowing how to link those with circled out bits of 'the game' or 'the condition' that these parts might link to would perhaps also help. It's a big cognitive load to layer on describing the twists of the game and the illness and so on and then try and point out 'and so this stat'. So communication angles might be useful to think of too.

There might be other bits where 'common sense doesn't add up' findings just jump out too. But eg when I look at the above, even if we were to use it then I'm trying to get my head around how you could 'display it' - would it be taking each participant as a 'column' and then showing their conditionally-formatted completion for all their hards (but then you couldn't use trial number because not everyone chose the same trial)

Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Senior Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)