Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Eddie · Mar 2, 2024

Murph said:
And that's what's healthy control F tried to do. He lost on purpose when the prize was low. To win on easy you had to press 30 times. He would chose easy and stop on exactly 29. He was playing an optimal strategy to maximise payout. IN other words, he was trying his hardest to win the game. That confused the researchers. They chucked the data out.

That's awesome, I think that's exactly what's going on.

The authors concluding that this is a test of effort preference when the optimal strategy involves failing the vast majority of tasks is so ironic. You know it's bad when you can maximize returns by pressing the button a total of zero times in all but two of the tasks (so long as you get lucky with the probabilities). Sorry if I missed it but, what was their rationale for throwing out the data of HC F when presumably he got paid out more than the vast majority of other participants?

Also I think that a major issue with this game is the absolutely paltry rewards. Imagine if participants had gotten to keep all their winnings (not receiving two randomized ones) and the rewards were 100x larger. We would have almost certainly seen a significantly higher % of all participants choosing the harder tasks (assuming that you couldn't win more by doing many fast easy tasks). If by the end of that test the ME/CFS participants had started failing the harder tasks (or choosing easier ones they could complete) we wouldn’t assume that they had an effort preference issue, we would assume they had a fatigue/PEM problem.

bobbler · Mar 2, 2024

bobbler said:
OK I'm trying to go through and look at whether eg two non-completions put off participants from picking hard (or something similar). I've not doing anything high-brow, just filters and hiding columns initially and then used simple conditional formatting

The first thing that struck me when I filtered by completion and hard was how few HVs had failed to complete hard tasks. Or to be more precise, how few hard tasks HVs had failed to complete.

Of the hard tests chosen, one HV failed to complete 6 'live' and 2 further prep (- numbers on excel sheet) tests, and all the rest of the HVs only failed to complete 5 'live' and 3 'prep' ones. Compared to over 100 non-completions from ME-CFS. It's pretty striking actually when you just chuck those filters on.

I then tried highlighting the last ones failed to complete and then removing that filter to just have 'hard selected' and see if there was any pattern where any participants obviously stopped choosing hard after that. So far that doesn't look like the case. In fact if you use conditional formatting on the two columns 'complete task yes or no' and 'reward yes or no' (I put this on because.. you know it is testing motivation so 'look what you could've won' vs 'was a no-win one anyway' was worth keeping an eye on) the difference between participants becomes quite striking.

HVA failed to complete 6 rounds early on (complete one, failed one, completed one, failed 5 hard ones, then completed the rest of the hard ones they selected) and then got it together and it's 'all green' (I conditionally formatted 'completing tasks' into green and red for failed) pretty much for them. They did 12 more hard ones after they got it together and seemed to be basing that on reward magnitude as they seemed to have selected enough low probability that didn't seem a discriminator for them.

Across the rest of the HVs you've then just got HVB failing 2 non-prep hard ones and E, N, O each failing to complete just one out of all the hard tasks they selected.

Then the variation within the ME-CFS is pretty stark 'groupings'. It's not what I thought though with people just giving up when they fail x amount of hard in a row.

The following 6 ME-CFS don't seem to have a 'completion issue'/ are relatively consistent with what you might see in some of the HVs. ME-CFS C, E, F (who seems to be sensibly picking high probability or 50:50 and high value for hard), J (but even more only high probability and the odd 50:50), K (same on high probability and 50:50 higher value), M (same with high probabilitity and high value 50:50)

N failed twice (having failed to complete once in the warm-up too), not in a row (there was a completion of one hard in between), but they then seemed to only pick high probability trials as 'hard' and completed them, choosing 15 hard in total - by comparison HV N only chose 13 hard but M 20 hard.

There is a group with 'a lot of red' from non-completion yet clearly continuing to choose hard after that of 4 ME-CFS. Excluding the 'prep' trials ME-CFS A failed to complete all but 2 of the 15 they chose hard for (similar strategic choices in those just a few clicks short), B only chose 9 hard and only complete 3 of those falling short by just one on a few so clearly a capability issue, D chooses hard loads despite failing nearly every time by significant amounts (clicks in the 70s and 80s) and is clearly determined to 'get there' finally managing two completions and two near-misses of one right at their end hard ones, H selects hard loads of times based on probability and value but fails to complete with clicks in the 80s

Then there are the 'in-betweeners' who to my eye are clearly being affected by capability issues in some way in the task just not to the extent of the group above.

L was 'OK' and selected hard a good bit early on, but failed 3 in the middle, 2 by quite a way (83, 85 clicks) by trial 27 and then only picked hard 3 more times (which they completed and were high probability, high value).

O looks like 'fatigue/fatiguability' too as after warm-up fails they then do 5 successful hard completions early on, one fail (96), two completions, one fail (97), one completion, three fails (96, 96, 97) then only selects hard 3 more times. It's not quite obvious/direct enough on whether the ones failed were 'sequential' but their first fail was doing hard for trial 9 (successful) and then 10 (just missed), then 13 and 15 they selected hard and completed and 17 they selected hard again and failed (97 clicks tho), 19 completed and then 20 another hard straight-after they failed (97) then 23, 24 hard selected and failed both (97,96) their next hard was trial 29 and they completed, failed 32 (97) and then completed 37.

So it's easy for me to look at and relate and think the person was 'borderline' ie their clicks when they missed were just a few off vs the group above who were often 10-20 clicks away+, given when they missed it was so close and my gut is that fatiguability is playing a part - but provability-wise the stats wouldn't be there etc.

G fails to complete hard ones 6 times in a row early (by just a few clicks) then selects hard 7 more times a bit more spaced out and completes 5/7 times.

I shows a similar pattern failing 4 hard early on (90-95), completing one, failing one (97), completing 2 then failing 1 (96) and then completing 4 more hards. Interesting to note here for 'I' that those two latter fails where when they had selected 2 trials as 'hard' in a row (26 they complete, 27 they fail; 36 they complete, 37 they fail).

It makes it quite hard to come up with an analytical strategy that would be a neat 'calculation' - unless you can think of some genius? But I think it is worth analysing at the descriptive level and noting the 'within group' variation is significant. As well as some of the inferences perhaps then being not there because 'fails to complete' could be coming from one sub-group (who seemed to perhaps to be desperate to give it a go and try and eventually get the odd win 'I will manage 98' - which sounds like me on some days in the past with my illness) where 'picking less hard' or other things from another etc. for the purposes I suspect of trying to get the best out of the body they are working with (also sounds like me on better days where I have had a little more in the capability-tank than blind-effort so had to use it wisely)

OK so I've tried to see if I could put this into a table format that could be more pictoral than scrolling down and down through looking at each participant in order. I've stripped out data to ration it down to just 'complete' (1 and green for completed, 0 and red for not completed), and clicks (anything less than 98 is non-completing on hard tasks). For hard tasks chosen only.

I haven't got ALL of the trial numbers on there - only so far the ones that any participant chose hard for, so there will be some gaps (which would just be back-filled by adding a blank row, as it would be a trial where all picked 'easy' or didn't do because the trial timed out/ended), a quick look and it might just be number 5 all the way up to trial 49.

Anyway it is me trying to show what I've described above, which was quite stark when I looked through. On this table each participant is in a different column so you are looking 'down' at the colours' and noting how there are differing patterns as you go across in that pattern of many HVs who are 'all green' vs those who had lots of non-completions (most of whom were ME-CFS, just one HV had many non-completions)

I'm not sure I'm quite there on it being striking, I've been trying to play with formatting to work with size, lines and so on to try and make those colour patterns downwards be easier to see.

bobbler · Mar 2, 2024

Karen Kirke said:
I thought patient B might have been purposely giving himself breaks, as he had four tasks in the second half of his 53 trials where he only pressed the button a few times - no-one else did this (unless Healthy F did it), and he did it four times, interspersed with successfully completed tasks. I thought he was giving his hands a rest, because he needed to. But I did not look at the probability and rewards, so @Murph 's thing may hold for patient B too.

Edited to correct.

Patient B is unusual because they only choose hard 3 times when given 'high probability' (which is when you'd be more likely to choose high on basis of it being 88% chance of 'counting' by being a win trial). The values are: 3.22, 1.96, 3.94. They complete all 3

And only chooses hard once when it was low probability (with a small value) and then only does 85 clicks after apparently having spent 5 secs deciding what to do/whether to do choose hard or easy.

Then on 50:50 probability they select hard 10 times. Only failing to complete once with a few clicks off for a relatively smaller value (1.73) whereas those others were higher values somewhat, but I don't know they were 'the highest'.

On Treadway's description apparently everyone got each value magnitude shown to them with each probability combination. I'm not sure this is the case here, but if so then that would mean it was only when things were 50:50 that HVB seemed to be motivated to select hard.

Sid · Mar 2, 2024

andrewkq said:
Sorry I haven't been very active the past two days, all this work has me crashing pretty hard.

Yes I think I'd like to write a letter to the editor arguing that the task was misused and that the results were misinterpreted, based largely on the 65% completion rate finding. I worked in an affective neuroscience lab for 3 years after undergrad running participants on similar tasks to EEfRT and I've been a co-author on a few papers in the same general area, so I feel like I could write it, but I only have a bachelors degree (thanks ME) so I think I'd need to get some PhDs to join as co-authors in order to have any hope of a letter to the editor getting published. I was thinking that I'd reach out to Treadway, present the concerns to him, and ask if he'd be willing to be a co-author. I figure the worst that could happen is he says no. I've never done this before so definitely open to thoughts people have, especially around whether this is enough to warrant an explicit retraction request and how that is usually done.

Reaching out to Treadway is a good idea as long as the letter is laser focused on the misinterpretation of the EEfRT task, not the wider ME/CFS politics of the study. I don't know him but generally most civilians have been very reluctant (to put it mildly) to wade into ME/CFS waters. You could also try contacting people who hold academic positions in psychology departments like Hughes and Wilshire.

bobbler · Mar 2, 2024

bobbler said:
Patient B is unusual because they only choose hard 3 times when given 'high probability' (which is when you'd be more likely to choose high on basis of it being 88% chance of 'counting' by being a win trial). The values are: 3.22, 1.96, 3.94. They complete all 3

And only chooses hard once when it was low probability (with a small value) and then only does 85 clicks after apparently having spent 5 secs deciding what to do/whether to do choose hard or easy.

Then on 50:50 probability they select hard 10 times. Only failing to complete once with a few clicks off for a relatively smaller value (1.73) whereas those others were higher values somewhat, but I don't know they were 'the highest'.

On Treadway's description apparently everyone got each value magnitude shown to them with each probability combination. I'm not sure this is the case here, but if so then that would mean it was only when things were 50:50 that HVB seemed to be motivated to select hard.

when you stick at focusing on the completion data but switch to looking only at the easy tasks then HVF stands out given you only get the odd fail from HVs in a mass of green completions.

ME-CFS B however does have quite a few fails on easy. However I don't know whether it is strategy in the sense of money/incentive because it seems like rest breaks from trial 26 onwards could also be possible due to the pattern they are in and it only doing eg 4 taps. and on one 3.76 50:50 trial they'd chosen easy and got 23 clicks through.

But all the rest of the HVs and indeed ME-CFS seem to have managed to complete nearly all of the easy ones.

bobbler · Mar 2, 2024

I'm putting a link to this post from @Karen Kirke which includes SF-36 info

Deep phenotyping of post-infectious myalgic encephalomyelitis/chronic fatigue syndrome, 2024, Walitt et al | Page 26 | Science for ME (s4me.info)

I'm sure there might be other disability-related scales somewhere we can also use to scan vs this, but it struck me when I saw the range within ME-CFS on this and vs HVs that the pattern on non-completion of hards and how to me it seems to 'group' seems reflective of this.

WHich to me would indicate a disability-issue with not having eg pre-calibrated the level of hard to individualise it for disability or something similar in as part of the checks process properly done etc

bobbler · Mar 2, 2024

bobbler said:
OK so I've tried to see if I could put this into a table format that could be more pictoral than scrolling down and down through looking at each participant in order. I've stripped out data to ration it down to just 'complete' (1 and green for completed, 0 and red for not completed), and clicks (anything less than 98 is non-completing on hard tasks). For hard tasks chosen only.

I haven't got ALL of the trial numbers on there - only so far the ones that any participant chose hard for, so there will be some gaps (which would just be back-filled by adding a blank row, as it would be a trial where all picked 'easy' or didn't do because the trial timed out/ended), a quick look and it might just be number 5 all the way up to trial 49.

Anyway it is me trying to show what I've described above, which was quite stark when I looked through. On this table each participant is in a different column so you are looking 'down' at the colours' and noting how there are differing patterns as you go across in that pattern of many HVs who are 'all green' vs those who had lots of non-completions

I'm not sure I'm quite there on it being striking, I've been trying to play with formatting to work with size, lines and so on to try and make those colour patterns downwards be easier to see.

View attachment 21267

I've just noticed how similar HVN and HVO look, and that might have been me making an error so I will check that when I'm operational to do so again. In case I've duplicated one or the other rather than the right data for each.

Yes I've dragged the copy of N across, because HVO only has one non-complete on trial 10.

I'll amend and update when I can but for now I'm intrigued whether the 'concept' of something like this as a way of getting things across ie are these the right variables (does the pattern seem as useful to anyone else) and how to visually make it 'readable' and the patterns more see-able etc might be good. ie is this useful?

Murph · Mar 2, 2024

Above I posted a chart of Healthy volunteer F and their button presses. Below is an equivalent chart for all participants, which shows two important things.

1. Several participants lose easy tasks at various points, possibly deliberately (look for short red bars). Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. None make such a habit of it as Healthy Volunteer F, however!
2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool.

As I said above, the chart shows two important things.

1. Several participants lose easy tasks at various points, possibly deliberately. Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. This makes the decision to chuck out HVF's data debatable. And with his data included the test shows no significant between group differences in the primary endpoint.
2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool, quoted here:

"An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. As a second manipulation check, we used trial number as an additional covariate in each of our GEE models."

Side observation: how amazing is it when authors make their data available directly with the paper. Makes a big difference. This is one thing NIH has done well with this paper.

Hutan · Mar 2, 2024

I'm interested in what the participants remember knowing about the experiment before they started.

If they understood that they would get paid for two rewards chosen randomly only from the tasks that they completed, then I think it would be fairly easy to realise that you want to keep the number of low value rewards down and just have a few of the highest value rewards. I think a significant number of people would work that out before the live games started. It is sort of hilarious that the smartest solution was to carefully select the most important work to do and not worry about the rest - pacing was the best strategy.

But, it's possible that the explanation wasn't clear or the participants misinterpreted what they were told, and so thought that they needed to try to get a reward for each task. I mean, it is a rather unusual, counterintuitive approach, to not pay out for each task, or an averaged amount, but to instead randomly select two rewards to pay. I wonder how the investigators explained it. Some participants might have thought that the pool of tasks that the reward would be chosen from included the ones that they didn't complete or tasks that ended up with a zero reward. If participants' understanding of the rules of the game they were playing differed, then that makes the experiment fairly worthless.

I'm also interested in what the participants were thinking as they did the task. Were they motivated to get the highest total payout? At what time of the day was the experiment done? Did participants find the tasks hard from the beginning; did they feel fatigued as time went on? As others have said, if they struggled to complete the tasks, that would invalidate the experiment.

I guess participants' retrospective accounts of what happened might be inaccurate, but I still think it would be interesting to hear them.

Simon M · Mar 2, 2024

Murph said:
2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool, quoted here:

Brilliant analysis and data presentation from @Murph.

Perhaps just as important, these patients keep trying despite many near-miss failures - that surely suggests they are trying VERY hard (because they nearly succeed and seem desperate to do so). Which is the opposite of what is suggested. By contrast, HVs have relatively few failures suggesting they don’t need to work so hard.

Unintentionally, EEfRT, which was inappropriately used (and interpreted) to show low effort preference of pwme, appears to show they are a bunch of triers. Hardly surprising, given their level of disability and the ordeal of intense testing they signed up to (shout to @Robert 1973 ).

Robert 1973 · Mar 2, 2024

andrewkq said:
I think I'd need to get some PhDs to join as co-authors in order to have any hope of a letter to the editor getting published.

My experience is that qualifications make little or no difference in determining whether letters are published in Journals. I don't have a degree and have had letters published in Nature, The Lancet etc. @Tom Kindlon has had numerous letters published and has no degree.

The problem with letters is the word limit. If you are collaborating, I would suggest coordinating so that different people write making different points in different letters, or submitting as a paper instead of a letter so that you can use more words. The latter might be harder to get published, but I'm not knowledgeable about that.

Apologies if this has already been said, I've not read all the posts.

Evergreen · Mar 2, 2024

I love this kind of discussion, where people are thinking and sharing and honing and finally figuring it out. So here's a summary of some of the key observations that moved it forward:

Bobbler & Simon M start spotting the real problem:

bobbler said:
if it turns out certain individuals were getting lots wrong then it isn't 'choice' in the same way really. EDIT: so it would be a representation of performance (that could be linked to exhaustion etc)

bobbler said:
"Next, the participant either completed 30 button presses in seven seconds with the dominant index finger if they chose the easy task, or 98 button presses in 21 s using the non-dominant little finger if they chose the hard task."
I often can't use my phone because of fatiguability in my arms, and if scrolling and tired I also often end up with my scrolling finger shaking with exhaustion to the point I can no longer use it. I frankly struggle to believe that someone would choose a task that involve such small muscles in people who get fatiguability, nevermind 98 presses PER go, in a set timeframe of 21 secs. 30 is actually bad enough. With what sounds like 6 seconds of 'rest' between these?

What is their 'peripheral fatigue' measure here? Because surely a relevant measure is signs of fatigue in the finger itself? How were they measuring this?

Simon M said:
The test was developed and validated on undergraduates – not on sick people or even older healthy volunteers. The developers validated that even the hard task didn’t cause exhaustion to their undergraduate subjects.

The paper presents no evidence that this is not a fatiguing task for people with ME.

Simon M said:
it is designed to fatigue healthy people (that is the effort, the cost), but it is explicitly designed not to exhaust...

bobbler said:
It wasn't supposed to be about chucking out one figure at the end '% hard' (particularly if you've modified it so you've nothing validated to compare it to from other trials) and suggesting they are the harder workers. Like Walitt seems to be pitching. The test just doesn't operate like that.

Sam Carter spots what healthy volunteer F is doing:

Sam Carter said:
I've been trying to work out what the optimal play is given the rules of the game.

I think it would be something like this: since your actual prize is two amounts chosen probabilistically from the basket of all the tasks you complete successfully, the best strategy is to i) flunk all the low-value tasks (i.e. don't press the button enough times to win), especially the $1 dollar easy tasks because you don't want to fill your basket with low value prizes, and ii) go hell for leather on the high-value, high-probability tasks so that your two actual prizes are pulled from the higher end of the range of prizes.

Did anyone do that? I think Healthy Volunteer F might have! Out of 52 games HVF only successfully completed 10 of them, one easy and nine hard (for which the prize was always $3.22 or higher).

HVF's data were declared invalid so someone must have noticed that the system was being gamed!

Andrewkq gets to the heart of the matter:

andrewkq said:
There was a massive difference between the groups in their ability to complete the hard trials. HVs completed hard and easy at a similar rate for both (means: easy 96%, hard 99%), but ME patients had a significantly lower completion rate for hard trials (means: easy 98%, hard 65%). This is exactly the result that Treadway warns would invalidate the data in his original paper, but Wallit et al. neglect to perform this validity check...I believe that this difference in ability actually invalidates the findings completely

Simon M said:
the 65% hard-task completion rate for PwME (thanks for running that analysis) shows the test was invalid for use in this study. Game over.

rvallee said:
Good grief this is ridiculous.

Murph exposes healthy volunteer F's gaming:

Murph said:
So it turns out the test was solvable. Most people just tried to push buttons as much as they could. But this guy understood it. It meant he mostly chose easy. That confounded the primary endpoint (how often do you choose hard).

Simon sums it up:

Simon M said:
Unintentionally, EEfRT, which was inappropriately used (and interpreted) to show low effort preference of pwme, appears to show they are a bunch of triers. Hardly surprising, given their level of disability and the ordeal of intense testing they signed up to

I like to think I contributed a teensy bit by hectoring people to look at the data and fangirling about @andrewkq 's observation!

I know we're all kind of lying groaning on the battlefield now, but it also feels like this hard task was worth it.

Edited to add a narrative.

rvallee · Mar 2, 2024

andrewkq said:
I was thinking that I'd reach out to Treadway, present the concerns to him, and ask if he'd be willing to be a co-author.

Seems worth the try to me. I am quite sure he would see it as a misuse of his test, and if he cares about its validity, he should be motivated to at least say so.

rvallee · Mar 2, 2024

Murph said:
Several participants lose easy tasks at various points, possibly deliberately (look for short red bars). Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. None make such a habit of it as Healthy Volunteer F, however!
2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool.

Again reminding that in the original design validation, the completion rates were 98% and 96%. Failing the easy task here clearly shows a strategic choice based around the way the 'game' was designed, which invalidates the test in yet another way.

And yeah basically they threw away HV F's results because he played the game as they designed it. I guess the players were not supposed to catch on it, but they clearly did. There are so many reasons why this test should have been thrown out, it makes the researchers look like a bunch of fools at best.

Simon M · Mar 2, 2024

Evergreen said:
I love this kind of discussion, where people are thinking and sharing and honing and finally figuring it out. So here's a summary of some of the key observations that moved it forward:

I think it is a great example of the power of the crowd, even when it is a crowd as ill as this one.

Evergreen said:
I like to think I contributed a teensy bit by hectoring people to look at the data and fangirling about @andrewkq 's observation!

Yes - and the graphs!

Evergreen said:
I know we're all kind of lying groaning on the battlefield now, but it also feels like this hard task was worth it.

Thank you for the great narrative.

andrewkq · Mar 2, 2024

@bobbler @Murph @Karen Kirke I love all of your visualizations, they've been really helpful for trying to see all of the pieces at play, it's so hard to visualize these at the trial-by-trial level but y'all nailed it

Simon M said:
I think it is a great example of the power of the crowd, even when it is a crowd as ill as this one.

100%

Simon M said:
Thank you for the great narrative.

I second this @Evergreen

andrewkq · Mar 2, 2024

Evergreen said:
I know we're all kind of lying groaning on the battlefield now, but it also feels like this hard task was worth it.

And as our reward... drum roll please...

More info about HV F!

I wrote the authors asking why HV F was excluded in the analyses, and was told (paraphrasing here) that the participant did not follow instructions.

Sure sounds like @Sam Carter @Murph and @EndME hit the nail on the head. Presumably the participant was attempting a unique strategy, but it seems very subjective to consider this not following instructions if the instructions were to try to win as much money as possible without wasting energy. At least we know for sure that it wasn't a mechanic failure.

I'm going to ask them to clarify how the participant was not following instructions and how this was determined (i.e. was it determined somehow at the time the task was administered or was it determined post hoc by looking at their data)

andrewkq · Mar 2, 2024

Jonathan Edwards said:
Brian Hughes might be interested in co-authoring. I am happy to, but not an expert in the area. There are one or two other senior members who might be ready to join in, although not chipping in just at present. I guess Treadway might or might not want to get involved but might join a letter expressing methodological concern. There may well be scope for a more extended response to the study which those outside the immediate field would probably not want to join, but that is probably a different project.

That's great to know. It would especially be helpful to have some folks contribute who have experience with the PACE initiatives and can contextualize why this may seem like a minor detail but has major implications for the community. Brian seems like the perfect co-author given his psych expertise.

I looked into Nature Communication's letter to the editor policies and they call them "Matters Arising" articles. They have a 1200 word limit and they say that "If the submission serves only to identify an important error or mistake in the published paper, it will usually lead to the publication of a clarification statement (correction or retraction, for example)." The main methodological critiques of EEfRT could certainly take up 1200 words, so I think it makes sense to write one just one on this that can then be referenced by others looking to critique the full paper. I'm thinking I'll write up a draft just presenting the methodological critiques then circle back to see if co-authors want to sign on and help write the final product. Does that sound good?

Sid said:
Reaching out to Treadway is a good idea as long as the letter is laser focused on the misinterpretation of the EEfRT task, not the wider ME/CFS politics of the study. I don't know him but generally most civilians have been very reluctant (to put it mildly) to wade into ME/CFS waters. You could also try contacting people who hold academic positions in psychology departments like Hughes and Wilshire.

That's good to know that it's been a pain point in the past. I think that combined with the short word limit makes it pretty clear to just focus on the methodological critiques as you said.

Robert 1973 said:
My experience is that qualifications make little or no difference in determining whether letters are published in Journals. I don't have a degree and have had letters published in Nature, The Lancet etc. @Tom Kindlon has had numerous letters published and has no degree.

That reassuring! Especially might not matter as much in this situation because it sounds like they likely won't even publish the letter if it only is pointing out methodological flaws.

Robert 1973 said:
The problem with letters is the word limit. If you are collaborating, I would suggest coordinating so that different people write making different points in different letters, or submitting as a paper instead of a letter so that you can use more words. The latter might be harder to get published, but I'm not knowledgeable about that.

Yes definitely, I think this can fill the 1200 word limit. I'll keep an eye out for messages about coordinating different letters or submitting a paper. If anyone knows of someone already starting to coordinate this please let me know!

Jonathan Edwards · Mar 2, 2024

andrewkq said:
Does that sound good?

yep

bobbler · Mar 2, 2024

Murph said:
Above I posted a chart of Healthy volunteer F and their button presses. Below is an equivalent chart for all participants, which shows two important things.

1. Several participants lose easy tasks at various points, possibly deliberately (look for short red bars). Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. None make such a habit of it as Healthy Volunteer F, however!
2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool.

View attachment 21269

As I said above, the chart shows two important things.

1. Several participants lose easy tasks at various points, possibly deliberately. Healthy Volunteer B and Healthy Volunteer O are notable. As is PI-ME/CFS B. This makes the decision to chuck out HVF's data debatable. And with his data included the test shows no significant between group differences in the primary endpoint.
2. Many people with ME repeatedly fail at the hard tasks. This would appear to violate the specifications made by Treadway et al in their 2009 paper that established this test as a valid research tool, quoted here:

"An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. As a second manipulation check, we used trial number as an additional covariate in each of our GEE models."

Side observation: how amazing is it when authors make their data available directly with the paper. Makes a big difference. This is one thing NIH has done well with this paper.

I REALLY like this

. It is a neat way of showing/being able to see the disability-effect vs hard, alongside how eg ME-CFS B used some pacing on easy to try and help with that. It also shows the 'noise' of having a task with different strategies operating 'alongside this'.

And how, I think, the disability-related effects are basically 'drowning out' anything the tool might be looking to operationalise because the 'peaks' from the non-completion due to those most disabled not having a calibrated 'hard task' would overwhelm any analysis of the [necessarily made so] subtleties inherent in the other test variables.

Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Established Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)