Use of EEfRT in the NIH study: Deep phenotyping of PI-ME/CFS, 2024, Walitt et al

Discussion in 'ME/CFS research' started by Andy, Feb 21, 2024.

  1. bobbler

    bobbler Senior Member (Voting Rights)

    OK @Evergreen one more version (let me know if the rose works OK)

    because I'd had the warm-up rounds in grey and lost it while playing with the background I've reinstated that being greyed for the first 4 rounds, but of course it is pros and cons and it could be that adds in contrast the removes any benefit that greying it off would provide :)

    edited to update table with paler pink

    walitt big table hard choices paler pink.png

    Attached Files:

    Last edited: Mar 6, 2024
    cfsandmore, Sean and Ash like this.
  2. Murph

    Murph Senior Member (Voting Rights)

    I feel satisfied that I understand the test fails on 3 levels:

    1. It is conceptually inappropriate to use in mecfs, a physical disease where effort preference isn't a legitmate scientific question, and in which the test hasn't been validated
    2. The high failure rate of ME/CFS patients on hard tasks renders the measure invalid as a measure of preference. This argument is the strongest one because it flies even if you accept EEfRT as a good and legitimate test: It fails on its own terms.
    3. the exclusion of healthy volunteer F's data is necessary for making the primary endpoint significant. It can't be rationally justified but it can be justified based on precedent.

    To do list from here might be to see what was pre-specified in terms of exclusion. Did they consider the possibility mecfs patients would fail the hard task at such high rates.
    Then to communicate the problems. Ask for erratum? Retraction? Fight rearguard action in the press?
    JoClaire, oldtimer, Sean and 14 others like this.
  3. bobbler

    bobbler Senior Member (Voting Rights)

    Interesting and really well laid out to get these points across. It builds/layers each finding in a way I think that most will get what you are demonstrating very clearly. I like the approach.

    I am yet to catch up with the next posts, so apologies if I jump the gun on you saying this in those. But before I forget it forever I'm bunging something down on these thoughts. :)

    Is the dotted line the median? Because assuming it is so these charts indeed make it clear just how different the mean and median are for ME/CFS. Yet even when on some of the charts they are almost the same for HVs ( I will look back because now I'm intrigued as to whether that 'slips' when looking at the later rounds, because that is an interesting hypothesis to note this nod towards a fatiguing effect for HVs - and their field beginning to separate on this - and where it might end).

    Anyway I point out this because a lot of the statistics that the EEfRT relies on for analysis require looking at the distribution and if you are basically comparing a normal distribution where the mean and median are as tight as the HVs are with one where that is certainly not the case (I'm thinking it is tri-modal to be honest on capability-front, as the group in the middle are coping with the effects of the task being undoable in a different way to those playing it as per the HVs but with the disability just 'showing' the non-completion effects that has, but then who knows because it will be overlapping with the additional effects it causes on how people can approach the test itself).

    This is where I'd have to re-acquaint myself with all the ins and outs of all possible tests, but surely that is an issue for quite a lot of analyses and we all know the whole parametric vs non-parametric options and so on - how can you do the tests you'd want to when you've such different distributions that you are 'comparing'? And then of course I'm unsure (you've probably seen a hint of it) of using an on-off approach rather than scale (for ME-CFS - 'ness' of different types, which could of course have been interesting to if the data had been calibrated before doing it, because different aspects within those scales could have pointed to different issues) when there is such an obvious split even if you just looked at the ME-CFS participant data.

    Sorry I'm rambling on now! o_O
    Last edited: Mar 5, 2024
    Sean, Sam Carter, Binkie4 and 3 others like this.
  4. Murph

    Murph Senior Member (Voting Rights)

    Got a reply from Treadway on EEfRT. He seems suitably cautious about the utility of using it in this population.


    Sorry for the delay. The task has not been specifically validated as a measure of fatigue. My understanding of the paper is that they are trying to understand the clinical features of this illness, which are not well known. In that sense, it seems appropriate to me to use the EEfRT as an exploratory measure to determine whether it is sensitive to PI-ME. In other words, I would look at this study as an initial attempt in the validation of the EEfRT for this type of population. Viewed in that way, I think this is a valid use of the task.

    I hope that helps.


    Last edited: Mar 5, 2024
  5. bobbler

    bobbler Senior Member (Voting Rights)

    This first point is useful insight regarding the way the test ended up not being re-run or calibrated when surely, even if you are so-minded (to confuse disability with other concepts), as soon as you look at how far off a significant chunk of ME-CFS were the whole way through from even completing hard before having done many rounds etc it would have flagged as an issue. Surely.

    Unless you are stuffed because you've been doing it one by one and it's all gone seemingly swimmingly for the first x number so you just don't 'see' it building in the way that would have happened if you'd been able to run a few testers on enough representative of the future participants.

    I'm interested in the experience @andrewkq notes of running similar tests and whether under circumstances where testing (I'm assuming normally) is more participants taking it in a closer timeframe to each other, there was always an eye out/chance that if strange things cropped up such information would be fed back in at that point to take a look and a pause on the way forward?

    Tricky when it might be a test you can't re-run on same participants for various reasons, and the point of your trial is the participants 'in-depth' vs a trial where the tool could perhaps be the only test or one of only a few other tests being run past larger numbers of people.

    However I'm aware that it was 8yrs in the making and perhaps it might have been possible for better validation outside of the trial to have been taking place before those with ME were drafted in (noting there seem to be more than one 'trip' to the centre) for said tests?
    Sean, Sam Carter, Kitty and 1 other person like this.
  6. bobbler

    bobbler Senior Member (Voting Rights)

    I cross-posted below above. But this is interesting. Thanks for posting it
    Ash, Binkie4, Kitty and 1 other person like this.
  7. Eddie

    Eddie Senior Member (Voting Rights)

    Just adding on, this isn't a problem with the the use of EEfRT in this study, it is a problem with the test in general. The fact that the strategy that makes the most money is different from the strategy that takes the most effort is a conflict.

    If participants don't actually care about the money than this is a mute point. However, if you really did want a larger payout then it is an issue that a lower effort strategy achieves that goal. If a very highly motivated participant can score as having "low effort preference" just because they wanted more money, then there is an issue with the test. Yes, you could just throw out any of the data that appears to follow this strategy. However, it is possible that others attempted to do this just in more subtle or less efficient ways that change the outcome.

    It is also silly because this would be an easy thing to fix. Just ensure that the highest payouts occur with the most effort.

    As I mentioned earlier, I have a feeling this these points won't be well understood by the researchers though.
    oldtimer, Sean, Sam Carter and 7 others like this.
  8. Eddie

    Eddie Senior Member (Voting Rights)

    The EEfRT is also terrible because takes a complex decision making task, with many 100s or 1000s of inputs and simplifies it to a single measure of "effort preference". If we think about the reasons why people might chose one task over another, there are obvious reasons like: the level of fatigue, monetary desires, understanding of the instructions, intelligence, and influence of PEM. However there are many other more subtle factors like: what they ate for breakfast, how comfortable their clothing is, what they thought about the researchers, what time of day the tests were performed, how well they slept, if they needed to use the bathroom, what they would spent the money on, etc. etc.

    While on their own these small factors may have little influence, they do all impact our decision making on a day to day basis. The researchers trying to simplify decision making in this way is like saying the US landed on the moon before Russia because they had a stronger effort preference to do so. It just makes no sense to frame the problem in those terms.
    Last edited: Mar 5, 2024
    oldtimer, Sean, Sam Carter and 6 others like this.
  9. Jonathan Edwards

    Jonathan Edwards Senior Member (Voting Rights)

    London, UK
    You are all doing a great job on this @Murph.

    The multiple levels of failure is worth itemising.

    I would just advise against any suggestion that ME/CFS is a 'physical disease' that excludes consideration of effort. diseases involving events known to us as thoughts are just as physical - they must be to have physical effects. It is quite legitimate to study effort in ME/CFS as long as results are interpreted in a plausible way. That is why I suggested the need to distinguish B causes A and C from B causes A, which causes C.

    The 'physical disease' argument is the easiest of all for the BPS people to shoot down and win in medical circles. And psychiatric diseases are just as physical and disabling as ME so it brings in a prejudice we can do without.

    I don't think exclusion of HVF is justified, even if somebody's respecified rules allow it. Treadway may be of help in that he has identified various issues with the test that need careful handling but there is no guarantee that the test is any use even with that care. My impression from what I have seen of its use is that you simply cannot draw the sort of conclusions psychologists would lie to draw from overlapping scatter plots with weak correlations with modest p values.
    oldtimer, Sean, Evergreen and 10 others like this.
  10. Sid

    Sid Senior Member (Voting Rights)

    Going down this route will also allow them to focus on the stigmatising bit and give a lecture on dualism, ignoring the stronger and most important points about pwME/CFS being unable to complete the hard tasks and therefore the measure is not a measure of preference.
    oldtimer, Sean, Evergreen and 8 others like this.
  11. EndME

    EndME Senior Member (Voting Rights)

    The solid line is the mean number of hard tasks chosen and the dotted line is the mean number of hard tasks successfully completed. IMO these are the two most important basic values we are looking at in our argument (I have also calculated variance for each group in each plot, but didn't want to overload the plots/the post for now).
    Last edited: Mar 5, 2024
    oldtimer, Evergreen, Kitty and 2 others like this.
  12. EndME

    EndME Senior Member (Voting Rights)

    I will play devils advocate for a moment:

    Something that is quite striking and something I believe one still has to discuss, is that in the second half of the game both HV and ME/CFS chose to play hard more often than in the first half even though they percentually both complete it even less often than in the first half (I will additionally also cut the games in half after I have specified a fixed duration of games played to ensure that everyone gets to play the same number of games, this is typically what other studies have done).

    The ratio of successful completion of hard/ choice of hard in the second half of the game for HV gets closer towards the ratio with which patients with ME/CFS start of the game with.

    The main argument of Walitt is al in its simplified version is “pwME choose hard less often, this means they prefer to not use effort”. Now we are arguing “that is an incomplete assessment because pwME have lower chances of completing hard tasks and as such some in-game learning effects might lead to choosing hard less often rather than the intrinsic nature of pwME”.

    The problem with this argument of ours, and something that Walitt et al can argue, is that comparing the second half of the game data to the first half of the game data shows that striking out more often doesn’t lead to choosing hard less often. I haven’t seen a convincing counterargument of our own against such a counterargument. @Murph @bobbler @andrewkq
    Last edited: Mar 5, 2024
    cfsandmore, Sean, bobbler and 2 others like this.
  13. Peter Trewhitt

    Peter Trewhitt Senior Member (Voting Rights)

    Though in danger of proposing an unfalsifiable argument, but as well as increasing fatigue as the task goes on, participants are learning about what they can do so perhaps they become less cautious. Alternatively as time runs down participants see the finish line in sight and become less cautious. For me these multiple possibilities just illustrate the problems of interpreting anything thing meaningful from this task.

    If you don’t know in advance what the cost of the exertion is going to be it makes sense to focus initially on the easy tasks but as time runs down to then try more hard ones.
    cfsandmore, Sean, bobbler and 4 others like this.
  14. EndME

    EndME Senior Member (Voting Rights)

    I understand your point and certainly agree that the EEfRT is far too unrobust to tell us anything about ME/CFS, but this argument would be an argument based purely on hypotheticals and ideas and not by any data. That is not convincing to me, especially if the other side has data to back up their argument (even if this data is iffy and not robust).
    Sean, Kitty and Peter Trewhitt like this.
  15. Evergreen

    Evergreen Senior Member (Voting Rights)

    These are great – the mean lines (which Simon M wisely suggested for Karen Kirke’s graph) are so helpful for seeing what is going on. And I agree that the practice rounds contain important info.

    Can I suggest some tweaks?
    · Add a legend for the dotted lines.
    · Instead of True vs False, Yes vs No. Or a descriptive Successfully completed vs Not successfully completed or Successfully completed vs Failed
    · Change “First 4” to “Practice rounds” to distinguish them from the initial rounds of the real task.
    · Make the “False”/not successfully completed bar colour a little darker as it’s hard to see.
  16. Hutan

    Hutan Moderator Staff Member

    Aotearoa New Zealand
    Pre-test calibration
    I mentioned that back a bit - the review paper I was talking about noted that there were a number of studies that did this calibration. I'll paste the link to the study here - it's one of the ones that bobbler noted.
    : Examining the reliability and validity of two versions of the Effort-Expenditure for Rewards Task (EEfRT) | PLOS ONE

    post-hoc data selection
    I think there's quite a lot of vagueness in that response from Walitt. I don't think the 'pre-existing way of handling the data' necessarily means that they had pre-specified criteria for valid data. I think it just means that they had always intended to look at the data and throw out what they didn't like. And the following phrase 'the evaluation of invalid performance and task validity takes place after the data is collected' makes me think it was all post-hoc even more. I remain amazed that the investigators didn't foresee participants doing what HVF did, and take some steps to make their experiment better.

    Summary of arguments
    Excellent. I think we need to add in there the ridiculously small size of the experiment, especially compared to much larger experiments where those investigators were still concerned about noise in the data. Perhaps it's part of argument 3 - in that one person pursuing their own strategy could make the difference between a significant or a non-significant difference. I wonder what would happen if we took one or two of the ME/CFS participants' data out, or replicated one of the ME/CFS participant's data? How much fiddling would we need to do adding in or subtracting a participant to make the result non-significant again?

    why participants keep trying to do hard tasks after repeatedly failing to complete
    I think I'd argue that participants had already bagged their $1 from the easy tasks. The rules of the game had been explained to the participants and they had time to think about it before they even started. So, most of them would have known that doing more $1 easy tasks was not going to increase their chances of getting a higher payout. Instead, to increase the number of higher value rewards in their pool for later selection, they would have to try to get some of the hard task rewards. There was nothing lost by trying and failing. And some of the participants (ME/CFS- A and ME/CFS - D) did manage to complete a hard task after trying and failing. I think it should be very hard to argue that those people weren't putting effort in.
    Last edited: Mar 5, 2024
  17. Evergreen

    Evergreen Senior Member (Voting Rights)

    I think having a contrast between the practice and real rounds is good. I still need to look at the graph on low brightness to be able for it, to make everything paler/more muted. If the rose were paler and the background cells were paler, it would be easier. Thank you so much for all the experimentation. Do stop when it makes sense to.
  18. Trish

    Trish Moderator Staff Member

    I haven't been able to follow all this discussion, and don't want to add to the burden of too many posts to read, so I'll try to make this brief.

    Are participants told before the task that they are being assessed for their effort preference? Or for anhedonia, or something else? If not, what are they told?

    I am imagining if I were a participant with ME/CFS and I were asked to perform this task, I would think it remarkably stupid and not worth expending effort on. The ME/CFS participants were there for biomedical testing, not silly mind games where they are trying to second guess strategies. My preference would be to opt out and conserve my energy for the worthwhile stuff.
  19. Evergreen

    Evergreen Senior Member (Voting Rights)

    Oooh I thought it was the mean successfully completed. But I also found myself wondering how the means and medians would differ.
    bobbler, Kitty and Peter Trewhitt like this.
  20. EndME

    EndME Senior Member (Voting Rights)

    I'm not fully understanding this point, as I don't see how that is a counterargument to their argument "pwME try hard less often because they prefer to exert less effort". In any case if having already had bagged the $1 twice was our counterargument one would have to look at whether the data backs such an argument, i.e. split the data into "before everyone has bagged $1 twice and after everyone has bagged $1 twice" which I don't think anybody has done yet and I expect (but could be wrong) one would still see that in both splits pwME are choosing hard less often.

    Something that wouldn’t be a direct counterargument, but rather an alternative interpretation of the data, but for which one would have to do some analysis to see how well this holds up, is that they roughly argue that "HV and pwME can be separated to some degree by how often they choose to do a hard task". If we show that "percentage of hard tasks completed” offers a better separation and that this separation is not equivalent (or strongly correlated) to their separation one has grounds for coming up with an alternative interpretation of the data that could be in fact better than theirs, which just makes their whole argument look insufficient, but wouldn't falsify it.
    Amw66, Hutan, Kitty and 1 other person like this.

Share This Page