Trial Report Plasma cell targeting with the anti-CD38 antibody daratumumab in ME/CFS -a clinical pilot study, 2025, Fluge et al

I think phase 3 proves that survey data is meaningless and step counts are the true revealer.

Survey data is only reliable if it correlates with step count data. In Ritux P3, it did not, in Dara, it did.

Again, self reported outcomes vs observed outcomes.
I klnow where you're coming from, but I don't think the data we have to date support that position. It is an assumption.

In the 2015 phase II study, the people who were thought to have responded particularly well to rituximab had step counts that are the stuff of dreams for most of us:
After 15–20 months follow-up, we had available Sensewear electronic armbands that continuously measured physical activity in the home setting. No data from baseline before intervention were available. The analyses were not preplanned, and were performed only in some patients (mainly in responders). They were performed in order to gain experience with the armbands for design of the protocol for the now ongoing randomized phase III-study. However, 12 out of 14 major responders in this study measured physical activity for 4–6 consecutive days in the time interval 15–20 months follow-up, with a mean value for “mean number of steps per 24h” 9829 (range 5794–18177), and a mean value for “maximum number of steps per
24h” 14623 (range 9310–23407).

That step count sounds about right for people who are at about 80 on the SF36 PF scale (at 15-20 months when their step count was measured):

1770386718549.png
 
I klnow where you're coming from, but I don't think the data we have to date support that position. It is an assumption.

In the 2015 phase II study, the people who were thought to have responded particularly well to rituximab had step counts that are the stuff of dreams for most of us:


That step count sounds about right for people who are at about 80 on the SF36 PF scale:

View attachment 30452
I saw that, but because they did not measure the pre baseline steps for a run in period, for non responders and responders, it is hard to conclude if this was a measurement error or not. In the Dara study, the baseline steps were measured.

Also, they had people wear the armband for 4-6 days. In Dara I believe they wore the device for a year.

I am quite suspicious of this number
 
Last edited:
I saw that, but because they did not measure the pre baseline steps for a run in period, for non responders and responders, it is hard to conclude if this was a measurement error or not. In the Dara study, the baseline steps were measured.

Also, they had people wear the armband for 4-6 days. In Dara I believe they wore the device for a year.
Mm, that's too big of an assumption for me. Am I understanding correctly that you're saying that the step count (for major responders in phase II ritux above) must be wrong, even though it tallies well with the SF36 physical function score, because you think step count only increases if someone's getting a real effect from a drug/intervention?

I think step count goes up with both real and placebo effects, though not to the same extent. Alas, I've never had the pleasure of experiencing either!

Without a placebo group, we just don't know.

I am eternally grateful to @Jonathan Edwards for reducing my expectations of the phase III rituximab trial before its publication. It was like getting advance warning that someone's going to dump you/fire you. Not pleasant, but helpful.

I wouldn't be surprised at all if Fluge & Mella are part of a research breakthrough in ME/CFS in the future. And their studies are a pleasure to read.
 
Mm, that's too big of an assumption for me. Am I understanding correctly that you're saying that the step count (for major responders in phase II ritux above) must be wrong, even though it tallies well with the SF36 physical function score, because you think step count only increases if someone's getting a real effect from a drug/intervention?

No, I am saying I am suspicious because

1. There is no baseline pre run in measurement, for both groups, whereas in Dara we know the run in baselines for 90 days.
2. There is no data on the severity of these patients pre treatment, whereas in Dara we know severity pre treatment. By severity, I mean step count, not survey score.
3. The step counts were recorded for a period of 4-6 days, in contrast to a period of 9 months pre and post treatment in Dara.

If I saw a pre recorded value of say 3k for 90 days run in, and a gradual increase to 9k post Ritux, I would instantly be a non believer in Dara.
 
Last edited:
Can you give some exaemples of biased selection criteria then?
Sure. For example, having stricter diagnostic criteria to make sure the people you’re studying actually have ME/CFS can end up selecting for people who are temporarily at the worst point of their illness. We know plenty of people can float in between severities.

Also inadvertently selecting for people who have the ability to travel for treatment. That can mean they have a caretaker with a lot of availability, allowing them to pace much more at baseline but have more leeway to test out higher levels of activity later on. And the fact that this is a risky immunosuppressant treatment—people who have tried a bunch of things with not even mild symptom relief may have meaningfully different biology, or people who don’t mind limiting their social contact to reduce risk of infection might also be different wrt baseline pacing. Duration of illness can be a factor as well, as it seemed to be in the intramural study.

That’s only a few potential issues, all of which can be very hard to predict the exact effect of in advance, which is why you set up a placebo arm with the same conditions so you don’t have to guess the cumulative effect.

Also you can’t assume that natural recovery from ME/CFS follows a normal or binomial distribution either. The oft-cited 5% figure is a very limited and context-specific estimate. The number of people who recover with different duration of illness, or who just partially recover, or who actually have a different underlying biology despite all being under the label of ME/CFS might be quite different.

I can definitely understand your frustration coming against people who seem to ignore the common sense logic of these results. I think what many people are trying to explain is that phase 2 results are quite notorious for being deceptive, precisely because of all the things that common sense doesnt account for. We all would love for Dara to work and see these results as encouraging, but know from experience to hold back from drawing conclusions.
 
Phase 2 is generally the hardest phase to get through. Drugs with a lot of funding behind them and preliminary results that look like a slam dunk frequently don’t make the cut. From AI:

The overall likelihood of approval (LOA) for a drug entering
Phase I clinical trials is low, generally ranging from 6.7% to 13.8%. Success rates vary by phase: Phase I (approx. 47–63% success), Phase II (approx. 28–31% success), and Phase III (approx. 55–58% success), with Phase II representing the highest hurdle (lowest success rate).
Clinical Phase Success Rates & Transition Probabilities
Key Factors Influencing Success
  • Overall Probability of Success (PoS):Data from 2014-2023 shows the average likelihood of approval for a new Phase I drug is 6.7%, a decrease from previous, higher estimates.
  • Disease Area: Oncology drugs often have lower success rates (3.4% overall) compared to other areas, while vaccines can have higher success rates (33.4%).
  • Trial Design: The use of biomarkers to terminate ineffective programs early in Phase II has contributed to lower overall success rates but higher efficiency in weeding out failures.
Summary of Transition Probability (Example Data)
  • Phase I to II: 47%
  • Phase II to III: 28%
  • Phase III to Approval: 55%
Once a, drug passes all three phases, the final, regulatory review 92% is usually successful.
 
No, I am saying I am suspicious because

1. There is no baseline pre run in measurement, for both groups, whereas in Dara we know the run in baselines for 90 days.
2. There is no data on the severity of these patients pre treatment, whereas in Dara we know severity pre treatment. By severity, I mean step count, not survey score.
3. The step counts were recorded for a period of 4-6 days, in contrast to a period of 9 months pre and post treatment in Dara.
Gotcha.

I don't think these things matter as much as we might think they would.

The part of the placebo response that is drug-related starts when the drug starts (unless the participants know the drug would not be expected to have an effect till X weeks). The run-in data is still useful to have, but it's not relevant for the point we're debating here.

It's true that we don't have baseline step counts for ritux in the phase II trial, but we do know that the responders were a little, but not a lot, less severe than the non-responders (mean SF36 PF at baseline 42.9 vs 36.5, table 4), and steps per day correlate well with SF36 PF (van Campen et al. 2020).

Continuous measurement is likely to give more representative data for individuals, but when we're comparing group means to determine if something is effective or not, I'm not sure there'll be an advantage over shorter-term measurements.

None of these points can explain normal or near-normal step counts of responders in the phase II rituximab trial. But the placebo response can.

Editing to add this graph from van Campen et al. 2020:

 
Last edited by a moderator:
Gotcha.

I don't think these things matter as much as we might think they would.

The part of the placebo response that is drug-related starts when the drug starts (unless the participants know the drug would not be expected to have an effect till X weeks). The run-in data is still useful to have, but it's not relevant for the point we're debating here.

It's true that we don't have baseline step counts for ritux in the phase II trial, but we do know that the responders were a little, but not a lot, less severe than the non-responders (mean SF36 PF at baseline 42.9 vs 36.5, table 4), and steps per day correlate well with SF36 PF (van Campen et al. 2020).

Continuous measurement is likely to give more representative data for individuals, but when we're comparing group means to determine if something is effective or not, I'm not sure there'll be an advantage over shorter-term measurements.

None of these points can explain normal or near-normal step counts of responders in the phase II rituximab trial. But the placebo response can.

Run in baseline is most important because we don't have any baseline reference for these step counts and continuous is important because we need to see if these increases are sustained over a year or a Hawthorne effect.

For example, does this armband overestimate steps and baselines might be 6k on average? Were the responders with the armbands purposely stepping up their counts just for the period of being measured?
 
For example, does this armband overestimate steps and baselines might be 6k on average? Were the responders with the armbands purposely stepping up their counts just for the period of being measured?
What's nice is, if Fluge & Mella do a phase III, we will know, because they put it all in their papers for everyone to pore over.
 
My worry is that there is an effect, because of the LLPC depletion, but Dara in general is too weak. We know in MM results vary with Dara, not because it doesn't work, but because the strength or effectiveness varies amongst patients.

So put it in the Phase 2 study, and there will be some effect, but some non responders too, even with higher NK cells.

For example, maybe some plasma cells, the faulty ones, have less CD38 on them? Maybe in some people their immune systems are too weak for Dara to work? Since Dara does not kill anything directly, it flags the CD38 cells, and lets the bodys immune system kill them.

The problem is, right now there is no way of identifying the antibodies, and also, no way of sampling the plasma cells, assuming a small fraction are the bad actors. You would have to take bone marrow samples of ME patients. Which is very painful and intrusive I presume.
 
Survey data is only reliable if it correlates with step count data. In Ritux P3, it did not, in Dara, it did.
Coming back to this point. Can you explain what you mean by saying that in the phase III rituximab trial, the SF36 PF did not correlate with step count data?

As far as I can see, it correlated very well. Looking at table 2 of the 2019 paper:
  • The rituximab group improved by roughly 10 points on the SF36PF and 480 steps.
  • The placebo group improved by roughly 13 points on the SF36PF and 671 steps.

And looking at Appendix table 4 of the 2019 paper:
  • At baseline, those who went on to worsen had both lower SF36 PF and lower steps than the other two groups, and those who continued with stable symptoms had higher mean SF36PF and higher steps than the other two groups. (When I say lower and higher here I do not mean statistically significantly lower or higher, just that the numbers are slightly lower or higher.)
What am I missing?
 
Last edited:
Sorry I mean the no intervention study. In that case it was physical fatigue score that didn’t correlate with step counts at all
 
For example, does this armband overestimate steps and baselines might be 6k on average?
Found something interesting in Rekeland et al. 2022 about this. They compared steps measured by Fitbit with steps measured by the Sensewear armband that was used in the phase II rituximab trial, where "responders" were measured as have end-of-treatment mean step count of 9829 (range 5794–18177). You were wondering if that number could be artificially high.

Compared to Fitbit, Sensewear step counts were lower.

At baseline in Rekeland et al. 2022, the group mean step count over 8 days measured by Fitbit was 7816 vs 4768 by Sensewear. At 24 weeks, the group mean step count over 5 days measured by Fitbit was 7051 vs 4923 by Sensewear.

So the mean step count of 9829 reported for "responders" in phase II ritux, which was measured by Sensewear, would have been higher if measured by Fitbit.

A Bland-Altman plot (Fig 5C) showed a systematic difference between the two devices, with a bias of 974 steps per 24 hours, (95% CI -542 to 2489), which corresponds to a bias of 27.5% (95% CI -5% to 60%).

This simple chart shows the difference:
1770463690992.png
 
Last edited:
Sorry I mean the no intervention study. In that case it was physical fatigue score that didn’t correlate with step counts at all
So Rekeland et al. 2022, right? I still don't follow, because SF36 physical function correlated reasonably well with steps there too, at group level: 0.49 (p<0.01) in figure 4, second only to SF36 social function.

The correlations between steps per day and self-reported SF-36 Physical function, Social function, and DSQ-SF were significant.

The individual data is always going to be messy,

I think there has been such a need for more objective outcome measures that people have maybe overestimated how reliable or straightforward step count will be as an outcome measure. Step count is likely to be influenced by many factors other than change in underlying disease. For example, I’ve had to move a lot between rooms and houses in recent years, and that impacts my step count. Sometimes my range, and hence step count, has been smaller or bigger, simply because of the layout of rooms, even though my physical function remains the same. Someone with mild or mild-moderate ME/CFS might walk outside more during summer compared to winter.
 
Measuring steps is completely pointless for mild patients

Wouldn't even want mild patients included in MECFS treatment studies given the current research state. I say this as someone who has spent the majority of my time with MECFS as mild. So it isn't some gatekeeping thing.
 
Measuring steps is completely pointless for mild patients

Wouldn't even want mild patients included in MECFS treatment studies given the current research state. I say this as someone who has spent the majority of my time with MECFS as mild. So it isn't some gatekeeping thing.

From my mild perspective, I don't think its pointless.

I could go up to 10k steps for week maybe. But not for a month. Or several months. I wouldn't be able to sustain it at all.
 
Sorry of course it would not be pointless, I was being careless with how I expressed myself
The point is sustained step count increase is a signal no matter how severe because it represents the patient going out instead of staying at home all day.
 
So Rekeland et al. 2022, right? I still don't follow, because SF36 physical function correlated reasonably well with steps there too, at group level: 0.49 (p<0.01) in figure 4, second only to SF36 social function.



The individual data is always going to be messy,

I think there has been such a need for more objective outcome measures that people have maybe overestimated how reliable or straightforward step count will be as an outcome measure. Step count is likely to be influenced by many factors other than change in underlying disease. For example, I’ve had to move a lot between rooms and houses in recent years, and that impacts my step count. Sometimes my range, and hence step count, has been smaller or bigger, simply because of the layout of rooms, even though my physical function remains the same. Someone with mild or mild-moderate ME/CFS might walk outside more during summer compared to winter.
Any seasonal or temporal effects should smooth out over a year of monitoring
 
Back
Top Bottom