Bias due to a lack of blinding: a discussion

Sean · Jan 30, 2020

Robert 1973 said:
This is perhaps a generous interpretation of what happened, as it implies that they are stupid and/or incompetent and/or deluded. An alternative explanation would be that they knew exactly what they were doing and acted dishonestly.

It is possible to be a fool in the first instance, then a fraud in trying cover up that fact.

dave30th · Jan 30, 2020

JohnTheJack said:
We're challenging a whole body of junk.

This definitely seems to be true.

Lucibee · Jan 30, 2020

Adrian said:
Something like the SF36 is perhaps ok for overall population trends but I would argue it is not good for measuring change in physical function (for that part of the scale) over the course of a trial.

Indeed. And that is what the SF36 was designed for - for looking at overall population trends.

Part of the reason it has ended up as the go-to measure in these sorts of trials is expediency (laziness?) and comparability. Although whether its use truly makes all these trials comparable is moot. In some ways it gives way to an illusion of comparability, which is more dangerous.

It would be better if each trial designed their own scale that was relevant to the condition and patient group they are trying to treat. They also need to take more care over whether such a measure can be repeatedly used over time. But then they wouldn't be able to combine results in meta-analyses. There is an added illusion that because these scales have been used so much in trials over the years, that they were properly validated for the use they are being put to. In most cases, they probably (definitely) weren't.

But who is going to say anything about that now?

ME/CFS Skeptic · Jan 30, 2020

Robert 1973 said:
There appear to be two possibilities: either blinding is not nearly as important as other research suggests or there is something very badly wrong with the methodology of the meta-analysis.

There's a third option, mentioned by the authors, namely a lack of precision. The previous studies of this kind did find an effect for blinding when subjective outcomes were used and in the current study, the confidence intervals are wide, suggesting that other samples might find a clear effect. It's possible that if you do this type of study a couple of times, only one would happen to find no effect of blinding (and that the Moustgaard study happened to be that one). I don't know how plausible this explanation is, but it's possible so best to keep it in mind.

Robert 1973 said:
If a result of this paper was to further understanding of the limitations of meta-analyses then it could be very useful.

I don't think this is about the validity of meta-analyses really. What they do in this type of studies is to look at trials which have a particular methodological characteristic (for example patients were blinded) and similar studies that don't have that characteristic (patients weren't blinded). Reviews and meta-analyses are used to find studies that differ in this aspect (blinding of patients) but are similar in all other regards (the same type of intervention, study population etc.). I think that's what they mean with "meta-epidemiological study".

As Jonathan suggested earlier this isn't how science usually measures effects. Suppose someone in Australia reports that ME/CFS patients fatigue got 30% better with treatment A. And a Canadian doctor reports that his ME/CFS patients' fatigue only got 15% better following treatment B. From this information we wouldn't normally conclude that the fatigue of 15% more ME/CFS patients got better with treatment A than with treatment B. We would require a proper randomized controlled blinded trials so that all possible confounders are controlled for.

Meta-epidemiological studies, however, use the methodology sketched above: they take trials that look alike except for the characteristic of interest (blinding) and then subtract the treatment effects to see whether the characteristic is associated with inflated treatment effects. One obvious problem could be that the trials being compared differ in all sorts of respects that have an influence on the treatment effect. That's why I hope to have a look at the individual trials that are being compared to better understand the results.

Adrian · Jan 30, 2020

Lucibee said:
Part of the reason it has ended up as the go-to measure in these sorts of trials is expediency (laziness?) and comparability. Although whether its use truly makes all these trials comparable is moot. In some ways it gives way to an illusion of comparability, which is more dangerous.

Having looks at some of the ONS data for sf36 I would be worried that it is not really comparable when looking at people with different severity. I did a cluster analysis looking at how question answers clustered around different scores and the mid scores were a mess in terms of which difficulties people has suggesting that the questions may be quite close in terms of abilities. There was more clarity on the edges but I suspect more distance from the middle points.

Lucibee said:
But then they wouldn't be able to combine results in meta-analyses. There is an added illusion that because these scales have been used so much in trials over the years, that they were properly validated for the use they are being put to. In most cases, they probably (definitely) weren't.

I think the properties you need for many experiments will be similar so repeatable scales may be possible. But it would be good to see the properties required set out clearly along with the evidence that they meet the requirements and the effect if they don't (i.e. as the deviate is it a small effect on results or a massive effect). But formally stating properties required etc doesn't seem to be part of the culture with trials.

I also think some sort of protocol static analysis would be good to try to check for common errors, bad assumptions or likely unstated assumptions. This kind of thing gets done for code these days and I think it could be extended to trials. But that would be a research topic for those who study methodology.

Hoopoe · Jan 30, 2020

The sad truth appears to be that large portions of research done under a mental health banner, ostensibly to help patients, are junk. Patients are just being taken advantage of by professionals that seem to have agreed to collectively pretend that methodology that is with certainty known to be unreliable is actually reliable. They can't offer any coherent arguments to justify what they're doing or respond with distractions.

There is no illness that can be objectively measured that meaningfully improves with placebo. Not ensuring that treatments delivered are better than a placebo is very negligent.

Even with mental health conditions at least they could try to find the most reliable way to measure improvement. Instead they appear to be trying to find the therapy and methods that give best results on biasable outcomes, thereby in effect potentially gradually refining therapies to be more and more biased.

Jonathan Edwards · Jan 30, 2020

Adrian said:
but then perhaps this is what post truth is.

yup

Jonathan Edwards · Jan 30, 2020

Lucibee said:
It would be better if each trial designed their own scale that was relevant to the condition and patient group they are trying to treat.

Indeed. I am constantly reminded of Emerson's ' A foolish consistency...

dave30th · Jan 30, 2020

Jonathan Edwards said:
Indeed. I am constantly reminded of Emerson's ' A foolish consistency...

I will have to look that up. I have no idea what that refers to. Is that Ralph Waldo?

Jonathan Edwards · Jan 30, 2020

dave30th said:
I will have to look that up. I have no idea what that refers to. Is that Ralph Waldo?

Really, David, I thought you were a well read man of letters.

Ralph Waldo Emerson: A [certain? - sources vary] foolish consistency is the hobgoblin of little minds'. He attributed it to statesmen and religious folk but there is no lack of it in medicine and science.

dave30th · Jan 30, 2020

Ah yes, ok. the hobgoblin of little minds part I recognize, although I wouldn't have been able to attribute it to Ralph.

Sly Saint · Jan 30, 2020

Ralph Waldo Emerson > Quotes > Quotable Quote

“A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do. He may as well concern himself with his shadow on the wall. Speak what you think now in hard words, and to-morrow speak what to-morrow thinks in hard words again, though it contradict every thing you said to-day. — 'Ah, so you shall be sure to be misunderstood.' — Is it so bad, then, to be misunderstood? Pythagoras was misunderstood, and Socrates, and Jesus, and Luther, and Copernicus, and Galileo, and Newton, and every pure and wise spirit that ever took flesh. To be great is to be misunderstood.”

― Ralph Waldo Emerson, Self-Reliance

dave30th · Jan 30, 2020

Thanks! I read a lot of Emerson in my American Romantics course in college (Emerson, Thoreau, Hawthorne, Melville, etc). That was in, uh, 1976. So, you know. A while ago.

Mithriel · Jan 31, 2020

Using patient self report in a trial is always subjective but in ME the situation is even worse.

The SF36 gives numbers you can add up with a yes or no for each section so has a certain objectivity. Patients are asked if they can do things or not so the answers should be some reflection of reality.

However, the Chalder fatigue scale does not even do that. Marking a score of 1 - 10 for how fatigued a person feels might reflect reality approximately but trying to compare how you feel with "usual" gives a subjective answer to a subjective questionnaire.

The CFQ fails even the simplest definition of validated to mean the questions can be understood. If they really want to know how we usually feel then do they mean usual after 12 months of treatment? But that would give the same score as the beginning if the trial was a success so the usual feeling was now a lot better than at the beginning.

But we are told they mean usual to be what you felt when well - then why did they not say that? But that fails too. I can't compare how I felt 52 years ago with now!

I don't understand who validated this scale and thinks it is useful and someone who defends it is showing their ignorance and prejudices. They are accepting the word of authority without question over the concerns and well being of patients. I am sorry for their patients.

Barry · Jan 31, 2020

Mithriel said:
The SF36 gives numbers you can add up with a yes or no for each section so has a certain objectivity.

The trouble is the numbering only really gives an illusion of objectivity, a sort of scientists' comfort factor (certain scientists anyway). Presuming that someone's perception of changes to their perceptions follows a nice linear numerical scale, is a pretty fluffy presumption. And the way people's perceptions map onto reality is going to vary according to what sort of perceptions are being 'measured', so the mismatch between reality and numerical scale is also going to vary according to what is being 'measured'. So not only are all these numbers pretty meaningless, the notion of then adding them all together then gets even more fantastical. To me it all seems rather pretentious and not very scientific at all.

chrisb · Feb 1, 2020

Well said. Its just another example of the "babble" family.

Mithriel · Feb 2, 2020

Barry said:
The trouble is the numbering only really gives an illusion of objectivity, a sort of scientists' comfort factor (certain scientists anyway). Presuming that someone's perception of changes to their perceptions follows a nice linear numerical scale, is a pretty fluffy presumption. And the way people's perceptions map onto reality is going to vary according to what sort of perceptions are being 'measured', so the mismatch between reality and numerical scale is also going to vary according to what is being 'measured'. So not only are all these numbers pretty meaningless, the notion of then adding them all together then gets even more fantastical. To me it all seems rather pretentious and not very scientific at all.

I agree that trials should never be based on things like the SF36, they are not objective enough. I may be remembering it wrong but I think that scale was designed to help doctors track how well cancer patients were doing, not as a research tool.

What I meant was that trials which rely on subjective scales should make some attempt at objectivity by asking questions like "How often do you manage to leave the house on the average week?" which the patient can count up or "Can you still do the sports you played before you were ill?" which can be answered yes or no.

Then there are scales where you have to estimate how much pain or fatigue you have on a scale of 1 to 10 which are little better than guesses so take the results even further from anything objective.

But the PACE trial and other BPS trials excel themselves by using things like the chalder fatigue scale where the patient has to guess what the question means before they guess at the answer. As far from objective as it is possible to go.

Barry · Feb 2, 2020

Mithriel said:
But the PACE trial and other BPS trials excel themselves by using things like the chalder fatigue scale where the patient has to guess what the question means before they guess at the answer. As far from objective as it is possible to go.

Yep

ME/CFS Skeptic · Feb 3, 2020

Hilda Bastian has blogged about this (although she mostly seems to summarize and quote other people):
https://blogs.plos.org/absolutely-m...linding-clinical-trials/#.XjdIM7_EPcs.twitter

Jonathan Edwards · Feb 3, 2020

Michiel Tack said:
Hilda Bastian has blogged about this (although she mostly seems to summarize and quote other people):
https://blogs.plos.org/absolutely-m...linding-clinical-trials/#.XjdIM7_EPcs.twitter

Seems very fence sitting to me.
The metaBlind paper is based on too poor a methodology to be worth considering or repeating in my view and Bastian agrees that the person she quotes as saying that has a good point. But then she ums and ahs.

Bias due to a lack of blinding: a discussion

Moderator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Administrator

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)

Senior Member (Voting Rights)