The Rise and Fall of Peer Review, 2022, Mastroianni

Keela Too

Senior Member (Voting Rights)
https://experimentalhistory.substack.com/p/the-rise-and-fall-of-peer-review

The rise and fall of peer review
Why the greatest scientific experiment in history failed, and why that's a great thing
Adam Mastroianni Dec 13

“For the last 60 years or so, science has been running an experiment on itself. The experimental design wasn’t great; there was no randomization and no control group. Nobody was in charge, exactly, and nobody was really taking consistent measurements. And yet it was the most massive experiment ever run, and it included every scientist on Earth.”

Worth a read. ME isn’t mentioned, but many if the points made are similar to the situation we find ourselves in.

PS I was able to read the whole article on my phone in Reader View, but when I turned that off, it asked me to subscribe. :thumbsdown:

Edit to add. This is the Tweet I saw.
 
My mind is blown by the novelty of his points, and the clarity with which he expresses, then demonstrates, them. He makes some solid points:
  • The best way to review science is to reproduce it. Carefully analyzing the data is second best. Peer review is more like a smoke test--"Does it smoke when you plug it in? Good enough." If it's not reviewed, but it's reproduced thoroughly, does it matter?
  • My own point: This is what we do in hard science. We don't rely on peer review. If we doubt how another team measured the gravitational constant, let's build our own experiment that's better.
  • The language in scientific papers is often thick and impenetrable, a senseless stumbling block to comprehension. You can explain complex concepts with simple language: https://xkcd.com/1133/ is a humorous example.
  • Almost all papers bury the authors' feelings and opinions in the same linguistic thicket. We can live a little. Science can be amusing. Check out this two-sentence mathematics paper.
  • Science can be humble. Explaining why you came up with a hypothesis, or the thought processes that led us from experiment to experiment can inspire further research. "I expected to get X result but I was dead wrong. Then I started investigating why." Exploring your work's limitations ("This only proves this part of my hypothesis") has the same effect.
  • Statistical analysis is overrated. Strong results don't need p-values. P-values and confidence errors only protect against random noise, not bias.
  • "But science is a strong-link problem: progress depends on the quality of our best work." I keep saying this! Quality over quantity. A few definitive studies beat preliminary chaff.
 
  • The best way to review science is to reproduce it. Carefully analyzing the data is second best. Peer review is more like a smoke test--"Does it smoke when you plug it in? Good enough." If it's not reviewed, but it's reproduced thoroughly, does it matter?
Comparing to quality control in software development, peer review is equivalent to what is called a "sanity test". The lowest form of basic test that is automatically run on a constant basis. It just checks the basics and amounts to very little.

In real life, most of the quality control is about what comes after that, about how you handle real life circumstances testing the hell out of your thing, in ways no tester will ever think of. It also involves customer service, which medicine categorically refuses to do and which explains a lot.

Quality control is a process, just like science. It's not a step, it's a continuous effort where there is ownership of how something is used in the real world, not just in an artificial testing setting environment. But peer review is treated as a step and there's basically nothing else after.

So basically science has a fully optional quality control process, it depends entirely on the discipline. In medicine it's turned off entirely, because once something starts getting used, it becomes nearly impossible to criticize it anymore, it takes a life of its own and the people who use it get very personal with it, for odd reasons. Because if you find that it's wrong, then it means harm was done, and medicine is not able to deal with this.

The chronic illness disaster is the best example of how this system fails miserably. It's the place where all the worst features are maximized and none of the good parts of science are allowed. And yet somehow it's held up as the highest level of evidence purely out of belief, showing how performative most of the system really is. Science works through experimentation and objective measurements. The rest is just screwing around. And EBM has neither, so it's almost entirely a bunch of screwing around with nothing to show for it.

I've come around to think of intelligence as a mostly brute force process. It works by the sheer mass of having many people working on different problems. But the academic system has become too rigid for that, it's like a brute force approach to finding the right key among millions that will unlock that one lock, except they keep trying the same 10 or so keys and instead ritualize the process of using those keys. It transforms a useful process into the most useless one imaginable.
 
Loved this article.

Nothing is done about this, because many are comfortable with the status quo.

They can wield their bias, and block excellent papers from publication. They can do the same for funding applications. They can get their own flawed studies published in prestigious journals, and carry on with this very unjust, erroneous process that affects millions of lives.
 
Last edited:
My mind is blown by the novelty of his points, and the clarity with which he expresses, then demonstrates, them. He makes some solid points:
  • The best way to review science is to reproduce it. Carefully analyzing the data is second best. Peer review is more like a smoke test--"Does it smoke when you plug it in? Good enough." If it's not reviewed, but it's reproduced thoroughly, does it matter?
  • My own point: This is what we do in hard science. We don't rely on peer review. If we doubt how another team measured the gravitational constant, let's build our own experiment that's better.
  • The language in scientific papers is often thick and impenetrable, a senseless stumbling block to comprehension. You can explain complex concepts with simple language: https://xkcd.com/1133/ is a humorous example.
  • Almost all papers bury the authors' feelings and opinions in the same linguistic thicket. We can live a little. Science can be amusing. Check out this two-sentence mathematics paper.
  • Science can be humble. Explaining why you came up with a hypothesis, or the thought processes that led us from experiment to experiment can inspire further research. "I expected to get X result but I was dead wrong. Then I started investigating why." Exploring your work's limitations ("This only proves this part of my hypothesis") has the same effect.
  • Statistical analysis is overrated. Strong results don't need p-values. P-values and confidence errors only protect against random noise, not bias.
  • "But science is a strong-link problem: progress depends on the quality of our best work." I keep saying this! Quality over quantity. A few definitive studies beat preliminary chaff.


to combine a few of these when thinking specifically of the ME/CFS BPS literature issue: it is the 'getting stuck in the loop' of replicating really poor methods without updating them that is the issue.

So not critiquing the design or logic because they seem to have misunderstood that replication is normally used where there aren't major issues with bias and other aspects rendering the results unreliable - or for a lot of their research with validity issues.

It is about checking a good result didn't appear as a matter of coincidence to replicate, it isn't about repeating the same issues and method precisely in order to repeat the same patterns of bias, but about testing from different angles to remove them. Putting in a blinded control wouldn't have stopped it from being a test of the treatment approach
 
to combine a few of these when thinking specifically of the ME/CFS BPS literature issue: it is the 'getting stuck in the loop' of replicating really poor methods without updating them that is the issue.

So not critiquing the design or logic because they seem to have misunderstood that replication is normally used where there aren't major issues with bias and other aspects rendering the results unreliable - or for a lot of their research with validity issues.

It is about checking a good result didn't appear as a matter of coincidence to replicate, it isn't about repeating the same issues and method precisely in order to repeat the same patterns of bias, but about testing from different angles to remove them. Putting in a blinded control wouldn't have stopped it from being a test of the treatment approach

:thumbup:

I have read that innovative studies are more favored for funding. Not so, with BPS work.
 
A current example of what may turn out to be a major failure of peer review is in the thread Apparent risks of postural orthostatic tachycardia syndrome diagnoses after COVID-19 vaccination and SARS-Cov-2 Infection.

The currently top-voted comment on one YouTube video where this paper is discussed is —

This study had 15 authors, 3 reviewers, 2 POTS experts writing an editorial on it, and it was published in the best journal. It came out of Cedars-Sinai, a medical system with an IRB, manuscript review process, and lots of prestige. It was amplified by WebMD, NBC News, AMA, Eric Topol, and countless others. And it is 100% false. [...] If the rest of peer-review medical research resembles this in any way, well, draw your own conclusions. This is a watershed failure of the system. If you thought the rumblings of medical corruption & incompetence were overblown, here is your evidence that they were not. Bias overwhelmed the system. Go read the original paper and the 2 editorials that amplified it; wonder at the audacity of their statements, all unfounded.
 
A current example of what may turn out to be a major failure of peer review is in the thread Apparent risks of postural orthostatic tachycardia syndrome diagnoses after COVID-19 vaccination and SARS-Cov-2 Infection.
The currently top-voted comment on one YouTube video where this paper is discussed is


I just watched the youtube critique. This man is my new hero! He went at my non-scientist speed with his explanation and I understood the whole lot without glazing over and giving up.

It looks like such a basic mistake I don't understand why it wasn't picked up before publication, let alone by peer review. Is it because there's such a lot of data in these studies that even the authors get confused with the logic and calculations?
 
Last edited:
Don't want to turn this thread into RetractionWatch but one more current example.

‘We made a mistake.’ Omicron origin study retracted after widespread criticism (Science)

The paper drew criticism almost from the moment it was published, and some scientists say the problem could have been avoided if the study had been posted as a preprint first, allowing independent scientists to comment. “This would have been slaughtered on Twitter within a few days of being on preprints,” says Aris Katzourakis, an evolutionary virologist at the University of Oxford.

The paper’s critics say the mistakes should have been caught in peer review. “Some hard questions certainly need to be asked,” Andersen says.
 
I think the big problem is that peer review is done by peers!

People in any specific area of research all have a collaborative approach to taking things forward along the lines they think correct.

These are the peers who get asked to review, because the are “knowledgeable” in the area.

So if “your” research produces the outcomes that support the current collective view of that group, what incentive is there for “me” to criticise your work? I wouldn’t want to start undermining the whole foundations of the work we do, nor would I want “you” to later examine my projects with the same scrutiny.

So methods that might be dubious pass peer reviews, and the more often they do, the more they become accepted as standard, and the harder it becomes to critically assess them.

Group-think thus gets given authority, which becomes more and more difficult to question as time goes on, and as the original authors become increasingly senior and established.

Meh
 
I think there is serious value in a culture of semi-formal preprint review. Probably one of the more important developments in science in recent years. Done right it could save us all a lot of time and energy, and heartache. Could also be a good, and mutually beneficial, training ground for emerging scientists, citizen and professional.
 
I think the big problem is that peer review is done by peers!

People in any specific area of research all have a collaborative approach to taking things forward along the lines they think correct.

These are the peers who get asked to review, because the are “knowledgeable” in the area.

So if “your” research produces the outcomes that support the current collective view of that group, what incentive is there for “me” to criticise your work? I wouldn’t want to start undermining the whole foundations of the work we do, nor would I want “you” to later examine my projects with the same scrutiny.

So methods that might be dubious pass peer reviews, and the more often they do, the more they become accepted as standard, and the harder it becomes to critically assess them.

Group-think thus gets given authority, which becomes more and more difficult to question as time goes on, and as the original authors become increasingly senior and established.

Meh

Someone who is in academia over the last few decades might be able to shed more light from there is experience, but I know that the change in UK from RAE (Research Assessment Exercise) to REF (introduced in 2008) changed the measures that matter. These are only every 7 years, but as the results stick with the department and often influence funding then these have a big effect on how academics are trained into doing papers. Peer review also often happens internally in a lot of places to some extent but you imagine this and hierarchy issues influence (because you are either following one or the other or both in your checking).

REF's adapted a bit since but RAE used to be more methodologically focused and objective about 'quality', REF introduced 'citations' ie how much your paper is cited by others, and 'impact' which relates to the section at the end where researchers are righting about the relevance of their work in layperson's terms and often 'off their area of expertise or what they've just investigated' (someone might be good at getting the method and spend a year on that, but riffing on how it might be used..).

Quote below from the following page: https://www.ref.ac.uk/about-the-ref/what-is-the-ref/

The REF is a process of expert review, carried out by expert panels for each of the 34 subject-based units of assessment (UOAs), under the guidance of four main panels. Expert panels are made up of senior academics, international members, and research users.

For each submission, three distinct elements are assessed: the quality of outputs (e.g. publications, performances, and exhibitions), their impact beyond academia, and the environment that supports research.

That impact section is particularly used/enjoyed by PR/journos because of the language being easy, which sort of defeats the point of spending all the time on getting the actual investigation right if all the laypersons are reading that last bit.

There is also public engagement being encouraged a lot more - all of these having potential benefits in the right circumstance but in game-playing/box-ticking type context could skew things.

Citations was spotted as an issue at the time because by its very nature if you have a network all citing each other throwing out hundreds of light-weight papers vs a more niche area doing in-depth stuff then that one really plays into your hands. Plus it encourages the back-scratching/don't upset someone issue because you need citations as much as them.

I don't know how BPS get away with not touching on or doing a review of the bio stuff for ME as it comes through but I suspect by dismissing it as 'nothing' then they also do not have to cite that. It's not much of a 'debate' is it if they never cite the other side, or base their suggestions on findings. Plus that attitude + hiding raw data/not doing it in tables blows out of the water the idea of a literature getting to the bottom of theories by working up findings that either fit or don't fit and amend the model.
 
I think there is serious value in a culture of semi-formal preprint review. Probably one of the more important developments in science in recent years. Done right it could save us all a lot of time and energy, and heartache. Could also be a good, and mutually beneficial, training ground for emerging scientists, citizen and professional.

In theory, but I can also see how that could add to populism being played to. I don't know how (or whether it would be desirable) to limit who comments or what they comment on, so it doesn't end up being a quick skim and vote if you think its interesting or are convinced by the rhetoric. You are probably correct that there might be citizen scientists or whatever but you just have to look at the popularity of the storytelling books for BPS to imagine this might not in itself end up in what you think without it being focused on 'what measures/validity' etc

Marking schemes is a bit of an issue/way of focusing things (broad principles), people shouldn't be getting their voice heard on the narrative bit if it doesn't marry to what is found or the method and design are not valid.

Done right, however, it could be a much better way of 'getting the public involved in science' and 'making the taxpayer feel that what they are paying for isn't someone's niche not useful to anything whatever' (which I think was perhaps part of the sentiment behind putting in citation and impact and public engagement)

I'd also hestiate to suggest - and hope it isn't naive - that whilst open access is done by some scientists on data-sharing but isn't possible for all, some sort of protocol regarding tables presented in full as per the primary outcomes and protocols rather than just a number from an end calc and the odd 2-3/100 'findings' picked out would also mean that the literature was accessible as a moving entity.

People could see if one paradigm was getting closer for example, or just money for old rope finding significance that merely replicates 'placebo' (and I hate that term as I believe it represents 'trial effect' including behaviour of staff often more than the nonsense stuck onto subjects). The current random layouts in certain 'units/subject areas' with space assigned to pseudophil over data and plain graphs so people can see what happened, prevents anyone from connecting to this at all and puts public more at arms length from actual science, and with less access to the information they theoretically paid to create/find.

I agree that as it currently is I'm not sure that those who are assigned to do the reviews are well-positioned to feel they can eviscertate someone more powerful's work. And of course people are trained into how to do papers by those who are currently doing them so it is cyclical.
 
I think the big problem is that peer review is done by peers!

People in any specific area of research all have a collaborative approach to taking things forward along the lines they think correct.

These are the peers who get asked to review, because the are “knowledgeable” in the area.

So if “your” research produces the outcomes that support the current collective view of that group, what incentive is there for “me” to criticise your work? I wouldn’t want to start undermining the whole foundations of the work we do, nor would I want “you” to later examine my projects with the same scrutiny.

So methods that might be dubious pass peer reviews, and the more often they do, the more they become accepted as standard, and the harder it becomes to critically assess them.

Group-think thus gets given authority, which becomes more and more difficult to question as time goes on, and as the original authors become increasingly senior and established.

Meh
The last few years have clearly shown us that peer review is more about cultural compliance than substance. Popular but wrong = popular. Unpopular but right = wrong.

The original mistake here is thinking about quality control as a step, rather than a process. There is no perfect process, especially one that doesn't allow for corrections. Real scientists keep poking and breaking ideas. Pseudoscientists rubberstamp everything in the expectation of the same.
 
I just caught up with this article today, and very much liked it. I also enjoyed the self published paper on some of his latest research, which as he says, is far more readable than if he'd put it into psych jargon for journal publication.
 
I just watched the youtube critique. This man is my new hero! He went at my non-scientist speed with his explanation and I understood the whole lot without glazing over and giving up.

It looks like such a basic mistake I don't understand why it wasn't picked up before publication, let alone by peer review. Is it because there's such a lot of data in these studies that even the authors get confused with the logic and calculations?


Sorry to make a little diversion... Thank you very much @oldtimer ! My 2 statistics math classes go back several decades and this Dr explains wonderfully well why the conclusion of this article is outrageously distorted.
 
Sorry to make a little diversion... Thank you very much @oldtimer ! My 2 statistics math classes go back several decades and this Dr explains wonderfully well why the conclusion of this article is outrageously distorted.

He explains the most outstanding problem, perhaps.
But as far as I can see he has accepted other aspects of the study as valid which look extremely shaky to me. He seems happy that the true incidence of 'POTS' increased in both vaccination and infection groups and that this must be due to the vaccine or infection. I don't think you can draw any conclusion like that.

The recent health context has been very complicated and rates of diagnosis of POTS may be changing for all sorts of spurious reasons. The simplest suggestion is that with all the media coverage doctors are more aware of the commercial potential of diagnosing POTS. Another is that doctors have been alerted to POTS. And one can go on for ever.
 
Back
Top Bottom