Rating scales institutionalise a network of logical errors and conceptual problems in research practices, 2022, Uher

SNT Gatchaman

Senior Member (Voting Rights)
Staff member
Rating scales institutionalise a network of logical errors and conceptual problems in research practices: A rigorous analysis showing ways to tackle psychology’s crises
Jana Uher

This article explores in-depth the metatheoretical and methodological foundations on which rating scales—by their very conception, design and application—are built and traces their historical origins. It brings together independent lines of critique from different scholars and disciplines to map out the problem landscape, which centres on the failed distinction between psychology’s study phenomena (e.g., experiences, everyday constructs) and the means of their exploration (e.g., terms, data, scientific constructs)—psychologists’ cardinal error.

Rigorous analyses reveal a dense network of 12 complexes of problematic concepts, misconceived assumptions and fallacies that support each other, making it difficult to be identified and recognised by those (unwittingly) relying on them (e.g., various forms of reductionism, logical errors of operationalism, constructification, naïve use of language, quantificationism, statisticism, result-based data generation, misconceived nomotheticism).

Through the popularity of rating scales for efficient quantitative data generation, uncritically interpreted as psychological measurement, these problems have become institutionalised in a wide range of research practices and perpetuate psychology’s crises (e.g., replication, confidence, validation, generalizability).

The article provides an in-depth understanding that is needed to get to the root of these problems, which preclude not just measurement but also the scientific exploration of psychology’s study phenomena and thus its development as a science. From each of the 12 problem complexes; specific theoretical concepts, methodologies and methods are derived as well as key directions of development.

The analyses—based on three central axioms for transdisciplinary research on individuals, (1) complexity, (2) complementarity and (3) anthropogenicity—highlight that psychologists must (further) develop an explicit metatheory and unambiguous terminology as well as concepts and theories that conceive individuals as living beings, open self-organising systems with complementary phenomena and dynamic interrelations across their multi-layered systemic contexts—thus, theories not simply of elemental properties and structures but of processes, relations, dynamicity, subjectivity, emergence, catalysis and transformation.

Philosophical and theoretical foundations of approaches suited for exploring these phenomena must be developed together with methods of data generation and methods of data analysis that are appropriately adapted to the peculiarities of psychologists’ study phenomena (e.g., intra-individual variation, momentariness, contextuality). Psychology can profit greatly from its unique position at the intersection of many other disciplines and can learn from their advancements to develop research practices that are suited to tackle its crises holistically.

Link | PDF
 
It's a long paper.

This leaves but one conclusion: Unless ratings are removed from psychology’s portfolio of research methods, its recurrent crises (e.g., replication, confidence, validation and generalisability) cannot be tackled. Ratings may be useful for pragmatic purposes in applied fields, but they preclude measurement and—far more importantly—they preclude the scientific exploration of psychology’s study phenomena and thus its development as a science.
 
I've only read the abstract and the bit quoted in the second post.

My solution doesn't require meta-anything.

If something's not working, scrap it.

Psychology has spent 50+ years trying to pretend to be a science with numerical data that is amenable to statistical analysis, and drawing all sorts of erroneous conclusions that affect real people's lives, as we've experienced to our cost.

It wasn't too bad back in the days before computers - when realistically all you could do was ask a few questions, draw some graphs by hand and do a simple statistical test on a single set of numbers. At least everyone could see what was being done, and mostly probably conclude that human thoughts and actions are too complicated to be turned into single numbers.

Now, with the huge expansion of psychology as a 'science' where anyone can make up a set of questions and allocate scores to the answers that suit their prejudices, 'validate' them against someone else's set of questions, and then use social media to collect vast quantities of data from willing participants, stuff it in a stats package that spits out hundreds results they don't understand, anyone can pretend to be doing science.

I say - junk all psych questionnaires.
 
This leaves but one conclusion: Unless ratings are removed from psychology’s portfolio of research methods, its recurrent crises (e.g., replication, confidence, validation and generalisability) cannot be tackled.
If something's not working, scrap it.
The author seems to thinking along the same lines as us.

Rigorous analyses reveal a dense network of 12 complexes of problematic concepts, misconceived assumptions and fallacies that support each other, making it difficult to be identified and recognised by those (unwittingly) relying on them (e.g., various forms of reductionism, logical errors of operationalism, constructification, naïve use of language, quantificationism, statisticism, result-based data generation, misconceived nomotheticism).
I have no idea what at least one of those problems is, but I'm glad that the authors have identified them and have bothered to write a paper suggesting that the field of psychology has major problems.


The language might be a bit waffly
Philosophical and theoretical foundations of approaches suited for exploring these phenomena must be developed together with methods of data generation and methods of data analysis that are appropriately adapted to the peculiarities of psychologists’ study phenomena (e.g., intra-individual variation, momentariness, contextuality).
but they seem to be recognising the problems we have talked about here - things like how a 'do you still enjoy the activities you used to do?' question doesn't work to diagnose depression when someone has a disabling illness. They seem to be calling for sensible context-driven thinking.


I'm not sure that I can muster the enthusiasm to read the paper, but the philosophical and theoretical foundations the authors talk about might include finding objective outcomes that have meaning for people when studying psychological interventions. I think this paper could be a useful reference when talking about the problems of specific BPS papers. Maybe the '12 complexes of problematic concepts, misconceived assumptions and fallacies' can be a checklist to tick off.
 
Failure to distinguish the study phenomena from the means of their exploration—here called psychologists’ cardinal error—is reflected in many practices and jargon established in psychology. [...] This logical error has serious implications for entire research programmes because it makes the distinction of disparate elements of research technically impossible, thereby distorting basic conceptions and procedures of science.

The paper itself is jargon-heavy (natch) but discusses the failings within a framework of 12 concepts

The present analyses are based on the Transdisciplinary Philosophy-of-Science Paradigm for Research on Individuals. The TPS-Paradigm is targeted at making explicit the most basic assumptions that different disciplines (e.g., psychology, biology, medicine, social sciences, physical sciences, metrology) make about research on individuals involving phenomena from all domains of life (e.g., abiotic, biotic, psychical, socio-cultural). Their holistic investigation, necessitated by their joint emergence in the single individual, poses challenges because different phenomena require different epistemologies, theories, methodologies and methods, which are based on different and even contradictory basic assumptions.

The 12 sections are —

1. Psychologists’ own role in their research: Unintended influences
2. Beliefs in researchers’ objectivity: Illusions of scholarly distance
3. Mistaken dualistic views: Individuals as closed systems
4. Lack of definition and theoretical distinction of study phenomena: Conceptual conflations and intersubjective confusions
5. Reductionism: Category mistakes, atomistic fallacy and decontextualisation
6. Operationalism: Logical errors and impeded theory development
7. Constructification: Studying constructs without also studying their intended referents
8. Naïve use of language-based methods: Reification of abstractions and studying merely linguistic propositions
9. Variable-based psychology and data-driven approaches: Overlooking the semiotic nature of ‘data’
10. Quantificationism: Numeralisation instead of measurement
11. Statisticism: Result-based data generation, methodomorphism and pragmatic quantification instead of measurement
12. Nomotheticism: Sociological/ergodic fallacy and primacy of sample-based over case–by–case based nomothetic approaches
 
I think hidden in all that jargon there is a core message that is sound, namely that psychology based on questionnaire data has gone horribly wrong. Maybe the author thinks the only way the psychologists trapped in a mire of obfuscation, data mishandling and pretentious waffle is to waffle back even harder at them.
cute-waffle-cone-soldier-fighting-with-sword-shield_152558-82979.jpg
 
Trish got there first --- why don't they just look at Brian Hughes blogs and then design experiments that are (relatively) sound --- I'm not noted for brevity but they could have encapsulated it in a paragraph or even asked Brian for permission to use that cartoon ---
"apart from the lack of blinding, the subjective outcome indicators the ------ it's not that bad a study".
 
The paper itself is jargon-heavy (natch) but discusses the failings within a framework of 12 concepts
Thanks for venturing in so others didn't have to.

12. Nomotheticism: Sociological/ergodic fallacy and primacy of sample-based over case–by–case based nomothetic approaches
the paper said:
This entails the sociological fallacy, which arises from the failed consideration of individual- level characteristics when drawing inferences regarding the causes of group-level variability (Diez Roux, 2002).

This inferential fallacy required axiomatic acceptance of ergodicity, a property of stochastic processes and dynamic systems, which presumes isomorphisms between inter-individual (synchronic) and intra-individual (diachronic) variations.

:rofl: I think that is saying that Error 12 is that generalisations are made at population level and then it is assumed that such generalisations always and fully apply at the individual level.


Having skimmed some of the paper, I now must go and like the post with the waffle picture. I think the authors might have been more effective if they had given some examples of each of the problems they wanted to talk about.
 
Last edited:
Finally some people willing to say it. The reason psychology has adopted this is because it gives them the answers they want, it's that simple. And for that reason alone it's invalid. If you're not measuring what you think you're measuring, even more if you're not actually measuring things and instead use ratings, then you aren't doing science. Period. There is no scientific hack that can put science back into fake numbers.

It would have been bad enough if it wasn't for the disastrous application of this into medicine, but the result of this has been even more catastrophic, not just for the massive harm done, but for having stagnated the foundations of medicine. The errors are all conceptual, they suffer the fatal flaw of being potentially completely invalid, making everything downstream, which when applied to concepts is literally everything, also completely invalid.

The error here is not a flawed concept, it's that the entire discipline is OK with conceptual errors and in fact escalates commitment to them as more of their work inevitably has to be considered invalid. Point #1 is basically most of it: the problem of wanting something to be true more than to actually make sure it is.

The fact that this is the present and future of medicine puts major urgency into reforming this. It will only get worse, which will reinforce the natural politics and bickering that stifle medical progress so completely they ended up betting their entire future on flawed magical thinking.

Numeralisation is a good description of this approach of doing fuzzy math on feelings. It creates the illusion that there are valid numbers to fiddle with, when actually it made the whole thing even less objective. Having been reading this for years, the papers, the studies, the questionnaires, none of what they claim to represent mean anything to me, they are useless.

But extending this means invalidating most of "evidence-based medicine", and certainly everything BPS. I suspect medicine will be far more zealous in this regard than psychology, and probably keep this error going if and after psychology finally grows up about it. So much embarrassment. All for good, the best outcome for patients, but horrible for giant egos who made catastrophic errors of judgment because everyone was doing them, never thinking about the consequences.
 
Thanks, @SNT Gatchaman, I actually read the paper! Some interesting thoughts if you can wade through the jargon - not entirely new, but there are some nice references to other stuff in there that I've made a note of.

The waffle is the philosophy type, not the psychology type. I think the author may have a philosophy background (or at the very least has ambitions to be a philosopher).

Another reason the writing style put me off is that it made use of bald, unsubstantiated claims without supporting argument or evidence, e.g. "Psychology’s core constructs (e.g., mind, behaviour, actions) are poorly defined; common definitions are discordant, ambiguous, overlapping and circular". Is this claim true just because the author says it is? Or because its a known fact and everyone agrees on it? Or because bald negative statements don't require any further justification - they are somehow true by definition?

On the plus side, there were nice reminders that self-report ratings are likely to be influenced by a number of spurious factors:

* Interpreting the item/question. Some previous researchers have empahasised that responding to an item isn't done in a vacuum, but involves building a model of what the researcher might be meaning by the item/question. The respondent must also decide on the meaning of each of the key words (e.g. what do they mean by "worry"?)

* Interpreting the rating scale labels. Deciding which actual rating to choose involves interpreting what the terms mean relative to some internal standard the person has. Some descriptors - such as often/rarely - involve comparing event frequency over a time period (can be subject to recall biases, see see below). Some descriptors involve comparing oneself to some model of what the person thinks other people do or feel (e.g. rating the severity of fatigue). Although its not mentioned in the paper, these types of ratings can be vulnerable to recalibration bias (when you take part in a treatment that encourages to see your pain as more common and widespread than you previousy thought).

* Timeframe and current context. On scales that ask people to describe events that occurred over the past few minutes (e.g pain ratings), we might see strong context effects - events occurring just prior to or during that period might strongly influence ratings. At the other end of the spectrum, scales that ask about the last few months require accurate recall of both confirming and disconfirming instances and an evaluation of their relative frequency. Most cognitive psychologists agree that memory doesn't really work in this way - when we look back over a period, we remember only salient or unusual events. Plus our current situation can massively bias in our recall of the past - if a person is currently in pain they may recall many previous similar experiences, but if a person is feeling well at the time of the survey, they might recall many fewer negative experiences.

This doesn't even touch on the bigger issues of expectation and they way interventions can shape rater behaviour.

I thought a lot of the rest of the paper was not helpful in clarifying the real issues. The "cardinal error" thing just seemed to be phrased at a level that was too general for it to be held to account of any of its claims. While "personality" is an everyday concept, and so is "intelligence", many of the things we study in psychology are not at all related to everyday folk psychological concepts, and are not attempting to address those kinds of concepts in any way.
 
PS On re-reading, this point seemed interesting (I've edited out some of the jargon):
Uher paper said:
Rating items are often thought to reflect standardised meanings... or are equated with the phenomena they describe (signifier–referent conflation; Table 1). .. raters may consider in their ratings different meanings, thus also different phenomena than intended by researchers... Researchers often conceive item responses as verbal behaviours, mixing up raters’ semantically guided meaning construction, everyday beliefs and hand movements for ticking off answer boxes... leading to just pseudo-empirical findings.

Rating-based research runs the risk of studying just linguistic propositions and the constructs designated ... both of which are often mistaken for the concrete phenomena to which they are intended to refer, thereby also often conflating description with explanation...
After wading through the jargon, I came out with the point that some decent proportion of the variation we measure in rating scales might not be genuine variation in the experiences or behaviours of the person doing the scale, it might be variation in the way the person interprets the item.

This point might be picking up on something interesting too:
Uher paper said:
To enable between-case comparisons, psychometricians develop rating ‘scales’ enabling the generation of scores that differentiate well (discrimination) and consistently (reliability) between cases and in ways considered meaningful (validity), such as by selecting items that
produce norm-distributed values, show desired levels of item difficulty and item discrimination, or coherent score distributions across different
items used for the same construct. But this adapts methods and results to statistical criteria and theories rather than to properties of the actual study phenomena ... thus enabling only result-dependent data generation but not measurement.
So for example (and this paper could do with a whole lot of examples), the popular five-factor model of personality types might have arisen, not because there are five big factors that account for much of the variation in personality, but because rating questions/items that we have thought of assessing, and that meet the statistical criteria we require for inclusion, fall into five broad categories.

Another interesting issue that isn't talked about here is that the choice of questions is not theoretically neutral - we pick questions that we think represent aspects of personality that are real, then we pare them down to those that generate good statistical curves, and then we analyse their intercorrelations. But the factors that emerge will depend heavily on our assumptions before we started.
 
Back
Top Bottom