Key points • You got to Ac-Cent-Tchu-Ate the Positive • Eliminate the negative • Latch on to the affirmative • Don't mess with Mister In Between (Ac-Cent-Tchu-Ate the Positive; 1944 song by Harold Arlen and Johnny Mercer).
Karl Popper is celebrated for his lucid conception of science as a creative enterprise, the intellectual fruits of which are pruned and ripened by perpetual challenge (1). He argued that scientists do not distill knowledge from accumulated facts. Instead, by leaps of imagination they develop hypotheses that go beyond our observed knowledge, hypotheses that yield new predictions by which they can be tested. Most hypotheses will be found wanting. By revealing anomalies, by displaying contradictions, by refuting hypotheses, we close off blind alleys in our understanding, and those few that survive determined rigorous challenge will guide the progress of science.
This notion became known as the hypothetico-deductive method of scientific inquiry.
But in 2010, Daniele Fanelli studied a large sample of papers from diverse fields – from physics and chemistry to psychology and social sciences. He analysed papers that “by declaring to have tested a hypothesis, had placed themselves at the research frontier of all disciplines and explicitly adopted the hypothetico-deductive method of scientific inquiry, with its assumptions of objectivity and rigour” (2). In all fields, most of the papers that Fanelli analysed reported evidence in favour of the tested hypothesis. At the top, more than 90% of the papers in Psychology and Psychiatry reported positive findings.
In 2012, Fanelli followed this up (3). He collected 2,434 papers derived from 20 disciplines between 2000 and 2007 and compared these to 2,222 papers published in the 1990s. Of the papers published in the 1990s, 70% returned findings supporting the hypothesis being tested, and in the later sample this had risen to 89%. ‘Negative findings’ appear to be fast disappearing from the literature. Science, it seems, is increasingly suffering from ‘positive-outcome’ bias.
The conclusions seem inescapable. Either scientists misreport the outcomes of experiments as supportive when they are not. Or they misrepresent their hypotheses - the hypotheses that they purport to be testing were conceived not before the experiments but after them, to fit in with the outcomes. Or they misrepresent the state of understanding at the outset of their experiments: their experiments were not innovative leaps into the unknown, inspired by bold hypotheses as they are purported to be, but had outcomes that were predictable (and predicted) from evidence and reasoning hidden from the readers. Or they suppress inconvenient evidence: many experiments that fail to support their hypotheses are never reported in the scientific literature, the phenomenon known as publication bias (4).
Put it like this. If scientists can correctly predict the outcomes of their experiments 90% of the time, why do they waste time and money actually doing them?
So what’s going on?
Scientists are remembered not for things they got wrong, but things they can claim to have got right. Thus, they ‘accentuate the positive’ in the outcomes of their studies – they cleave to any findings that are clear, and then (sometimes at least) retrospectively construct a hypothesis according to which these findings should be expected, while minimising or disregarding findings for which they have no reasonable interpretation. Such narratives give the appearance of hypothesis-driven research, but this appearance is artificial. The research might have been driven by a hypothesis – just not by the hypothesis apparent from the narrative. The hypothesis as presented is an imposter.
Sadly, it seems that such creative re-presentation may be an effective strategy, at least for boosting citations. In August 2019, Lee Treanor and colleagues looked at citation practices in imaging research and asked “Are Diagnostic Accuracy Studies With Positive Titles and Conclusions Cited More Often?” (5) True to form, their own conclusion was positive.
They looked at 995 primary studies. Of these, 782 came to conclusions that were “positive or positive with qualifiers”, 127 came to neutral conclusions and just 86 came to negative conclusions. Studies with positive conclusions were cited, on average, 0.54 times per month, those with neutral conclusions 0.42 times per month, and those with negative conclusions just 0.34 times per month.
Then Treanor et al. looked at the titles of the papers. Fifty-one papers declared the positive findings in their titles, and these were cited more often than other positive studies - on average, 0.66 times a month. Just two studies declared negative findings in their titles – these were cited just 0.06 times per month.
Of course, having the conclusions in the title saves the bother of reading further.
A reflection: Gareth Leng
The risk of this was known long ago, but seems to have been forgotten. When I began my career, The Journal of Physiology was the most respected journal in my field; then, it had a rule that the titles of papers should not express a conclusion. In 1980, my first paper was one of 438 published in The Journal in that year. Three years later, in 1983, these were cited 1,987 times – an average of 4.5 times each.
The rule that titles should not express conclusions has long been abandoned, an early casualty of the need to enhance the impact of the Journal. Many other initiatives followed – the Journal attempted to become more selective, reducing its acceptance rates, and it began to publish reviews (which were thought to attract higher citations) and to publish commentaries on selected papers to increase their visibility.
In its heyday, papers in the Journal were published without abstracts – its readers had no alternative but to work their way through the detail. Then, Summaries began to appear at the end of papers, as a recap of the key steps. These were then moved to the front of papers, and trimmed to 300 words or less to facilitate indexing. Now, for those who will find the 250 word abstract a struggle to read, authors must also provide a ‘Key Points Summary’ of less than 150 words in bullet-pointed sentences (https://jp.msubmit.net/html/Keypoints_Guidelines.pdf; see above).
Like other highly respected journals, The Journal of Physiology has struggled to reconcile academic integrity with the need to demonstrate its authority through metrics that have little to do with quality. Many of the innovations were reasonable; however, the ambition to enhance the journal impact factor was not a success. In 2016, the Journal published 310 research papers, and on my quick estimate 136 had titles that disclosed a primary conclusion, while 164 had ‘neutral’ titles. Three years later, in 2019, the papers with ‘positive’ titles were cited (on average) 4.9 times each while those with neutral titles were cited just 3.5 times each. It’s thus a matter of simple fact that the impact factor of the Journal has scarcely changed in 40 years (6).
Perhaps, if all its papers had positive titles the impact factor would increase. Someone will probably suggest that this should be the new rule, but since all other journals are favouring positive titles there is probably little advantage to be gained. But chasing journal impact is a zero-sum game, and seeking to gain advantage in it at the expense of other journals by manipulations that have nothing to do with quality seems to me to be cynical and unworthy.
I rather liked the old rule, the rule that expressed the expectation that the experiments reported in the Journal should have a value that endures longer than any particular hypothesis that inspired them.
Let’s return to Popper:
"Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or 'given' base; and if we stop driving the piles deeper, it is not because we have reached firm ground. We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being.”(1)
For Popper, a hypothesis was not the end of an experimental programme, but the means to an end – the scaffold by which to design experiments that could yield outcomes that had a value not only for testing that particular (perhaps flawed) hypothesis, but for constructing better hypotheses for future testing. The piles on which we build our castles of theory are built from bricks, bricks made of straw and clay, and the straw and clay are gathered and recycled from older castles, often now in ruins. The straw and clay we use are the “facts” – messy things, open to diverse and malleable interpretation.
In a good paper, the data will be fully and transparently declared, will be the outcomes of methodologically sound, well-designed experiments using validated analytical approaches. If so, the data will be transportable, open to fresh interpretation in the light of other evidence, yet to be discovered.
Accordingly, the enduring value of a paper lies not in the conclusions that its authors come to, or in the interpretations that its authors prefer, though these may be the driver of its short-term citation impact. Conclusions are always provisional, always open to re-evaluation. Interpretations are always part reason, part rhetoric. So we tell our students that papers should never be cited for their conclusions, and that interpretation should never be borrowed lazily. Papers should be cited for what within them should endure – the data that the authors have harvested. Our students, and the authors of citing papers, must make their own interpretation of the data, and draw their own conclusions, informed by what the authors of the original study could not be – by other evidence published since then.
Scientists are story-tellers; we are in the business of persuading others of the soundness and importance of the claims we make. That’s the nature of science, and we must be robustly sceptical of the conclusions of others, and also of our own. However we trust, as we must trust, to the integrity of scientists in reporting the outcomes of their experiments, however much we may doubt their interpretation of those outcomes. We can freely reject conclusions, but should not ignore the data.
Authors: Gareth Leng and Rhodri Ivor Leng | 08.04.2020
Notes
This blogpost is based on an abridged extract from The Matter of Facts, by Gareth and Rhodri Leng, to be published by MIT Press in 2020. For further information and list of retailers, see https://mitpress.mit.edu/books/matter-facts.
1 Popper, K. ([1934] 2000). The Logic of Scientific Discovery. London: Routledge – quotation on P94.
2 Fanelli D (2010) “Positive” results increase down the hierarchy of the sciences. PLoS One 5(4):e10068 – quotation is on P5.
3 Fanelli D (2012) Negative results are disappearing from most disciplines and countries. Scientometrics 90:891-904.
4 Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychol Bull. 86:638–41
5 Treanor L et al. (2019) Selective Citation Practices in Imaging Research: Are Diagnostic Accuracy Studies With Positive Titles and Conclusions Cited More Often? AJR Am J Roentgenol. 213:397-403. doi: 10.2214/AJR.18.20977. For a broader discussion of citation bias, the act of preferentially citing literature of a particular outcome, see: Duyx, B., Urlings, MJ., Swaen, GM., et al. (2017). Scientific citations favor positive results: a systematic review and meta-analysis. J Clin Epidemiol. 88:92–101.
6 Analysis was performed with data derived from the Web of Science - https://wok.mimas.ac.uk/