How far can we trust the men in lab coats?

A month ago the Lancet and the New England Journal of Medicine each retracted a major study on Covid-19 drug therapies. One article had been up for more than a month, the other for less than two weeks. Both were based on faked data. That the rush to publish on Covid-19 led established researchers, reviewers and journals to skip elementary checks is deplorable, if not entirely surprising. But is there a more deep-seated crisis in scientific research? Stuart Ritchie claims an epidemic of ‘fraud, bias, negligence, and hype’. Alas, he overhypes his own argument.

In 2011 this book would have been a wonderful path-breaker. Back then, a reputable psychology journal published an article claiming undergraduates could predict an image that was about to appear on their computer screen, but, oddly, only if it was pornographic. Ritchie, then a PhD student, tried to replicate this experiment, but the journal refused to publish his report that replication was (surprise, surprise) impossible.

We may wonder how a serious journal could expect its readers to believe in time-travelling porn; but at least Ritchie would now have no trouble publishing his report on a replication failure. Times have changed. Indeed his book is best read not as a call for radical change but as a guide to the revolution in the culture of science that has taken place over the past decade.

Unfortunately scientists, including Ritchie, remain more than a little naive about their own disciplines. They are shocked and horrified when there is evidence of fraud; yet even Newton and Pasteur cheated on their results. They are puzzled when, in psychology, only 65 per cent of experiments can be successfully replicated (a figure that rises sharply to 92 per cent if someone from the original team participates in the replication — a fact Ritchie doesn’t mention). But for years historians of science have been studying disputes over replication, which go all the way back to Galileo’s experiments, dropping balls from high towers in order to disprove Aristotelian physics.

Disputes over replication go back to Galileo’s experiments with dropping balls from high towers

In 1958 Michael Polanyi introduced the category of ‘tacit knowledge’ to explain a key aspect of the replication problem. Just as I would have difficulty explaining to you how to ride a bicycle without falling off, so it can be almost impossible to explain how to set up an experiment so that it works; you learn how to do it by watching someone else and copying them. Naturally, the replication problem often disappears if there is someone present to show how the experiment should be done.

Replication, in any case, isn’t the end of the matter. Galileo’s high-tower experiments were systematically skewed because he released his balls by hand, always letting go of the lighter ball slightly ahead of the heavy ball simply because his grip on the heavier ball was tighter. The design of the experiment was defective, but he didn’t question it because he liked the results. The idea that fraud, replication disputes and skewed results can ever be eliminated from science is simply wishful thinking: they are part and parcel of the enterprise.

Most popular

Patrick West

How political ideology corrupted science

Much more easily eliminated are false arguments from statistics. The agreed standard for a significant result is that the chance of it occurring randomly is better than one in 20. Ronald Fisher, who published The Design of Experiments in 1935, gave the example of a lady who claimed to be able to differentiate by taste whether the milk had been poured into a cup of tea before or after the tea. Presented with eight cups of tea, some milk first and some tea first, she identified which was which unerringly. Had she been choosing at random, the experiment would have had to be run, on average, 70 times to get this result; with six cups of tea it would have had to be run 20 times, Fisher’s threshold for a significant result. But of course if you test 70 people for their ability to predict the result of eight coin tosses you are likely to find one person with apparently unerring powers of prediction. Many medical studies are unreliable because one favourable result has been cherry-picked amongst lots of unfavourable ones.

Back in 2005 John Ioannidis published an article with the striking title ‘Why most published research results are false’. For the field he was discussing, medicine, his claim was sound; but, as Ritchie shows, the situation has recently been radically transformed by the requirement that medical scientists pre-register the design of their trials before conducting them, thus making cherry-picking much harder.

Considerable comfort can be drawn from an unlikely source: the database which, for the past decade, has recorded articles retracted by journals. Up until 2012, retraction rates were rising fast; but now they have levelled out, and only about four articles in 10,000 are retracted — hardly justifying Ritchie’s claim that there is an epidemic of fraud.

Moreover, moves to police fraud actually work: plagiarism, false refereeing and the use of doctored photographs all dropped once regular checks were introduced to make them harder to carry out. A group at Tilburg University has introduced Statcheck, a set of simple statistical tests which can be automatically performed to check if the numbers in a report have been doctored. The result, we can be confident, will be a rapid improvement in the statistical accuracy of published scientific reports for the simple reason that authors will Statcheck their own papers before submitting them.

Often, of course, mistakes are not deliberate but unintentional. The classic example is a 2010 paper by Reinhart and Rogoff, arguing that when public debt passes 90 per cent of GDP, economic growth is throttled. This provided the intellectual foundation for austerity programmes in Britain and elsewhere. Unfortunately the distinguished authors had made elementary errors in their spreadsheet formulae, producing totally spurious results. This was easy to establish once the spreadsheet was made available to their critics. Publishing raw data online is a simple way of eliminating crude errors of this sort.

In 1983, Peter Medawar, a Nobel prize winner in physiology, wrote:

The number of dishonest scientists cannot, of course, be known, but even if they were common enough to justify scary talk of ‘tips of icebergs’, they have not been so numerous as to prevent science’s having become the most successful enterprise (in terms of the fulfilment of declared ambitions) that human beings have ever engaged upon.

Science Fictions is a lively read; but it contains too much scaremongering and too little celebration of the progress that has been made. Yes, there have been serious problems. Many have already been fixed; some whole disciplines (above all social psychology, where key experiments, such as the Stanford prison guard experiment, turn out to have been rigged) have been shaken to their foundations. But the solution is not, as Ritchie seems to think, to make science less competitive. Take out the competition and scientific progress will simply grind to a halt.

In 1942 the great sociologist Robert K. Merton set out to explain the norms which make science successful. Thus, for example, scientists practise a form of communism in that they share their discoveries without charge. (Merton worried about patents, but not, as many do now, about the cost of journal subscriptions.) What Merton wanted to understand was how highly competitive scientists, each striving to outdo their colleagues and peers, could produce reliable new knowledge, and end up helping the discipline move forward. It’s an example of the old Mandevillian paradox of private vices producing public benefits.

Ritchie, well aware of the difficulty of enforcing the norms, thinks there is some sort of ineradicable contradiction between competition and cooperation, and he attacks ‘publish or perish’, the use of citation counts in promotions and the vagaries of peer review in awarding grants. He wants a disinterested community of scientists, not a war of all against all.

But competition is not the problem: it is the motor which drives progress. Nor is it easy to devise better measures of success (in the sciences at least) than citations and grants. Alas, committees of academics will always tend to reward mediocrity and distrust originality. For this, I am sorry to say, there is no cure. Human beings are pack animals. In science, as in market economies, competition is always imperfect; but the solution is never to eliminate but always to regulate it. Scientists are getting better at regulating fraud and misrepresentation; indeed it may no longer be true that most published research results are false. There is cause for celebration.