Don’t be deceived by Covid stats – we know a lot less than the numbers suggest

There is a concept at least as old as computing itself — Charles Babbage, the father of the field, expressed the sentiment, if not the words themselves — ‘garbage in, garbage out’. The idea is not a complicated one: no matter how advanced the calculation machine, no matter how good the statistical model, no matter how intricate the formulae, if the data on the way in isn’t reliable, the calculation that comes out will be suspect at best — and is liable to be outright wrong. We are regularly told that figures for economic growth were wrong and have to be dramatically revised. So what hope do we have of predicting coronavirus?

There is no reason to doubt either the skill or the dedication of the academics working to produce figures and forecasts to shape the response to the virus. But we do risk being misled by bad data. The daily influx of statistics, of infections, of deaths etc starts to build for most of us a false impression. It suggests to us that coronavirus is knowable, that we understand what is happening, that we have a plan. This impression begins to erode the second we examine the bedrock upon which it is built. Even the most reliable-looking of the figures we see tells us far less than it should.

It only takes a second of thinking about numbers of new cases to know they cannot be comparable across countries: how can they, if the death rate for the same disease is supposed to be ten times higher in Italy than in Germany? The infection numbers will only ever be the tip of an iceberg. In the UK, officially only those being considered for admission to hospital are eligible for coronavirus testing — with even NHS front-line workers unable to get tested before that stage, even if certain celebrities, politicians and others seem to get tests without much struggle.

Other countries test any suspected coronavirus patient, and even some asymptomatic people who are suspected to have come in contact with the virus. Many countries test somewhere between those poles — meaning in practice we are comparing largely meaningless figures with one another, with little idea of the true scale of corona in any.

The swine flu drama had plenty of lessons. Around two months after the first recorded US case in the 2009 swine flu epidemic, it was estimated to have caused 87 US deaths, versus more than 1,000 at the equivalent point for coronavirus. But later estimates of US mortality from swine flu — still regarded more as a near-miss than a pandemic — estimated it killed between 8,000 and 18,000 Americans. Estimating the lethality of a new disease takes months or years, not days — but we need to make decisions now.

Incomplete and inconsistent data issues like this might be manageable if we knew more about the disease: how virulent is it, how many people infected show symptoms, how many of those get seriously ill, and how many of those die. Many flu strains, for example, are pretty familiar. But because this particular coronavirus — designated Covid-19 — is a new strain, we do not yet have reliable answers to any of these questions.

The people making the models are trying to navigate us through a storm in an entirely unknown ocean

Most popular

Patrick West

How political ideology corrupted science

We are caught in a cycle of ignorance. For a start, we need more tests and more reliable tests — at this stage, we don’t even know how well the tests being carried out work. We have good reason to believe quite a few of them simply don’t.

Then factor in that we also have good reason to believe that multiple countries are lying about their coronavirus cases and deaths. It is a simple statement of fact that China covered up coronavirus in its early stages, downplayed the outbreak and wasted valuable time. Outside reporting suggests Wuhan deaths could be ten or more times higher than the government figures. Now that China has reversed its lockdown, further outbreaks would suggest that the party had handled the crisis badly. That would be covered up.

China is hardly alone. Russia, which had been seeking to hold a referendum granting President Putin sweeping new powers, claimed just two weeks ago to have coronavirus under control. Not long afterwards, Putin compromised with reality and initiated a lockdown so strict that no citizen is allowed more than 100 metres from his home. Other countries, to greater or lesser extents, also have incentives to misreport their figures.

Given this impossible backdrop of missing, flawed and fake data, it should no longer be hard to see why respected epidemiologists can come up with models suggesting radically different outcomes — such as the discrepancy between the Imperial College estimates driving UK government policy, and the Oxford study which last week suggested coronavirus may be far more widespread but far less deadly than it first appeared to be.

We have never had this type of pandemic in the modern era: Aids was horribly under-estimated by epidemiologists, but teaches us little about Covid-19, given its radically different transmission mechanism. Swine flu, Sars and Mers were all handled at relatively early stages. We have never got this far into a full-blown respiratory pandemic, and don’t know how to model what happens next. The people making the models are trying to navigate us through a storm in an entirely unknown ocean.

For all the limitations of the academics and number-crunchers, we are better with their efforts than without. Even flawed models are better than guesswork, and these efforts — and the expertise behind them – can save lives. But we shouldn’t look to their outputs as gospel truth or even settled science. We seek a grim kind of reassurance in modelling — even if what it tells us is scary, we feel we know what is coming.

The truth is we don’t. Everyone is trying to make their best guess on horribly limited information. We can only hope that we learn more soon — and that the knowledge doesn’t come at too high a price.