Dominic Cummings placed a job advert back in January calling for data scientists, statisticians and modellers. Since then, the coronavirus epidemic has made all of us 'weirdos and misfits' in our growing obsession with data. Everyone now has opinions on the latest coronavirus statistics, whether it's South Korean test numbers, German fatality rates or Italian regional differences. The latest data visualisations get shared widely across the internet. But how should we make sense of them?
To a professional mathematician like me who gives lectures about probability and uncertainty, the rise of the epidemic modellers is somewhat bewildering. Compared to the problems I normally work on, modelling the epidemic is both incredibly easy and unbelievably hard.
We think we know how epidemics play out. Every infected person passes on the virus to the same number of people every day. Let’s suppose that it’s two people per day, and then we can see the terrifying maths of exponential growth. If on Day 0, one person has the virus, on Day 1, two new people become infected. On Day 10, it’s 1,024. Three weeks in, it's a couple of million. Of course, it won’t be exactly two people per day, but the principle remains: the number of cases will be multiplied by the same remorseless factor every day.
But if you plot cases on a graph, you’ll find it hard to judge what is going on because the numbers involved grow so fast. This is where the main trick of coronavirus data visualisation comes in. The ‘slide rule generation’ may dimly remember this: taking the logarithm gets rid of an exponential. This is why the graphs you see have strange labels, where the step from 10 to 100 is as big as the step from 100 to 1,000, and so on. Squashing the axes like this turns hard-to-visualise exponential growth into something much easier to understand: a straight line. The more steeply the line goes up, the more cases every day – and the worse the news is.
Unfortunately, this is where things start to get messy. We don’t all meet the same number of people every day, we don’t all develop symptoms at exactly the same rate, tests aren’t processed as fast at weekends, and so on. As a result, the numbers won’t go up as precisely as the theory suggests. When we use real data, our logarithms graphs don’t have straight lines, but much wigglier ones instead.
It's worth remembering this when you next hear the latest grim tally of coronavirus victims.
Firstly, get into the habit of converting them into percentage changes, because it’s this you should care about. Secondly, remember those changes won’t be the same every day, so nobody should be trying to judge the Government’s approach from any particular day’s values: look at the long-term trend instead. The current UK death toll illustrates this well; daily fluctuations cancel out, and a clear straight-line pattern emerges.
But of course, things aren't quite so simple. The numbers won’t grow exponentially forever. As social distancing measures start to work, eventually deaths won’t go up by the same factor every day. Indeed, in graphs from Italy or Spain, the straight lines have now started to flatten, as things get slightly less terrible.
As the death toll grows each day in Britain, we are desperate for the same to happen here. We want to think that every day at home fighting for Ocado slots and wrestling with Zoom settings has been worth it. Like a dieter fooling themselves that a vanished pound is the start of something bigger, we stare at the graph of deaths and convince ourselves that things must be working, the line really is starting to flatten. Unfortunately, there’s no easy way to tell at first if this is another wiggle in the figures, or the start of the good news we are longing to see. All we can do is be patient, keep our fingers crossed, and not jump to conclusions before we’ve seen three or four days of good news in a row. Only then, can we begin to hope.
It's at that stage that the real modellers must show their worth, if we are to get through this epidemic for good. It seems likely that lockdown measures will be relaxed and reinstated over the summer. But how those decisions are made – with their ultimate objective to avoid the NHS being overwhelmed – will be based on complicated interactions. At this stage, the armchair modellers’ simple straight-line plots will be long forgotten, and the data science skills that Cummings espoused will be more valuable than ever. Thank goodness for the real weirdos and misfits.
Oliver Johnson, School of Mathematics, University of Bristol @BristOliver