COVID-19 testing strategy – why random sampling matters

The release of the government’s five-pillared testing strategy is very welcome. However, the word “random” is conspicuous by its absence from the accompanying report (Coronavirus (COVID-19): Scaling up our testing programmes.)

Testing, whether for current infection (antigen) or previous infection and, presumably, a degree of immunity (antibody), has two quite different purposes: to understand the health status of the individual being tested; and to understand the health status of the population at large.

For the former, a focus on key workers and/or high-risk individuals makes perfect sense; for the latter, it can be seriously misleading. Simple example: if, unbeknown to you, COVID-19 is twice as likely to infect men as it is women and you test twice as many men as women, a simple average will over-estimate the population-wide prevalence by a factor of 5/3. It’s easy to adjust for this because gender is an easy thing to recognise, but when you don’t know how to put individuals into the relevant groups (gender? age? underlying health issues…which ones, exactly? ) you’re stuck unless you sample at random, or at least “as-if-random.”

That’s easier said than done, but it’s what statisticians and epidemiologists are trained to think about whenever they are designing a new study. For COVID-19, food outlets might be a reasonable way to get an “as-if-random” sample provided you make adjustments for age and gender, for which you can tap into the national census data.

Another problem we have heard a lot about is that current tests are “not good enough.” A “good” test is one that is likely to give a positive result when applied to an infected individual and likely to give a negative result when applied to an uninfected individual – these represent the “Sensitivity” (Se) and “Specificity” (Sp) of the test.

Now, for making decisions about individuals you obviously need high Se and high Sp – you don’t want to let infected individuals out of quarantine or vice versa. But for estimating the prevalence in the whole population, or in a sub-population of particular interest, e.g. school-children or people in Yorkshire, then, provided you have an as-if-random sample and unless the test is no better than the flip of a coin, you can apply a simple correction:

True prevalence = (Apparent prevalence -100 + Sp)/(Se + Sp -100).

There may be some merit in rolling out a programme of as-if-random testing sooner rather than later.

Health priorities

COVID-19

Collecting COVID-19 data and samples in a way that enables global research (ISARIC-CCP proj...

A doctor’s view from the frontline – Patient care at the time of COVID and how routine health data can help

BHF Data Science Centre Webinar: February 2024