In February’s review of published papers and pre-prints from the HDR UK community, the Early Career Committee considered dozens of open access articles. They were ranked against core pillars of the HDR UK ethos: research quality, team science, scale, open science, patient and public involvement, impact and equality, diversity and inclusion. This month’s winning publication was “Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework”, co-authored by Nicholson et al.

Throughout the COVID-19 pandemic, it has been crucial for policymakers to have access to accurate and reliable data on the rates of infection across the country, so they can act quickly and make decisions about vaccinations, mask-wearing and lockdowns.

A challenge in obtaining an accurate picture of infections is that national testing is targeted to individuals at higher risk of being infected, such as healthcare workers and people contact-traced to COVID-19 case, meaning estimates may be inaccurate. In this study, Nicholson et al. present a statistical framework, using cutting edge causal inference methods, that models large-scale testing data with smaller-scale but unbiased randomized testing data to improve the accuracy in monitoring the spread of COVID-19.

Nicholson et al. combined different testing data sources in their analysis – national testing data collected through the National Health Service (NHS) programme, and two randomized surveillance surveys – the Office for National Statistics (ONS) Community Infection Survey (CIS) and Real-time Assessment of Community Transmission (REACT) study. The national testing data included all the PCR tests conducted in England, whereas the survey studies were designed to be nationally representative and, importantly, recruited participants regardless of symptom status and high-risk group status. This allowed Nicholson et al. to develop an analysis that achieved unbiased estimates of infection prevalence and R number over time.

The potential for impact of this study stems from the flexibility of the approach. The model allows for several important features – including accounting for differing testing capacity, different test types and estimating the number of infectious people. Overall, this study combines national testing with randomized testing surveys to deliver more accurate evidence to inform pandemic response. It also opens the potential for future work and collaborations in other countries to contribute towards public health responses across the world.

What the committee said

We scored this paper highly on open science and reproducibility, as all analysis code was made publicly available on GitHub and the statistical framework was made with interoperability and potential for use on other datasets in mind.

The potential for impact on the lives of the public and patients was high in this study, owing to the potential to provide real-time accurate prevalence estimates and improved global public health response to both the current COVID-19 pandemic and future ones.

Multiple co-authors, including lead author, acknowledged support from HDR UK relevant to this study. HDR UK’s Early Career Committee would like to congratulate and commend this team for their contribution to HDR UK’s vision of uniting the UK’s health data to enable discoveries that improve people’s lives.

Read the paper