Data is crucial to the way we live our lives and the current pandemic has brought that into sharp focus.  The data has highlighted the different outcomes from COVID-19 for people across a number of protected characteristics, for example:

  • People aged 80 or over are 11 times more likely to die from COVID-19 than those aged under 50.
  • Mortality in women with COVID is around 20% less than in men after adjusting for age.
  • Individuals “limited a lot” in terms of disability had a mortality rate 2.4 times higher for females and 1.9 times higher for males.
  • And perhaps the one that has captured most attention Black, Asian and minority ethnic groups are at higher risk of poor outcomes from COVID-19, which is in line with a number of diseases.

Often it has been the data (or at least the interpretation and modelling of it) that has driven the measures put in place to control the spread of the virus and hence dictated what we can and cannot do in our daily lives.

Data can also help us to understand whether policies introduced to tackle the pandemic have been inclusive– both in the short term and the longer term.  However, this presents a challenge if the data we collect is not representative, either because our sampling is not inclusive or the quality of the data about important individual characteristics is incomplete, inaccurate, or inconsistent.

Both are very real issues.  Even highly curated national datasets such as Hospital Episode statistics have at least 8% of entries missing ethnicity data.  Consistent collection, retention and use of data about protected characteristics can be extremely challenging and sensitive.

A June 2020 Lancet article COVID-19 and ethnicity: who will research results apply to? highlighted a number of stark facts regarding ethnicity and clinical trials, both within the UK and Internationally, pre and post COVID-19:

  • In a review of 12 UK Type 2 diabetes trials, 4 studies did not even report ethnicity whilst across the remainder the mean South Asian involvement was 5·5% despite South Asians representing 11·2% of the UK type 2 diabetes population.[1]
  • Of 1,518 COVID-19 international studies registered on ClinicalTrials.gov between 1 Dec 19 and 15 May 20, only six were collecting data on ethnicity[2].

Fortunately, we can report a much better picture for the RECOVERY trial where recruitment has broadly matched hospitalisation rates for different ethnic groups.  However, we must make this the rule and not the exception.

That’s why the UK Health Data Research Alliance at its July 2020 board meeting acknowledged the vital role members have to play to ensure both that the quality of coding of ethnicity and other protected characteristics is as complete and accurate as possible AND that the data we collect is sufficiently representative of all parts of our population.  Only then can we begin to be confident that we are doing all we can to help ensure that the research and innovation resulting from the use of our datasets is also appropriately representative, free of bias and inclusive. This is a priority area for HDR UK, as set out in our current Diversity and Inclusion Policy, and represents one of the ways in which we are proactively championing a significant increase in diversity of datasets and show why this is important in achieving our mission.


[1] Khunti K, Bellary S, Karamat MA et al. Representation of people of South Asian origin in cardiovascular outcome trials of glucose-lowering therapies in type 2 diabetes. Diabetic Med. 2016; 34: 64-68

[2] Pan D, Sze S, Minhas JS et al. The impact of ethnicity on clinical outcomes in COVID-19: a systematic review.

EClinicalMedicine. 2020; (published online June 3.) https://doi.org/10.1016/j.eclinm.2020.100404