Making large-scale health data useable for research requires computer programs to identify traits, or phenotypes, from complex records. To enable researchers to share, compare and analyse different phenotypes, HDR UK has created a library that is already improving the quality of research.

The challenge 

Data researchers can find new ways to improve health and care by studying electronic health records. Large data sources offer lots of valuable insights, but these records are often not standardised nor designed for research purposes. To make these useable for studies, researchers create computer programs to analyse the records and identify traits, also known as phenotypes, such as a condition someone has or what medication they are taking. 

Researchers create definitions of each phenotype, which might be a specific combination of clinical codes appearing in the health record. This is used by the program to identify the records that are relevant for the study, for instance, distinguishing between people with type 1 and type 2 diabetes. Using these specialised programs, researchers and clinicians can maximise the value of data in electronic health records and use it to improve healthcare. 

As more researchers take this approach to generate new insights from health data, there is a growing need for tools to support them. For example, if each researcher creates their own definitions of phenotypes, it makes it harder to compare different studies, resulting in a duplication of effort. It is also important for researchers to openly share their methods so they can be reviewed, improved and reused by others. 

The solution 

The HDR UK National Phenotype Library is the first national platform to store and freely share these phenotype definitions. This allows researchers to find and reuse existing phenotypes for greater breadth and accuracy in their work. 

Since its launch, over 1,000 phenotypes have been uploaded to the repository. This draws on contributions from hundreds of researchers, with phenotypes defined using dozens of different datasets. It is now a major resource to help answer important questions using UK health data. 

The team behind the Library was also keen to ensure that it led to public benefits so they invited non-experts to review the site for accessibility during its development.  

Emily Jefferson, Director of the Health Informatics Centre at the University of Dundee and co-lead of the Phenotype Library, said: “Over the last few years, the Library has grown to provide a powerful platform for researchers. Now, the ability to access curated, data-driven definitions for 1,000 common and rare health conditions gives researchers the opportunity to save time and improve the quality of their research on an all-new scale, to ultimately bring major benefits to patients.”  

The impact 

The Library is already helping to improve research and care for patients. For example, phenotypes from the Library were used to show that people with cardiovascular diseases are at higher risk of death from COVID-19. In addition, it led to the discovery of over 10,000 deaths from COVID-19 that were previously unattributed. 

Gathering different phenotypes together has also helped researchers to create ‘gold-standard’ definitions for various conditions, such as asthma. This will improve the quality and comparability of future research in these areas. 

The phenotypes also lead to faster recruitment for clinical trials. The Library can integrate with existing clinical trial software to help proactively identify eligible people, rather than waiting for when individuals happen to speak to a healthcare professional.