All algorithms are available open source under a CC Attribution 4.0 license; see the CALIBER Github repository.
Since GPs and hospital staff electronically record so much information every day, the UK has a hugely valuable data resource for healthcare research. However, different parts of the healthcare system record clinical information using different data collection systems – and often use different codes to describe the same health condition – so electronic health data can be complicated to interrogate and use for research.
HDR UK is helping to address this challenge through a research platform called CALIBER. Created by a team of data scientists, led by Professor Spiros Denaxas at the University College London Institute of Health Informatics, CALIBER helps determine which pieces of information stored in electronic health records – such as prescription data, blood results or test results – most reliably indicate presence of a specific disease. These ‘definitions’ are then turned into algorithms which researchers run against large health databases so they can define these health conditions in their own datasets. They then immediately have access to all the relevant data for their research study – meaning that they can more quickly advance improvements in care.
On the face of it, using electronic health records to define specific conditions is a simple proposition. But different organisations, and even different healthcare professionals, have different ways of recording the same information about a patient and about a specific diagnosis. The information is also recorded across a number of different clinical systems, depending on whether a GP or a hospital was treating the patient at the time. Additionally, researchers may be interested in identifying different aspects of a specific disease (occurrence, subtype, severity, or propensity)
Thismeans that if researchers, for instance, want to study the proportion of smokers who have a heart attack before the age of 60, they have to disentangle the different ways in which that information has been recorded and the different ways in which a heart attack has been defined.
This can slow down efforts to find better ways of caring for specific patients or treating specific diseases.
CALIBER is a platform which supports consistent data definitions for particular diseases and conditions. Created by a team at University College London, the platform provides algorithms that help researchers easily identify and potentially adapt the specific clinical information necessary to address their research interests. These algorithms are then made available, via an open access website, to anyone who would like to use them.
It means that rather than researchers having to gain access to data and then painstakingly go through the steep learning curve necessary to understand and adequately use the different sections of electronic health data – and deciding on the classification rules to identify and validate the disease of interest – they can simply run the algorithm on their own data to explore in a timely manner and in depth, the clinical context that they want to research.
The algorithms hold true across electronic health records within GP practices and hospitals, as well as national databases which record deaths from particular conditions or all-cause mortality.
There are now nearly 1,000 algorithms, all available on an open access website for researchers to use. Researchers can also upload their own algorithms to the site which can be used in turn by others.
What Was Learned
The project has underscored that the UK has an incomparable health dataset and that, when researchers are provided with tools that allow them to more easily interrogate it, research flourishes.
The algorithms created by the UCL team via CALIBER have now been reused by other researchers, on different studies. This facilitates research because it means that researchers are able to reuse information without having to “reinvent the wheel”. This also promotes transparency, replicability and reproducibility, so important in medical research. CALIBER proves that where tools are shared openly, and the logic behind them is clearly laid out, researchers are happy to reuse techniques established elsewhere.
Traditionally, however, it has been difficult to secure funding to support the development of tools like CALIBER. This in turn means that it can be difficult to establish reliable and trusted platforms within the research community.
In the case of CALIBER, funding is now in place to further support its development, with HDR UK supporting the creation of a National Phenotype Library. That means that CALIBER will increasingly become the established go-to ‘library’ of algorithms which detail the data that defines specific conditions or pieces of information.
Impacts and Outcomes
The team at University College London has published hundreds of clinical research papers using the algorithms to reliably identify relevant data for studies. Many of the papers are highly cited in other research, and some have served to influence the development of new clinical guidelines which improve care and population health.
Other teams of researchers are re-using the algorithms to contribute to their own clinical research. Significantly, the data that CALIBER algorithms makes easier to interrogate is not just relevant to the patterns and determinants of ill health in specific populations, or to clinical trials. It will also be applicable to genomics, helping experts determine the genetic bases of disease and to find more targeted and personalised treatments.
If all researchers across all areas of health research use the same definitions, then it will be easier to bridge existing gaps between different areas or research – for the ultimate benefit of patients and the healthcare system more broadly.
Accelerating reproducible science through open science and open code
We have brought together over 150 repositories of open standards, data and source code, tackling some of the most important challenges in wrangling multi-modal data and generating replicable insights.
Appendix. SSI Evidence of originality, significance and rigour for software outputs
Visit www.software.ac.uk/REF2021guidance for more info.