Projects range from the creation of tools to enable trustworthy health data research, pilot studies investigating the use of new sources of health data, and a public engagement campaign to improve awareness and understanding of data use.   

All projects are, however, united by their ability to gear the institute up to deliver the next five years of its strategy from 1st April 2023. 

Andrew Morris, Director of HDR UK said: “The next phase of our strategy will build on our success in uniting the UK’s health data assets, improving them through cutting-edge infrastructure and strong information governance, and using them to deliver radical improvements to people’s lives.  

“I’m delighted to support these 12 innovative projects from across our HDR UK community, which bring together some of the best of the UK’s world-leading talent and expertise in health data research. 

“Together, they’ll allow us to hit the ground running in delivering impact in the next 5-years of our ambitious strategy, and ultimately support our vision for large scale data and advanced analytics to benefit every patient interaction, clinical trial, biomedical discovery, and public health.” 

This investment is designed to allow the institute to rapidly mobilise teams across the UK to respond to emerging needs and support priority strategic activity. 

A total of 63 collaborative applications were received and taken forward for review by an independent panel of representatives from HDR UK’s national community and members of the public from their Public Advisory Board.   

The projects are all community-led efforts from HDR UK institute members, and were assessed on their ability to be delivered in line with the Institute’s commitment to collaborative, open and team science approaches, as well as for their scientific excellence and ability to deliver rapidly. 

Three of the projects focus on improving data use in clinical trials, helping them harness the power of routinely collected data to allow new treatments and interventions to be assessed faster and more efficiently. 

Enabling data research at scale is also a theme across the projects, with two of the projects looking to enhance the ability of the health data research community to undertake UK population-wide research studies. Some of this will build on the significant progress made throughout the COVID-19 pandemic to enable large-scale analyses across each of the four nations to improve population health. 

Other projects look to make health data research more inclusive, through creating informatics tools to promote ethnic and gender equality in genomic medicine, and leveraging nationwide data to better understand rare diseases and improve the lives of those affected.  

An additional two proposals were considered to closely align with the Data and Connectivity Programme, part of the National Core Studies established by Sir Patrick Vallance in 2020 to support the UK’s COVID-19 response, and were funded via this route.  

More information on the funded projects is found below.  


Summary of projects funded through the Innovation Fund 

Note: this list includes the project titles and leads, however many are collaborations across multiple institutions. 

Enhancing health data science services to support UK population wide health data research – Led by Cathie Sudlow  

HDR UK, BHF Data Science Centre, University of Leeds, University College London, University of Liverpool, DATA-CAN, University of Cambridge, University of Dundee, Swansea University, SAIL, Edinburgh Parallel Computing Centre, Public Health Scotland, NHS Digital, Queen’s University Belfast

This project includes expansion of the heath data science team to rapidly enhance the capacity to provide optimum support for projects and researchers, including ‘on the job’ training for new team members and recruitment from non-health backgrounds where appropriate. The project plan includes improvement to existing frameworks and development of new reproducible and reusable pipelines for data curation and analysis; establishment of efficient mechanisms for sharing code and curation approaches across trusted research environments (TREs) and nations, beyond the existing external GitHub model; provision of off-the-shelf resources for researchers to ‘hit the ground running’ when they first access the TREs; and development of joined-up approaches between BHF Data Science Centre and DATA-CAN Health Data Research Hub teams to facilitate improved working across the teams.

Demonstrate Data Integrity for Clinical Trials (DEDICaTe) led by Macey Murray

University College London, University of Oxford, NHS Digital

Clinical trialists need to demonstrate to regulatory authorities (e.g. MHRA, EMA) that all data used in a clinical trial, including healthcare systems data, are integral, reliable, and complete. The project team have developed a process to ascertain and document the integrity and provenance of these types of data and selected and assessed two NHS Digital data assets using this process. This project will extend this important work to automate the assessment process using a metadata cataloguing tool, Collibra, within NHS Digital’s data platform, so that more data assets can be evaluated for integrity, and to allow this evaluation to be made available to trialists.

Medicines Data Resource – Led by Reecha Sofat and Munir Piromohamed  

University of Liverpool, University College London, University of Birmingham, University of Leeds, University of Manchester, University of Dundee, BHF Data Science Centre, University of Strathclyde

The project aims to create an open medication resource to transform medicines-related health data research. The team plan to harmonise coding across data sets to enable drug codes and drug dosages to be downloadable for research across databases (including UK Biobank, Clinical Practice Research Datalink, BHF-DSC TRE, incorporating collaboration across NHSE, EDRIS (Scotland), SAIL Databank and Honest Broker Service Northern Ireland). They then plan to create a website for researches to device code lists and if possible an R package to enable federated analysis across medicines data sets.

Pioneering study of the ‘glycome’ at population scale: pilot assay of a major under-studied ‘omics layer in diverse cohorts from across the HDRUK Multiomics Cohorts Consortium – Led by Adam Butterworth  

University of Cambridge, University of Edinburgh, Queen’s University Belfast, University of Leicester, MRC Epidemiology Unit, University of Oxford, Queen Mary’s University London, University College London, Imperial, College London, University of Bristol

During HDR UK’s first five years, the institute has primarily focused its multiomics research on data types that are common across population cohorts in the UK, such as proteomics and metabolomics. One of the goals of the Molecules to Health Records Driver Programme for HDR UK’s second five years is to build on this initial success by expanding multiomics research in several ways, including newer data types, more diverse cohorts and participants, and expanded international collaboration. This project aims to accelerate the delivery of those goals by conducting glycomics assays across several diverse UK population cohorts during 2022/23, which can then be analysed in the early phases of HDR UK’s next five years.

Public Engagement in Data Research Initiative (PEDRI) – Led by Amanda White and Sinduja Manohar  

HDR UK, HDR UK Public Advisory Board, Office for National Statistics, Administrative Data Research UK, NHS Transformation Directorate, DARE UK, Ada Lovelace Institute, Cancer Research UK, Research Data Scotland, Our Future Health, National Cancer Research Institute, Charities Research Involvement Group

The project will work across the data research and statistics ecosystem to systematically embed best practice in public engagement and involvement, improve public awareness and understanding, and build public trust in data use. HDR UK and the Office for National Statistics are working in partnership with other organisations that use and enable data for research, to collaborate on best practice of bringing the views of the public to policymakers and data holders and to ensure data are managed and used in ways that demonstrate trustworthiness and build public confidence. The initiative will build and progress on the foundations already set by the partners to rapidly improve the collective impact and efficiency.

Creation of FAIR (Findable, Accessible, Interoperable, Reusable) informatics tools to promote ethnic and gender equity in genomic medicine Led by Michael Inouye

University of Cambridge, EMBL European Bioinformatics Institute, HDR Global, University of Oxford, University of Dundee, Swansea University, Wellcome Sanger Institute, the PAGE consortium, Johns Hopkins School of Public Health

The project aims to develop and deploy polygenic scores for disease and molecular traits which are highly predictive, unbiased with respect to ethnicity and gender, and which fulfil FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. This will be achieved by developing a systematic benchmarking of polygenic risk score (PGS) Catalog to determine predictive and equitable polygenic scores; expanding OmicsPred to include additional technology platforms and cohorts, more than doubling its size; promoting equity by developing robust PGSs for OmicsPred in diverse ancestries; and disseminating polygenic scores freely and FAIR’ly via globally interoperable and field-leading portals.

Developing capacity and capability to undertake UK-wide studies of >65M people using COVID-19 as an exemplar – Led by Aziz Sheikh and Cathie Sudlow 

University of Edinburgh, University College London, University of Leicester, University of Bristol, University of Cambridge, Queen’s University Belfast, Public Health Agency, Public Health Scotland, University of Aberdeen, University of Strathclyde, University of Edinburgh, Swansea University

This project builds on a significant amount of COVID-19 related work across the UK nations which has been funded by the Data and Connectivity National Core Studies programmes. The project aims to further develop the team of investigators, representatives from TREs patient and public involvement (PPI) representatives across all four UK nations, working closely with Chief Medical Officer Directorates, to develop processes and robust scientific methods for near real-time identification of health indicators in a whole UK population cohort. Using COVID-19 as an exemplar to address key health questions, analyses in each of the nations will enable comparisons to be made across the UK as well as a meta-analysis for more than 65 million people. Findings from this work will be presented to GO-SCIENCE, national COVID-19 Advisory Groups and public health agencies in addition to publication in high impact journals. The lessons learned from this work will contribute vital information for society’s recovery from the pandemic and resilience against future waves of infection. This work will also demonstrate the potential to use data intelligence to improve our responses to many other pressing health challenges (e.g. cancer, heart disease, respiratory disorders), yielding significant future benefits to the health of the UK population.  

Addressing high priority methodological questions for the trials community to use routinely-collected healthcare data – Led by Paula Williamson, Amanda Farrin, Fiona Lugg- Widger 

University of Liverpool, University College London, NHS Digital, BHF Data Science Centre, University of Oxford, HDR UK

This project builds from the ongoing MRC-NIHR TMRP-funded COMORANT-UK study. It will address two of the highest ranked questions identified in COMORANT-UK, a Delphi survey on challenges to for trials to use routinely-collected healthcare data emerging from the MRC-NIHR TMRP-funded COMORANT-UK study. HDR UK North will lead on establishing a best practice for a prioritised methodological challenge and Cardiff University will lead on addressing a challenge that can be addressed through training. In each instance, we will pick a challenge that is both ranked high and on which meaningful progress can be made during the funding window.

Assessing the utility of healthcare systems data for trials: data utility comparisons in STAMPEDE trial (DUCkS) – Led by Matt Sydes and Sharon Love 

University College London, Institute of Cancer Research, STAMPEDE Trial Management Group, NHS DigiTrials, NHS Digital, University of Manchester

The DUCkS project will assess the concordance agreement between traditional trial-specific data collection and healthcare systems data (routinely-collected healthcare data) in 10,000 STAMPEDE patients. The analyses will involve assessment of survival, chemotherapy treatments, radiotherapy treatment, second-line treatment and toxicities.

An Academy for PhD Training in Health Data Research – Led by Iain Styles and Christopher Yau

University of Birmingham, HDR UK, University College London, University of Manchester, University of Edinburgh, University of Cambridge, Queen’s University Belfast

This project will develop HDR UK’s PhD programmes and to develop a sustainable programme of advanced training (doctoral-level) in health data research. This will lay the foundations for a National Doctoral Academy for Health Data Research with the aim of making advanced health data science training accessible to all UK PhD students and contributing to “levelling-up”.

In addition to the projects listed above, two proposals were considered to closely align with the National Core Studies Data and Connectivity programme and were approved for funding through this route.

Data Protection and Governance Led by Cassie Smith  

HDR UK, University of Bristol

Recruitment of a Data Protection and Governance Manager to accelerate HDR UK’s Ethics and Governance workstream in its next five years. The workstream will provide best practice ethics and governance toolkits to the Research Driver Projects and Infrastructure programmes. Early recruitment to this role will enable HDR UK to begin work on streamlining the governance processes across TREs, enabling faster access and influencing NHS thinking at a critical juncture.

Rare Disease Phenomics: leveraging nationwide scale and richer local data – Led by Tom Lumbers  

University College London, University of Edinburgh, University of Dundee, University of Liverpool, London School of Hygiene and Tropical Medicine

This project will develop and implement a data-driven framework for defining rare diseases in nationwide datasets and richer hospital data. This will be used to chart an atlas of rare diseases in the UK, including comparative estimates of disease burden to complement current single disease approaches. This project will catalyse a strategic focus on health data infrastructure for rare diseases to support the UK Rare Diseases Framework Priorities and improve the lives of those affected and their families.