A worldwide drive has been taking place to standardise databank infrastructure so researchers can carry out swift, large-scale international studies to tackle COVID-19 and other public health challenges. A recent project has ensured that the UK Biobank – which holds immensely rich data about more than 500,000 people – is accessible for researchers carrying out the kind of work that is crucial in shaping healthcare policies and responses to emergencies.

The Challenge

The COVID-19 pandemic has demonstrated the importance of international federated analyses of healthcare data. However, the databases created by different countries and organisations, are very different from one another. This makes it time-consuming and resource-intensive to make use of data from multiple sources.

Calls were made by the IMI European Health Data Evidence Network (EHDEN) project, to improve the situation by converting datasets to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). CDMs, which standardise databases so lots of different applications can make use of their data, are essential for fast and accurate international federated research.

A team involving HDR UK affiliated researchers such as Spiros Denaxas, Richard J B Dobson, Folkert W Asselbergs and Vaclav Papez, took on the challenge of converting the UK Biobank (UKB) to the OMOP CDM.

This is a potentially valuable contribution to international research as UKB is an immensely holds genetic, phenotypic and other data on more than 500,000 volunteers.

The biggest of the many challenges the team faced was to translate the many coding systems used in the source data.


The team’s methods and approach, outlined in Transforming and evaluating the UK Biobank to the OMOP Common Data Model for COVID-19 research and beyond (published in JAMIA, October 2022, allowed them to convert UKB to OMOP CDM v5.3. This involved transforming research data as well as electronic health records (EHRs) from primary care, hospitalisations, cancer registrations, and mortality information from providers in England, Scotland, and Wales.

The scale of the task is indicated by the fact that the team processed primary care data about 212,828,306 clinical events from EMIS and 133,092,016 from TPP. Overall some 1.3 billion rows of data were processed and converted to the OMOP CDM.


Converting UKB brings many benefits. The OMOP CDM is an open community data standard renowned for facilitating efficient analyses that produce reliable results.

Approved research projects can now simply provide UKB with straightforward scripts. UKB can then rapidly provide answers to the research question – a safe and secure approach which ensures confidentiality by never releasing the actual patient data.

UKB has been assisting two international federated studies thanks to the conversion and a third is planned.

Dr Papez, Research Associate in Clinical Epidemiology at UCL Institute of Health Informatics, and joint lead author (together with Maxim Moinat) of the JAMIA paper, said: “It was time-consuming to create the transformation of the original data to the OMOP CDM. But now we can easily and quickly contribute to new studies. And because we have a standardised form we don’t have the difficulty of shaping scripts to our needs.”

What the Impact Committee said:

The committee selected the paper for the value of its high research excellence and contribution to applied analytics. They highlighted the project’s significance in making a framework for data across different centres and places.


Email v.papez@ucl.ac.uk.