The recent posting of UK Biobank data for sale on the Chinese online marketplace Alibaba points to a gap in control of people’s sensitive health data. As UK ministers noted in parliament, the data were accessed legitimately by accredited institutions and then misused, rather than stolen.

Public confidence in health research is hard won and easily lost. For patients and the public, confidence that data is handled consistently and responsibly is a precondition for its use. Even though the data had direct identifiers such as name and date of birth removed, such data can still – potentially – allow reidentification, especially if combined with other data. People who contributed their data to UK Biobank – and the many millions whose data sits in similar systems – did so on the understanding that it would be properly protected.

Thankfully, approaches for managing the risks in accessing data for research already exist. Typically sensitive data is held in secure environments applying strong safeguards to established standards.

What is required now is independent oversight and accreditation for best practice, coupled with continued development of the infrastructure to handle the latest research methods with such large-scale datasets. Consistent governance is what makes large-scale health research worthy of public trust.

Necessary controls

The Five Safes framework provides a structured way to balance access to data for research and the necessary protection. Developed by the Office for National Statistics in 2003, the framework is widely adopted nationally and internationally. Together, these controls underpin the use of sensitive data for research.

  • Safe people: only trained, accredited researchers access data
  • Safe projects: data use is appropriate, ethical and delivers public benefit
  • Safe data: identifiability is reduced through data minimisation and removal of direct identifiers like name, address and NHS number
  • Safe settings: data access happens in secure, controlled environments
  • Safe outputs: results are checked before release to prevent disclosure of personal information

UK Biobank is an extraordinary scientific resource, with a track record of enabling high-impact discoveries across genomics, epidemiology, and population health. It has been built through significant support, substantial investment and, crucially, the willingness of participants to share their data.

Its controlled data environment, called the Research Analysis Platform (RAP), represented a significant step forward in strengthening ‘safe settings’. However, UK Biobank’s implementation of ‘safe outputs’ has lagged behind best practice.

In most secure data environments for health data, controlled ‘airlock’ processes systematically check outputs before release. While UK Biobank had begun exploring approaches for output checking with sufficient scale and automation, a fully implemented solution was not yet in place. It has cited the scale at which it operates and the technical complexity involved.

Compliance has therefore relied more on contractual obligations for researchers than embedded technical controls in the data environment. As we’ve since seen, we need the technical controls too. And at the necessary scale.

The importance of accreditation

This raises an important question that goes beyond a single organisation: who ensures that the UK’s network of secure health data environments operates to consistent standards?

Around 100 such environments now exist across the UK. They underpin vital research into cancer, dementia, cardiovascular disease and pandemic preparedness. Most apply strict controls, including limits on what data can be taken out by researchers. Others may be more permissive.

This is not primarily a failure of individual organisations. It reflects a gap in how the system is governed. In the absence of mandatory standards, decisions about basic protections are left to each institution. The result is a patchwork, where similar data may be handled very differently depending on where it is stored.

Independent accreditation would provide the necessary assurance that secure data environments meet established benchmarks. This was one of the five recommendations of the 2024 Sudlow review on health data.

Even better, we do not need to start from scratch. Scotland has established accreditation arrangements for its Safe Havens. The UK Statistics Authority (UKSA) already operates an accreditation framework for non-health data under the Digital Economy Act (DEA). In particular SATRE provides a standard architecture for data environments, which was developed through the DARE UK programme by those managing secure environments. The UKSA is exploring how future iterations of DEA accreditation could align with this approach.

Enabling the latest research

Continued technical innovation is also essential. We know from a soon-to-be-published survey carried out by DARE UK, that the vast majority of secure data environments in the UK rely on manual approaches for output checking.

As research increasingly relies on larger datasets and advanced analytical methods (such as training of AI models), our infrastructure must evolve in parallel. This includes scalable approaches to privacy protections and disclosure control as well as more capable secure environments. Investment here is not only about risk mitigation, but about enabling high-quality, timely research and maintaining the UK’s global competitiveness.

Good practice and underlying solutions already exist for safe and secure use of health data. What is missing is system-wide oversight to ensures it is followed consistently. That is the vital step that would help protect the public and support researchers.

Experts including Professor Andrew Morris, Director of HDR UK, respond to news on 23 April of Biobank data being listed on the Chinese marketplace Alibaba.