New service helps researchers identify relevant cohorts for research
20 May 2026 | Author: Ruth Milne, Communications Manager, Infrastructure and Services
Researchers can now take a faster, more confident first step in their research studies with Cohort Discovery on the Health Data Research Gateway.
For many researchers, the journey from a promising idea to accessing health data can begin with a simple but essential question: does the right data exist to support my research?
Too often, answering this question involves time-consuming emails, fragmented processes, or speculative applications. The Cohort Discovery Service, available through the Health Data Research Gateway, is changing that.
The service enables researchers to quickly assess whether relevant patient cohorts exist across multiple datasets held in different Secure Data Environments (also known as Trusted Research Environments) – providing a clearer, faster starting point for planning studies and supporting more efficient and informed data access requests.
Sangya Pundir, Product Owner for the Cohort Discovery Service, says:
“Cohort Discovery gives researchers a simple way to answer one of the most important early questions in their work: Does the right data exist to support their study? By enabling feasibility checks across multiple datasets in one place, we’re helping researchers move forward with greater confidence and submit more informed data access requests.”
A smarter starting point for research
Planning research using health data often begins with uncertainty:
- Do the right patients exist?
- Which datasets are relevant?
- Who should you contact?
The Cohort Discovery Service is designed to answer these questions early, enabling researchers to assess feasibility in one place before progressing further.
The service allows researchers to run a single query across multiple datasets and receive near real-time insights into cohort availability. For example, a researcher could search for “female asthma patients aged under 35”, and Cohort Discovery will query multiple pseudonymised datasets to return an aggregated count of how many individuals match those criteria in each dataset. This helps researchers quickly understand whether a suitable study cohort exists, without needing to contact each data custodian individually.

Importantly, these early insights are delivered without accessing identifiable data, enabling researchers to identify potential cohorts safely while reducing speculative requests and supporting a more efficient journey from research question to data access.
Peter Harrison, Interim Chief Technology Officer at (HDR UK), says:
“Improving how researchers discover and access data is essential to accelerating research that benefits people’s health. Cohort Discovery helps remove uncertainty at the earliest stage, supporting a more efficient and responsible journey from research idea to data access and ultimately enabling valuable research to happen faster.”
Built with privacy at its core
Protecting patient privacy is fundamental to the service. Researchers never see patient-level data. Cohort Discovery queries are run on pseudonymised data, and only aggregated totals of patient numbers available in searchable datasets are returned. Patient counts are rounded, and small numbers are suppressed to reduce any risk of re-identification.
This allows researchers to safely assess whether relevant cohorts exist before deciding which datasets to pursue and applying to data custodians for access. The approach supports responsible data use and gives data custodians confidence that queries are secure and protect underlying patient data. The federated capability is delivered through the Bunny application that sits inside each custodian’s SDE/TRE, developed by one of Cohort Discovery’s technology partners, the University of Nottingham.
Improving the research ecosystem
By introducing a structured way to assess the usefulness of potential cohorts early into the research journey, the Cohort Discovery Service benefits both researchers and data custodians.
Researchers can validate their ideas earlier and with greater confidence, helping them focus their efforts where it matters most – on the research. By assessing feasibility across multiple datasets with a single query, they can quickly understand whether the right patient cohorts exist, identify the most relevant data custodians to engage, and reduce uncertainty before investing time a full application. This leads to more informed data access requests and a faster path to funded research.
At the same time, data custodians benefit from a more streamlined and efficient approach to managing demand. By enabling early feasibility checks, they receive more targeted, higher-quality data access requests, reducing the volume of speculative enquiries and the associated administrative burden. With robust privacy and governance safeguards in place, custodians can support responsible data use with confidence, while increasing the visibility and impact of their datasets within the research community.
The result is a more efficient research process across the UK health data ecosystem, enabling valuable studies progress faster.
Get started
Researchers can begin exploring potential cohorts straight away through Cohort Discovery on the Gateway (access approval required).
Data custodians can participate by onboarding their datasets and enabling cohort searches across them, helping to improve the quality and efficiency of research engagement. Visit the Support Centre to find out how to get started.
About the Cohort Discovery Service
Cohort Discovery helps researchers quickly assess whether relevant patient cohorts exist across multiple datasets, without accessing identifiable data. This enables a clearer and faster starting point for planning studies and supporting more efficient and informed data access requests.
Available through the Health Data Research Gateway, the service is supported by a federated analytics ecosystem that enables queries to run securely within Secure Data Environments (Trusted Research Environments) using tools like Bunny (developed by the University of Nottingham as part of the HDR UK Federated Analytics programme) and data pre-processing and transformation support from the Health Informatics Centre (HIC) at the University of Dundee.
The Cohort Discovery Service is developed and maintained by HDR UK and builds on earlier work from the CO-CONNECT programme, which enabled researchers to rapidly discover and access COVID-19 data while ensuring patient information remained private and secure.