Why is synthetic data potentially important?
Synthetic datasets offer the potential to speed up access to UK healthcare datasets – a key aspect towards uniting health datasets to enable discoveries that improve people’s lives. Synthetic datasets:
- Could facilitate easier access to data for those who are generating hypotheses, developing tools, or benchmarking analysis methods.
- Could prepare researchers for the practical challenges of working with national clinical datasets.
- Can pilot data from synthetic datasets and would strengthen researchers’ applications when they apply for access to real clinical datasets.
What next steps are needed to advance the generation of synthetic healthcare datasets?
To address these challenges, it will be important to bring together a diverse group of stakeholders to collaborate on a driver project for a synthetically generated national healthcare dataset. In particular this should include data architects, methods developers, and clinical scientists; all supported by an active public and patient involvement group. Such a dataset could be hosted on HDR UK’s Innovation Gateway, which can then support a variety of uses including: academic use; teaching & training; benchmarking of methods; and commercial use (algorithm development)
HDR UK held an initial workshop on 9th December 2020 to understand the current UK Health Data landscape regarding synthetic data activity. We had presentations from data custodians, industry specialists, and academics working in both medicine and computer science.