Opinion

Health data research in 2020 – the year of privacy by design

14 January 2020 | Author: Andrew Morris, Director, HDR UK

It has been an immense privilege to work in partnership with colleagues across the United Kingdom and beyond since the establishment of Health Data Research UK 18 months ago.

2019 was a breath-taking year! As it is invidious to pick out individuals, I would like to highlight a handful of significant developments that have been achieved by working in partnerships across an array of communities:

Launch of the £10M BHF Data Science Centre, in partnership with the British Heart Foundation
Award of a ~£9M Wellcome PhD programme in health data science, working in partnership with the Alan Turing Institute
Development of innovative approaches to attribution, public engagement, new career pathways and national team science approaches working with our outstanding Public Advisory Board and Early Career Researcher Committee.
Establishment of the Digital Innovation Hub Programme in partnership with UKRI and the Office for Life Sciences – (i) Formation of the UK Health Data Research Alliance, which now has 27 members since its launch in February 2019 with NHS Digital and HDR UK (ii) seven Health Data Research Hubs launched in October 2019 (a ~£40M programme), (iii) the imminent launch of the UK Health Data Research Innovation Gateway, making up to 400 UK-wide datasets discoverable. This rapid progress has been achieved by working with over 110 academic, NHS, industry organisations, charities and members of the public – and has created a terrific environment for shared learning
HDR UK’s high-quality business operations being awarded ISO 9001 status; one of very few Research Institutes in the UK to achieve this accolade in its very first year.

2019 was also a year where societal concerns regarding access and use of health data have been front-page news, almost on a weekly basis. There are legitimate concerns about the social, economic, and political dimensions of “corporate-clinical alliances” that are beginning to share data to train AI models at scale. For example, Project Nightingale (between Google and Ascension) raised concerns that patient data transfer between the US healthcare provider Ascension and Google was identifiable. Likewise in the UK, Italy, Singapore and many other countries the ethics of potentially lucrative “data partnerships“ have been challenged. There is a lack of clarity about benefit-share and value-exchange which the public expect.

Another challenge exemplified in 2019 is that making data private and keeping it private is always going to be an evolving field. Evidence emerged from Zak Kohane’s lab in Boston in January 2019 that by linking completely de-identified EHR data sets, re-identification was possible. Likewise, a report from the Mayo Clinic in the New England Journal in October demonstrated it was possible to re-create reconstructions of patients faces using de-identified MRI images which could then be identified using commercially available facial recognition systems.

As a first step, we are working in partnership with UK Health Data Research Alliance partners (including NHS colleagues across the four nations), to establish best practice by bringing together working groups, and when needed publish “white papers“ of principles to support the development of a trustworthy research-enabled health data ecosystem in the UK. We are looking at key issues that will be developed further this year, including:

Data Quality – how do we apply standards for quality of data and metadata, test and implement tools for evaluating data quality and share best practice for curation?

Gateway Design – how do we establish, enhance and encourage the use of the Gateway to provide safe and secure discovery and access to the UK’s health data?

Governance Infrastructure – how can we work across institutional boundaries to allow access to linked, large, federated, Alliance member research datasets?

One such white paper that we are exploring is asking the question – what are the characteristics of trusted research environments for health data research? It borrows a model from the UK Data Service, which has been adopted by the Office of National Statistics – which summarises the five safes – safe people; safe projects; safe settings; safe outputs; safe data. This is a model we are exploring with partners, whilst adding a sixth safe, safe cloud! Since the original definition was developed, a new issue has become important which is not explicitly covered by the five safes which needs to be addressed to maintain public trust. This is the use of public cloud provision for all or part of the safe setting – we will share more details on this as it develops.

I therefore wish you a very happy New Year. I would like 2020 to be known as the year when we work in partnership to develop a well-engineered, consensual, safe and transparent system for the use of health data for research and innovation that is in the public good. This will require clarity around value exchange, data access and knowing how data is being used, by whom and for what purposes. This will require collaboration between colleagues from fields such as ethics, the law, NHS, sociology and anthropology, the technology sector, patients and the public. It is a grand and ambitious challenge, but the opportunity is too great to squander.

As Osler, the famous Canadian physician who died 100 years ago said: “The best preparation for tomorrow, is to do today’s work superbly well.”

The Hubs

News

Opinion