What do the words ‘free text’ mean to you?
You could be forgiven for thinking about tempting offers on your latest mobile phone data plan. However, in the world of health data science, so-called ‘free text’ refers to unstructured data or ‘natural language’ – words, phrases and sentences.
Not just numbers
Not everything that is important to patient health and care is (or can be) recorded using numbers, quantitative measures or clinical codes. The free text found in notes, letters and reports within medical records provides a rich narrative of the context, symptoms and outcomes surrounding patient care.
But in what circumstances would you be happy with sharing the contents of your doctor’s notes with researchers? That was the big question we posed at a recent public event on using free text for research purposes, held at the Alan Turing Institute’s headquarters in London.
Big opportunities, big responsibility
Interest in free text has accelerated rapidly in recent years with advances in natural language processing techniques. As a HDR UK Fellow with a project that focuses on natural language processing, I’m interested in its potential to unlock greater insights into health, care and patient experience.
Yet, it’s not all plain sailing.
Free text can contain sensitive and personal data. Names, places and situations that could be used to re-identify a person must be stripped out before analysis to protect confidentiality. Whilst machines and algorithms can do this task well, it’s still not done perfectly and, in some cases, may need to be done manually.
Changes in data protection law and increasing concerns about privacy in recent years mean that there are barriers to accessing free text, with some data providers revoking access completely. Yet, we know surprisingly little about public opinion in this area.
No drama, just public engagement
When colleague Dr Elizabeth Ford (Brighton and Sussex Medical School) invited me and others from the governance working group of the UK Healthcare Text Analytics Network (Healtex) to run an event for the public about free text data, we knew getting feedback on such technical concepts could be a challenge.
Having conducted a citizens’ jury in Brighton in 2018 on free text data, this would not be our first time talking to the public about this topic; however, on this occasion we had to get the audience up to speed in a much shorter space of time.
We decided to get creative, acting out a consultation between a GP and a patient (complete with sound effects) to see how free text might be used to record significant details, not typically covered by clinical codes. Delivered in the first session of the day, this provided a common reference point presentations that followed on the benefits, risks and procedures associated with using free text for research purposes.
The afternoon session kicked off with a Q&A with a line-up of esteemed panellists. Speakers from NHS England, the Office for the National Data Guardian, useMYdata and Understanding Patient Data all gave their expert opinions on the proceedings, giving due consideration to the benefits and risks of using free text.
Group work followed, encouraging delegates – which included members of useMYData and HDR UK’s public advisory board – to join in discussions about recommendations for conducting research and public engagement. This yielded some key learnings:
- Researchers need to be prepared to articulate the benefits and risks of using free text and demonstrate how richer data can lead to greater insights and health benefits.
- Conversations about free text needed to be embedded in more general discussions about health data for research, leveraging existing resources (e.g. by Understanding Patient Data and HDR UK).
- Researchers, patients, healthcare professionals, communications specialists and data governance experts need to work together to map models of access and find mutually acceptable ways of working.
Armed with growing knowledge of public views, our next steps are to finalise a series of recommendations for the governance of free text research in future. These will be presented to the free text research community at Healtex’s annual conference in Cardiff this April and we’ll continue to work with HDR UK, UseMYdata and other researcher and patient networks to disseminate widely and provide opportunities for comment and development.
Dr Lamiece Hassan is a Fellow with/receives funding from HDR UK.
Dr Elizabeth Ford (no funding from HDR UK) got funding for this event from the Alan Turing Institute’s Event & Engagement Funding and Healtex.
Link to agenda and line up for the Turing event: https://www.turing.ac.uk/events/sharing-your-healthcare-data-safely
To join the Healtex community, visit: http://healtex.org/apply/
This is a guest blog for HDR UK and reflects the interests/knowledge of its author. Our blogs are designed to stimulate debate and are not necessarily reflective of HDR UK’s opinion
Evaluation of antithrombotic use and COVID-19 outcomes in a nationwide atrial fibrillation cohort
20 May 2022
Our Open Access Publication of the Month, May: Selected by the Impact Committee for its research excellence
Clinical trials Day 2022: How data can make trials faster, more efficient and better at improving care
20 May 2022
We explore how routinely collected health data may hold the key to super-charging how clinical trials are delivered, and in turn, how quickly we can bring improvements to people's lives.