The promise of AI for health data research
2 February 2021
The 2020 UK Health Data Research Alliance Symposium tackled some of the biggest topics in health data research. Panellists Jim Hendler, Cian Hughes and Laura Robinson explored the promise of using artificial intelligence (AI) for health data research at scale.
Today’s data scientists can do things using machine learning and artificial intelligence (AI) that they could barely have dreamed of in the past. Today’s supercomputers can crunch through vast swathes of health data to gain insights into diseases faster than ever before.
Crisis often spurs innovation, and COVID-19 has been no exception. At the start of the pandemic, companies and universities agreed to share their supercomputers for research, making available a mind-boggling processing power of around 600,000 petaflops (millions of times that of the average PC).
What’s more, there are now more types of human health data available at a scale that would have been unimaginable just a few years ago.
“Through COVID, what we’ve witnessed from a digital transformation is unprecedented. Over three to four months we saw innovation that would typically have taken us four to five years,” remarked symposium panelist Laura Robinson, Health and Life Sciences Lead at Microsoft.
Using AI to unlock the power of health data
The challenge for data scientists now is figuring out how best to harness these rich datasets, impressive computing powers, and machine learning capabilities to unlock the benefits of health data research for patients and the public.
Cian Hughes, Informatics Lead at Google Health Research, works with a type of deep learning algorithm inspired by the patterns of cell-to-cell connections found in brains.
He told symposium attendees how he has taken advantage of cloud computing to scale up his research. That’s been critical for collaborative Health Data Research UK (HDR UK) projects he;’s working on with INSIGHT, the Health Data Research Hub for Eye Health.
He explained, “We’re trying to pull together imaging and non-imaging data about patients in a linked pattern…that enables researchers to answer not only eye health questions but to look for signs of non-eye disease within images of the eye. For example, cardiovascular or neurological diseases that are visible in the eye that researchers could build algorithms to detect, based on the datasets that we’re pulling together as part of INSIGHT.”
Data diversity is essential
AI and machine learning applications are ‘data hungry’, requiring huge, high quality datasets that fairly represent the population – something that the HDR UK Alliance is working hard to achieve.
To make them good at the job they’re designed for, algorithms – essentially a set of rules – need to be ‘trained’ by learning from many data sets, getting better at their job with experience. But just like people, machines can make mistakes – so how do we correct them?
“When the algorithm I’m creating fails on a patient’s case, I want to go through every slice of that CT scan to see it with my own eyes and understand where that neural network has got it wrong… But what do I do if I can’t see that failure or share it with the community because it’s confidential medical information?” Cian said.
What’s needed, Cian suggests, is more publicly available data, so that the data research community can learn from it and see how to deal with failure.
Microsoft recently published research that benchmarked its deep-learning augmented radiotherapy analysis tool against archived cancer imaging data that had been made public for the advancement of science. Cian hopes to see more diverse examples of public datasets like that made available in the future.
“There are specific examples, let’s say Asian patients with predispositions to certain cancers of the nose, where if we don’t have publicly available examples of those patients and those cancers then we can inadvertently train algorithms and do research which will underperform in those sections of society,” Cian added.
Such work depends first and foremost on the trust of the people whose data is being used in this way. The panel agreed that being open and honest about what patient data is used for, as well as the fallibility of both researchers and computers, is the best way to maintain their support.
“Patients and the public are very understanding when humans make genuine mistakes, and they are similarly understanding when algorithms make mistakes,” said Cian.
There is much excitement about AI research, and panelist Jim Hendler, Director and Head of Computer Science, Rensselaer Polytechnic Institute, said that it is important to separate the ‘hype’ from publicity on the benefits of health data research.
For example, he pointed out how the new COVID-19 vaccines that have been developed in just one year, when previously it would have taken five or ten, all have supercomputing, AI and high data resources to thank for their speedy development.
“The role of the computation needs to be explained as part of this much larger ecosystem,” he added. “It wasn’t that the computer built the vaccine, nor that the humans did it without computers.”
“If the last decade was all about digitising healthcare, the next decade is all about making that data good for something, through the power of computing and machine learning,” added Laura.