This is an exciting programme of research that brings together data science, with machine learning, health data from wearables, such as ‘fitbit-like’ devices, and the important quality of reproducibility. The overall aim is to better enable this kind of technology to provide trustworthy clinical insights from the enormous amounts of health data out there. Ultimately it will help clinicians to better prevent and treat a wide range of human diseases.

What is machine learning

Machine learning is a type of artificial intelligence that provides computer programs with the ability to automatically learn and improve from experience, without being explicitly programmed. It focuses on the development of computer algorithms that can access data and use it to learn for themselves. This approach offers the potential of using computers to automatically learn clinical insights from vast amounts of complex data. One example of how this might help in healthcare, is developing a computer programme to analyse electronic health records to predict people’s risk of being readmitted to hospital due to having additional community healthcare needs. This could help doctors prioritise patients who might require extra support after being discharged from hospital, so they remain in good health for longer.

Using a new form of health data – from wearables, such as Fitbits

Another exciting aspect of this research is that it makes use of a relatively new form of health data – wearables (i.e. devices such as a Fitbit or Apple watch) – to investigate important health questions that could improve people’s lives. For example, our current understanding of how physical activity and sleep relate to cardiovascular disease is severely limited, as we have relied on asking people how much exercise they do, rather than directly measuring it. The advantage of data from a wearable device, is that it is much more accurate.

Many of us routinely wear such a device, that measures the number of steps we take, the speed we are walking or running, our heart rate, the length and quality of our sleep etc. However, these devices routinely collect millions of data points per person, so the use of machine learning is important to learn new patterns of physical activity behaviours. This project makes use of such data to answer important medical questions.

What is reproducibility and why is it important?

Machine learning has been used to address clinical questions before, however there is room for improvement in using this technique to support clinical decision-making. There have been many reports of scientific studies involving machine learning in health-related research, whose results are difficult, or even impossible, to recreate. This lack of reproducibility, or repeatability, suggests the methods are not reliable. This calls into question whether any clinical decisions supported by machine learning are reliable, and above all trustworthy.

This project

In this project four key challenges will be investigated to help improve the trustworthiness of machine learning in medicine:

  1. How should researchers report machine learning in health data science? First, they will work with international experts to develop a set of guidelines to help healthcare professions report how they actually use machine learning in decision making.
  2. Can synthetic datasets be used to evaluate the stability of machine learning models in health data science? Sharing data along with code can aid reproducibility, but this is often difficult with confidential medical data. They will therefore test if synthetically generated, or mock, datasets can support reproducible machine learning in healthcare.
  3. What are the minimal requirements for reproducibility in restricted ‘safe-haven’ environments? They will also seek to address how reproducible machine learning should be conducted when accessing restricted medical data in ‘safe-haven’ environments.
  4. Can we strengthen the culture of reproducible machine learning within HDR-UK? To help start a culture of reproducible machine learning in UK health data science, they will train the next generation of researchers, initiate a reproducibility ambassador programme across HDR UK and create awards for good practice in reproducibility.

In this way, the current unmet need – to provide confidence that machine learning results are trustworthy – will be addressed, before routine use in public health and clinical practice.

This project is led by Aiden Doherty (HDR UK Oxford).