As a scientist, I’m used to questioning things.

Throughout my career, I’ve seen some amazing advances in medicine, especially in precision medicine and I really believe we’re now working our way to a world in which no patient gets left behind. However, excited as I am by this, there are still questions that we’re yet to answer.

These are pretty fundamental questions around the data being used to power new medical advances, innovations and treatments. Unfortunately, this data is flawed. And I don’t mean a little bit flawed.

The data on which we’re basing innovation in our healthcare system is not reflective of the global population. Not even nearly. For example, 81% of participants in genome-wide association studies are of European descent. Whilst efforts are increasing to diversify the global genome data pool, more is needed, and fast!

How can we possibly develop products to help a global population if our starting point is data from only those of western origin? Your age, ethnicity, gender or socioeconomic status can all strongly influence the way you respond to treatment. Extrapolating data from a homogenous group will therefore not provide solutions for a diverse society.

How can we accept a world where drugs are approved and registered for use based on data generated from a select few? This is seen in the example of multiple myeloma pivotal studies supporting FDA registration. Trials were conducted largely in Caucasian male population, even though it is an established fact that multiple myeloma is highly prevalent in black men. Do you see the problem?

And this goes beyond backgrounds and ethnicities. The underlying misperceptions that coronary heart disease and heart attack is a man’s disease have resulted in women being 50% more likely than men to get the wrong initial diagnosis after a heart attack. These inequalities in the diagnosis and treatment of coronary heart failure are costing lives.

It’s not all bad news: we are moving, but we’re not moving fast enough. The disproportionate impact of COVID-19 on the elderly and BAME groups has forced the issue into mainstream debate, certainly in the UK, yet we still need to include diverse populations in clinical trials in order to understand the safety and efficacy of vaccines and treatments in the groups hardest hit by the virus.

This is about more than just good scientific practice. This is about ethics and it’s about combatting inequalities in medical treatment and care.

That’s why I worked to establish the Data Diversity Initiative (DDI) at BenevolentAI. When we began this work, a biggest challenge was in raising awareness. Whilst most people are aware that research isn’t representative of the global population, their jaws often drop when they hear about the scale of the problem.

Thankfully, once they have lifted their jaw back up, their reaction is almost always, ‘Right, so what do we do about it?’

As I see it, there are two main things I personally can do to tackle this problem. While I might not be able to upend the system on my own – much as I would like to – I believe that if we each identify areas we can chip away at, together we can make a real difference.

The first is using my experience and understanding of clinical trials to try and improve representation and make them more accessible to the underserved. Have you ever wondered how easy it is for a disabled person to take part in a clinical trial for example?

The answer to that is not to exclude them: it’s to change the system so they can take part, even if this means taking the research to them. I have been involved in the NIHR’s Include initiative which is working to tackle the challenge of inclusion of the underserved population in research in the UK and various patient society groups to make clinical trials more accessible. I have contributed to the development of the Trial Forge’s Include Ethnicity Framework, helping trial teams consider which ethnic groups should be included in their research work.

Secondly, at BenevolentAI, with advanced technologies paving the way to new treatments, I see it as part of my role to continually challenge our internal team to ask, what is your strategy for data diversity? Are you thinking about it? How can you do better? BenevolentAI is an incredible platform for this. As well as the DDI, we were able to develop an open-source Diversity Analysis Tool to help scientists across the world understand the diversity of the data they are using.

Tackling this problem won’t always be easy. But that doesn’t mean it’s not our responsbility.

This issue is something I’ve been aware of throughout my career, but it wasn’t until I saw the potential of AI in drug discovery, did the penny drop for me. We have to take action now and I feel empowered to do something about it that I really feel I ‘found my why’, my purpose, and the difference I wanted to make in the world.

I believe we have the solution to improve diversity in data. It lies in us working together, talking about the topic, and weaving the issue into the fabric of our organisations, as we have done at BenevolentAI. I’m going to keep working to build awareness and make positive changes – however small – until we really do get to a world in which no patient, of any ethnicity, gets left behind.