Health Data Research UK recently announced an exciting new partnership with the national supercomputing facilities as a part of wider funding led by the Engineering and Physical Sciences Research Council. But how will this help to enable discoveries that improve people’s lives?

HDR UK is contributing to an upgrade to the Cambridge Service for Data Driven Discovery (CSD3). This will create one of the most powerful academic supercomputers in the UK. Although it is located in Cambridge, CSD3 is very much a national resource. It is part of a network of eight UK high performance computing facilities available to researchers from academic institutions and industry. And we’re planning to make the most of it for HDR UK.

Our Understanding the Causes of Disease Science Priority seeks to redefine diseases at a biological level. Traditionally, diseases were defined by symptoms relating to one organ or part of the body e.g. heart disease or asthma. But different diseases can have similar symptoms but completely different causes, and vice versa, meaning treatment based on symptoms can be hit and miss.

New technologies are increasing our ability to measure the huge numbers of molecules present in blood and other tissues and to read the genes that provide the basic ‘instruction book’ for our bodies’ development and function. By combining these measurements with information about people’s health and lifestyle, we can study how the combination of our biological make-up and lifestyle affects our health. This lets us identify the mechanisms by which our genes and our environment lead to disease. By focusing on the cause and not just the symptoms we will improve prevention, diagnosis and treatment.

Uncovering these interactions requires measuring and then finding the links between enormous numbers of genetic variants, thousands of proteins and other molecules, and health events. In one study we compared expression of over 3,000 proteins with over 10,000,000 genetic variants, each measured in over 3,000 healthy volunteers. This is where a supercomputer comes in. Running that analysis on the current version of the CSD3 supercomputer took several days even on a supercomputer. It would have taken far too long to be feasible without access to this sort of computing power. And this is just the start of what we need to do. We plan to look at more molecular measurements in more volunteers, and the supercomputing capacity required will need to go up with these increases in participants and measurements. This new upgrade to the CSD3 will allow us to do this.

We also need to add more details about the health outcomes of volunteers into the analysis. To do this, we need the right security in place, and that’s the other advantage that CSD3 brings. They are building a Trustworthy Research Environment that meets international and NHS security standards on top of their supercomputing platform. That will give us the best of both worlds – huge processing power, combined with the confidence that any health data stored on the platform is safe and is only accessible to the authorised researchers.

Access to this resource will accelerate our analyses of different layers of molecular and health data. We hope to find new targets for medicines development, identify where existing treatments for one disease may help others and spot potential side effects of new drugs. Part of the excitement is that we don’t know what we might discover, but we do know that a better understanding of the molecular basis of disease will make the process of developing new treatments faster and more efficient, and will help us get the right treatments to the right people.

CSD3 provides the UK with a world leading resource for this type of work. Together with the generous volunteers who provide samples and give permission for us to study their health records, it gives us the opportunity to change our understanding of disease and therefore improve prevention, diagnosis and treatment for millions of people around the world.