Author: Dr Rhoswyn Walker, Chief Science Strategy Officer, Health Data Research UK

What is health data science? You would expect this to be an obvious answer for HDR UK, but the diverse disciplines that come together across our research sites means that sometimes the answer is not so clear.  In its simplest terms, we define health data science as a unique blend of maths, statistics, computational science across all of its specialties and domain knowledge – which in our case is, of course, health.  This domain context is vital and making the most of these diverse skills truly requires a team science approach.

Last week I saw the great opportunities that can come from mixing this diverse set of expertise at a workshop focused on building HDR UK’s national approach to defining health and disease.

Professor Cathie Sudlow and Professor Harry Hemingway hosted a high-energy discussion on ‘phenomics’- more on defining this later.  A unifying theme across the HDR UK sites is the use of routine ‘real world’ data for research.  But real-world data is messy, some data is restructured (using coding, but these coding systems change – watch out for ICD11 which is coming soon!) and the majority, currently untapped real-world data is locked-away within unstructured data including free text within electronic medical records and diverse sources of medical imaging data.

Key questions for the day included what is phenomics?  How do we build a national community resource required to integrate and share diverse, digital representations of health data?  How do we build, share and recognise the important (typically non-publication) outputs required to build scalable phenomic resources?  And how will we know when we have succeeded in reproducibly defining health and disease “states, traits and rates”?

We heard diverse perspectives from all four nations across the UK – with universal agreement that partnership with the custodians of health data were vital.  Perspectives from health data domain leaders including UK Biobank, the clinical trial community and current UK disease specific phenomics resources provided important reminders of the existing and complementary phenomics expertise across the UK.  The importance of expert data engineers and technical teams required to develop and implement the methods needed to integrate, curate and validate diverse and complex data will be essential – with exciting new connections made across the genomics and health data engineering communities.

So did we agree a common definition of phenomics?  Nearly! Here is a starting point:

  • A digital representation of health characteristics (from structured and unstructured data using algorithms to derive validated definitions)
  • Of all, or a critical mass of diseases, health states, traits and rates (time is important too!)
  • Relevant to health interventions and decisions.

And why is this important? At HDR UK we are committed to delivering new insights from complex data to deliver outcomes that will help people live longer and healthier lives.  If we continue to develop and derive phenotypes in siloes, the power of data to saves lives will be severely held back.

I look forward to seeing the HDR UK phenomics community build and work together to answer some of these important questions over the next few months.

Photo credit: Haiyan Wu