This project is funded by the UK Research and Innovation’s Industrial Strategy Challenge Fund (ISCF).



We know that answers to many in-depth healthcare questions can only be found if we look across data for the whole UK. However, most healthcare data is managed locally, and also set up differently to suit the needs of local communities. To make better use of local data we need a map that tells us precisely where to look for data and how to interpret it when we find it. If we have this sort of map then we can use it to link data between localities. This can make it easier and more straightforward to access data, while allowing the people managing different data sets to keep control of how the data in their charge is shared. We can also treat the map itself as data that can be shared to give insights into potential uses of data linkage and to encourage as wide a variety of innovators as possible to build tools that can be used across the data landscape; enriching the data, revealing new knowledge and extending the map.

The Challenge

Health data sets are described or formatted differently by local communities to best suit their own needs, they are both isolated from each other and hard to interpret if you are not part of the community who collected them. The complete picture of a patient or a population, can’t be easily obtained or explored, which prevents many health questions being answered.

The Solution

By creating a map to link up the availability of data across different communities, we will create a solution to the challenges that health data scientists face with separately managed datasets. The map will give us an overarching view of the data available in a selection of real world data sets, held in separate geographical locations, and will tell us precisely where to look for data and how to interpret it when we find it. The data map will be designed so that further data sets can be efficiently added in the future. This will increase the number and depth of health-related questions we, and other researchers, could ask.

Data is collected routinely to support the work of different aspects of health care or research. Information about the same thing might be recorded in slightly different ways depending on how the group collecting the data routinely refer to it. This is a bit like a local dialect, and an example of this would be a routine blood test called ACT, it is also known as Activated Coagulation Time, or Activated Clotting Time but all three names describe the same test. As data is described differently by each community collecting data, a new computer language will be written to act as a ‘translator’ between the separate data collections. This language is called an ‘interlingua’. The map will describe the data from each source using this new common language, allowing related data to be identified regardless of location. The interlingua will describe how local data has been formatted and processed. Approved users will be able to read the map of available data in one viewing place. They will also be able to ask health-related questions using the ‘interlingua’, which through the translation process will indicate whether, and how their questions can be answered.

The mapping process will not change the original local data, and will, by describing it rather than directly linking it, create a non-confidential resource.  The communities who originally gathered the data will still be able to control how much of the described confidential data is visible depending on the permissions given to the data viewer.

Impact and Outcomes

A new data language, or interlingua, is the main outcome from this project. It will be developed to link a set of up to 7 Scottish data sets, but could be used to link diverse data sets from across the UK. This language, and the linking process it supports, could lead to improved access to a virtual UK wide data sets and the exploration of possible health related research questions. The shared, mapped data describes the data rather than sharing the data itself.  It will allow researchers to explore possible health based questions that could be asked without seeing all the data in advance of asking the question which is a current stumbling block.

This project could lead to improvements in healthcare, as researchers would be able to ask questions across multiple datasets; such as identifying groups of patients that might benefit from a treatment approach, or designing a clinical trial that would require multiple locations to make viable.

Removing the need to change local data storing approaches, but still linking them up, data set by data set, could also allow the NHS and health researchers to ask questions previously not possible with current data storing practice. New methods of diagnosis shown to be effective in one area of the country, for example London, could be quickly verified for use by an NHS board in Scotland.

The map itself will link up a larger data set that could become an attractive research tool for those commissioning research projects or testing research approaches from around the world. This in turn could lead to further investment in research which could benefit the UK economy.


Academic/NHS partners: University of Edinburgh, University of Strathclyde, University of Glasgow, University of Aberdeen, University of St Andrews, NHS Education for Scotland Digital Platform, University of Dundee, NHS National Services Scotland

Industry partners: Microsoft Research, Platinum Informatics, Trento


Research Manager: Rebecca Dawson 
Lead Applicant: Professor Dave Robertson