As the name suggests, health data research relies on researchers having access to health datasets. But it’s essential that this is done in a safe, secure way, with safeguards and controls in place to make sure that people’s privacy is protected and confidential information doesn’t fall into the wrong hands.

One of the best ways to ensure this is through Trusted Research Environments (TREs). These enable vetted researchers to securely access and analyse anonymised health datasets in a safe environment.

The SAIL databank in Swansea is a leading TRE, providing secure access to anonymised health data via a remote gateway. This allows approved researchers to log in and conduct their analyses, taking away only their results without ever receiving a copy of the data itself.

Unlocking insights from the COVID Symptom Study

Just one of the many datasets that are available to researchers via SAIL comes from the COVID Symptom Study app, which asks users to log their health on a daily basis through the COVID-19 pandemic. This was created in March by researchers at King College London led by HDR UK Associated Researcher Professor Tim Spector, working together with the health science company ZOE, and now has more than 4 million users across the UK.

The data collected through the app has underpinned an impressive amount of scientific research and insights into this new disease. For example, data from the app has identified new symptoms of COVID-19, such as loss of smell and skin rash, and has been used to develop an algorithm that can tell whether someone is likely to have COVID-19 in the absence of a test.

However, there is so much more to learn from this huge dataset, which contains more than 170 million health reports and is growing every day. To aid this scientific effort, anonymised data from the app is being fed into SAIL, so that health data researchers across the UK can conduct their own analyses and gather even more insights.

“Working together through the HDR UK community allows us to make the data available to other researchers so we can access different skills, and produce different analytics, providing maximum benefit from the data available,” explains Dr Richard Fry, part of the Health Data Research UK (HDR UK) Wales and Northern Ireland team at Swansea University, who works closely with SAIL.

Mapping millions of datapoints

The COVID Symptom Study provides maps of COVID-19 prevalence by county, based on the number of people predicted to have COVID-19 based on symptoms and reported test results. But known risk factors for COVID-19 like social deprivation or ethnicity often vary over much smaller areas within towns and cities.

In order to target testing most efficiently and implement effective local controls to slow the spread of COVD-19, Richard and his collaborators knew they needed to pinpoint virus hotspots more accurately.

“We securely accessed the COVID Symptom Study app data through SAIL and used sophisticated mapping methods to create predictions of COVID-19 prevalence across the UK at high resolution in near real-time,” Richard explains.

The team were able to map the likely number of cases over areas covering an average of 1500 people (1000 to 3000 people). These are smaller areas with more people in urban environments and bigger areas in rural locations, to ensure that it’s not possible to identify individual homes.

“We also produced confidence intervals around these predictions. So in each area, we have a predicted COVID-19 prevalence, and an indication of how reliable that prediction is,” he says.  

Tracking how COVID-19 spreads

Published as a preprint in August, the models produced by Richard and the team show us how the disease can move across communities.

“We’ve seen in recent months that the spread of COVID is very localised. It’s a disease of proximity, and the interactions we have within a local community are what drives the spread of the virus,” says Richard. 

He explains that when a disease hotspot forms, it can spread into the surrounding geographic areas over time if left unchecked, sometimes very quickly, so it’s important to get this detailed local data to keep track of what is happening ‘on the ground’.

The team’s analysis of the COVID Symptom Study data has been sent to the Scottish and Welsh governments on a weekly basis to help inform their pandemic responses.

“The COVID Symptom Study data is based on self-reported data and subsequent predictions which have some limitations, but it can give us an idea of community prevalence and may provide an early indicator of where hotspots are forming before they are picked up by testing,” says Richard.

He notes that a strength of the data from the COVID symptom study is that it has been available since March at a UK-wide level with consistent levels of coverage, which has enabled the team to produce weekly updates of the mapped outputs at a local level for the whole of the UK since April.

Richard believes that the datasets held within SAIL – including the data from the COVID Symptom Study – have been instrumental in supporting the Welsh response to the virus.

“We’ve got all this new COVID-19 data coming in, but we also have lots of contextual, historical data about the population of Wales from primary care data to hospital admissions,” he says. “This enables us to look at other risk factors which may be influencing people’s outcomes, adapt the geospatial modelling and inform policy accordingly.”

Find out more: