The SARS-CoV-2 virus, which causes COVID-19, gets into our body by binding with a molecular ‘gateway’ on the surface of our cells called ACE2. Because of this entry mechanism, some doctors suspected that common drugs called ACE inhibitors, which increase the amount of ACE2 on the surface of cells, might put people at increased risk from COVID.

ACE inhibitors are widely prescribed for conditions including high blood pressure, heart failure and kidney disease, potentially putting a significant proportion of the population at increased risk from coronavirus. At the same time, stopping taking them could also have significant health consequences.

“A lot of people are on ACE inhibitors, and some were asking if they should stop taking them during the pandemic,” says Health Data Research UK associate researcher Professor Richard Dobson, from King’s College London, who led the study. “So we needed to find out quickly whether people were at greater risk from COVID-19 if they continued their medication”.

Richard and his collaborators from the Precision Health Informatics Data Lab turned to health data for answers.

“In March 2020, the patient-led KERRI research committee of King’s College Hospital approved the use of deidentified patient data in the electronic health records for COVID-19 research because doctors needed to understand how to treat patients with this new disease urgently”, said Dr James Teo, a consultant from Kings College Hospital, who worked with Richard on the project. 

“We looked at anonymised electronic health records from people that were hospitalised with COVID using a system called CogStack, which is an information retrieval and extraction platform that allows us to securely search anonymised electronic health records in near real-time,” says Richard. “We investigated whether people that were taking ACE inhibitors were more likely to be admitted to intensive care or die compared with other patients.”

In theory, this sounds simple: just look up which patients are taking ACE inhibitors and compare their outcomes with people who aren’t taking the medication. But in reality, a lot of the data the researchers required to understand the connection between COVID-19 and ACE inhibitors was hidden in parts of the hospital notes called ‘unstructured health data’.

“When you go to the hospital or the GP, you will notice that the doctor spends a lot of time typing things. And most of what they are typing goes into a descriptive text box,” explains Richard. “That unstructured data often contains a lot of subtle, nuanced information that describe your individual circumstances, such as any existing conditions or previously prescribed medications like ACE inhibitors.”

For example, the unstructured text field on a hospital admission note may say: ‘we’ve admitted a 55-year-old hypertensive male,  who regularly takes drugs X, Y, and Z.’ The pertinent bits of information within that text are: this person is male; they have hypertension (high blood pressure), and they are taking three different medications. But how can a researcher extract that information without having to individually read every bit of unstructured data and pull out the significant bits by hand?

To solve this challenge, the researchers used Natural Language Processing, a technique developed by a Health Data Research UK project called the National Text Analytics Resource.

“Natural language processing is all about understanding the language used in text, generating meaning from that language, and then turning it into a form that can be easily analysed by a computer,” Richard explains.

But again, it’s more complicated than it sounds. Natural language processing algorithms need to understand context, complex strings of words and medical concepts. 

For example, they should be able to distinguish whether particular annotations are current (‘Mr Smith has been admitted with hypertension’), historic (‘Mr Smith has previously been diagnosed with hypertension’) or familial (‘Mr Smith’s sister suffers from hypertension’). They also have to deal with variations in structure, vocabulary and writing style between different doctors – and not get tripped up by typos! 

Just as humans learn to read and understand texts as children, practise and learning from experience are crucial to creating an algorithm that can accurately understand and process whatever it is reading. And where better to find written material to practise on than the internet?

“We used data from Wikipedia to pre-train the algorithm in the way that language is used. Then we improved the model using real hospital data,” says Richard. 

Once the algorithm was up and running, it was able to extract the data the researchers needed to analyse whether ACE inhibitors put people at risk of worse outcomes from COVID-19. Their results were good news for patients.

“We found no evidence that people on these medications had worse outcomes from COVID, and as a result, our advice was that patients should continue taking their ACE inhibitor medications.” 

The full results of their study were published in the European Journal of Heart Failure. Richard and his collaborators are now putting their language processing algorithm to work revealing more hidden information about COVID-19 in health data, including looking at the effectiveness and potential improvements for the scoring system that is currently used to predict COVID-19 risk, and investigating the links between ethnicity and outcomes from the disease. 

Health Data Research UK is working to make health data securely and safely accessible for research to improve people’s lives. Find out more at, and follow on Twitter @hdr_uk and LinkedIn.

Read more: