Overview
Being able to link addresses across systems offers a valuable resource for health data science. However, they are often not standardised despite a government push towards this. Researchers at the Clinical Effectiveness Group (CEG) at Queen Mary University of London, supported by HDR UK and Endeavour Health, have developed an algorithm to accurately match addresses from health records to a reference database, providing a powerful tool for future work.
The challenge
Health data researchers are increasingly linking data from different sources to draw new insights from the connections. This includes location information to allow, for instance, linking health records to environmental data. To do this, the non-standardised addresses in health records need to be linked to reference databases that contain unique property reference numbers (UPRNs) and coordinates.
Address-matching algorithms are available in the UK, but these are not transparent or have not been tested against patient-recorded addresses in electronic health records. Linking places to people is a key part of the Government’s policy to connect databases together and improve lives.
The solution
Researchers supported by HDR UK and Endeavour Health Charity have developed a new algorithm specifically designed for working with electronic health records to match the registered addresses to UPRNs. The algorithm, called ASSIGN, was tested using gold-standard datasets from London and Wales, each containing over 9,000 addresses. The team then applied the ASSIGN algorithm to the recorded addresses of a sample of 1,700,000 adults registered with all general practices in northeast London.
The aim was to transparently carry out quality assurance and examine potential biases in matching, using multivariable analyses to estimate the likelihood of a match by demographic, registration and organisational variables.
Impact and outcomes
The researchers found that ASSIGN had at least a 99.5% match rate in the gold standard datasets and 98.6% for the northeast London study population. The 1.4% without a UPRN match were more likely to have changed registered address in the last 12 months, be from a Chinese ethnic background or registered with a GP using the SystmOne clinical record system. People who were registered for more than 6.5 years with their GP were more likely to have a match than those who had registered more recently.
The paper describing ASSIGN was published in the International Journal of Population Data Science and the algorithm code is available open source for others to use freely. The work done to identify the address-matching algorithm accuracy and biases will support the use of UPRNs in electronic health records and potentially in other sectors also. It builds on the pioneering work of Welsh HDRUK colleagues who more than a decade ago proposed the use of Residential Anonymised Linkage Fields to enable evaluation of the wider determinants of health. ASSIGN is to be implemented in both Wales and Scotland Trusted Research Environments to provide a uniform way of assigning UPRNs when doing collaborative work with multi-nation addresses.
Impact committee
The impact committee thought that this was impressive research. The incredible accuracy of the algorithm and the thoroughness of validation in a large population means that ASSIGN is likely to have a significant impact on research and, ultimately, patients.
Read the full paper