All components are available as open source software under an Apache 2 license; see the CogStack Github repository.
CogStack represents a major advance in the capacity to extract and analyse unstructured data from electronic health records (EHRs). It uses a range of technologies to support modern open source healthcare analytics, including NLP, within the NHS. CogStack has opened up huge, rich sources of previously inaccessible of underused information for researchers and is having a clear impact within the health service.
Much NHS patient data is recorded in an unstructured form that is descriptive. It is a vast asset but difficult to collate and interpret. Overcoming this challenge offers the opportunity to quickly and accurately answer an enormous variety of healthcare-related questions.
CogStack has been developed (with support from HDR UK) by a team at NIHR Maudsley BRC and PhiDataLab led by Richard Dobson – an HDR UK associated researcher and Professor of Health Informatics at KCL and UCL with clinical lead Dr James Teo, Clinical Director of Data and Ai at KCH. It uses best-of-breed enterprise search, natural language processing (NLP), analytics and visualisation technologies.
Advanced NLP technology allows it to read and understand unstructured records, such as clinicians’ case notes.
CogStack has created one of the largest NHS trained language models trained and is system agnostic, so can be widely used. It operates in near real-time and is user-friendly, allowing data to be accessed by asking straightforward questions.
It can be used for everything from large-scale research and business intelligence to planning patient personalised care.
Spread and use
The system has been tested at four London NHS Foundation Trusts King’s College Hospital NHS Foundation Trust and South London and Maudsley NHS Foundation Trust (SLaM), Guy’s and St Thomas’ NHS Foundation Trust and University College London Hospitals NHS Foundation Trust.
It is being adopted by trusts in Manchester and East Anglia and internationally. Funding of £16m has been awarded by the Office for Life Sciences (OLS) for dissemination to 11 further Trusts.
Impact and outcomes
CogStack is allowing the NHS to implement new data mining techniques. Some 12m free text documents and over 250m diagnostic results and reports have already been processed at KCH alone.
It has been used for successful projects including:
- Large-scale research (BMC Medicine, January 2021) investigating the effectiveness of the NEWS2 hospital early warning score system to predict 14-day outcomes for the most seriously ill COVID-19 patients.
- Large-scale research showing that COVID-19 patients prescribed ACE inhibitors or ARBs are not more likely to be severely ill.
- Improving the safety of prescribing Methotrexate for rheumatology patients.
- Improving the use of the antipsychotic medication Clozapine.
- Identifying outpatient orthopaedic procedures missed by manual coding – leading to annual NHS Trust revenue gain of over £1.25m a year.
Recognition of CogStack
The importance of the platform for delivery of precision medicine was recognised in Professor Dame Sally Davies’ 2016 Chief Medical Officer (CMO) annual report.
Secretary of State for Health and Social Care, Matt Hancock MP, highlighted CogStack during his keynote address to the Health Tech Alliance, describing it as a system that “can perform manual coding and data collection tasks in a tenth of the time that it takes a human analyst”.
Accelerating reproducible science through open science and open code
We have brought together over 150 repositories of open standards, data and source code, tackling some of the most important challenges in wrangling multi-modal data and generating replicable insights.
Appendix. SSI Evidence of originality, significance and rigour for software outputs
Visit www.software.ac.uk/REF2021guidance for more info.