Day 2 of HDR UK Conference 2025
View the agenda for Day 2 of HDR UK Conference 2025, on Thursday 16 October 2025.
View full agenda as pdf
View agenda for Day 1: Wednesday 15 October
Day 2: Thursday 16 October
08:30 – 09:30 (Alsh 1)
ECR breakfast event (booking only)
08:45 – 09:30 (Hall 1 and 2, Exhibition)
Arrival, tea and coffee
09:30 – 09:45 (Lomond Auditorium)
Title: Welcome address
Speaker: Emily Jefferson, Chief Technology Officer, HDR UK and Interim Director, DARE UK
09:45 – 10:30 (Lomond Auditorium)
Title: Health Data Research Service (panel talk)
Panellists:
- Ming Tang, Interim Chief Digital and Information Officer, and Chief Data and Analytics Officer, at NHS England
- Dr Richard Torbett, Chief Executive at ABPI
Chair: Sir David Sloman, Senior Advisory Consultant at SDMS Advisory Ltd.
10:30 – 11:00 (Hall 1 and 2, Exhibition)
Coffee break
11:00 – 12:15 (Parallel sessions)
Stream 7: Data-driven solutions to NHS pressures
Location: Lomond Auditorium
Chair: Anthony Wilson, Consultant at Manchester University NHS Foundation Trust
Speakers:
-
- Sarah Bowers – Unscheduled care in the last year of life for people with multimorbidity: A national cohort study
View Sarah’s abstract
Introduction
People living with multimorbidity (≥2 long-term conditions) are high users of healthcare, particularly unscheduled care — the unplanned use of health services. Unscheduled care use rises sharply towards the end of life, but how multimorbidity shapes this pattern is less well understood.
Methods
We conducted a national retrospective cohort study of all adults (≥18 years) who died in Scotland between 2017 and 2021. Linked routine healthcare data were used to explore associations between multimorbidity, sociodemographic factors, and patterns of unscheduled care use in the last year of life. Logistic regression was used to estimate adjusted odds ratios (aOR) with 95% confidence intervals (CI). Public contributors were involved throughout this work via a dedicated advisory group.
Results
Among 299,361 decedents, 45.6% (n=136,953) had multimorbidity prior to their last year of life. People with multimorbidity were older, more often female, and more likely to live in socioeconomically deprived areas. Unscheduled care use was almost universal, but higher among those with multimorbidity (97.5% vs. 95.5%), with greater contact rates (971 vs. 956 contacts per 1,000 decedents) and higher out-of-hours use (5,541 vs. 4,214 per 1,000). They also began using unscheduled care earlier before death (median 13 vs. 8 weeks). Multimorbidity was strongly associated with higher overall unscheduled care use (aOR 1.50, 95% CI 1.45–1.57).
Conclusions
Unscheduled care use is high in the last year of life, particularly among people with multimorbidity — highlighting likely unmet needs. We are now undertaking in-depth interviews with bereaved caregivers of people with multimorbidity to better understand experiences of care and how services might respond to support this growing population. Insights from both studies will inform future policy and service design.
- Sarah Bowers – Unscheduled care in the last year of life for people with multimorbidity: A national cohort study
-
- Tristram Gale – Enabling exploration of the dynamics of service pressure
View Tristram’s abstract
Background
During the COVID-19 pandemic, the systemic anti-cancer therapy (SACT) analysis team in Public Health Scotland (PHS) was tasked with providing activity data showing how SACT delivery was affected by the pandemic. These data were calculated and released on the COVID-19 wider impacts dashboard.
The data showed changes during the pandemic, but also long-term increases in the level of SACT activity. These data were used as references by the Scottish Government during the development of Scotland’s cancer strategy.
When the COVID-19 wider impacts dashboard ended regular updates in 2022, PHS was asked to develop an enhanced standalone activity dashboard.
Objectives
PHS to develop a public-facing SACT activity dashboard which can be used by managers and clinical staff to explore the dynamics of service pressure.
Methods
PHS analysts developed a scope for a new R Shiny dashboard, working with stakeholders and drawing on previous requests that used SACT data to better understand prescribing activity. The scope was presented to national groups and amended to meet user requirements.
Results
The new dashboard was successfully published and continues to be updated weekly. It enables users to explore the dynamics of service pressure by displaying breakdowns that distinguish potential causes of increased activity.
New data breakdowns include cancer sub-groups, drug types, and identifying patients by whether they are in their first treatment reporting period for a type of cancer.
Conclusion
The publication of SACT activity for Scotland has been successful and well-received by stakeholders. Users include managers, clinical staff, the Scottish Government and the public. It is a valuable resource for local analytical teams, letting them redirect requests to publicly-available data.
Some examples of dynamics visible in SACT activity include increases in activity for breast cancer and prostate cancer. There is a 67% increase in the number of patients being treated for breast cancer between 2015 and 2024. The dashboard lets users establish that the number of patients in their first treatment year has only increased 8%. The remainder of the difference is accounted for by patients continuing treatment. This implies patients are on average being treated for longer.
The number of appointments for prostate cancer in Scotland rose from 7,003 in 2015 to 29,864 in 2024. The dashboard lets users see that this increase is due to appointments for hormone drugs.
Acknowledgments
PHS acknowledges the invaluable contribution of colleagues in National Services Scotland, the Scottish regional cancer networks, the Scottish cancer treatment centres, the Innovative Healthcare Delivery Programme (IHDP) Scotland and our Scottish Government sponsors.
Lay summary
Public Health Scotland regularly publishes information about the use of cancer drugs as tables and graphs. These can help users understand changes over time in appointment or patient numbers.
- Tristram Gale – Enabling exploration of the dynamics of service pressure
-
- Samantha (Hiu Yan) Ip – Using health data to influence COVID-19 policy decisions – A study to assess the impact of COVID-19 vaccination uptake on stroke, heart attack and blood clots
View Samantha’s abstract
Within accredited Secure Data Environments, we analysed UK-wide, linked routine UK health records (including GP, hospital, laboratory testing, vaccination, primary-care dispensing, death registration records) for up to 68 million people. Through HDR UK’s BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium and the HDR UK COALESCE collaboration, multidisciplinary teams turned these data into evidence that informed COVID-19 vaccination policy.
Vaccine safety (England, 46M adults): Incidence of arterial thromboses (heart attack, ischaemic stroke, other arterial embolism) were up to 10% lower in the 13–24 weeks after the first dose of a COVID-19 vaccine. Following a second dose, the incidence was 27% lower after receiving the AstraZeneca vaccine and 20% lower after the Pfizer/Biotech vaccine. Boosters generally lowered the incidence further. Incidence of venous thromboses (deep-vein thrombosis, pulmonary embolism, portal or intracranial venous thrombosis) also showed similar patterns. Short-term increases in incidence were only observed for already known rare outcomes: intracranial venous thrombosis and thrombocytopenia after a first dose of the AstraZeneca vaccine, and myocarditis/pericarditis after mRNA-based vaccines. No new cardiovascular safety signals appeared with later doses. Overall, the lower post-vaccination incidence of common cardiovascular diseases, set against the higher incidence following COVID-19 infection, outweighed the small absolute excesses in rare events.
Under-vaccination (UK-wide, 68M ages ≥5): In summer 2022, 33–50% were under-vaccinated across the four UK nations, especially amongst those who were younger, more deprived, had fewer comorbidities, and of non-White ethnicity. Under-vaccination was related to more hospitalisations and deaths across all age groups, with under-vaccinated people over 75 more than twice as likely to have a severe COVID-19 outcome than those who were fully vaccinated. It was estimated that over 7,000 hospitalisations/deaths were potentially preventable with better vaccine coverage.
Briefed to the UK medicines regulator (MHRA), the national vaccination advisory committee (JCVI), and the UK Chief Medical Officers/Chief Scientific Adviser; also cited in World Health Organization guidance on thrombosis with thrombocytopenia syndrome after vaccination. These examples are part of a wider portfolio that showcase how HDR UK’s national data infrastructure and team science delivered real-world impact.
- Samantha (Hiu Yan) Ip – Using health data to influence COVID-19 policy decisions – A study to assess the impact of COVID-19 vaccination uptake on stroke, heart attack and blood clots
-
- Tassella Isaac – Clustering multiple long-term conditions and social care needs in 7.2M adults: A 10-year study
View Tassella’s abstract
Lay summary
People living with multiple long-term conditions (MLTC) often need extra support with everyday activities—such as dressing, washing, moving around, or managing the home. This non-medical support is known as social care. As more people live with MLTC, it is critical to understand how their health needs intersect with social care needs, and how these combined challenges affect long-term survival.
In this study, we analysed electronic health records from 7.2 million adults in England to identify common patterns of long-term conditions and social care needs, and to examine how these patterns relate to 10-year mortality risk. We found that people with the highest levels of medical and social complexity were significantly more likely to die earlier. Our findings highlight the urgent need to align health and social care systems—particularly for individuals facing socioeconomic disadvantage—to improve outcomes and reduce inequalities. This research provides a scalable, data-driven framework that supports NHS priorities for integrated, person-centred care.
Background
Multiple Long-Term Conditions (MLTC)—defined as two or more co-existing chronic health problems—affect approximately 45% of adults in the UK and are linked to higher hospital admissions, poorer quality of life, and increased mortality. These risks are exacerbated when social care needs (SCNs) go unmet. SCNs include support for essential aspects of daily living, such as mobility, finances, housing, and social connection. While NHS policy increasingly recognises the need to integrate health and social care data, large-scale analyses quantifying how these factors jointly impact mortality are scarce.
Aims
- To identify population subgroups with distinct patterns of MLTC and social care needs using a data-driven clustering approach.
- To assess how these subgroups differ in their risk of all-cause mortality over 10 years.
Methods
We conducted Latent Class Analysis (LCA) using data from the Clinical Practice Research Datalink (CPRD). The analysis included the ten most prevalent MLTC categories and eight social care need domains: activities of daily living, mobility, financial, disability, community care, residential status, social networking, and bereavement. Clusters were described based on condition profiles, SCNs, and socio-demographic factors (age, sex, ethnicity, region, deprivation). We then used Cox proportional hazards models and Kaplan–Meier curves to estimate and visualise mortality risk across clusters, adjusting for covariates.
Results
Among 7.2 million adults with MLTC (mean age 54, mostly female and White), four distinct clusters emerged. Clusters with more social care needs and higher multimorbidity had significantly elevated 10-year mortality risks compared to the healthiest group (HRs: 1.83, 1.49, 2.43).
Conclusion
This is the first national-scale study to jointly cluster MLTC and social care needs. Our findings demonstrate that early identification of high-risk groups using routinely collected data can inform strategies for integrated care, reduce inequalities, and support more targeted service planning.
- Tassella Isaac – Clustering multiple long-term conditions and social care needs in 7.2M adults: A 10-year study
-
- Julian Matthewman – Mapping risks of hospital-recorded health conditions in people with eczema
View Julian’s abstract
Introduction and objectives
Atopic eczema is related to adverse health outcomes. Existing work has largely focused on associations with individual outcomes, with limited work looking more comprehensively at multiple outcomes. Here, we produced a comprehensive atlas of results on how eczema is associated with thousands of diseases assessing the full health spectrum.
Materials and methods
We harnessed UK hospital admissions data to systematically assess associations with all 2,056 category-level ICD-10 (International Classification of Diseases) morbidity codes (e.g., “J45 Asthma”, “K58 Irritable bowel syndrome“), recorded in any diagnostic position. We identified over 3 million people with eczema from primary care records (Clinical Practice Research Datalink), matched (age, sex, general practice) to over 14 million comparators without eczema. We followed participants for up to 25 years using their hospital records. By mapping ICD-10 codes to Global Burden of disease (GBD) categories and phecodes, we utilised additional strategies to define diagnoses. We performed multiple different types of analyses on different cohorts.
Results
We produced results from 60,000 statistical models and created a dashboard to explore these. Some of the most relevant results are:
- Strong associations in all cohorts and analyses with eye outcomes including the rarer and more serious complications of cataracts and keratoconus.
- An increased risk of Crohn’s disease emerged as the strongest signal that would not typically be seen as part of the complications of atopic eczema and is markedly stronger than for Ulcerative colitis.
- Reassuringly, for people who had eczema in childhood, compared to outcomes informing atopic eczema diagnosis (e.g., asthma, allergy, skin infections, conjunctivitis) any increase in risk in other organ systems was much smaller.
Conclusion
In conclusion, our results give the first comprehensive overview of excess hospital diagnoses in eczema and provide a resource for generating and testing individual association hypotheses. The ability to contextualise and compare across all diseases is key to prioritising decision making (e.g., on awareness, screening and prevention) given limited NHS resources.
We highlight ophthalmologic outcomes, for which there has previously been a research gap with few studies available, as key complications of eczema. Considering the emergence of novel treatments for eczema that have themselves been linked to ophthalmic complications it is especially important to consider these. For Crohn’s disease, the evidence is amassing from other studies including from large routinely collected data, birth cohorts and Mendelian randomisation studies, but here, for the first time, we demonstrate this increased risk in the context of all hospital-recorded diagnoses.
Together with our recently published large-scale work investigating eczema outcomes in primary care (doi: 10.1038/s41467-024-54035-1) the evidence on eczema outcomes has received a considerable update, with implications for guidelines and future research, e.g., studies on adverse effects of new medicines.
- Julian Matthewman – Mapping risks of hospital-recorded health conditions in people with eczema
Stream 8: Patient and public involvement and engagement (PPIE) throughout the research cycle
Location: Alsh 1
Chair: Farheen Yameen, Public Adviser at Applied Research Collaboration North West Coast (ARCNWC)
Speakers:
-
-
Robin Flaig – Findings from a novel Citizen Panel project with minoritised groups reviewing data access process to linked longitudinal data
View Robin’s abstract
Background: The ‘Citizen Panel’ concept is developed by the organisation Understanding Patient Data as a learning data governance model for data access. The model shifts the involvement of the public in data access from a one-way direction to a feedback cycle. To date, this conceptual model has not been deployed in the context of researcher access to public data. Data access processes have public contributors only involved in reviewing research applications, not in the evaluation of the full data access process and decision making; this is what makes the Citizen Panel a ground breaking methodological innovation.
Objective: The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for data linkage in longitudinal research. UK LLC piloted the Citizen Panel research project (2024–2025) with the aim to embed public feedback and perceptions into data access and decision-making. The project involves the public in decisions around the acceptance and suitability of data use, assessing the scope and benefit of research questions and creating a learning feedback loop for decision-making.
Methods: The Citizen Panel comprises a Steering Group who helped co-design the protocol, co-lead recruitment and train the panel (July–September 2024). The Citizen Panel comprised 50% participants of UK LLC partner Longitudinal Population Studies (LPS) and 50% from seldom-heard communities known to be under-represented in LPS. The project was funded by UKRI and has ethical approval from the University of Edinburgh.
Results: The Panel operated during two rounds. In each round the Panel members had four online meetings and one hybrid workshop. At the end of round one (March 2025), the Panel provided a set of recommendations to UK LLC which included: (1) developing its own standards for defining and measuring public good, (2) integrating an assessment of equality, diversity and inclusion (EDI) at every step of the data application process, (3) encouraging researchers to sensitively engage with people with lived experience, and diverse, minoritised groups, (4) have a feedback loop where public contributors review completed research projects and assess whether they achieved what they aimed to, (5) reassure the public that it is taking into consideration the challenges of Artificial Intelligence (AI), and what actions are in place to mitigate these. The second round of the Panel took place in April–June 2025 and will conclude with a second set of recommendations.
Conclusion: The list of recommendations by the Citizen Panel is shared with UK LLC management who are feeding back the response to the Panel and implement it where appropriate and practical. Learnings and findings from the Citizen Panel are shared with partner organisations, academic and non-academic institutions. A full evaluation of the Citizen Panel will be completed before the end of the project (December 2025).
-
-
- Elsie Makachiya – Public perspectives on NHS data use: Insights from the East of England SDE engagement workshops
View Elsie’s abstract
To understand public views on the East of England Secure Data Environment (SDE), the team partnered with ten Healthwatch organisations across the East of England. Six workshops were held in Bedford, Colchester, Hertfordshire, Peterborough, and Norwich. These informed communication materials, security measures, and decision-making processes for the SDE, while promoting meaningful, diverse public involvement in NHS data use.
The workshops aimed to explore participants’ existing knowledge of NHS data use, their concerns, hopes for data-driven research, and the safeguards needed to build trust. Materials co-created with the Core Public Advisory Group (CPAG) were tested. The CPAG is a group of public representatives supporting the long-term development of the SDE. Participants also advised on effective communication channels, opt-out processes, and ways to make information accessible and inclusive.
A key finding was limited awareness of how NHS data is used, both in individual care and research. Despite this, participants broadly supported data use that benefits patient care and services. They strongly favoured transparency, plain language, and materials in varied formats (e.g. EasyRead, video, translations).
Concerns were raised about data accuracy, fairness, risks of re-identification (especially for rare disease patients), and exclusion from research due to privacy worries. Participants called for clear explanations of anonymisation, safeguarding, and ethical governance in public communications. They stressed that data use must serve public benefit and avoid discrimination, especially against underserved communities.
Public involvement in decision-making was seen as essential. There was strong support for a diverse Data Access Committee (DAC), including professionals and people with lived experience. The DAC reviews requests to use sensitive data, (e.g. health records) ensuring ethical, safe and responsible access. Participants recognised the need to balance efficiency with meaningful public involvement, i.e. ensuring public voices are informed and empowered.
Low awareness of opt-out options was noted. Participants recommended multiple, accessible routes (e.g. phone, email, website), with clear explanations and flexibility to change decisions. While optional reasons for opting out could be offered, they should not be required.
The SDE’s first use case, the cardiovascular disease project was well received. Participants viewed it as a strong example of how SDE research can improve health outcomes. However, trust remained a concern. Ethical, transparent research, free from political influence or links to untrustworthy organisations, was seen as vital. These insights provide practical, community-informed guidance for building public trust through inclusive design, transparent governance, as well as sustained and meaningful public engagement.
How PPIE feedback has shaped our approach:
- Emphasising benefits in communications about health and care research
- Shaping the DAC’s composition and terms of reference
- Defining the DAC’s role in approving projects in the public interest and tracking requests that cannot proceed due to technical or data issues
- Elsie Makachiya – Public perspectives on NHS data use: Insights from the East of England SDE engagement workshops
-
-
Kayleigh Mason – Impact of Public and Patient Involvement (PPI) on health data research in children and young people
View Kayleigh’s abstract
Background: Public and Patient Involvement (PPI) is key to delivering accessible medical research that is relevant to the needs of those seeking healthcare. Health data research and qualitative research have previously been less inclusive of PPI, particularly research within children and young people (CYP).
Objective: To highlight the role of PPI across the research lifecycle for a mixed methods programme using health care records from the Clinical Practice Research Datalink (CPRD) and qualitative research investigating the primary care prevalence, prognosis, and outcomes of musculoskeletal pain in children and young people (aged 8–18 years).
Methods: A Young Persons Advisory Group (YPAG) was established to provide input throughout all stages of the research via meetings with researchers. This included input on the research questions, data analysis, results and interpretation, and dissemination from the CPRD study; and recruitment advertising and design for data collection techniques (e.g., topic guides and storyboards) for the qualitative study.
Results: One young adult (female, aged 25, white British) and 4 CYP (3 females; age range 8–14; 3 white British) experiencing musculoskeletal pain have attended at least 3 of 7 YPAG meetings with a parent/family member from grant development (2021) to present. Examples of key YPAG involvement include:
- Grant development: Provided feedback on outcomes for workpackages. Helped to define time period for a “new episode” of musculoskeletal pain. Input to the frequency of PPI meetings and dissemination plans.
- Quantitative: Developed information sheet to explain the use of electronic health records to CYP. Input to diagrams to explain how electronic health record research is conducted. Finding of 1 in 10 individuals consulting primary care for musculoskeletal pain per year rephrased to 3 children per class. Feedback on draft infographics to communicate results (clarity; colour; wording; alternate icons).
- Qualitative: Input into the design of “storyboards”; visual aids to interviews with CYP aiming to spur conversation about healthcare interactions. Input into the design of topic guides for interviews and focus groups, helping ensure the conversation covered relevant areas of CYP healthcare experience in lay modes of communication. Input regarding the results of data collection and analysis.
- Funding: The impact of neurodiversity or comorbid mental health conditions on consulting for musculoskeletal pain and how it is managed in primary care was raised in a YPAG meeting. This led to a further NIHR-funded grant.
Conclusion: Inclusion of affected individuals and the formation of the YPAG have been instrumental in our research using health care data including methods to explain health data research and interpreting and disseminating findings, in shaping our patient-facing documents and recruitment strategies for the qualitative study, and generating future research questions for funding applications and studentships.
-
-
-
Sarah Robertson – The power of positive participation: Meaningfully engaging young people in health research
View Sarah’s abstract
Lay summary / Background and hypothesis
Generation Scotland (GS) is a research resource of over 40,000 people from across Scotland used to study the genetic and environmental basis of common diseases. It was established in 2006, but originally only people over the age of 18 could join. In 2023 we were able to include young people aged 12–18, however through our youth engagement approach we realised that there were significant barriers to their participation. We therefore began a journey to address these barriers and improve our systems.
Objective
We actively involve young people in shaping health research by gathering feedback from diverse groups. Our goal is to improve youth engagement and inform research practices. These groups have influenced decisions in our longitudinal health study, aligning with children’s rights and challenging tokenistic assumptions for meaningful engagement.
Methods
We have developed positive healthy relationships with young people and stakeholders through the development of Youth Advisory Groups, consisting of a diverse group of young people from across the country and the establishment of a new Mental Health Youth Forum, the creation of new social media accounts, development of “young people friendly” materials. We have also given young people a voice in ethics submissions, and a platform to share their journey at national conferences, enhanced partnerships with youth work organisations and developed a lesson plan for secondary schools to raise awareness of health research.
Results
After feedback from young people we have significantly redesigned our youth communication and recruitment pathways, including text-based communication and compensating time. 12–15 year-olds can now initiate the process of joining the study without a parent/guardian joining first, consistent with children’s rights-based approaches. Youth co-production has also been central to projects, ensuring age-appropriate resources, engagement with new technology, raising awareness of health research and optimising and maintaining engagement. In order to raise awareness about health research and reach more young people we developed a school’s lesson plan “Why Big Data Matters” in partnership with SHINE (School’s Health Improvement Network) and within 1 month of launch, 37 schools have engaged with the new lesson plans from 65% of local authority areas. In April 2025 we received funding from Research Data Scotland to develop this resource further.
Conclusion
Young people have made a meaningful contribution to our research projects, and contributed social media content to disseminate knowledge and highlight the importance of young people’s engagement in health research. Insight from young people has strengthened research design and increased participant engagement. Information about how to contact young people has led to system changes to remove barriers and allow young people to be heard. Effective communication strategies with young people will benefit future research involving the younger generation.
-
-
- Ana Paula Rubio – Beyond the dataset: Integrating public voices in data science
View Ana Paula’s abstract
Lay summary: The SCONe project (Scottish Collaborative Optometry-Ophthalmology Network eResearch) links eye images with health records to help detect health conditions earlier. As large data-driven health research grows, involving patients and the public is more important than ever. This article shares how public contributors helped co-develop SCONe’s approach to patient and public involvement and engagement (PPIE); reflecting on what worked, what didn’t, and offering practical suggestions to support more inclusive and meaningful public involvement in health data research.
Background/hypothesis: Projects like SCONe operate within complex systems and strict governance rules that limit public involvement. In data science, meaningful PPIE is still emerging, with little practical guidance on how to apply it in real-world settings. Without a clear framework, and with inconsistent reporting, much of this work remains hidden—creating an “iceberg effect” that obscures both challenges and successes in large-scale data projects.
Objective: To co-develop a PPIE strategy with public partners to make SCONe more inclusive, transparent, and trustworthy.
Methods: In June 2024, SCONe launched a PPI group of individuals with age-related macular degeneration (AMD) and their carers to shape the project from the outset, without predefined goals or a fixed framework. The strategy evolved through iterative collaboration, with priorities emerging from ongoing dialogue. Aligned with PEDRI and Aitken et al.’s guidance, it follows the IAP2 Public Participation Spectrum, structured around three tiers: involvement, engagement, and consultation—to ensure public voices influence decisions. The approach will expand to include a more diverse reference group over time.
Results: Large-scale data projects like SCONe face unique technical and governance challenges that can restrict direct public influence. To navigate this, SCONe co-developed a three-tiered PPIE model—Involvement, Engagement, and Consultation—that enables layered participation despite structural limitations. Intensive co-design occurs at the involvement tier, where public contributors act as partners in decision-making through a public steering group. Engagement builds transparency by explaining SCONe’s data governance processes, sharing project information, and co-developing resources that make data science more accessible, while also fostering wider public understanding of data-driven systems. Consultation, while not based on partnership, is vital in large-scale contexts where direct involvement of all data subjects is unfeasible. It gathers diverse perspectives that inform the other tiers and align with PPI group priorities. This layered model fosters collaboration, understanding, and amplifies public insight, ensuring PPIE is integral to ethical data-driven research. Rather than a singular approach, it offers multiple entry points for interaction, enabling public influence to grow over time.
Conclusion: The lack of an established PPIE framework in data science is not a limitation; it is an opportunity to rethink and redefine research with and for the public. As SCONe shows, meaningful public involvement must be flexible, iterative, and responsive to context in order to remain relevant and impactful.
- Ana Paula Rubio – Beyond the dataset: Integrating public voices in data science
Stream 9: Addressing societal challenges by linking health and non-health records
Location: Alsh 2
Chair: Jo Knight, Professor of Applied Data Science at University of Lancaster
Speakers:
-
- Colin Mclean – Preparing Lothian healthcare data for federated studies into social determinants of health
View Colin’s abstract
Lay summary
We are using health data to better understand how factors like income, local environment, smoking, alcohol use, and body weight affect people’s risk of developing cancer and other long-term illnesses. By securely linking and analysing anonymised health records from across Lothian, we can study these patterns without moving or exposing sensitive data. This helps researchers work together across the UK to find ways to reduce health inequalities and improve prevention.Background/ hypothesis
Federated analytics enables secure, privacy-preserving investigations of social determinants of health—including deprivation, lifestyle and demographic factors—across distributed datasets. This is especially valuable for cancer research in the UK, where early-onset cancer rates have risen since the mid-1990s, likely due to early-life exposures such as poor diet, smoking, and socio-economic disadvantage. These same factors also contribute to the broader burden of chronic diseases, which are unequally distributed across the population. Understanding these relationships is essential for informing prevention strategies and reducing health inequalities.Objective
To investigate how early-life exposures and social determinants—such as diet, smoking, alcohol use, and socio-economic status—contribute to the rising incidence of early-onset cancers in the UK.Methods
To participate in federated analysis of social determinants of health, we extended our OMOP Common Data Model (CDM) to include socio-economic status, demographics (Scottish Index of Multiple Deprivation (SIMD), Townsend Scores, and lifestyle factors including smoking, alcohol use, Body Mass Index (BMI)).We used standard OMOP vocabularies to map the core concepts and observations from Lothian extracts of registry data for outpatient appointments (SMR00), inpatient admissions (SMR01), cancer (SMR06), death records (National Records of Scotland), lab results (TRAKCare), and GP records. Socio-economic indicators came from Public Health Scotland (SIMD) and the UK Data Service (Townsend Scores).
Results
In collaboration with HDR UK’s HERON Network, we developed a robust OMOP CDM based on routinely collected primary and secondary healthcare data for the Lothian population. Covering 1984 to present, it includes de-identified records for ~2 million unique patients. This infrastructure supports secure, longitudinal studies of cancer and chronic disease risk in relation to social determinants, enabling research into topics such as early-onset cancer and health inequalities.Conclusion
By extending our OMOP CDM to include social determinants of health, we have created a robust foundation for federated healthcare and cancer research in the UK. This approach enables secure, large-scale analysis of early-life exposures and their long-term impact on cancer and chronic disease risk. Our work supports collaborative, privacy-preserving research that can inform targeted prevention strategies and help address persistent health inequalities.
- Colin Mclean – Preparing Lothian healthcare data for federated studies into social determinants of health
-
- Difei Shi – Maternal contact with social care services and their children’s health and educational outcomes
View Difei’s abstract
Lay summary
In England, children in contact with children’s social care (CSC) services represent a vulnerable group, and some disadvantages may continue into the next generation. We examined whether babies born to mothers with CSC histories experienced poorer outcomes. Using linked national data, we found that these babies had higher risks of low birthweight, preterm birth, and infant death. These findings suggest mothers with CSC experience may benefit from targeted support to improve outcomes for the next generation.Background
One in four children in England receive CSC services by age 18, facing higher risks of adverse outcomes in health, education, and other areas than the general population. These challenges persist across the life course, affecting well-being.Objective
Given the well-established intergenerational effects of disadvantage, such as poverty, our study aims to investigate whether a history of CSC involvement among mothers contributes to poorer outcomes for their children and whether such histories could trigger targeted public health interventions.Methods
We used Education and Child Health Insights from Linked Data (ECHILD), which links all-of-England health (Hospital Episode Statistics and mortality), education (National Pupil Database), and social care data. We identified births and then linked the data back to mothers’ social care, education, and health records, and forward to the child’s education and health records via a mother-baby link. We examined a range of outcomes and estimated adjusted risk ratios (aRRs) using Poisson regression, comparing the risk of each outcome in maternal CSC exposure. Models were adjusted for maternal characteristics and used robust standard errors accounting for siblings.Results
The cohort included 192,380 babies (born 2012-2020) for health outcomes and 7,080 babies (born 2012-2014) for educational outcomes, with mothers born between 1995-2003 and 1995-1997, respectively. Babies born to mothers with a history of CSC contact were more likely to have low birthweight (aRR=1.10; 95% CI=1.05-1.14), be born preterm (aRR=1.06; 95% CI=1.01-1.10), or die before age 1 (aRR=1.25; 95% CI=1.04-1.50), compared to those born to mothers without CSC experience. However, children born to mothers with CSC experience were less likely to have unplanned hospital admissions or Accident and Emergency attendances between ages 1 and 5 than those without, and there were no differences between groups in school readiness at age 5. Associations were partly explained by maternal characteristics: maternal education, age at first delivery, and deprivation. Results also varied by level of maternal CSC experience, including being a Child in Need, on a Child Protection Plan, or Looked After.Conclusion
Infants born to mothers with CSC experience were at risk of poor birth and infant outcomes. Better support for mothers with a history of social care involvement is needed to avoid adverse outcomes for their children, including a higher risk of child death.
- Difei Shi – Maternal contact with social care services and their children’s health and educational outcomes
-
- Olivia Swann – The Homes, Heat and Healthy Kids Study: The role of underheated homes in preschool respiratory infection
View Olivia’s abstract
Background
Preschool children spend a lot of time at home when their lungs are developing. Acute respiratory infections (ARIs) are the main cause of hospitalisations in preschool children and are associated with cold, damp housing. Preschool ARIs have lifelong health consequences.Half of Scotland’s housing has low energy efficiency. The Scottish Government is retrofitting housing to increase home energy efficiency (HEE) for Net Zero targets. However, whilst making homes warmer, some HEE measures reduce ventilation, decreasing heat loss but trapping air pollutants indoors, potentially worsening respiratory health.
We can now link children’s healthcare data with every home they have lived in across Scotland. This new national linkage will allow us to investigate how underheated homes and the different approaches to warming them affect preschool ARI risk.
Objectives
1. Identify children living in underheated homes using routinely collected data.
2. Estimate how many preschool ARIs are due to underheated, damp homes.
3. Explore whether different ways of making homes warmer increase or decrease preschool ARI risk.Planned methods
We have established a retrospective birth cohort of children born in Scotland from 2008-2025 followed up until 5 years old. The cohort links cross-sectoral data from healthcare, housing, HEE measures, smart meters, air pollution, high street banking and weather for ~4 million child-years and 230,000 properties across Scotland.Risk prediction models will identify underheated homes using routine data and we will estimate the contribution of underheated homes to preschool ARIs using causal epidemiology methods. We will use machine learning approaches and natural experiment methodology to determine the effects of different HEE measures on preschool ARI risk.
Impact
Results will inform energy, welfare, climate and building policies and ensure that Net Zero policies do not worsen health inequalities. The new methodology developed will be applicable to other health conditions, demographics and populations.
- Olivia Swann – The Homes, Heat and Healthy Kids Study: The role of underheated homes in preschool respiratory infection
-
- Isabella Tortora Brayda – Heat-related deaths and hospitalisations across Scotland: 2005-2024
View Isabella’s abstract
Background
The climate is changing on a vast scale with increasing weather extremes impacting human health worldwide. Rising global temperatures increase exposure to hazardous heat levels, leading to higher rates of hospitalisation and death. By global standards, Scotland is a colder country, and consequently there are few investigations into heat-related health impacts. Those that exist, typically target Scotland or focus on large cities, overlooking the high degree of temperature variability experienced across Scotland. Further, existing literature has used data preceding the COVID-19 pandemic, which does not capture the heat-related health impacts post COVID-19.Objective
The Climate Analyst Team in Public Health Scotland aim to establish an up-to-date baseline for heat related deaths and hospitalisations in Scotland, using 20 years of data (2005-2024). We will evaluate impacts at both health board level and council area level, to support our stakeholders to understand how heat will affect their populations, and develop mitigation and adaptation strategies. We will take a focus on health inequalities and how vulnerable populations are differentially impacted. Further, we are developing interactive indicators which will be hosted on the ScotPHO profiles tool (https://www.scotpho.org.uk/) with downloadable open access data.Methods
We first link hospital admissions and deaths data to weather data from MetOffice, and use a Distributed Lag Non-Linear Model (DLNM) to calculate the relative risk of hospitalisation and death in Scotland. This method involves running quasi-poisson models at regional levels, before calculating Scotland level estimates through a meta-analysis on regional estimates. The strength of a DLNM is the inclusion of a lag component, which enables us to capture hospital admissions or deaths occurring on the day of exposure as well as the day(s) that follow.Although the DLNM model is widely used to explore the interaction between high heat and mortality, few studies have used this approach to analyse the impact of heat on hospital admissions. Our analysis identifies optimal temperatures where there is no higher or lower risk of death or hospitalisation. We also identify two key temperature thresholds; firstly, where the relative risk of death or hospitalisation begins to increase and, secondly, where risks are 10% higher than at optimal temperatures. Additionally, we estimate the annual rate of attributable deaths and hospitalisations over these thresholds.
Impact
Heat estimates and risk thresholds currently available for the UK are based on those of an English/Welsh population and English/Welsh climate. With a better understanding of heat risk thresholds for the cooler climate-adapted Scottish population, we will be providing data-driven recommendations for adverse weather plans, and supporting health boards and local authorities to implement effective adaptation and mitigation strategies and communicate the risks of heat to the general public.
- Isabella Tortora Brayda – Heat-related deaths and hospitalisations across Scotland: 2005-2024
12:15 – 13:15
Lunch (Hall 1 and 2, Exhibition)
Working well with the public in health data science – workshop (Alsh 1 & Alsh 2 – booking only)
13:15 – 14:30 (Parallel sessions)
Stream 10: Innovating the data infrastructure
Location: Lomond Auditorium
Chair: Amy Tilbrook, Health Data Research UK
Speakers:
-
- Wendi Bacon – Easier validation, faster approvals: Safe People Registry for streamlining Safe People verifications
View Wendi’s abstract
Background
Health data research is improving lives every day. But because health data is highly sensitive, it must only be accessed under strict conditions: by trusted researchers with the right credentials; within secure environments that protect against unauthorised use; and for projects that benefit the public. These principles are captured in the Five Safes Framework which guide the responsible use of data in research. The Safe People Registry is a secure software solution – maintained and supported by the HDR UK Technology Team – designed to standardise and simplify researcher verification for secure data access – the Safe People principle.Problem
In practice, verifying the Five Safes Framework for individual researchers is repetitive, time consuming, and difficult to track. Researchers are also submitting multiple data access requests which often require separate Safe People validations for each, in line with a data custodian’s local policies and procedures. This fragmented process results in significant inefficiencies, diverting time away from generating valuable medical insights.Solution
We developed a Safe People Registry, to create a more integrated and efficient process for assessing responsible researcher access to sensitive data.How does the registry help researchers?
- Reduces duplication: Researchers create one profile that can be shared with multiple data custodians.
- Fast setup: Integrations such as Google, Microsoft, and ORCID speed up profile creation.
- Easy tracking: Safe People validation is managed in one place – no more chasing email threads.
- Smart notifications: Automated alerts when training course certificates expire.
- Sustainable: Researchers can keep and update their profile when moving between organisations or institutions.
How does the registry help data custodians?
- Reduces duplication: Custodians can reuse past verification steps to speed up repeat checks.
- Rapid validation: Minimum validation requirements are customisable and missing elements can be flagged directly to the researcher.
- Centralised tracking: Safe People validation statuses are managed in one place.
- Automated notifications: Custodians can receive alerts when researchers change organisations, training expires, profiles are updated, and a project ends.
- Visibility: See which other custodians and projects a researcher is involved with, including whether they have been previously validated.
- Optional features: Can enable Single Sign-on for access, record project approvals, or automatically list project details on a public data use register.
How does the registry help organisations?
- Reduces duplication: Researchers are affiliated once, not with each new research project data access request.
- Automated notifications: When a researcher leaves an organisation, all relevant data custodians are informed, reducing the administrative burden on the organisation.
- Easy tracking: Organisations can view all their associated researcher profiles, which projects they are involved with, and which custodians they engage with.
- Wendi Bacon – Easier validation, faster approvals: Safe People Registry for streamlining Safe People verifications
-
- Jonathan Batty – IMPACT: an algorithm for phenotyping long-term conditions in UK electronic health records
View Jonathan’s abstract
Lay Summary
To identify long-term conditions in healthcare records, diagnosis codes must be mapped to clinical conditions, which is time-consuming and error prone. In this project, we worked with clinical experts, patients and members of the public to define key long-term conditions that are of importance to healthcare research. We used artificial intelligence to exhaustively screen all 254,546 diagnosis codes used in UK healthcare data against a definition of each condition, identifying a list of candidate codes. These were then assessed by multiple clinical and health data experts. Overall, 321 conditions were mapped to 39,196 diagnosis codes. Comparisons with existing codelists showed these to be comprehensive and reliable. The full set of codes is publicly available and has been built into software to help other researchers standardise and accelerate their work.Background
An exhaustive mapping from a standardised set of long-term conditions to all coding systems used in routinely collected health data in the UK does not exist.Objective
To develop a comprehensive phenotyping algorithm for an inclusive set of long-term conditions, spanning physical and mental health, sex-specific disorders and conditions under-represented in research to date, operationalised across all coding systems in use in the UK.Methods
Candidate long-term conditions were identified from existing systematic reviews, consensus studies and multidisciplinary clinical, patient and public representative input. Full-text descriptions were retrieved for each code appearing once or more in large primary and secondary care datasets (n = 254,546 in total). A large language model was used to perform pairwise screening of each code against a detailed synopsis for every long-term condition, resulting in 321 candidate codelists. These were then independently reviewed by multiple clinicians to produce final adjudicated codelists. A subset of these (including diabetes mellitus, atrial fibrillation and chronic obstructive pulmonary disease) were compared with pre-existing, expert-curated codelists to validate this approach.Results
From 11 systematic reviews, 6 consensus studies and multidisciplinary clinical and patient input, 321 unique long-term conditions were identified, which were grouped into 116 clinically-related phenotypes, spanning 19 body systems. The two-stage screening approach mapped 39,196 codes, spanning multiple coding systems (including SNOMED-CT, Read/CTV3, ICD-10, OPCS-4, local and vendor-specific codes) to the 321 long-term conditions. A subset of these were validated against pre-existing codelists, confirming the high sensitivity and specificity of this approach (diabetes: sensitivity 97.0%, specificity 100%; atrial fibrillation: sensitivity 100%, specificity 100%, and chronic obstructive pulmonary disease: sensitivity 100%, specificity 99.9%).Conclusions
The Inclusive Multimorbidity Phenotyping Algorithm Codelist Tool (IMPACT) enables the ascertainment of 321 long-term conditions, which may be aggregated into 116 clinical phenotypes, across a range of clinical coding systems. The final codelists (and packages to implement IMPACT in R, Python and Stata) are available via GitHub.
- Jonathan Batty – IMPACT: an algorithm for phenotyping long-term conditions in UK electronic health records
-
- Cosmika Goswami – Medicines in Acute and Chronic Care in Scotland (MACCS): a new linkable medicine intelligence research
View Cosmika’s abstract
Background
Medicines are fundamental to modern healthcare, offering essential benefits in the prevention, management, and treatment of a wide range of health conditions. However, their safe, effective, and equitable use presents increasing challenges, particularly against the backdrop of an ageing population, growing levels of multimorbidity, and widespread polypharmacy. These factors complicate prescribing and monitoring, leading to potential issues around medicine safety, adherence, and effectiveness. The Health Data Research (HDR) UK Medicines Driver Programme is addressing these challenges by working to optimise medication use across the UK. One of its flagship initiatives is the development of a federated, linked health data infrastructure that will support advanced research into medication use, safety, and outcomes.Methods
This medicine-focused data infrastructure aims to integrate a wide array of national-level health datasets in Scotland. These include prescribing and dispensing data (community, hospital and homecare), hospital episodes, death records, and laboratory tests (biochemistry, haematology, microbiology). In Scotland, the infrastructure is being developed with the support of Public Health Scotland. Data are securely linked using the Community Health Index (CHI) number (a unique patient identifier), pseudonymised, and accessible through the National Safe Haven, a secure platform for health data research. The development process is guided by the FAIR data principles (Findable, Accessible, Interoperable, and Reusable), ensuring data quality and maximising research potential.Results
The Medicines in Acute and Community Care in Scotland (MACCS) research resource is progressing steadily. Significant milestones have been achieved, including the identification of key datasets – HEPMA, SMR00/01/02/04/06, NRS Death Records, PIS, SRR, SCI, SICSAG, ECOSS and Regional Lab data. To enable the secure integration of these individual-level datasets across care settings, robust data linkage mechanisms have been established, supported by the creation of a comprehensive governance framework. Once fully implemented, MACCS will offer researchers streamlined and efficient access to high-quality, linked datasets and analytical tools. This will enable investigations into various dimensions of medicine use, including patterns of access and prescribing, equity in care, safety profiles, treatment adherence, and real-world health outcomes. The resource is being developed with input from clinicians, data scientists, and patient groups, ensuring relevance to real-world challenges in medicine optimisation.Conclusion
The MACCS initiative underscores the transformative potential of linked health data in advancing medicines research. By enabling comprehensive, population-scale analyses, it opens the door to more precise and evidence-based decision-making in clinical and policy contexts. Moreover, MACCS represents an open invitation to researchers across the UK and internationally to collaborate with Scotland in conducting high-impact research that can ultimately improve medication use and patient outcomes on a global scale.
- Cosmika Goswami – Medicines in Acute and Chronic Care in Scotland (MACCS): a new linkable medicine intelligence research
-
- Tamsin Nash – Data science design for conflict resolution within the Scottish Cancer Intelligence Platform
View Tamsin’s abstract
Lay Summary
The Cancer Intelligence Platform (CIP) is a national infrastructure transforming access to cancer data in Scotland by enabling secure, integrated analysis across population-level datasets. However, combining datasets with overlapping and sometimes conflicting information poses challenges for researchers and clinicians. Without a systematic approach to assess consistency, accuracy, and agreement, differences in data coding and completeness can cause confusion or misinterpretation. We developed a standardised, clinically informed protocol to harmonise data within CIP, demonstrated by linking the Scottish Cancer Registry and the Quality Performance Indicator (QPI) dataset. This approach identifies overlap and conflict clearly, supports metadata development, and enables robust, scalable population health research.Background and Rationale
CIP integrates multiple datasets describing different stages of the cancer care pathway. However, duplication and variation across datasets complicate combined use. Without a structured method to quantify agreement and resolve inconsistencies, users must rely on ad hoc judgements, undermining reproducibility and confidence. A scalable, protocolised approach is required to evaluate data quality and guide appropriate use within trusted research environments.Objectives
- Develop a reproducible, clinically informed protocol for assessing conflict and agreement across CIP datasets.
- Demonstrate this method by harmonising the Scottish Cancer Registry and QPI datasets across shared clinical domains.
Methods
Individual-level records from the Scottish Cancer Registry and QPI were linked. Variables and coding were reviewed by clinical experts to identify overlapping domains. Equivalent but differently coded items were mapped, and logic rules based on clinical reasoning applied to determine true conflicts. Domain-level metrics on completion, agreement, and conflict were generated, highlighting areas needing further validation or single-source prioritisation.Results
29,930 matched records were analysed. We reviewed 94 variables from the Scottish Cancer Registry and 68 from QPI across diagnosis, histopathology, tumour characteristics, demographics, treatments, and outcomes. Twelve shared domains with potential overlap or conflict were identified. Domains with highest agreement included lymph node examination (98.3%), basis of diagnosis (94.6%), and clinical trial participation (96.1%). Highest conflict occurred in Duke’s stage (60.1%) and tumour site (41.2%). Complex harmonisation was needed for nodal status due to naming and dataset-specific definitions. Diagnostic method and mode of detection required nuanced interpretation owing to overlapping clinical categories. Staging differences often reflected granularity variations rather than true conflicts, illustrating the need for clinically informed reconciliation.Conclusion
This protocol provides a scalable foundation for population-level cancer research within CIP. Reducing ambiguity around data quality and guiding dataset interpretation will lower barriers for researchers and clinicians using integrated health data. As CIP grows, this approach will enable systematic metadata generation, support automation of harmonisation, and contribute to FAIR data principles and reproducible UK health research.
- Tamsin Nash – Data science design for conflict resolution within the Scottish Cancer Intelligence Platform
-
- Ruth Norris – Can training and development change research culture and support non-traditional, hidden and technical roles?
View Ruth’s abstract
The Teams Build Dreams (TBD) programme was scaled-up from initial applications in health informatics research programmes, to promote and support collaborative, team-based interdisciplinary ways of working to:
- Increase awareness of Team Research (TR) principles and their value
- Provide a suite of training and development activities
- Develop a reputation for TR application and expertise
Funded by Research England, we conducted workshops and surveys with academic, operational, technical and publics to co-develop a TR definition, and identify the benefits, blockers and solutions to implementation. Using this input, a comprehensive offering on the principles of collaboration, high-performance and interdisciplinary working was developed to provide:
I. Training and Development Suite using a pedagogically informed, developmental, learner-centred approach. To date c200 academic, operations and technical staff have been trained using:
- Ia. Facilitator-led live training modules:
- Introduction & Creating Teams
- Team Visioning and Ways of Working
- Team Conflict and Resolution
- Followership
- Mapping your Network
- Team Building
- Ib. Bespoke sessions based on needs assessment
- Ic. Online training set for self-led learning
II. Micro catalyst funding call to support small impactful collaborative ideas from the community (12 projects ~£1000 each)
III. Community event and networking series
IV. Online knowledge repository
V. Monthly reading group
VI. Mailing list (320+ users)
We use a developmental, iterative, learner-centred, feedback-centred approach to improving content and delivery. Recent feedback include:
“…really engaging and interactive. Can take a lot of tools to use in my role….”
“…well-structured and engaging, with clear real-world applications…”Learnings from a pilot phase with one faculty informed the offering, including:
- Recruitment of a qualified Training Manager
- Targeted communications
- Module Development and improvement
- Delivery Options review: resulting in comprehensive online/in person/self-led suite.
Driving parity and diversity in attendance and engagement across disciplines and role types continues to be challenging, with senior academics and male participants being in the lowest groups represented, and females, operational, and early career researchers most represented. We are currently undertaking an evaluation and sustainability planning, the findings of which will inform future offerings.
The TBD training is being utilised in a number of externally funded health and technology projects, including the NIHR COL-LAB project (Fu; Nolan); and the British Council ODA project delivering a Research Management and Workforce training suite across Kenyatta Hospital and University of Manchester (Jameson/Ngugi).
TBD is also being applied to the Secure Data Environment Research Technical Professional (RTP) Team Development Hub (EPSRC; Couch). The project will embed team research ways of working into the career development frameworks of RTPs working on data platforms and infrastructure for health and care: e.g. Platform/DevOps Engineers, System Administrators and Cloud Architects. Participants will receive tailored training and team building activities that equip them to work collaboratively, and to drive recognition and value of SDE RTP and other technical roles.
- Ruth Norris – Can training and development change research culture and support non-traditional, hidden and technical roles?
Stream 11: Public health and health inequalities
Location: Alsh 1
Chair: Karyn Mégy, Associated Principal Scientist, Clinical Informatics Centre for Genomics Research
Speakers:
-
- Hannah Bunk – Self-harm and suicide by sexual orientation – A national population-based study in England and Wales
View Hannah’s abstract
Background
Previous research indicates that people identifying with a non-heterosexual orientation have a higher risk of self-harm and suicidality compared with their heterosexual peers. The UK Government’s 2023 Suicide Prevention Strategy reports there is developing research to understand the risk of suicides in this group. However, national population-based estimates on self-harm and suicide by sexual orientation are limited. We aimed to examine differences in rates of self-harm and suicide by sexual orientation, including differences by sexual orientation sub-group and characteristics such as age, sex and ethnicity.
Methods
The 2021 Census of England and Wales included a voluntary question on sexual orientation for the first time. We linked individual-level Census 2021 data to death registrations and Electronic Health Records via the NHS Patient Demographic Service (PDS). Our study population of 28.7 million people included those aged 16 years and over who responded to the sexual orientation question in Census 2021 (not by proxy) and could be linked to the NHS PDS. We analysed two outcomes independently: (i) at least one hospital admission or Accident and Emergency (A&E) attendance for intentional self-harm and (ii) death by suicide. We calculated age-standardised rates per 100,000 people by sexual orientation between March 2021 and December 2023, broken down by sociodemographic, geographical, socioeconomic and health variables. Rates were standardised to the age distribution of the LGB+ population (gay, lesbian, bisexual or other non-heterosexual orientation) observed in the linked study dataset. We calculated rate ratios for the LGB+ groups vs. the heterosexual group to understand groups with the largest relative risk of self-harm and suicide.
Results
Risk of self-harm for people identifying with an LGB+ orientation was 2.5 times higher than for those identifying as heterosexual, with an age-standardised rate of 1,508.9 per 100,000 people for the LGB+ population and 598.4 per 100,000 people for the heterosexual population. Risk of suicide for people identifying with an LGB+ orientation was 2.2 times greater compared with those identifying as heterosexual, with an age-standardised rate of 50.3 per 100,000 people for the LGB+ population and 23.1 per 100,000 people for the heterosexual population. Relative risk of self-harm was higher for LGB+ individuals who were females, younger adults and those from a Black ethnic group. For suicides we also saw this elevated risk in females, older adults and those from a Black ethnic group.
Conclusion
This study provides the first national population-based estimates of self-harm and suicide by sexual orientation in England and Wales. Our findings show that people identifying with an LGB+ orientation are at a substantially increased risk of self-harm and suicide compared with their heterosexual peers. These findings are important to inform the UK Government’s suicide prevention strategy and further research supporting the mental health of non-heterosexual groups.
- Hannah Bunk – Self-harm and suicide by sexual orientation – A national population-based study in England and Wales
-
- Joana Cruz – Towards cleaner air: PM2.5 exposure and disparities around childcare providers in England
View Joana’s abstract
Introduction
Air pollution poses a significant health risk for young children, particularly in urban areas. This study aimed to examine changes in PM2.5 concentrations around Early Years Providers (EYPs)—including nurseries—across England from 2018 to 2022 and to identify neighbourhood characteristics associated with EYPs located in areas exceeding WHO air quality guidelines (>5 µg/m³).
Methodology
Using Geographic Information Systems (GIS) methodology, we integrated data on EYP locations from the Ordnance Survey with air pollution data from the Department for Environment, Food and Rural Affairs (DEFRA) in England. We analysed changes in PM2.5 concentrations over time around EYPs, identified clusters of EYPs in areas exceeding WHO interim target level of 10 µg/m³, using Local Moran’s I method, and explored the impact of local population characteristics on PM2.5 exposure. Socio-demographic data, including deprivation (from the Ministry of Housing, Communities and Local Government), urbanicity, and ethnic group distribution, were sourced from the 2021 Census of Population for England and Wales. To investigate associations between PM2.5 exposure levels near EYPs and small-area socio-demographic factors, we used a Bayesian spatial regression model incorporating random effects.
Results
The annual number of EYPs in England ranged from 15,780 in 2018 to 18,427 in 2019. Although we observed a 17.8% decline in average PM2.5 levels around EYPs between 2018 and 2022 (9.4 µg/m³ (SD=1.8) to 7.8 µg/m³, SD=1.5), over 96% of EYPs consistently recorded concentrations above the WHO annual guideline of 5 µg/m³. Temporary reductions in PM2.5 during COVID-19 lockdowns (2020 mean (SD): 7.5 µg/m³ (1.3); 2021: 7.5 µg/m³ (1.0)) rebounded by 2022 (7.8 µg/m³ (1.5)). High-High clusters—areas where EYPs with high PM2.5 levels (>10 µg/m³) are surrounded by other high-exposure EYPs – were observed around major urban centres, including Greater London, Birmingham, and other parts of the Midlands and Southeast England. Conversely, Low-Low clusters – regions where EYPs with low PM2.5 exposure are near other low-exposure EYPs – were primarily situated in northern and rural regions such as Devon, Cornwall, and the northern parts of England. EYPs located in deprived, urban areas, and in areas with a predominantly other than white population exhibited higher PM2.5 concentrations.
Conclusion
These findings underscore the urgent need for targeted interventions to reduce air pollution in deprived urban regions to protect young children’s health.
- Joana Cruz – Towards cleaner air: PM2.5 exposure and disparities around childcare providers in England
-
- Rabia Khan – Breast cancer inequities using real world data mapped to the OMOP CDM from Pakistan and the UK
View Rabia’s abstract
Background
Advancements in breast cancer (BC) diagnosis and treatment have notably improved survival in many high-income countries. However, patients in low- and middle-income countries (LMICs) continue to face substantial disparities in treatment and clinical outcomes. In Pakistan, where breast cancer (BC) incidence and mortality remain high, real-world evidence is critically limited. There is an urgent need to leverage routinely collected health data to better understand the clinical characteristics and trends of diagnosis of BC patients in this country.
Methods
We used patient-level data from the UK through The Health Improvement Network (THIN), a primary care data source, and from Pakistan through the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC) hospital data source, both previously mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Using the PhenotypeR package, BC patient characteristics including medication uptake following diagnosis were reported and crude incidence rates (IR) were calculated per 100,000 person-years from 2018-2022 in a randomized subset of 100,000 patients from each cohort.
Results
A total of 29,714 breast cancer patients were identified from SKMCH&RC, which comprises 10,031,718 cancer patients overall, whereas 63,621 breast cancer patients were identified from THIN UK, drawn from a broader population of 10,977,326 individuals. Sex distribution was comparable across both cohorts, with females accounting for 99.22% and 98.96% in SKMCH&RC and THIN UK, respectively. The median age at diagnosis was 45 years (IQR: 38–54) in SKMCH&RC and 65 years (IQR: 54–75) in THIN UK, respectively. Following breast cancer diagnosis, the most frequently recorded medications in the SKMCH&RC data were metoclopramide (80.56%), omeprazole (77.74%), acetaminophen (76.44%), ondansetron (74.81%), and dexamethasone (74.70%). Whereas for THIN UK, the top medications observed were omeprazole (33.53%), letrozole (31.42%), acetaminophen (30.31%), amoxicillin (29.51%), and tamoxifen (26.69%). The annual crude IR in the SKMCH&RC were: 89.98 (43.15–165.48) in 2018, 194.63 (121.97–294.67) in 2019, 159.42 (94.48–251.96) in 2020, 193.19 (121.07–292.50) in 2021, and 145.07 (82.92–235.59) in 2022. Corresponding IR of breast cancer in the THIN UK dataset were as follows: 88.59 (95% CI: 58.38–128.90) in 2018, 88.49 (56.70–131.67) in 2019, 98.04 (62.15–147.10) in 2020, 96.21 (57.92–150.24) in 2021, and 132.20 (80.75–204.16) in 2022.
Conclusion
This study demonstrates the feasibility of applying a standardized breast cancer phenotype across two international data sources using the OMOP-CDM. Patients in Pakistan presenting at a younger age at diagnosis and reported consistently higher crude IR compared to the cohort from the UK, the later reflective of specialized oncology hospital data. Omeprazole and acetaminophen were common medications to both datasets, suggesting a shared need for gastroprotection and pain management in BC patients across settings.
- Rabia Khan – Breast cancer inequities using real world data mapped to the OMOP CDM from Pakistan and the UK
-
- Esther Wainwright – How well do women understand their menstrual cycles? Insights from 383,085 UK based women
View Esther’s abstract
Lay summary
Many women in the UK lack a clear understanding of their menstrual cycles, including basic metrics such as cycle length, period duration, and what constitutes a regular cycle. This knowledge gap is particularly evident among younger individuals and those using hormonal contraception. Importantly, even women who believe they have “regular” cycles often misclassify their cycle patterns, potentially delaying recognition of underlying health issues.
Background/hypothesis
Menstrual health literacy is a critical yet frequently overlooked component of reproductive health; it shapes an individual’s ability to identify abnormal menstrual patterns, seek timely medical care, and make informed decisions. However, many women of reproductive age lack essential knowledge about their cycles, contributing to widespread misconceptions, stigma, and delays in healthcare engagement. This study explores menstrual health awareness, the misclassification of cycle regularity, and demographic disparities among UK-based women.
Objective
To explore common misconceptions about cycle length, regularity, and period length among women across various life stages, using large-scale data from an online health assessment (OHA).
Methods
This study utilises large-scale, real-world data from 383,085 UK-based women aged 18 to 58 who completed an OHA between September 2020 and January 2025. The assessment collected self-reported data on menstrual patterns, period characteristics, contraception use, reproductive history, and lifestyle factors. Participants described their cycle regularity and provided numerical entries for cycle and period lengths or selected “I don’t know.”
Results
More than one in five participants (22.2%) did not know their cycle length, with lack of awareness highest in those under 25 (33.4%). Although knowledge improved with age, it declined again after 45 years. Awareness also varied by ethnicity, with white participants reporting the highest uncertainty (22.5%). Period length was better understood overall, yet 9.5% of participants could not recall their period duration. Awareness again varied by age and contraceptive use, with hormonal contraceptive users showing the highest uncertainty across both measures (cycle length: 36.3%, period length: 14.5%). Among women who self-reported having regular cycles, 4.9% had numerical values that fell outside the clinically recognised 21–35 day range, suggesting a misunderstanding of what constitutes a regular cycle. Misclassification was particularly common among women under 25 (7.2%) and among Black (6.4%) and Asian (5.5%) women.
Conclusions
These findings, based on the largest UK menstrual health datasets to date, highlight persistent gaps in menstrual health literacy. They underscore the need for targeted, demographically tailored public health interventions. Improving menstrual cycle literacy could mitigate misinformation and empower individuals to recognise and address menstrual health concerns. Addressing these disparities is critical to improving reproductive health outcomes and promoting menstrual equity. Leveraging population-scale health data to inform menstrual health policy and education is a critical step toward advancing reproductive health equity in the UK.
- Esther Wainwright – How well do women understand their menstrual cycles? Insights from 383,085 UK based women
-
- Elizabeth Wrench – Understanding the distribution of cancer and stage of diagnosis across geographic regions in England
View Elizabeth’s abstract
Background/ hypothesis
There are well known health differences between the North and South of England, including regional disparities in cancer incidence. There is, however, limited evidence on how the stage at diagnosis differs between regions.
Objective
This analysis therefore aimed to determine and understand regional disparities in cancer incidence based on cancer type and the stage at diagnosis in 2022 whilst controlling for deprivation level.
Methods
The analysis used data exported from the National Cancer Registration and Analysis Service from 2022 on the counts of seventeen different cancer types at Integrated Care Board (ICB) Level. The dataset also included counts of diagnosis at stage 1 and 2 (omnibus) and stage 3 and 4 (omnibus). Separately, data on the median age, total population, proportion of each sex and the region of the ICB was extracted from ONS mid-population estimates for mid-2022. The overall English Indices of Multiple Deprivation (IMD) 2019 score for each ICB was also exported from The Health Foundation.
The count of cancer cases in each ICB was used as the outcome in a negative binomial generalised linear mixed model, the total population of the ICB was used as an offset, with median age, proportion of each sex and IMD 2019 score of the ICB included as fixed effects. A random effect was included as the interaction between region, stage and cancer type to investigate which region had the highest number of cases of a specific cancer type diagnosed at a later stage.
Results
Median age and IMD were significantly (p < 0.01) associated with a higher number of cancer cases. A higher proportion of females in the ICB was significantly (p < 0.05) associated with a lower number of cases. When considering the random effect, prostate cancer at stage 3 & 4 in the South East and South West had the largest random effect (>1). These two regions had the highest random effect even when the fixed effects, age, IMD and sex are controlled for. This suggests that in these areas other factors are associated with the higher number of cases which require further investigation. Whereas, the North West region and London have the lowest ranked random effect estimates for stage 3 & 4 prostate cancer, and although they are higher than 1 suggesting cases are higher than expected, a larger number of these cases can be explained by the fixed effects, age, IMD and sex.
Conclusion
Overall, these results confirm the inequalities between the North and the South of the country with deprivation and demographics playing a stronger role in the North than in the South where other factors are likely to be contributing to a higher number of cases at a later stage.
- Elizabeth Wrench – Understanding the distribution of cancer and stage of diagnosis across geographic regions in England
Stream 12: Patient and public involvement and engagement (PPIE) throughout the research cycle
Location: Alsh 2
Chair: Jan Speechley, Public Advisory Board member at Health Data Research UK (HDR UK)
-
- Kirsteen Campbell – How UKLLC and its public contributors work together to evolve their data access process
View Kirsteen’s abstract for Kirsteen Campbell
Background
UK Longitudinal Linkage Collaboration (UKLLC) is the national Trusted Research Environment for data linkage in longitudinal research. The Data Access Public Review Panel (DAPRP) forms an integral part of the data access application process, enabling the public to share their perspective in who can access these data for research. The DAPRP comprises public members including study participants.Objectives
UKLLC collaborates with the Panel on continuous improvement, responding to UKLLC growth, and the resulting additional demands.Methods
The Panel met to discuss methods to support a new remit—the assessment of public good in data access applications—and consider potential risks/harms. A further discussion was held on ethical considerations, in light of UKLLC transitioning to accept applications for research in the public good, beyond COVID-19 research.Results
The session on public good resulted in the Panel implementing a bespoke solution drawing from the National Data Guardian’s types of benefit and the UK Statistics Authority Centre for Applied Data Ethics public good checklist. The Panel recommended UKLLC include examples of risks/harms and develop a values statement and public good criteria, co-developed with Panel members.Conclusion
Involving public contributors from the beginning of a process can deliver diverse viewpoints, co-develop solutions and demonstrate trustworthiness. The UKLLC DAPRP is forging a route others can adopt or adapt.
- Kirsteen Campbell – How UKLLC and its public contributors work together to evolve their data access process
-
- Ruth Jack and Debbie Butler – Meaningful patient and public involvement in large electronic health database research
View Ruth and Debbie’s abstract for Ruth Jack and Debbie Butler
Background
Patient and public involvement (PPI) is essential to health and social care research. Many PPI training courses focus on studies which recruit participants, leaving gaps in work with secondary data.Objective
To create a way of working that demonstrates meaningful PPI in large electronic health database research.Methods
Researchers, public contributors and PPI managers co-designed supports to maintain involvement throughout projects using large EHR databases.Results
Public contributors helped shape questions, were co-applicants on bids, attended project and PPI meetings, and received sessions on methods and interpreting statistical outputs. We produced an online training course, handbook, and an animation about experiences. Public contributors co-authored papers and co-presented findings.Conclusion
Meaningful PPI is possible across the lifecycle of complex EHR studies when contributors are supported, informed, and involved throughout.
- Ruth Jack and Debbie Butler – Meaningful patient and public involvement in large electronic health database research
-
- Brian Kennedy – Leading with Public and Patient Involvement and Engagement in public-private data partnerships
View Brian’s abstract for Brian Kennedy
Lay summary
Flatiron Health embeds PPIE at every level of NHS data partnerships—from engagement to governance—to ensure transparency, trust, and public benefit.Background/Hypothesis
To learn from the experience of every person with cancer, Flatiron partners with NHS Trusts to curate and anonymise routine data; public trust is essential, so PPIE is integrated throughout.Methods
Three-pronged PPIE strategy: governance & operations (Patient Voices Panel; independent Research Transparency Panel), national advocacy, and local partner engagement.Results
200+ directly engaged, ~2,500 reached; 37,000+ multilingual letters issued with 4.31% opt-out; feedback highlights value of transparency, choice, and articulated public benefit.Conclusion
Meaningful, sustained PPIE builds trust and support for data partnerships and is scalable across organisations.
- Brian Kennedy – Leading with Public and Patient Involvement and Engagement in public-private data partnerships
-
- Katie Oldfield – Who decides: Involving the public in deciding how administrative data is used for research
View Katie’s abstract for Katie Oldfield
Research Data Scotland (RDS) has been planning how to bring the public meaningfully into a complex system whilst ensuring researchers can safely access data more simply and quickly to do research for public good. Building on public engagement and sector research, RDS will pilot a new advisory group within its Researcher Access Service and has developed criteria for research in the public good to guide applications and approvals.
Background
Data access requests are reviewed by committees of data owners and experts; public involvement varies. RDS launched Phase One of the Researcher Access Service (RAS) in April 2024; Phase Two will include public members.Aims
Involve the public meaningfully in approvals while keeping the process efficient for researchers, owners and managers.Challenges
Scaling to high volumes, information governance, and integrating a public advisory group without slowing approvals.Approach
Consultation with the public and the Scotland Talks Data panel informed “public good” criteria; reviews of existing models and wider consultation shaped the plan. RDS will pilot a Public Impact and Advisory Group (PIAG) to review requests and give feedback on public benefit, engagement and dissemination.Conclusion
The PIAG pilot is planned for Autumn 2025 to embed meaningful public involvement in Scotland’s national approvals process.
- Katie Oldfield – Who decides: Involving the public in deciding how administrative data is used for research
-
- Ruth Blackburn – Co-producing Public Involvement and Engagement guidance for research using pseudonymised address data
View Ruth’s abstract for Ruth Blackburn
Lay overview
Linking address data (UPRNs) to health records can reveal environment–health relationships but raises privacy concerns. With a public advisory board (PAB), we co-created guidance to embed PPIE across the research cycle.Background
UPRNs encode public address data that can be linked to sensitive health data. Integrated PPIE helps guide ethics, inclusivity and direction.Objectives
Co-create guidance that helps researchers build public trust when using UPRNs.Methods
We consolidated programme learnings on privacy-preserving methods and worked with six PAB volunteers to develop “top tips” for embedding PPIE; the group met regularly, supported by HDR UK’s PPIE team.Results
Outputs include a learning module (featuring researchers and volunteers) and summary guidance describing PPIE at each research stage; the process provided a platform to explore risks and mitigations collaboratively.Conclusions
Co-created resources amplify public voice in training and guide researchers to plan, manage and communicate UPRN-based research with PPIE at the core.
- Ruth Blackburn – Co-producing Public Involvement and Engagement guidance for research using pseudonymised address data
14:30 – 15:00 (Hall 1 and 2, Exhibition)
Coffee break
15:00 – 16:15 (Lomond Auditorium)
Title: UK data assets
Chair: Professor Cathie Sudlow, Director of the Adolescent Health Study
Keynote speakers:
- Naomi Allen, UK Biobank
- Professor Ben Goldacre, OpenSafely
- Dr Raghib Ali, Our Future Health
- Dr Ellen Thomas, Genomics England
16:15 – 16:30 (Lomond Auditorium)
Title: Summary remarks and close
Speaker: Andrew Morris, Director of Health Data Research UK (HDR UK)
16:30
Close
View full agenda as pdf
Day 1 of HDR UK Conference 2025
View the agenda for Day 1 of HDR UK Conference 2025, on Wednesday 15 October 2025.
Health Data Research UK Conference
The UK’s No.1 conference on health data science returns for 2025. Join us at this two-day hybrid event to celebrate the latest advances in health data science that are improving people’s...