Minassian C, Williams R, Meeraus WH, Smeeth L, Campbell OMR, Thomas SL

Pharmacoepidemiology and drug safety (2019) 28:923-933

PURPOSE: Primary care databases are increasingly used for researching pregnancy, e.g. the effects of maternal drug exposures. However, ascertaining pregnancies, their timing, and outcomes in these data is challenging. While individual studies have adopted different methods, no systematic approach to characterise all pregnancies in a primary care database has yet been published. Therefore, we developed a new algorithm to establish a Pregnancy Register in the UK Clinical Practice Research Datalink (CPRD) GOLD primary care database.

METHODS: We compiled over 4000 read and entity codes to identify pregnancy-related records among women aged 11 to 49 years in CPRD GOLD. Codes were categorised by the stage or outcome of pregnancy to facilitate delineation of pregnancy episodes. We constructed hierarchical rule systems to handle information from multiple sources. We assessed the validity of the Register to identify pregnancy outcomes by comparing our results to linked hospitalisation records and Office for National Statistics population rates.

RESULTS: Our algorithm identified 5.8 million pregnancies among 2.4 million women (January 1987-February 2018). We observed close agreement with hospitalisation data regarding completeness of pregnancy outcomes (91% sensitivity for deliveries and 77% for pregnancy losses) and their timing (median 0 days difference, interquartile range 0-2 days). Miscarriage and prematurity rates were consistent with population figures, although termination and, to a lesser extent, live birth rates were underestimated in the Register.

CONCLUSIONS: The Pregnancy Register offers huge research potential because of its large size, high completeness, and availability. Further validation work is underway to enhance this data resource and identify optimal approaches for its use.