The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation
22 March 2021
Samuel Lambert and colleagues have developed an open resource of published PGSs including full scoring information annotated with the expertly curated metadata.
Overview
Polygenic risk scores (PGSs) have immense potential for helping predict people’s likelihood of developing many serious illnesses. They work by aggregating the effects of a multitude of small genetic variants into a single number. However, progress is being hindered by a lack of common standards or best practice. The new Polygenic Score (PGS) Catalog is a major step forward.
The challenge
PGSs could be invaluable for quantifying a patient’s risk of everything from heart disease to certain cancers. They could also help predict the prognosis and best treatment options. Other potential uses include the identification of new disease biomarkers and drug targets.
It’s an area that’s attracting a lot of interest with around 1,000 PGS-related articles published since 2009, but there are major challenges due to the lack of consensus on what is reported by researchers and how it is expressed. The underreporting of key PGS information has led to problems comparing different scores that predict the same diseases, and testing which are the most accurate.
The solution
A new paper in Nature Genetics (March 2021) by Dr Samuel Lambert (University of Cambridge) and a HDR UK Associated Researcher) et al, describes how the new catalogue helps. It is an open resource of published PGSs, that includes full scoring information annotated with the expertly curated metadata. It allows PGSs to be reused and evaluated so their accuracy and clinical value can be fully assessed.
Impact and outcomes
The catalogue has transformative potential to help a young area of research become of real clinical value. It is already being extensively used and holds >140 papers, demonstrating its capacity to harmonise and unify them in one database.
More material is being added and the project is moving into new phases, partly thanks to HDR UK funding to support the employment of a software developer and two trainee curators.
The team is adding new features to make the data even more usable by providing the scores in a way that can more easily be applied by users of genetic data. Work is also underway to better catalogue how well the scores perform in different ancestry groups.
HDR UK aims and priorities
The catalogue contributes directly to the HDR UK objectives of:
- Improving healthy life expectancy for people living with a common disease
- Developing and applying advanced health data science to address major health challenges
- Creating a world-leading data infrastructure and UK-wide approach to secure data services
It also addresses the HDR UK research priorities for:
- Understanding the causes of diseases
- Improving public health
- Better care
- Human phenome
- Applied analytics
Team and authors
Dr Lambert’s work was supervised by Professor Michael Inouye and Dr Jackie MacArthur now Research Project Manager, at the BHF Data Science Centre.
The catalogue is funded by the University of Cambridge, the European Bioinformatics Institute (EMBL-EBI) and the Baker Institute.
Abstract
We present the Polygenic Score (PGS) Catalog (https://www.PGSCatalog.org), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.