Our increased understanding of the genes involved in cancer development and progression has ushered in the exciting new era of precision medicine. However, the tsunami of data that has emerged can be confusing, making it difficult to distinguish between gene mutations with clinical relevance and gene variants of uncertain significance.

Over 700 genes have been associated with cancer to date, and hundreds of thousands of changes  in these genes may be implicated in cancer development and progression, highlighting the need for a searchable data dictionary that will yield the breadth and depth of evidence needed for a targeted clinical intervention. While data resources exist, they are fragmented, siloed and, in many cases, not compatible with each other. Now, a team of experts from both sides of the Atlantic have brought all this disparate information together in a single searchable online resource.

The Variant Interpretation for Cancer Consortium (VICC) is a Driver Project of the Global Alliance for Genomics and Health (GA4GH), a worldwide organisation dedicated to effective and secure sharing of data to accelerate knowledge and enhance clinical care. VICC has emerged from the previous work of the GA4GH Clinical Working Group.   In the current issue of the premier scientific journal Nature Genetics, the Consortium describes the creation and application of the VICC Meta-Knowledgebase, which aims to aggregate all known information about gene mutations and variants of uncertain significance into a single searchable resource for the cancer community.

“The new resource serves as a kind of translator so that all of this information generated by different research efforts can be searched simultaneously and their results presented in a common language,” said Obi Griffith, assistant director of the McDonnell Genome Institute (MGI) at Washington University School of Medicine in St. Louis and one of three senior authors on the paper.

“Distinguishing the variants of uncertain significance  from gene mutations that are disease-causing or predict patient outcomes or treatment responses is critical to providing effective care to cancer patients and those at risk of developing disease,” said Obi’s twin brother Malachi Griffith, MGI Assistant Director and another senior author on the paper.

The VICC meta-knowledgebase represents an unprecedented framework for structuring and harmonizing clinical interpretations across the entire world’s knowledge of cancer-related genetic variation. “We relied on established community resources, standards, and guidelines to transform the data from each research effort into a consistent vocabulary,” said Alex Wagner, a post-doctoral fellow in the Griffith Lab and first author on the paper. “As a result, we have consolidated interpretations into a single, harmonized open-access meta-knowledgebase that contains 12,856 harmonized interpretations supported by 4,354 distinct scientific publications.”

“The cancer variation classification world is a fragmented one,” said Mark Lawler, Professor of Digital Health at Queen’s University Belfast, Associate Director of Health Data Research Wales-Northern Ireland, and Scientific Director of DATA-CAN, the UK Health Data Research Hub for Cancer and an author on the paper. “Hundreds of siloed efforts have sprung up around the globe to collate information from clinicians and researchers about whether any particular variant is worthy of concern. But until the work of VICC, none of these knowledgebases spoke the same language. Our “data translator” has harmonised the information and made it freely available online to the entire cancer community.”

“This work was a few years in the making, and is ongoing,” said Wagner, who noted that the six knowledgebases currently aggregated by the VICC meta-knowledgebase represent some of the world’s preeminent publicly accessible knowledge on clinical interpretations of cancer variants. “But there are also a multitude of knowledgebases from academic and clinical centres that would likely benefit from adopting the harmonization framework we outline in our work.” The team is encouraging institutions that maintain such resources to participate in the VICC to increase global consensus of the clinical relevance of each variant and improve cancer care around the world.

“Our clear focus in GA4GH is on the development and provision of robust standards for the clinical and scientific community, so that we realise the enormous power of data,” said Ewan Birney, Director of EMBL’s European Bioinformatics Institute and Chair of GA4GH. “The VICC meta-knowledgebase is an exemplar project which demonstrates what we can achieve when we harness the resources of the data science community to a global challenge.”

“This is a very exciting initiative that highlights the important role of enabling access to data at scale to drive increased knowledge and enhanced provision of modern healthcare” said Caroline Cake, newly appointed CEO of Health Data Research UK (HDR UK), the national institute for health data in the UK.  “We at HDR UK see the value of such a meta-knowledgebase, particularly as it has been made available online to the community to enhance cancer data and its application to improving care for cancer patients.”

The paper, entitled A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer, is available here.