Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
Open Access
- 7 August 2019
- journal article
- research article
- Published by Georg Thieme Verlag KG in Applied Clinical Informatics
- Vol. 10 (04), 679-692
- https://doi.org/10.1055/s-0039-1695793
Abstract
Background High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. Objectives To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Methods Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. Results The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients. Conclusion A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source. The experiments were performed using anonymized patient data. The authors therefore declare that this study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects.This publication has 36 references indexed in Scilit:
- Properties of the propagating shock wave in the accretion flow around GX 339-4 in the 2010 outburstAstronomy & Astrophysics, 2010
- Safeguarding donors’ personal rights and biobank autonomy in biobank networks: the CRIP privacy regimeCell and Tissue Banking, 2010
- Individual genomes and personalized medicine: life diversity and complexityPersonalized Medicine, 2010
- Schema exchange: Generic mappings for transforming data and metadataData & Knowledge Engineering, 2009
- BioPortal: ontologies and integrated data resources at the click of a mouseNucleic Acids Research, 2009
- ETL Workflows: From Formal Specification to OptimizationLecture Notes in Computer Science, 2006
- Methods in biomedical ontologyJournal of Biomedical Informatics, 2006
- Matching Unstructured Vocabularies Using a Background OntologyLecture Notes in Computer Science, 2006
- A census of human cancer genesNature Reviews Cancer, 2004
- A survey of approaches to automatic schema matchingThe VLDB Journal, 2001