Eight practices for data management to enable team data science

Open Access

23 June 2020

journal article
research article
Published by Cambridge University Press (CUP) in Journal of Clinical and Translational Science

Vol. 5 (1), 1-21
https://doi.org/10.1017/cts.2020.501

Abstract

Introduction: In clinical and translational research, data science is often and fortuitously integrated with data collection. This contrasts to the typical position of data scientists in other settings, where they are isolated from data collectors. Because of this, effective use of data science techniques to resolve translational questions requires innovation in the organization and management of these data. Methods: We propose an operational framework that respects this important difference in how research teams are organized. To maximize the accuracy and speed of the clinical and translational data science enterprise under this framework, we define a set of eight best practices for data management. Results: In our own work at the University of Rochester, we have strived to utilize these practices in a customized version of the open source LabKey platform for integrated data management and collaboration. We have applied this platform to cohorts that longitudinally track multidomain data from over 3000 subjects. Conclusions: We argue that this has made analytical datasets more readily available and lowered the bar to interdisciplinary collaboration, enabling a team-based data science that is unique to the clinical and translational setting.

Keywords

This publication has 28 references indexed in Scilit:

Birth weight, gestational age, fetal growth and childhood asthma hospitalization
Allergy, Asthma & Clinical Immunology, 2014
Quality control, analysis and secure sharing of Luminex® immunoassay data using the open source LabKey Server platform
BMC Bioinformatics, 2013
Ancillary study management systems: a review of needs
BMC Medical Informatics and Decision Making, 2013
Stability of T cell phenotype and functional assays following heparinized umbilical cord blood collection
Cytometry Part A, 2012
False-Positive Psychology
Psychological Science, 2011
LabKey Server NAb: A tool for analyzing, visualizing and sharing results from neutralizing antibody assays
BMC Immunology, 2011
LabKey Server: An open source platform for scientific data integration, analysis and collaboration
BMC Bioinformatics, 2011
Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support
Journal of Biomedical Informatics, 2008
Development of an automated analysis system for data from flow cytometric intracellular cytokine staining assays from clinical vaccine trials
Cytometry Part A, 2008
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
Statistical Science, 2001

Cited by 2 articles