Insights from Adopting a Data Commons Approach for Large-scale Observational Cohort Studies: The California Teachers Study
- 11 February 2020
- journal article
- research article
- Published by American Association for Cancer Research (AACR) in Cancer Epidemiology, Biomarkers & Prevention
- Vol. 29 (4), 777-786
- https://doi.org/10.1158/1055-9965.EPI-19-0842
Abstract
Background: Large-scale cancer epidemiology cohorts (CEC) have successfully collected, analyzed, and shared patient-reported data for years. CECs increasingly need to make their data more findable, accessible, interoperable, and reusable, or FAIR, How CECs should approach this transformation is unclear. Methods: The California Teachers Study (CTS) is an observational CEC of 133,477 participants followed since 1995-1996. In 2014, we began updating our data storage, management, analysis, and sharing strategy. With the San Diego Supercomputer Center, we deployed a new infrastructure based on a data warehouse to integrate and manage data and a secure and shared workspace with documentation, software, and analytic tools that facilitate collaboration and accelerate analyses. Results: Our new CTS infrastructure includes a data warehouse and data marts, which are focused subsets from the data warehouse designed for efficiency. The secure CTS workspace utilizes a remote desktop service that operates within a Health Insurance Portability and Accountability Act (HIPAA)- and Federal Information Security Management Act (FISMA)-compliant platform. Our infrastructure offers broad access to CTS data, includes statistical analysis and data visualization software and tools, flexibly manages other key data activities (e.g., cleaning, updates, and data sharing), and will continue to evolve to advance FAIR principles. Conclusions: Our scalable infrastructure provides the security, authorization, data model, metadata, and analytic tools needed to manage, share, and analyze CTS data in ways that are consistent with the NCPs Cancer Research Data Commons Framework. Impact: The CTS's implementation of new infrastructure in an ongoing CEC demonstrates how population sciences can explore and embrace new cloud-based and analytics infrastructure to accelerate cancer research and translation.Funding Information
- NCI
- NIH (U01-CA199277, P30-CA033572, P30-CA023100, UM1-CA164917, R01-CA077398)
- California Department of Public Health (103885)
- Centers for Disease Control and Prevention's National Program of Cancer Registries (5NU58DP006344)
- National Cancer Institute's Surveillance, Epidemiology and End Results Program (HHSN261201800032I, HHSN261201800015I, HHSN261201800009I)
This publication has 12 references indexed in Scilit:
- Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic DataTrends in Genetics, 2019
- A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trialsDatabase: The Journal of Biological Databases and Curation, 2019
- The National Cancer Institute Cohort Consortium: An International Pooling Collaboration of 58 Cohorts from 20 CountriesCancer Epidemiology, Biomarkers & Prevention, 2018
- Progress Toward Cancer Data EcosystemsThe Cancer Journal, 2018
- Architecture and Implementation of a Clinical Research Data Warehouse for Prostate CancereGEMs (Generating Evidence & Methods to improve patient outcomes), 2018
- Clinical Data WarehouseThe Health Care Manager, 2017
- Transforming Epidemiology for 21st Century Medicine and Public HealthCancer Epidemiology, Biomarkers & Prevention, 2013
- New Models for Large Prospective Studies: Is There a Better Way?American Journal of Epidemiology, 2012
- Criteria for the Evaluation of Large Cohort Studies: An Application to the Nurses’ Health StudyJNCI Journal of the National Cancer Institute, 2008
- High breast cancer incidence rates among California teachers: results from the California Teachers Study (United States)Cancer Causes & Control, 2002