Knowledge Graph-Enabled Cancer Data Analytics
Open Access
- 4 May 2020
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Journal of Biomedical and Health Informatics
- Vol. 24 (7), 1952-1967
- https://doi.org/10.1109/jbhi.2020.2990797
Abstract
Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding cancer characteristics, treatments, and outcomes. Cancer registry data typically (1) categorize each reportable cancer case or tumor at the time of diagnosis, (2) contain demographic information about the patient such as age, gender, and location at time of diagnosis, (3) include planned and completed primary treatment information, and (4) may contain survival outcomes. As structured data is being extracted from various unstructured sources, such as pathology reports, radiology reports, medical records, and stored for reporting and other needs, the associated information representing a reportable cancer is constantly expanding and evolving. While some popular analytic approaches including SEER*Stat and SAS exist, we provide a knowledge graph approach to organizing cancer registry data. Our approach offers unique advantages for timely data analysis and presentation and visualization of valuable information. This knowledge graph approach semantically enriches the data, and easily enables linking with third-party data which can help explain variation in cancer incidence patterns, disparities, and outcomes. We developed a prototype knowledge graph based on the Louisiana Tumor Registry dataset. We present the advantages of the knowledge graph approach by examining: i) scenario-specific queries, ii) links with openly available external datasets, iii) schema evolution for iterative analysis, and iv) data visualization. Our results demonstrate that this graph based solution can perform complex queries, improve query run-time performance by up to 76%, and more easily conduct iterative analyses to enhance researchers' understanding of cancer registry data.Funding Information
- U.S. Department of Energy
- National Cancer Institute
- National Institutes of Health
- ANL (DE-AC02-06-CH11357)
- Lawrence Livermore National Laboratory (DE-AC52-07NA27344)
- Los Alamos National Laboratory (DE-AC5206NA25396)
- Oak Ridge National Laboratory (DE-AC05-00OR22725)
This publication has 33 references indexed in Scilit:
- Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) ApproachSynthesis Lectures on Information Concepts, Retrieval, and Services, 2012
- Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn projectJournal of Biomedical Informatics, 2012
- Semantic web reasoners and languagesArtificial Intelligence Review, 2010
- Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health CentersPLoS Medicine, 2008
- Race and triple negative threats to breast cancer survival: a population-based study in Atlanta, GABreast Cancer Research and Treatment, 2008
- The OBO Foundry: coordinated evolution of ontologies to support biomedical data integrationNature Biotechnology, 2007
- Triple-Negative Breast Cancer: Clinical Features and Patterns of RecurrenceClinical Cancer Research, 2007
- Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White PaperJournal of the American Medical Informatics Association, 2007
- Prognostic markers in triple‐negative breast cancerCancer, 2006
- An Introduction to the Resource Description FrameworkBulletin of the American Society for Information Science and Technology, 1998