Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Open Access

11 December 2017

journal article
research article
Published by Hindawi Limited in Computational and Mathematical Methods in Medicine

Vol. 2017, 1-16
https://doi.org/10.1155/2017/6120820

Abstract

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.

Keywords

Funding Information

Vancouver Island Health Authority

This publication has 35 references indexed in Scilit:

Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework
Journal of General Internal Medicine, 2013
Big Data: Unleashing information
Journal of Systems Science and Systems Engineering, 2013
An Efficient Distributed Programming Model for Mining Useful Patterns in Big Datasets
IETE Technical Review, 2013
Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units
Science Translational Medicine, 2012
Comparative Study of the New Generation, Agile, Scalable, High Performance NOSQL Databases
International Journal of Computer Applications, 2012
Data mining: past, present and future
The Knowledge Engineering Review, 2011
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics
BMC Bioinformatics, 2010
Medical informatics: Past, present, future
International Journal of Medical Informatics, 2010
MapReduce
Communications of the ACM, 2010
Bigtable
ACM Transactions on Computer Systems, 2008

Cited by 13 articles