Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
Open Access
- 11 December 2017
- journal article
- research article
- Published by Hindawi Limited in Computational and Mathematical Methods in Medicine
- Vol. 2017, 1-16
- https://doi.org/10.1155/2017/6120820
Abstract
Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.Keywords
Funding Information
- Vancouver Island Health Authority
This publication has 35 references indexed in Scilit:
- Bringing Big Data to Personalized Healthcare: A Patient-Centered FrameworkJournal of General Internal Medicine, 2013
- Big Data: Unleashing informationJournal of Systems Science and Systems Engineering, 2013
- An Efficient Distributed Programming Model for Mining Useful Patterns in Big DatasetsIETE Technical Review, 2013
- Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care UnitsScience Translational Medicine, 2012
- Comparative Study of the New Generation, Agile, Scalable, High Performance NOSQL DatabasesInternational Journal of Computer Applications, 2012
- Data mining: past, present and futureThe Knowledge Engineering Review, 2011
- An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformaticsBMC Bioinformatics, 2010
- Medical informatics: Past, present, futureInternational Journal of Medical Informatics, 2010
- MapReduceCommunications of the ACM, 2010
- BigtableACM Transactions on Computer Systems, 2008