Multi-Objective Big Data View Materialization Using NSGA-II

1 April 2021

journal article
research article
Published by IGI Global in Information Resources Management Journal

Vol. 34 (2), 1-28
https://doi.org/10.4018/irmj.2021040101

Abstract

Big data views, in the context of distributed file system (DFS), are defined over structured, semi-structured and unstructured data that are voluminous in nature with the purpose to reduce the response time of queries over Big data. As the size of semi-structured and unstructured data in Big data is very large compared to structured data, a framework based on query attributes on Big data can be used to identify Big data views. Materializing Big data views can enhance the query response time and facilitate efficient distribution of data over the DFS based application. Given all the Big data views cannot be materialized, therefore, a subset of Big data views should be selected for materialization. The purpose of view selection for materialization is to improve query response time subject to resource constraints. The Big data view materialization problem was defined as a bi-objective problem with the two objectives- minimization of query evaluation cost and minimization of the update processing cost, with a constraint on the total size of the materialized views. This problem is addressed in this paper using multi-objective genetic algorithm NSGA-II. The experimental results show that proposed NSGA-II based Big data view selection algorithm is able to select reasonably good quality views for materialization. Request access from your librarian to read this article's full text.

Keywords

This publication has 17 references indexed in Scilit:

IoT-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges
IEEE Internet of Things Journal, 2016
Parallel Processing Systems for Big Data: A Survey
Proceedings of the IEEE, 2016
When things matter: A survey on data-centric internet of things
Journal of Network and Computer Applications, 2016
Beyond the hype: Big data concepts, methods, and analytics
International Journal of Information Management, 2015
An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications
IEEE Transactions on Network and Service Management, 2014
An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints
IEEE Transactions on Evolutionary Computation, 2013
10 rules for scalable performance in 'simple operation' datastores
Communications of the ACM, 2011
MapReduce
Communications of the ACM, 2010
The pathologies of big data
Communications of the ACM, 2009
A formal perspective on the view selection problem
The VLDB Journal, 2002

Cited by 3 articles