QMapper for Smart Grid

Abstract

Apache Hive has been widely used by Internet companies for big data analytics applications. It can provide the capability of compiling high-level languages into efficient MapReduce workflows, which frees users from complicated and time consuming programming. The popularity of Hive and its HiveQL-compatible systems like Impala and Shark attracts attentions from traditional enterprises as well. However, enterprise big data processing systems such as Smart Grid applications often have to migrate their RDBMS-based legacy applications to Hive rather than directly writing new logic in HiveQL. Considering their differences in syntax and cost model, manual translation from SQL in RDBMS to HiveQL is very difficult, error-prone, and often leads to poor performance. In this paper, we propose QMapper, a tool for automatically translating SQL into proper HiveQL. QMapper consists of a rule-based rewriter and a cost-based optimizer. The experiments based on the TPC-H benchmark demonstrate that, compared to manually rewritten Hive queries provided by Hive contributors, QMapper dramatically reduces the query latency on average. Our real world Smart Grid application also shows its efficiency.

Keywords

Funding Information

National Natural Science Foundation of China (61020106002, 61070027, 61161160566)

This publication has 18 references indexed in Scilit:

DualTable: A hybrid storage model for update optimization in Hive
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
DGFIndex for smart grid
Proceedings of the VLDB Endowment, 2014
SQL-on-Hadoop
Proceedings of the VLDB Endowment, 2014
Interactive analytical processing in big data systems
Proceedings of the VLDB Endowment, 2012
Stubby
Proceedings of the VLDB Endowment, 2012
Query optimization for massively parallel data processing
Published by Association for Computing Machinery (ACM) ,2011
HadoopDB
Proceedings of the VLDB Endowment, 2009
Hive
Proceedings of the VLDB Endowment, 2009
Performance tradeoffs for client-server query processing
ACM SIGMOD Record, 1996
Starburst mid-flight: as the dust clears (database project)
IEEE Transactions on Knowledge and Data Engineering, 1990

Cited by 6 articles