QMapper for Smart Grid
- 27 May 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
- p. 647-658
- https://doi.org/10.1145/2723372.2742792
Abstract
Apache Hive has been widely used by Internet companies for big data analytics applications. It can provide the capability of compiling high-level languages into efficient MapReduce workflows, which frees users from complicated and time consuming programming. The popularity of Hive and its HiveQL-compatible systems like Impala and Shark attracts attentions from traditional enterprises as well. However, enterprise big data processing systems such as Smart Grid applications often have to migrate their RDBMS-based legacy applications to Hive rather than directly writing new logic in HiveQL. Considering their differences in syntax and cost model, manual translation from SQL in RDBMS to HiveQL is very difficult, error-prone, and often leads to poor performance. In this paper, we propose QMapper, a tool for automatically translating SQL into proper HiveQL. QMapper consists of a rule-based rewriter and a cost-based optimizer. The experiments based on the TPC-H benchmark demonstrate that, compared to manually rewritten Hive queries provided by Hive contributors, QMapper dramatically reduces the query latency on average. Our real world Smart Grid application also shows its efficiency.Keywords
Funding Information
- National Natural Science Foundation of China (61020106002, 61070027, 61161160566)
This publication has 18 references indexed in Scilit:
- DualTable: A hybrid storage model for update optimization in HivePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- DGFIndex for smart gridProceedings of the VLDB Endowment, 2014
- SQL-on-HadoopProceedings of the VLDB Endowment, 2014
- Interactive analytical processing in big data systemsProceedings of the VLDB Endowment, 2012
- StubbyProceedings of the VLDB Endowment, 2012
- Query optimization for massively parallel data processingPublished by Association for Computing Machinery (ACM) ,2011
- HadoopDBProceedings of the VLDB Endowment, 2009
- HiveProceedings of the VLDB Endowment, 2009
- Performance tradeoffs for client-server query processingACM SIGMOD Record, 1996
- Starburst mid-flight: as the dust clears (database project)IEEE Transactions on Knowledge and Data Engineering, 1990