Llama
- 12 June 2011
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 961-972
- https://doi.org/10.1145/1989323.1989424
Abstract
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new cluster-based data warehouse system, LLama, a hybrid data management system which combines the features of row-wise and column-wise database systems. In Llama, columns are formed into correlation groups to provide the basis for the vertical partitioning of tables. Llama employs a distributed file system (DFS) to disseminate data among cluster nodes. Above the DFS, a MapReduce-based query engine is supported. We design a new join algorithm to facilitate fast join processing. We present a performance study on TPC-H dataset and compare Llama with Hive, a data warehouse infrastructure built on top of Hadoop. The experiment is conducted on EC2. The results show that Llama has an excellent load performance and its query performance is significantly better than the traditional MapReduce framework based on row-wise storage.Keywords
This publication has 12 references indexed in Scilit:
- MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large ClustersIEEE Transactions on Knowledge and Data Engineering, 2010
- The performance of MapReduceProceedings of the VLDB Endowment, 2010
- DremelProceedings of the VLDB Endowment, 2010
- Making cloud intermediate data fault-tolerantPublished by Association for Computing Machinery (ACM) ,2010
- A comparison of join algorithms for log processing in MaPreducePublished by Association for Computing Machinery (ACM) ,2010
- Optimizing joins in a map-reduce environmentPublished by Association for Computing Machinery (ACM) ,2010
- HadoopDBProceedings of the VLDB Endowment, 2009
- Materialization Strategies in a Column-Oriented DBMSPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Optimizing database architecture for the new bottleneck: memory accessThe VLDB Journal, 2000
- On searching transposed filesACM Transactions on Database Systems, 1979