Biscuit: A Framework for Near-Data Processing of Big Data Workloads
- 25 August 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 153-165
- https://doi.org/10.1109/isca.2016.23
Abstract
Data-intensive queries are common in business intelligence, data warehousing and analytics applications. Typically, processing a query involves full inspection of large in-storage data sets by CPUs. An intuitive way to speed up such queries is to reduce the volume of data transferred over the storage network to a host system. This can be achieved by filtering out extraneous data within the storage, motivating a form of near-data processing. This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives. It allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner. In order to offer a high-level programming model, Biscuit builds on the concept of data flow. Data processing tasks communicate through typed and data-ordered ports. Biscuit does not distinguish tasks that run on the host system and the storage system. As the result, Biscuit has desirable traits like generality and expressiveness, while promoting code reuse and naturally exposing concurrency. We implement Biscuit on a host system that runs the Linux OS and a high-performance solid-state drive. We demonstrate the effectiveness of our approach and implementation with experimental results. When data filtering is done by hardware in the solid-state drive, the average speed-up obtained for the top five queries of TPC-H is over 15x.Keywords
This publication has 19 references indexed in Scilit:
- BlueDBMPublished by Association for Computing Machinery (ACM) ,2015
- Towards sustainable in-situ server systems in the big data eraPublished by Association for Computing Machinery (ACM) ,2015
- IbexProceedings of the VLDB Endowment, 2014
- Query processing on smart SSDsPublished by Association for Computing Machinery (ACM) ,2013
- Active disk meets flashPublished by Association for Computing Machinery (ACM) ,2013
- Windows Azure StoragePublished by Association for Computing Machinery (ACM) ,2011
- SkimpyStashPublished by Association for Computing Machinery (ACM) ,2011
- What is Twitter, a social network or a news media?Published by Association for Computing Machinery (ACM) ,2010
- A case for intelligent disks (IDISKs)ACM SIGMOD Record, 1998
- A fast string searching algorithmCommunications of the ACM, 1977