Biscuit: A Framework for Near-Data Processing of Big Data Workloads

25 August 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 153-165
https://doi.org/10.1109/isca.2016.23

Abstract

Data-intensive queries are common in business intelligence, data warehousing and analytics applications. Typically, processing a query involves full inspection of large in-storage data sets by CPUs. An intuitive way to speed up such queries is to reduce the volume of data transferred over the storage network to a host system. This can be achieved by filtering out extraneous data within the storage, motivating a form of near-data processing. This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives. It allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner. In order to offer a high-level programming model, Biscuit builds on the concept of data flow. Data processing tasks communicate through typed and data-ordered ports. Biscuit does not distinguish tasks that run on the host system and the storage system. As the result, Biscuit has desirable traits like generality and expressiveness, while promoting code reuse and naturally exposing concurrency. We implement Biscuit on a host system that runs the Linux OS and a high-performance solid-state drive. We demonstrate the effectiveness of our approach and implementation with experimental results. When data filtering is done by hardware in the solid-state drive, the average speed-up obtained for the top five queries of TPC-H is over 15x.

Keywords

This publication has 19 references indexed in Scilit:

BlueDBM
Published by Association for Computing Machinery (ACM) ,2015
Towards sustainable in-situ server systems in the big data era
Published by Association for Computing Machinery (ACM) ,2015
Ibex
Proceedings of the VLDB Endowment, 2014
Query processing on smart SSDs
Published by Association for Computing Machinery (ACM) ,2013
Active disk meets flash
Published by Association for Computing Machinery (ACM) ,2013
Windows Azure Storage
Published by Association for Computing Machinery (ACM) ,2011
SkimpyStash
Published by Association for Computing Machinery (ACM) ,2011
What is Twitter, a social network or a news media?
Published by Association for Computing Machinery (ACM) ,2010
A case for intelligent disks (IDISKs)
ACM SIGMOD Record, 1998
A fast string searching algorithm
Communications of the ACM, 1977

Cited by 90 articles