Sparrow
- 3 November 2013
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.Keywords
Funding Information
- Amazon Web Services
- Ericsson
- Microsoft
- Defense Advanced Research Projects Agency (FA8750-12-2-0331)
- Intel Corporation
- Cisco Systems
- Huawei Technologies
- Oracle
- Cloudera
- Hortonworks
- Samsung
- VMware
- U.S. Department of Defense
- WANdisco
- Hertz Foundation
- Division of Computing and Communication Foundations (CCF-1139158)
- General Electric
- NetApp
- Yahoo!
- SAP America
- Clearstory Data
- FitWave
- Splunk
This publication has 15 references indexed in Scilit:
- The tail at scaleCommunications of the ACM, 2013
- An update on the scalability limits of the Condor batch systemJournal of Physics: Conference Series, 2011
- A generalization of multiple choice balls-into-binsPublished by Association for Computing Machinery (ACM) ,2011
- DremelProceedings of the VLDB Endowment, 2010
- QuincyPublished by Association for Computing Machinery (ACM) ,2009
- The power of two choices in randomized load balancingIEEE Transactions on Parallel and Distributed Systems, 2001
- The Power of Two Random Choices: A Survey of Techniques and ResultsPublished by Springer Science and Business Media LLC ,2001
- How useful is old information?IEEE Transactions on Parallel and Distributed Systems, 2000
- Analysis and simulation of a fair queueing algorithmPublished by Association for Computing Machinery (ACM) ,1989
- Adaptive load sharing in homogeneous distributed systemsIEEE Transactions on Software Engineering, 1986