Sparrow

Abstract
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.
Funding Information
  • Facebook
  • Amazon Web Services
  • Ericsson
  • Microsoft
  • Defense Advanced Research Projects Agency (FA8750-12-2-0331)
  • Intel Corporation
  • Cisco Systems
  • Huawei Technologies
  • Oracle
  • Cloudera
  • Hortonworks
  • Samsung
  • VMware
  • U.S. Department of Defense
  • WANdisco
  • Hertz Foundation
  • Division of Computing and Communication Foundations (CCF-1139158)
  • General Electric
  • NetApp
  • Yahoo!
  • Google
  • SAP America
  • Clearstory Data
  • FitWave
  • Splunk

This publication has 15 references indexed in Scilit: