Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand

Abstract
The era of big data has led to the emergence of new real-time distributed stream processing engines like Apache Storm. We present Stela (STream processing ELAsticity), a stream processing system that supports scale-out and scale-in operations in an on-demand manner, i.e., when the user requests such a scaling operation. Stela meets two goals: 1) it optimizes post-scaling throughput, and 2) it minimizes interruption to the ongoing computation while the scaling operation is being carried out. We have integrated Stela into Apache Storm. We present experimental results using micro-benchmark Storm applications, as well as production applications from industry (Yahoo! Inc. and IBM). Our experiments show that compared to Apache Storm's default scheduler, Stela's scale-out operation achieves throughput that is 21-120% higher, and interruption time that is significantly smaller. Stela's scale-in operation chooses the right set of servers to remove and achieves 2X-5X higher throughput than Storm's default strategy.

This publication has 15 references indexed in Scilit: