S4: Distributed Stream Computing Platform
Top Cited Papers
- 1 December 2010
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 170-177
- https://doi.org/10.1109/icdmw.2010.172
Abstract
S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. In this paper, we outline the S4 architecture in detail, describe various applications, including real-life deployments. Our design is primarily driven by large scale applications for data mining and machine learning in a production environment. We show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.Keywords
This publication has 7 references indexed in Scilit:
- Actor frameworks for the JVM platformPublished by Association for Computing Machinery (ACM) ,2009
- MapReduceCommunications of the ACM, 2008
- Internet Advertising and the Generalized Second-Price Auction: Selling Billions of Dollars Worth of KeywordsAmerican Economic Review, 2007
- The 8 requirements of real-time stream processingACM SIGMOD Record, 2005
- Aurora: a new model and architecture for data stream managementThe VLDB Journal, 2003
- ActorsPublished by MIT Press ,1986
- A Simplex Method for Function MinimizationThe Computer Journal, 1965