MapReduce

1 January 2008

journal article
research article
Published by Association for Computing Machinery (ACM) in Communications of the ACM

Vol. 51 (1), 107-113
https://doi.org/10.1145/1327452.1327492

Abstract

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

Keywords

This publication has 9 references indexed in Scilit:

Evaluating MapReduce for Multi-core and Multiprocessor Systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
The Google file system
Published by Association for Computing Machinery (ACM) ,2003
Web search for a planet: the google cluster architecture
IEEE Micro, 2003
Active disks for large-scale data processing
Computer, 2001
Cluster-based scalable network services
Published by Association for Computing Machinery (ACM) ,1997
High-performance sorting on networks of workstations
Published by Association for Computing Machinery (ACM) ,1997
Efficient dispersal of information for security, load balancing, and fault tolerance
Journal of the ACM, 1989
Scans as primitive parallel operations
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1989
Parallel Prefix Computation
Journal of the ACM, 1980

Cited by 10222 articles