MapReduce
- 1 January 2008
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Communications of the ACM
- Vol. 51 (1), 107-113
- https://doi.org/10.1145/1327452.1327492
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.Keywords
This publication has 9 references indexed in Scilit:
- Evaluating MapReduce for Multi-core and Multiprocessor SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- The Google file systemPublished by Association for Computing Machinery (ACM) ,2003
- Web search for a planet: the google cluster architectureIEEE Micro, 2003
- Active disks for large-scale data processingComputer, 2001
- Cluster-based scalable network servicesPublished by Association for Computing Machinery (ACM) ,1997
- High-performance sorting on networks of workstationsPublished by Association for Computing Machinery (ACM) ,1997
- Efficient dispersal of information for security, load balancing, and fault toleranceJournal of the ACM, 1989
- Scans as primitive parallel operationsInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1989
- Parallel Prefix ComputationJournal of the ACM, 1980