When Do Redundant Requests Reduce Latency?

Open Access

Abstract

Many systems possess the flexibility to serve requests in more than one way, such as distributed storage systems that store multiple copies of the data. In such systems, the latency of serving the requests may potentially be reduced by sending redundant requests: a request may be sent to more servers than needed and deemed served when the requisite number of servers complete service. Such a mechanism trades off the possibility of faster execution of the request with the increase in the load on the system. Several recent works empirically evaluate the latency performance of redundant requests in diverse settings. In this paper, we perform an analytical study of the latency performance of redundant requests, with the primary goals of characterizing under what scenarios sending redundant requests will help (and under what scenarios it will not), and of designing optimal redundant-requesting policies. We show that when service times are i.i.d. memoryless or “heavier,” and when the additional copies of already-completed jobs can be removed instantly, maximally scheduling redundant requests achieves the optimal average latency. On the other hand, when service times are i.i.d. “lighter” or when service times are memoryless and removal of jobs is not instantaneous, then not having any redundancy in the requests is optimal under high loads. Our results are applicable to arbitrary arrival processes.

Keywords

Funding Information

Berkeley Fellowship
Microsoft Research PhD Fellowship
KFAS Fellowship
AFOSR MURI (FA9550-10-1-0567)
NSF (CCF-1116404)

This publication has 22 references indexed in Scilit:

The MDS queue: Analysing the latency performance of erasure codes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
FAST CLOUD: Pushing the Envelope on Delay Performance of Cloud Storage With Coding
IEEE/ACM Transactions on Networking, 2013
When do redundant requests reduce latency ?
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Reducing web latency
Published by Association for Computing Machinery (ACM) ,2013
Regenerating codes for errors and erasures in distributed storage
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Crowds in two seconds
Published by Association for Computing Machinery (ACM) ,2011
Computer Architecture and Implementation
Published by Cambridge University Press (CUP) ,2000
Data partitioning and load balancing in parallel disk systems
The VLDB Journal, 1998
An introduction to disk drive modeling
Computer, 1994
Dynamic file allocation in disk arrays
Published by Association for Computing Machinery (ACM) ,1991

Cited by 83 articles