When Do Redundant Requests Reduce Latency?

Abstract
Many systems possess the flexibility to serve requests in more than one way, such as distributed storage systems that store multiple copies of the data. In such systems, the latency of serving the requests may potentially be reduced by sending redundant requests: a request may be sent to more servers than needed and deemed served when the requisite number of servers complete service. Such a mechanism trades off the possibility of faster execution of the request with the increase in the load on the system. Several recent works empirically evaluate the latency performance of redundant requests in diverse settings. In this paper, we perform an analytical study of the latency performance of redundant requests, with the primary goals of characterizing under what scenarios sending redundant requests will help (and under what scenarios it will not), and of designing optimal redundant-requesting policies. We show that when service times are i.i.d. memoryless or “heavier,” and when the additional copies of already-completed jobs can be removed instantly, maximally scheduling redundant requests achieves the optimal average latency. On the other hand, when service times are i.i.d. “lighter” or when service times are memoryless and removal of jobs is not instantaneous, then not having any redundancy in the requests is optimal under high loads. Our results are applicable to arbitrary arrival processes.
Funding Information
  • Berkeley Fellowship
  • Microsoft Research PhD Fellowship
  • KFAS Fellowship
  • AFOSR MURI (FA9550-10-1-0567)
  • NSF (CCF-1116404)

This publication has 22 references indexed in Scilit: