Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

p. 637-349
https://doi.org/10.1109/micro50266.2020.00059

Abstract

Saving energy for latency-critical applications like web search can be challenging because of their strict tail latency constraints. State-of-the-art power management frameworks use Dynamic Voltage and Frequency Scaling (DVFS) and Sleep states techniques to slow down the request processing and finish the search just-in-time. However, accurately predicting the compute demand of a request can be difficult. In this paper, we present Gemini, a novel power management framework for latency-critical search engines. Gemini has two unique features to capture the per query service time variation. First, at light loads without request queuing, a two-step DVFS is used to manage the CPU power. Our two-step DVFS selects the initial CPU frequency based on the query specific service time prediction and then judiciously boosts the initial frequency at the right time to catch-up to the deadline. The determination of boosting time further relies on estimating the error in the prediction of individual query’s service time. At high loads, where there is request queuing, only the current request being executed and the critical request in the queue adopt a two-step DVFS. All the other requests in-between use the same frequency to reduce the frequency transition overhead. Second, we develop two separate neural network models, one for predicting the service time and the other for the error in the prediction. The combination of these two predictors significantly improves the power saving and tail latency results of our two-step DVFS. Gemini is implemented on the Solr search engine. Evaluations on three representative query traces show that Gemini saves 41% of the CPU power, and is better than other state-of-the-art techniques.

Keywords

This publication has 44 references indexed in Scilit:

Optimal Aggregation Policy for Reducing Tail Latency of Web Search
Published by Association for Computing Machinery (ACM) ,2015
Selective Search
ACM Transactions on Information Systems, 2015
SleepScale
ACM SIGARCH Computer Architecture News, 2014
Unicorn
Proceedings of the VLDB Endowment, 2013
Faster top-k document retrieval using block-max indexes
Published by Association for Computing Machinery (ACM) ,2011
Wikipedia workload analysis for decentralized hosting
Computer Networks, 2009
A pipelined architecture for distributed text query evaluation
Information Retrieval Journal, 2006
Formal online methods for voltage/frequency control in multiple clock domain microprocessors
Published by Association for Computing Machinery (ACM) ,2004
Combining fuzzy information
ACM SIGMOD Record, 2002
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems, 1998

Cited by 9 articles