Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines
- 1 October 2020
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Abstract
Saving energy for latency-critical applications like web search can be challenging because of their strict tail latency constraints. State-of-the-art power management frameworks use Dynamic Voltage and Frequency Scaling (DVFS) and Sleep states techniques to slow down the request processing and finish the search just-in-time. However, accurately predicting the compute demand of a request can be difficult. In this paper, we present Gemini, a novel power management framework for latency-critical search engines. Gemini has two unique features to capture the per query service time variation. First, at light loads without request queuing, a two-step DVFS is used to manage the CPU power. Our two-step DVFS selects the initial CPU frequency based on the query specific service time prediction and then judiciously boosts the initial frequency at the right time to catch-up to the deadline. The determination of boosting time further relies on estimating the error in the prediction of individual query’s service time. At high loads, where there is request queuing, only the current request being executed and the critical request in the queue adopt a two-step DVFS. All the other requests in-between use the same frequency to reduce the frequency transition overhead. Second, we develop two separate neural network models, one for predicting the service time and the other for the error in the prediction. The combination of these two predictors significantly improves the power saving and tail latency results of our two-step DVFS. Gemini is implemented on the Solr search engine. Evaluations on three representative query traces show that Gemini saves 41% of the CPU power, and is better than other state-of-the-art techniques.Keywords
This publication has 44 references indexed in Scilit:
- Optimal Aggregation Policy for Reducing Tail Latency of Web SearchPublished by Association for Computing Machinery (ACM) ,2015
- Selective SearchACM Transactions on Information Systems, 2015
- SleepScaleACM SIGARCH Computer Architecture News, 2014
- UnicornProceedings of the VLDB Endowment, 2013
- Faster top-k document retrieval using block-max indexesPublished by Association for Computing Machinery (ACM) ,2011
- Wikipedia workload analysis for decentralized hostingComputer Networks, 2009
- A pipelined architecture for distributed text query evaluationInformation Retrieval Journal, 2006
- Formal online methods for voltage/frequency control in multiple clock domain microprocessorsPublished by Association for Computing Machinery (ACM) ,2004
- Combining fuzzy informationACM SIGMOD Record, 2002
- Inverted files versus signature files for text indexingACM Transactions on Database Systems, 1998