Multi-level anomaly prediction in Tier-0 datacenter

Abstract

Modern scientific discoveries are driven by an unsatisfiable demand for computational resources. To solve large problems in science, engineering, and business, data centers provide High-Performance Computing (HPC) systems with aggregation of the computing capacity of thousand of computing nodes. Anomaly prediction is critical in order to preserve the continuity of the service of HPC systems and prevent hardware deterioration. In the datacenter, a thermal anomaly occurs when the balance of cooling capacity and computational demand is disturbed. Moreover, this is identifiable from a suspicious/abnormal pattern in the monitoring signals. In this poster, the anomaly prediction task in the HPC systems is investigated by defining complex statistical rules-based and Deep Learning DL-based anomaly detection methods, then utilizing these anomaly detection methods in an anomaly prediction framework.

Keywords

This publication has 5 references indexed in Scilit:

Paving the Way Toward Energy-Aware and Automated Datacentre
Published by Association for Computing Machinery (ACM) ,2019
Artificial Neural Network Based Prediction of Temperature and Flow Profile in Data Centers
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2018
Towards Thermal Aware Workload Scheduling in a Data Center
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Long Short-Term Memory
Neural Computation, 1997

Cited by 1 article