Dynamic Autoselection and Autotuning of Machine Learning Models for Cloud Network Analytics

19 October 2018

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems

Vol. 30 (5), 1052-1064
https://doi.org/10.1109/tpds.2018.2876844

Abstract

Cloud network monitoring data is dynamic and distributed. Signals to monitor the cloud can appear, disappear or change their importance and clarity over time. Machine learning (ML) models tuned to a given data set can therefore quickly become inadequate. A model might be highly accurate at one point in time but may lose its accuracy at a later time due to changes in input data and their features. Distributed learning with dynamic model selection is therefore often required. Under such selection, poorly performing models (although aggressively tuned for the prior data) are retired or put on standby while new or standby models are brought in. The well-known method of Ensemble ML (EML) may potentially be applied to improve the overall accuracy of a family of ML models. Unfortunately, EML has several disadvantages, including the need for continuous training, excessive computational resources, requirement for large training datasets, high risks of overfitting, and a time-consuming model-building process. In this paper, we propose a novel cloud methodology for automatic ML model selection and tuning that automates the model build and selection and is competitive with existing methods. We use unsupervised learning to better explore the data space before the generation of targeted supervised learning models in an automated fashion. In particular, we create a Cloud DevOps architecture for autotuning and selection based on container orchestration and messaging between containers, and take advantage of a new autoscaling method to dynamically create and evaluate instantiations of ML algorithms. The proposed methodology and tool are demonstrated on cloud network security datasets.

Keywords

Funding Information

IBM T. J. Watson Research Center, Yorktown Heights, NY
Joint Study Agreement (W1463335)
IBM Research
Khalifa University, Abu Dhabi, UAE

This publication has 23 references indexed in Scilit:

Integration of Cloud computing and Internet of Things: A survey
Future Generation Computer Systems, 2016
The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set
Information Security Journal: A Global Perspective, 2016
UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Ensemble learning of rule-based evolutionary algorithm using multi-layer perceptron for supporting decisions in stock trading problems
Applied Soft Computing, 2015
Automating model search for large scale machine learning
Published by Association for Computing Machinery (ACM) ,2015
Neural Networks and Support Vector Machine Algorithms for Automatic Cloud Classification of Whole-Sky Ground-Based Images
IEEE Geoscience and Remote Sensing Letters, 2014
Energy-Efficient Virtual Machines Consolidation in Cloud Data Centers Using Reinforcement Learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information
Computers & Geosciences, 2014
A survey on feature selection methods
Computers and Electrical Engineering, 2014
Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods
Communications in Computer and Information Science, 2012

Cited by 14 articles