Dynamic Autoselection and Autotuning of Machine Learning Models for Cloud Network Analytics
- 19 October 2018
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 30 (5), 1052-1064
- https://doi.org/10.1109/tpds.2018.2876844
Abstract
Cloud network monitoring data is dynamic and distributed. Signals to monitor the cloud can appear, disappear or change their importance and clarity over time. Machine learning (ML) models tuned to a given data set can therefore quickly become inadequate. A model might be highly accurate at one point in time but may lose its accuracy at a later time due to changes in input data and their features. Distributed learning with dynamic model selection is therefore often required. Under such selection, poorly performing models (although aggressively tuned for the prior data) are retired or put on standby while new or standby models are brought in. The well-known method of Ensemble ML (EML) may potentially be applied to improve the overall accuracy of a family of ML models. Unfortunately, EML has several disadvantages, including the need for continuous training, excessive computational resources, requirement for large training datasets, high risks of overfitting, and a time-consuming model-building process. In this paper, we propose a novel cloud methodology for automatic ML model selection and tuning that automates the model build and selection and is competitive with existing methods. We use unsupervised learning to better explore the data space before the generation of targeted supervised learning models in an automated fashion. In particular, we create a Cloud DevOps architecture for autotuning and selection based on container orchestration and messaging between containers, and take advantage of a new autoscaling method to dynamically create and evaluate instantiations of ML algorithms. The proposed methodology and tool are demonstrated on cloud network security datasets.Keywords
Funding Information
- IBM T. J. Watson Research Center, Yorktown Heights, NY
- Joint Study Agreement (W1463335)
- IBM Research
- Khalifa University, Abu Dhabi, UAE
This publication has 23 references indexed in Scilit:
- Integration of Cloud computing and Internet of Things: A surveyFuture Generation Computer Systems, 2016
- The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data setInformation Security Journal: A Global Perspective, 2016
- UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Ensemble learning of rule-based evolutionary algorithm using multi-layer perceptron for supporting decisions in stock trading problemsApplied Soft Computing, 2015
- Automating model search for large scale machine learningPublished by Association for Computing Machinery (ACM) ,2015
- Neural Networks and Support Vector Machine Algorithms for Automatic Cloud Classification of Whole-Sky Ground-Based ImagesIEEE Geoscience and Remote Sensing Letters, 2014
- Energy-Efficient Virtual Machines Consolidation in Cloud Data Centers Using Reinforcement LearningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial informationComputers & Geosciences, 2014
- A survey on feature selection methodsComputers and Electrical Engineering, 2014
- Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural GoodsCommunications in Computer and Information Science, 2012