Automatic Device Identification and Anomaly Detection with Machine Learning Techniques in Smart Factories

Abstract
With the development of Industrial Internet of Things (IIoT) technologies, there are more and more diverse smart devices and sensors connected in smart factories. Since these devices are only designed for connection with each other, they usually have very limited security mechanism. Also, due to the diverse behaviors for different devices, it would be difficult to design individual security mechanism manually. To detect potential threats on these devices, machine learning methods might be helpful to learn the diverse behaviors from their generated packets for identifying device types. In this paper, we propose a machine learning approach to automatic device identification and anomaly detection through network traffic analysis. First, we utilize both unsupervised and supervised learning for identifying different types of IoT devices. Second, based on the model learned from device identification module, we conduct feature selection to improve classification performance for anomaly detection. In our experiments on real data in a smart factory, the performance of device identification using supervised learning outperforms that of unsupervised learning. The best performance can be obtained for XGBoost with the best accuracy of 97.6% and micro-averaging F1 score of 97.6%. Also, in the emulated attacks on real devices, gradient boosted decision trees were found useful in anomaly detection, which gives an accuracy of 99.997% with the F1 score of 99.995%. This shows the potential of the proposed approach for anomaly detection in smart factories. Further investigation is needed to verify the proposed approach using more types of devices and network attacks.