A Spatial-Temporal based Next Frame Prediction and Unsupervised Classification of Video Anomalies in Real Time Estimation

Abstract
Anomaly detection is an area of video analysis has a great importance in automated surveillance. Although it has been extensively studied, there has been little work started using CNN networks. Hence, in this thesis we presented a novel approach for learning motion features and modeling normal Spatio-temporal dynamics for anomaly detection. In our technique, we capture variations in scale of the patterns of motion in an image object by using optical flow dense estimation technique and train our auto encoder model using convolution long short term memories (ConvLSTM2D) as we are processing video frames and we predict the anomaly in real time using Euclidean distance between the generated and the ground truth frame and we achieved a real time accuracy of nearly 98% for the youtube videos which are not used for either testing or training. Error between the network’s output and the target output is used to classify a video volume as normal or abnormal. In addition to the use of reconstruction error, we also use prediction error for anomaly detection. The prediction models show comparable performance with state of the art methods. In comparison with the proposed method, performance is improved in one dataset. Moreover, running time is significantly faster.

This publication has 1 reference indexed in Scilit: