Shallow Convolutional Neural Networks for Acoustic Scene Classification

Abstract
Recently, deep neural networks, which include convolutional neural networks (CNNs), have been widely applied to acoustic scene classification (ASC). Motivated by the fact that some simplified CNNs have shown improvements over deep CNNs, such as Visual Geometry Group Net (VGG-Net), we have figured out how to simplify the VGG-Net style architecture to a shallow CNN with improved performance. Max pooling and batch normalization are also applied for better accuracy. With a series of controlled tests on detection and classification of acoustic scenes and events (DCASE) 2016 data sets, our shallow CNN achieves 6.7% improvement, and reduces time complexity to 5%, compared with the VGG-Net style CNN.