Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
Top Cited Papers
- 23 March 2017
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper presents a method for speech emotion recognition using spectrograms and deep convolutional neural network (CNN). Spectrograms generated from the speech signals are input to the deep CNN. The proposed model consisting of three convolutional layers and three fully connected layers extract discriminative features from spectrogram images and outputs predictions for the seven emotions. In this study, we trained the proposed model on spectrograms obtained from Berlin emotions dataset. Furthermore, we also investigated the effectiveness of transfer learning for emotions recognition using a pre-trained AlexNet model. Preliminary results indicate that the proposed approach based on freshly trained model is better than the fine-tuned model, and is capable of predicting emotions accurately and efficiently.This publication has 18 references indexed in Scilit:
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015
- CaffePublished by Association for Computing Machinery (ACM) ,2014
- Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural NetworksIEEE Transactions on Multimedia, 2014
- Automatic speaker age and gender recognition using acoustic and prosodic level information fusionComputer Speech & Language, 2013
- Survey on speech emotion recognition: Features, classification schemes, and databasesPattern Recognition, 2011
- Modeling prosodic feature sequences for speaker recognitionSpeech Communication, 2005
- Speech Enhancement Using Perceptual Wavelet Packet Decomposition and Teager Energy OperatorPublished by Springer Science and Business Media LLC ,2004
- The production and recognition of emotions in speech: features and algorithmsInternational Journal of Human-Computer Studies, 2003
- Acoustical properties of speech as indicators of depression and suicidal riskIEEE Transactions on Biomedical Engineering, 2000
- ICARUS: Source generator based real-time recognition of speech in noisy stressful and Lombard effect environments ☆Speech Communication, 1995