Emotion recognition from speech with StarGAN and Dense‐DCNN
Open Access
- 14 September 2021
- journal article
- research article
- Published by Institution of Engineering and Technology (IET) in IET Signal Processing
- Vol. 16 (1), 62-79
- https://doi.org/10.1049/sil2.12078
Abstract
Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required for training is extremely long, which makes it difficult to ensure efficient classification. The proposed network Dense-DCNN, combined with StarGAN, can address this issue. StarGAN is used to generate numerous Log-Mel spectra with related emotions and extract high-dimensional features through the Dense-DCNN to achieve a high-precision classification. The classification accuracy for all the data sets was more than 90%. Simultaneously, DenseNet's layer jump connection can speed up the classification process, thereby improving efficiency. The experimental verification shows that our model not only has good generalisation ability but also exhibits good robustness in multiscene and multinoise environments, thereby showing potential for application in medical and social education industries.This publication has 40 references indexed in Scilit:
- Recognition of fear from speech using adaptive algorithm with MLP classifierPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent networkPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Recognizing Emotions Induced by Affective Sounds through Heart Rate VariabilityIEEE Transactions on Affective Computing, 2015
- Convolutional, Long Short-Term Memory, fully connected Deep Neural NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural NetworksIEEE Transactions on Multimedia, 2014
- Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of EvaluatorsIEEE Transactions on Affective Computing, 2014
- Deep learning for robust feature generation in audiovisual emotion recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Understanding how Deep Belief Networks perform acoustic modellingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection AlgorithmPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Speech emotion recognition using hidden Markov modelsSpeech Communication, 2003