Emotion recognition from speech with StarGAN and Dense‐DCNN

Abstract
Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required for training is extremely long, which makes it difficult to ensure efficient classification. The proposed network Dense-DCNN, combined with StarGAN, can address this issue. StarGAN is used to generate numerous Log-Mel spectra with related emotions and extract high-dimensional features through the Dense-DCNN to achieve a high-precision classification. The classification accuracy for all the data sets was more than 90%. Simultaneously, DenseNet's layer jump connection can speed up the classification process, thereby improving efficiency. The experimental verification shows that our model not only has good generalisation ability but also exhibits good robustness in multiscene and multinoise environments, thereby showing potential for application in medical and social education industries.

This publication has 40 references indexed in Scilit: