Improving deep neural network acoustic models using generalized maxout networks

Top Cited Papers

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

p. 215-219
https://doi.org/10.1109/icassp.2014.6853589

Abstract

Recently, maxout networks have brought significant improvements to various speech recognition and computer vision tasks. In this paper we introduce two new types of generalized maxout units, which we call p-norm and soft-maxout. We investigate their performance in Large Vocabulary Continuous Speech Recognition (LVCSR) tasks in various languages with 10 hours and 60 hours of data, and find that the p-norm generalization of maxout consistently performs well. Because, in our training setup, we sometimes see instability during training when training unbounded-output nonlinearities such as these, we also present a method to control that instability. This is the “normalization layer”, which is a nonlinearity that scales down all dimensions of its input in order to stop the average squared output from exceeding one. The performance of our proposed nonlinearities are compared with maxout, rectified linear units (ReLU), tanh units, and also with a discriminatively trained SGMM/HMM system, and our p-norm units with p equal to 2 are found to perform best.

Keywords

This publication has 14 references indexed in Scilit:

Deep maxout neural networks for speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Deep maxout networks for low-resource speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
On rectified linear units for speech processing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
An empirical study of learning rates in deep neural networks for speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
IEEE Signal Processing Magazine, 2012
The subspace Gaussian mixture model—A structured model for speech recognition
Computer Speech & Language, 2011
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing, 2011
Boosted MMI for model and feature-space discriminative training
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008
Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Statistical analysis of learning dynamics
Signal Processing, 1999

Cited by 135 articles