Improving deep neural network acoustic models using generalized maxout networks
Top Cited Papers
- 1 May 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Abstract
Recently, maxout networks have brought significant improvements to various speech recognition and computer vision tasks. In this paper we introduce two new types of generalized maxout units, which we call p-norm and soft-maxout. We investigate their performance in Large Vocabulary Continuous Speech Recognition (LVCSR) tasks in various languages with 10 hours and 60 hours of data, and find that the p-norm generalization of maxout consistently performs well. Because, in our training setup, we sometimes see instability during training when training unbounded-output nonlinearities such as these, we also present a method to control that instability. This is the “normalization layer”, which is a nonlinearity that scales down all dimensions of its input in order to stop the average squared output from exceeding one. The performance of our proposed nonlinearities are compared with maxout, rectified linear units (ReLU), tanh units, and also with a discriminatively trained SGMM/HMM system, and our p-norm units with p equal to 2 are found to perform best.Keywords
This publication has 14 references indexed in Scilit:
- Deep maxout neural networks for speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Deep maxout networks for low-resource speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- On rectified linear units for speech processingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- An empirical study of learning rates in deep neural networks for speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research GroupsIEEE Signal Processing Magazine, 2012
- The subspace Gaussian mixture model—A structured model for speech recognitionComputer Speech & Language, 2011
- Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing, 2011
- Boosted MMI for model and feature-space discriminative trainingInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008
- Evaluation of Proposed Modifications to MPE for Large Scale Discriminative TrainingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Statistical analysis of learning dynamicsSignal Processing, 1999