Deep maxout neural networks for speech recognition
- 1 December 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
A recently introduced type of neural network called maxout has worked well in many domains. In this paper, we propose to apply maxout for acoustic models in speech recognition. The maxout neuron picks the maximum value within a group of linear pieces as its activation. This nonlinearity is a generalization to the rectified nonlinearity and has the ability to approximate any form of activation functions. We apply maxout networks to the Switchboard phone-call transcription task and evaluate the performances under both a 24-hour low-resource condition and a 300-hour core condition. Experimental results demonstrate that maxout networks converge faster, generalize better and are easier to optimize than rectified linear networks and sigmoid networks. Furthermore, experiments show that maxout networks reduce underfitting and are able to achieve good results without dropout training. Under both conditions, maxout networks yield relative improvements of 1.1-5.1% over rectified linear networks and 2.6-14.5% over sigmoid networks on benchmark test sets.Keywords
This publication has 12 references indexed in Scilit:
- Improving deep neural network acoustic models using unlabeled dataPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- An evaluation of posterior modeling techniques for phonetic recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- On rectified linear units for speech processingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research GroupsIEEE Signal Processing Magazine, 2012
- Understanding how Deep Belief Networks perform acoustic modellingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcriptionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing, 2011
- Acoustic Modeling Using Deep Belief NetworksIEEE Transactions on Audio, Speech, and Language Processing, 2011
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998