Deep maxout neural networks for speech recognition

1 December 2013

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 291-296
https://doi.org/10.1109/asru.2013.6707745

Abstract

A recently introduced type of neural network called maxout has worked well in many domains. In this paper, we propose to apply maxout for acoustic models in speech recognition. The maxout neuron picks the maximum value within a group of linear pieces as its activation. This nonlinearity is a generalization to the rectified nonlinearity and has the ability to approximate any form of activation functions. We apply maxout networks to the Switchboard phone-call transcription task and evaluate the performances under both a 24-hour low-resource condition and a 300-hour core condition. Experimental results demonstrate that maxout networks converge faster, generalize better and are easier to optimize than rectified linear networks and sigmoid networks. Furthermore, experiments show that maxout networks reduce underfitting and are able to achieve good results without dropout training. Under both conditions, maxout networks yield relative improvements of 1.1-5.1% over rectified linear networks and 2.6-14.5% over sigmoid networks on benchmark test sets.

Keywords

This publication has 12 references indexed in Scilit:

Improving deep neural network acoustic models using unlabeled data
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
An evaluation of posterior modeling techniques for phonetic recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
On rectified linear units for speech processing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
IEEE Signal Processing Magazine, 2012
Understanding how Deep Belief Networks perform acoustic modelling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing, 2011
Acoustic Modeling Using Deep Belief Networks
IEEE Transactions on Audio, Speech, and Language Processing, 2011
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998

Cited by 49 articles