Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

18 August 2011

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Speech and Language Processing

Vol. 7 (4), 1-22
https://doi.org/10.1145/1998384.1998386

Abstract

In this article, we focus on keyword detection in children's speech as it is needed in voice command systems. We use the FAU Aibo Emotion Corpus which contains emotionally colored spontaneous children's speech recorded in a child-robot interaction scenario and investigate various recent keyword spotting techniques. As the principle of bidirectional Long Short-Term Memory (BLSTM) is known to be well-suited for context-sensitive phoneme prediction, we incorporate a BLSTM network into a Tandem model for flexible coarticulation modeling in children's speech. Our experiments reveal that the Tandem model prevails over a triphone-based Hidden Markov Model approach.

Keywords

Funding Information

Seventh Framework Programme (211486 (SEMAINE)IST-2002-50742 (HUMAINE)IST-2001-37599 (PF-STAR))

This publication has 30 references indexed in Scilit:

Being bored? Recognising natural interest by extensive audiovisual integration for real-life application
Image and Vision Computing, 2009
Learning long-term dependencies with recurrent neural networks
Neurocomputing, 2008
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Neural Networks, 2005
Creating conversational interfaces for children
IEEE Transactions on Speech and Audio Processing, 2002
An overview of audio information retrieval
Multimedia Systems, 1999
Long Short-Term Memory
Neural Computation, 1997
Learning long-term dependencies in NARX recurrent neural networks
IEEE Transactions on Neural Networks, 1996
Learning Complex, Extended Sequences Using the Principle of History Compression
Neural Computation, 1992
A time-delay neural network architecture for isolated word recognition
Neural Networks, 1990
Some observations on the development of anticipatory coarticulation
The Journal of the Acoustical Society of America, 1986

Cited by 9 articles