Improving temporal representation in TDNN structure for phoneme recognition

Abstract
The authors deal with increasing the amount of temporal information that can be extracted by a time delay neural network in speech recognition problems. In addition to input time windows, frequency windows are considered for connection to the hidden units. Frequency windows are included to extract more information, such as the change in the energy contents over time of speech data at the grass-root level of the network. The proposed approach was verified by designing an unvoiced stop consonant classifier and evaluating it with continuous speech. Results are shown to demonstrate the viability of the approach.

This publication has 8 references indexed in Scilit: