Transform representation of the spectra of acoustic speech segments with applications. I. General approach and application to speech recognition

Abstract
We present in this series of two papers a new approach for modeling and capturing the time-varying structure of the spectral envelope of speech. In this approach, we use an acoustic subword decomposition and the Karhunen-Loeve transform (UT) to extract and efficiently represent the highly correlated structure of the spectral envelope. Integration of the UT with acoustic subword modeling is a novel approach that concisely represents both steady-state and dynamic features of the spectra in a unified framework that very effectively captures acoustic-phonetic patterns. The organization of these two papers is as follows: the first paper, Part I presents the physiological and perceptual basis for the approach, the frame-based and acoustic-subword-based spectral representation, and applications to speaker-dependent recognition. The performance of the recognition algorithm based on this approach compares favorably to other existing techniques. Part II will present a frequency-domain coding technique by analysis/synthesis. This application of the new method produces good quality speech at low bit rates.

This publication has 18 references indexed in Scilit: