A study of various composite kernels for kernel eigenvoice speaker adaptation

28 September 2004

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when the amount of adaptation data is small, say, less than 10 seconds. In traditional eigenvoice (EV) speaker adaptation, linear principal component analysis (PCA) is used to derive the eigenvoices. Recently, we proposed that eigenvoices found by nonlinear kernel PCA could be more effective, and the eigenvoices thus derived were called kernel eigenvoices (KEV). One of our novelties is the use of composite kernel that makes it possible to compute state observation likelihoods via kernel functions. We investigate two different composite kernels: direct sum kernel and tensor product kernel for KEV adaptation. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptations using either form of composite kernel are equally effective, and they outperform a speaker-independent model and the adapted models from EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech. For example, with 2.1s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, and MLLR adaptations are not effective at all.

Keywords

This publication has 8 references indexed in Scilit:

A database for speaker-independent digit recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
The Pre-Image Problem in Kernel Methods
IEEE Transactions on Neural Networks, 2004
Face recognition using eigenfaces
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Sparse Kernel Feature Analysis
Published by Springer Science and Business Media LLC ,2002
Rapid speaker adaptation in eigenvoice space
IEEE Transactions on Speech and Audio Processing, 2000
Nonlinear Component Analysis as a Kernel Eigenvalue Problem
Neural Computation, 1998
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
Computer Speech & Language, 1995
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
IEEE Transactions on Speech and Audio Processing, 1994

Cited by 12 articles