A study of various composite kernels for kernel eigenvoice speaker adaptation
- 28 September 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when the amount of adaptation data is small, say, less than 10 seconds. In traditional eigenvoice (EV) speaker adaptation, linear principal component analysis (PCA) is used to derive the eigenvoices. Recently, we proposed that eigenvoices found by nonlinear kernel PCA could be more effective, and the eigenvoices thus derived were called kernel eigenvoices (KEV). One of our novelties is the use of composite kernel that makes it possible to compute state observation likelihoods via kernel functions. We investigate two different composite kernels: direct sum kernel and tensor product kernel for KEV adaptation. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptations using either form of composite kernel are equally effective, and they outperform a speaker-independent model and the adapted models from EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech. For example, with 2.1s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, and MLLR adaptations are not effective at all.Keywords
This publication has 8 references indexed in Scilit:
- A database for speaker-independent digit recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- The Pre-Image Problem in Kernel MethodsIEEE Transactions on Neural Networks, 2004
- Face recognition using eigenfacesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Sparse Kernel Feature AnalysisPublished by Springer Science and Business Media LLC ,2002
- Rapid speaker adaptation in eigenvoice spaceIEEE Transactions on Speech and Audio Processing, 2000
- Nonlinear Component Analysis as a Kernel Eigenvalue ProblemNeural Computation, 1998
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov modelsComputer Speech & Language, 1995
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chainsIEEE Transactions on Speech and Audio Processing, 1994