Session Th1A Speaker Adaptation I

Chairperson Harald Hoege Siemens AG, Germany

Home


COMBINED ON-LINE MODEL ADAPTATION AND BAYESIAN PREDICTIVE CLASSIFICATION FOR ROBUST SPEECH RECOGNITION

Authors: Qiang Huo (1) and Chin-Hui Lee (2)

(1) ATR Interpreting Telecommunications Research Labs., 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan (2) Multimedia Communications Research Lab, Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974, USA

Volume 4 pages 1847 - 1850

ABSTRACT

In this paper, we study a class of robust automatic speech recognition problem in which mismatches between training and testing conditions exist but an accurate knowledge of the mismatch mechanism is unknown. The only available information is the test data along with a set of pretrained speech models and the decision parameters. We try to compensate for the abovementioned mismatches by jointly adopting a dynamic system design strategy called on-line Bayesian adaptation to incrementally improve the estimation of the model parameters used in the recognizer, and a robust decision strategy called Bayesian predictive classification to average over the remaining uncertainty in model parameters. We report on a series of experimental results to show the viability and effectiveness of the proposed method.

A0076.pdf

TOP


SPEAKER ADAPTIVE TRAINING APPLIED TO CONTINUOUS MIXTURE DENSITY MODELING

Authors: Xavier Aubert, Eric Thelen

Philips GmbH Forschungslaboratorien Aachen, P.O. Box 50 01 45, D-52085 Aachen, Germany E-mail: faubert,theleng@pfa.research.philips.com

Volume 4 pages 1851 - 1854

ABSTRACT

Speaker Adaptive Training (SAT) has been investigated for mixture density estimation and applied to large vocabulary continuous speech recognition. SAT integrates MLLR adaptation in the HMM training and aims at reducing inter-speaker variability to get enhanced speaker- independent models. Starting from BBN's work on compact models, we derive a one-pass Viterbi formulation of SAT that performs joint estimation of MLLR-based transformations and density parameters. The computational complexity is analyzed and an approximation based on using inverse affine transformations is discussed. Compared to applying MLLR on standard SI models, our experimental results achieve lower error rates as well as reduced decoding costs, for both supervised batch and unsupervised incremental adaptation. In the latter case, it is shown that the enrollment of a new speaker can be sped up by selecting among the transformations that were estimated from the training speakers, the one that best fits with the first test utterance.

A0090.pdf

TOP


SPEAKER NORMALIZATION TRAINING FOR MIXTURE STOCHASTIC TRAJECTORY MODEL

Authors: Irina Illina Yifan Gong

CRIN/CNRS, INRIA-Lorraine B.P. 239, 54506 Vandoeuvre-l` es-Nancy, France illina@loria.fr Speech Research Media Technologies Laboratory Texas Instruments Dallas TX 75265, U.S.A. Yifan.Gong@ti.com

Volume 4 pages 1855 - 1858

ABSTRACT

In this paper we are interested in speaker and environment adaptation techniques for speaker independent (SI) continuous speech recognition. These techniques are used to reduce mismatch between training and the testing conditions, using a small amount of adaptation data. In addition to reducing this mismatch during the adaptation, we propose to reduce the variation due to speakers or environments during the training itself in the context of Speaker Normalisation (SN) approach, using MLLR transformation. SN also includes a combination of the context-dependent, phone dependent and broad phonetic class dependent information. The use of linear regression to model broad phonetic class dependent information assures our model to be used in the case that the adaptation data or training data is not given for some phonetic symbols. SN is developed for Mixture Stochastic Trajectory Model, a segment based model. The approach can be used for speaker, gender or environment normalization. We show the performance of SN compared to SI recognition and to MLLR speaker adaptation, through experiments on continuous speech recognition.

A0222.pdf

TOP


ON-LINE ADAPTATION OF HIDDEN MARKOV MODELS USING INCREMENTAL ESTIMATION ALGORITHMS

Authors: V. Digalakis

Dept. of Electronics & Computer Engineering Technical University of Crete 73100 Chania, Crete, GREECE vas@telecom.tuc.gr

Volume 4 pages 1859 - 1862

ABSTRACT

The mismatch that frequently occurs between the train- ing and testing conditions of an automatic speech recognizer can be efficiently reduced by adapting the parameters of the recognizer to the testing conditions. The maximum likelihood adaptation algorithms for continuous -density hidden-Markov-model (HMM) based speech recognizers are fast, in the sense that a small amount of data is required for adaptation. They are, however, based on reestimating the model parameters using the batch version of the expectation-maximization (EM) algorithm. The multiple iterations required for the EM algorithm to converge make these adaptation schemes computationally expensive and not suitable for on-line applications, since multiple passes through the adaptation data are required. In this paper we show how incremental versions of the EM and the segmental k- means algorithm can be used to improve the convergence of these adaptation methods so that they can be used in on-line applications.

A0241.pdf

TOP


MODELING DEPENDENCY IN ADAPTATION OF ACOUSTIC MODELS USING MULTISCALE TREE PROCESSES

Authors: Ashvin Kannan and Mari Ostendorf

Electrical and Computer Engineering Department Boston University, 44 Cummington Street, Boston, MA 02215, USA http://raven.bu.edu/ {ashvin,mo}

Volume 4 pages 1863 - 1866

ABSTRACT

To adapt the large number of parameters in a speech recognition acoustic model with a small amount of data, some notion of parameter dependence is needed. We present a dependence model to relate parameters in a parsimonious framework using a Gaussian multiscale process defined by the evolution of a linear stochastic dynamical system on a tree. To adapt all classes from all adaptation data, we formulate adaptation as optimal smoothing of the tree process. This approach is used to adapt two types of models: Gaussians, and Gaussian processes (segment models) characterized by a polynomial mean trajectory. Recognition results presented on the Switchboard corpus show improvements in supervised and unsupervised modes.

A0403.pdf

TOP


ACOUSTIC CLUSTERING AND ADAPTATION FOR ROBUST SPEECH RECOGNITION *

Authors: Larry Heck and Ananth Sankar

Speech Technology And Research Laboratory SRI International Menlo Park, CA { heck,sankar} @ speech.sri.com

Volume 4 pages 1867 - 1870

ABSTRACT

We describe an algorithm based on acoustic clustering and acoustic adaptation to significantly improve speech recognition performance. The method is particularly useful when speech from multiple speakers is to be recognized and the boundary between speakers is not known. We assume that each test data segment is relatively homogeneous with respect to the acoustic background and speaker. These segments are then grouped using an agglomerative acoustic clustering algorithm. The idea is to group together all test segments that are acoustically similar. The speech recognition models are then adapted separately to each test data cluster. Finally these adapted models are used to recognize the data from that cluster. This algorithm was used in SRI's system for the 1996 DARPA Hub4 partitioned evaluation. Experimental results are presented on the 1996 H4 development data set. It was found that an improvement of 9.5% was achieved by using this algorithm.

A1237.pdf

TOP