Speaker Adaptation 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Eigenvoices for Speaker Adaptation

Authors:

Roland Kuhn, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Patrick Nguyen, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Jean-Claude Junqua, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Lloyd Goldwasser, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Nancy Niedzielski, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Steven Fincke, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Ken Field, Panasonic Technologies Inc., Speech Technology Laboratory (USA)
Matteo Contolini, Panasonic Technologies Inc., Speech Technology Laboratory (USA)

Page (NA) Paper number 303

Abstract:

We have devised a new class of fast adaptation techniques for speech recognition. These techniques are based on prior knowledge of speaker variation, obtained by applying Principal Component Analysis (PCA) or a similar technique to T vectors of dimension D derived from T speaker-dependent models. This offline step yields T basis vectors called ``eigenvoices''. We constrain the model for new speaker S to be located in the space spanned by the first K eigenvoices. Speaker adaptation involves estimating the K eigenvoice coefficients for the new speaker; typically, K is very small compared to D. We conducted mean adaptation experiments on the Isolet database. With a large amount of supervised adaptation data, most eigenvoice techniques performed slightly better than MAP or MLLR; with small amounts of supervised adaptation data or for unsupervised adaptation, some eigenvoice techniques performed much better. We believe that the eigenvoice approach would yield rapid adaptation for most speech recognition systems.

SL980303.PDF (From Author) SL980303.PDF (Rasterized)

TOP


Speaker Clustering Using Direct Maximisation of the MLLR-Adapted Likelihood

Authors:

Sue E. Johnson, Cambridge University (U.K.)
Philip C. Woodland, Cambridge University (U.K.)

Page (NA) Paper number 726

Abstract:

In this paper speaker clustering schemes are investigated in the context of improving unsupervised adaptation for broadcast news transcription. The various techniques are presented within a framework of top-down split-and-merge clustering. Since these schemes are to be used for MLLR-based adaptation, a natural evaluation metric for clustering is the increase in data likelihood from adaptation. Two types of cluster splitting criteria have been used. The first minimises a covariance-based distance measure and for the second we introduce a two-step E-M type procedure to form clusters which directly maximise the likelihood of the adapted data. It is shown that the direct maximisation technique produces a higher data likelihood and also gives a reduction in word error rate.

SL980726.PDF (From Author) SL980726.PDF (Rasterized)

TOP


Incremental On-Line Speaker Adaptation in Adverse Conditions

Authors:

Olli Viikki, Nokia Research Center, Speech and Audio Systems Laboratory (Finland)
Kari Laurila, Nokia Research Center, Speech and Audio Systems Laboratory (Finland)

Page (NA) Paper number 313

Abstract:

In this paper, we examine the use of speaker adaptation in adverse noise conditions. In particular, we focus on incremental on-line speaker adaptation since it, in addition to its other advantages, enables joint speaker and environment adaptation. First, we show that on-line adaptation is superior to off-line adaptation when realistic changing noise conditions are considered. Next, we show that a conventional left-to-right HMM structure is not well suited for on-line adaptation in variable noise conditions due to unreliable state-frame alignments of noisy utterances. To overcome this problem, we suggest the use of state duration constrained HMMs. Our experimental results indicate that the performance gain due to adaptation is much greater with duration constrained HMMs than obtained with conventional left-to-right HMMs. In addition to the appropriate model structure, we point out that in long-term adaptation, such as incremental on-line adaptation, the supervised approach is a necessity.

SL980313.PDF (From Author) SL980313.PDF (Rasterized)

TOP


Cluster Adaptive Training for Speech Recognition

Authors:

Mark J.F. Gales, IBM Almaden Research Center (USA)

Page (NA) Paper number 375

Abstract:

When performing speaker adaptation there are two conflicting requirements. The transform must be powerful enough to model the speaker. Second, the transform should be rapidly estimated for any particular speaker. Recently the most popular adaptation schemes have used many parameters to adapt the models. This paper examines an adaptation scheme requiring few parameters to adapt the models, cluster adaptive training. It may be viewed as a simple extension to speaker clustering. A linear interpolation of the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speaker-independent task CAT reduced the word error rate using very little adaptation data compared to a standard system. a speaker independent model set.

SL980375.PDF (From Author) SL980375.PDF (Rasterized)

TOP