Speaker Recognition I

Chair: Douglas A. Reynolds, MIT, USA

Home


Speaker Verification Using Minimum Verification Error Training

Authors:

Aaron E Rosenberg, AT&T Labs (U.S.A.)
Olivier Siohan, AT&T Labs (U.S.A.)
S. Parthasarathy, AT&T Labs (U.S.A.)

Volume 1, Page 105, Paper number 1767

Abstract:

We propose a Minimum Verification Error (MVE) training scenario to design and adapt an HMM-based speaker verification system. By using the discriminative training paradigm, we show that customer and background models can be jointly estimated so that the expected number of verification errors (false accept and false reject) on the training corpus are minimized. An experimental evaluation of a fixed password speaker verification task over the telephone network was carried out. The evaluation shows that MVE training/adaptation performs as well as MLE training and MAP adaptation when performance is measured by average individual equal error rate (based on a posteriori threshold assignment). After model adaptation, both approaches lead to an individual equal error-rate close to 0.6%. However, experiments performed with a priori dynamic threshold assignment show that MVE adapted models exhibit false rejection and false acceptance rates 45% lower than the MAP adapted models, and therefore lead to the design of a more robust system for practical applications.

ic981767.pdf (From Postscript)

TOP



Speaker Identification Using Minimum Classification Error Training

Authors:

Olivier Siohan, AT&T Labs (U.S.A.)
Aaron E Rosenberg, AT&T Labs (U.S.A.)
S. Parthasarathy, AT&T Labs (U.S.A.)

Volume 1, Page 109, Paper number 1783

Abstract:

In this paper we use a Minimum Classification Error (MCE) training paradigm to build a speaker identification system. The training is optimized at the string level for a text-dependent speaker identification task. Experiments performed on a small set speaker identification task show that MCE training can reduce closed set identification errors by up to 20-25% over a baseline system trained using Maximum Likelihood Estimation. Further experiments suggest that additional impro vement can be obtained by using some additional training data from speakers outside the set of registered speakers, leading to an overall reduction of the closed-set identification errors by about 35%

ic981783.pdf (From Postscript)

TOP



Model Adaptation Methods for Speaker Verification

Authors:

William Mistretta, T-NETIX Inc. (U.S.A.)
Kevin R. Farrell, T-NETIX Inc. (U.S.A.)

Volume 1, Page 113, Paper number 2502

Abstract:

Model adaptation methods for a text-dependent speaker verification system are evaluated in this paper. The speaker verification system uses a discriminant model and a statistical model to represent each enrolled speaker. These modeling approaches consist of a neural tree network and Gaussian mixture model. Adaptation methods are evaluated for both modeling approaches. We show that the overall system performance with adaptation is comparable to that obtained by training the model with the additional information. However, the adaptation can be performed within a fraction of the time required to retrain a model. Additionally, we have evaluated the adapted and non-adapted models with data recorded six months after the initial enrollment. The adaptation reduced the error rate for the aged data by 40%

ic982502.pdf (Scanned)

TOP



Robust Model for Speaker Verification Against Session-Dependent Utterance Variation

Authors:

Tomoko Matsui, NTT Human Interface Laboratories (Japan)
Kiyoaki Aikawa, NTT Human Interface Laboratories (Japan)

Volume 1, Page 117, Paper number 1880

Abstract:

This paper investigates a new method for creating speaker models robust against utterance variation in continuous distribution hidden-Markov-model-based speaker verification. In this method, the distribution of the session-independent features for each speaker is estimated by separately modeling the session-to-session utterance variation as two distinct variations: one session-dependent and the other session-independent. In practice, joint normalization of the session-dependent utterance variation and estimation of the parameters of speaker models is performed based on a speaker adaptive training algorithm. The resulting speaker models more accurately represent session-independent speaker characteristics, and the discriminatory capabilities of these models increases. In text-independent speaker verification experiments using data uttered by 20 speakers in 7 sessions over 16 months, we show that the proposed method achieves a 15% reduction in the error rate.

ic981880.pdf (From Postscript)

TOP



Speaker Verification in Noisy Environments with Combined Spectral Subtraction and Missing Feature Theory

Authors:

Andrzej Drygajlo, EPFL (Switzerland)
Mounir El-Maliki, EPFL (Switzerland)

Volume 1, Page 121, Paper number 2014

Abstract:

In the framework of Gaussian mixture models (GMMs), we present a new approach towards robust automatic speaker verification (SV) in adverse conditions. This new and simple approach is based on the combination of a speechenhancement using traditional spectral subtraction, and a missing feature compensation to dynamically modify the probability computations performed in GMM recognizers. The identity of spectral features missing due to noise masking is provided by the spectral subtraction algorithm. Previous works have demonstrated that the missing feature modeling method succeeds in speech recognition with some artificially generated interruptions, filtering and noises. In this paper, we show that this method also improves noise compensation techniques used for speaker verification in more realistic conditions.

ic982014.pdf (From Postscript)

TOP



A Comparison of A Priori Threshold Setting Procedures for Speaker Verification in the Cave Project

Authors:

Jean-Benoit Pierrot, ENST (France)
Johan Lindberg, KTH (Sweden)
Johan Koolwaaij, KUN (The Netherlands)
Hans-Peter Hutter, Ubilab-UBS (Sweden)
Dominique Genoud, IDIAP (Switzerland)
Mats Blomberg, KTH (Sweden)
Frederic Bimbot, ENST (France)

Volume 1, Page 125, Paper number 5229

Abstract:

The issue of a priori threshold setting in speaker verification is a key problem for field applications. In the context of the CAVE project, we compared several methods for estimating speaker-independent and speaker-dependent decision thresholds. Relevant parameters are estimated from development data only, I.e. without resorting to additional client data. The various approaches are tested on the Dutch SESP database.

ic985229.pdf (Scanned)

TOP



Text Dependent Speaker Verification Using Binary Classifiers

Authors:

Dominique Genoud, IDIAP (Switzerland)
Miguel Moreira, IDIAP (Switzerland)
Eddy Mayoraz, IDIAP (Switzerland)

Volume 1, Page 129, Paper number 1741

Abstract:

This paper describes how a speaker verification task can be advantageously decomposed into a series of binary classification problems, i.e. each problem discriminating between two classes only. Each binary classifier is specific to one speaker, one anti-speaker and one word. Continuous attribute decision trees are used as classifiers. The set of classifiers is then pruned to eliminate the less relevant ones. Diverse pruning methods are experimented, and it is shown that when the speaker verification decision is performed with an a priori threshold, some of them give better results than a reference HMM system.

ic981741.pdf (From Postscript)

TOP



Speaker Verification Using Verbal Information Verification for Automatic Enrollment

Authors:

Qi Li, Bell Labs, Lucent Technologies (U.S.A.)
Biing-Hwang Juang, Bell Labs, Lucent Technologies (U.S.A.)

Volume 1, Page 133, Paper number 2397

Abstract:

A conventional speaker verification (SV) system needs an enrollment session to collect the training data. In [1], we introduced a speaker authentication method called Verbal Information Verification (VIV) which verifies a speaker by verbal contents instead of speech characteristics. Such a system does not need an enrollment session. In this paper, VIV is combined with SV. We propose a system which uses VIV to collect training data during the first few accesses automatically, which are often from different acoustic environments. Then, a speaker dependent model is trained and speaker authentication can be performed by SV. This approach not only avoid formal enrollment session which brings convenience to the user, but mitigates the mismatch problem causing by different acoustic environments between training and test sessions. Our experiments show that the proposed system improved the SV performance over 40% compared to the conventional SV system.

ic982397.pdf (From Postscript)

TOP