Spacer ICASSP '98 Main Page

Spacer
General Information
Spacer
Conference Schedule
Spacer
Technical Program
Spacer
    Overview
    50th Annivary Events
    Plenary Sessions
    Special Sessions
    Tutorials
    Technical Sessions
    
By Date
    May 12, Tue
May 13, Wed
May 14, Thur
May 15, Fri
    
By Category
    AE    ANNIV   
COMM    DSP   
IMDSP    MMSP   
NNSP    PLEN   
SP    SPEC   
SSAP    UA   
VLSI   
    
By Author
    A    B    C    D    E   
F    G    H    I    J   
K    L    M    N    O   
P    Q    R    S    T   
U    V    W    X    Y   
Z   

    Invited Speakers
Spacer
Registration
Spacer
Exhibits
Spacer
Social Events
Spacer
Coming to Seattle
Spacer
Satellite Events
Spacer
Call for Papers/
Author's Kit

Spacer
Future Conferences
Spacer
Help

Abstract -  SP4   


 
SP4.1

   
Speaker Verification Using Minimum Verification Error Training
A. Rosenberg, O. Siohan, S. Parthasarathy  (AT&T Labs, USA)
We propose a Minimum Verification Error (MVE) training scenario to design and adapt an HMM-based speaker verification system. By using the discriminative training paradigm, we show that customer and background models can be jointly estimated so that the expected number of verification errors (false accept and false reject) on the training corpus are minimized. An experimental evaluation of a fixed password speaker verification task over the telephone network was carried out. The evaluation shows that MVE training/adaptation performs as well as MLE training and MAP adaptation when performance is measured by average individual equal error rate (based on a posteriori threshold assignment). After model adaptation, both approaches lead to an individual equal error-rate close to 0.6%. However, experiments performed with a priori dynamic threshold assignment show that MVE adapted models exhibit false rejection and false acceptance rates 45% lower than the MAP adapted models, and therefore lead to the design of a more robust system for practical applications.
 
SP4.2

   
Speaker Identification Using Minimum Identification Error Training
O. Siohan, A. Rosenberg, S. Parthasarathy  (AT&T Labs, USA)
In this paper we use a Minimum Classification Error (MCE) training paradigm to build a speaker identification system. The training is optimized at the string level for a text-dependent speaker identification task. Experiments performed on a small set speaker identification task show that MCE training can reduce closed set identification errors by up to 20-25% over a baseline system trained using Maximum Likelihood Estimation. Further experiments suggest that additional impro vement can be obtained by using some additional training data from speakers outside the set of registered speakers, leading to an overall reduction of the closed-set identification errors by about 35%.
 
SP4.3

   
Model Adaptation Methods for Speaker Verification
W. Mistretta, K. Farrell  (T-NETIX Inc., USA)
Model adaptation methods for a text-dependent speaker verification system are evaluated in this paper. The speaker verification system uses a discriminant model and a statistical model to represent each enrolled speaker. These modeling approaches consist of a neural tree network and Gaussian mixture model. Adaptation methods are evaluated for both modeling approaches. We show that the overall system performance with adaptation is comparable to that obtained by training the model with the additional information. However, the adaptation can be performed within a fraction of the time required to retrain a model. Additionally, we have evaluated the adapted and non-adapted models with data recorded six months after the initial enrollment. The adaptation reduced the error rate for the aged data by 40%.
 
SP4.4

   
Robust Model for Speaker Verification Against Session-Dependent Utterance Variation
T. Matsui, K. Aikawa  (NTT Human Interface Laboratories, Japan)
This paper investigates a new method for creating speaker models robust against utterance variation in continuous distribution hidden-Markov-model-based speaker verification. In this method, the distribution of the session-independent features for each speaker is estimated by separately modeling the session-to-session utterance variation as two distinct variations: one session-dependent and the other session-independent. In practice, joint normalization of the session-dependent utterance variation and estimation of the parameters of speaker models is performed based on a speaker adaptive training algorithm. The resulting speaker models more accurately represent session-independent speaker characteristics, and the discriminatory capabilities of these models increases. In text-independent speaker verification experiments using data uttered by 20 speakers in 7 sessions over 16 months, we show that the proposed method achieves a 15% reduction in the error rate.
 
SP4.5

   
Speaker Verification in Noisy Environments with Combined Spectral Subtraction and Missing Feature Theory
A. Drygajlo, M. El-Maliki  (EPFL, Switzerland)
In the framework of Gaussian mixture models (GMMs), we present a new approach towards robust automatic speaker verification (SV) in adverse conditions. This new and simple approach is based on the combination of a speech enhancement using traditional spectral subtraction, and a missing feature compensation to dynamically modify the probability computations performed in GMM recognizers. The identity of spectral features missing due to noise masking is provided by the spectral subtraction algorithm. Previous works have demonstrated that the missing feature modeling method succeeds in speech recognition with some artificially generated interruptions, filtering and noises. In this paper, we show that this method also improves noise compensation techniques used for speaker verification in more realistic conditions.
 
SP4.6

   
A Comparison of A Priori Threshold Setting Procedures for Speaker Verification in the Cave Project
J. Pierrot  (ENST, France);   J. Lindberg  (KTH, Sweden);   J. Koolwaaij  (KUN, The Netherlands);   H. Hutter  (Ubilab-UBS, Sweden);   D. Genoud  (IDIAP, Switzerland);   N. Blomberg  (KYH);   F. Bimbot  (ENST, France)
The issue of a priori threshold setting in speaker verification is a key problem for field applications. In the context of the CAVE project, we compared several methods for estimating speaker-independent and speaker-dependent decision thresholds. Relevant parameters are estimated from development data only, I.e. without resorting to additional client data. The various approaches are tested on the Dutch SESP database.
 
SP4.7

   
Text Dependent Speaker Verification Using Binary Classifiers
D. Genoud, M. Moreira, E. Mayoraz  (IDIAP, Switzerland)
This paper describes how a speaker verification task can be advantageously decomposed into a series of binary classification problems, i.e. each problem discriminating between two classes only. Each binary classifier is specific to one speaker, one anti-speaker and one word. Continuous attribute decision trees are used as classifiers. The set of classifiers is then pruned to eliminate the less relevant ones. Diverse pruning methods are experimented, and it is shown that when the speaker verification decision is performed with an a priori threshold, some of them give better results than a reference HMM system.
 
SP4.8

   
Speaker Verification Using Verbal Information Verification for Automatic Enrollment
Q. Li, B. Juang  (Bell Labs, Lucent Technologies, USA)
A conventional speaker verification (SV) system needs an enrollment session to collect the training data. In [1], we introduced a speaker authentication method called Verbal Information Verification (VIV) which verifies a speaker by verbal contents instead of speech characteristics. Such a system does not need an enrollment session. In this paper, VIV is combined with SV. We propose a system which uses VIV to collect training data during the first few accesses automatically, which are often from different acoustic environments. Then, a speaker dependent model is trained and speaker authentication can be performed by SV. This approach not only avoid formal enrollment session which brings convenience to the user, but mitigates the mismatch problem causing by different acoustic environments between training and test sessions. Our experiments show that the proposed system improved the SV performance over 40% compared to the conventional SV system.
 

< Previous Abstract - SP3

SP5 - Next Abstract >