Session W1C Speaker Recognition II

Chairperson Aaron Rosenberg AT & T Labs, USA

Home

SPEAKER IDENTIFICATION WITH USER-SELECTED PASSWORD PHRASES

Authors: Aaron E. Rosenberg S. Parthasarathy

Speech and Image Processing Services Research Lab AT&T Labs Florham Park, NJ 07932 USA

Volume 3 pages 1371 - 1374

ABSTRACT

An open-set speaker identification system is described in which general-text, sentence-long phrases are used as passwords. Customers are allowed to select their own password phrases and the system has no knowledge of the text. Passwords are represented by phone transcriptions and whole-phrase Hidden Markov Models (HMM's). Phrase identification, carried out using both speaker dependent and speaker independent models, constitutes an identity claim. Verification of the claim uses likelihood ratio scoring with speaker independent phone HMM's providing the background model score. An evaluation has been carried out over a database of password phrases spoken by 250 speakers. 100 of the speakers are test speakers. In an experimental trial, each test speaker is designated as a customer or an imposter and speaks the phrase associated with the customer. The imposter set for each customer consists of same-gender test speakers excluding the customer. At a 5% reject level, the rate of imposter identification is approximately 4%. The misidentification rate for both customers and imposters is less than 0.1%. The closed-set identification error rate is less than 1%, while the average verification equal-error rate is approximately 3%.

A0243.pdf

TOP

SPEAKER VERIFICATION BASED ON PHONETIC DECISION MAKING

Authors: Jesper O. Olsen

Center for PersonKommunikation, Aalborg University, Fredrik Bajers Vej 7A-2, DK-9220 Aalborg Øst, Denmark email: jo@cpk.auc.dk

Volume 3 pages 1375 - 1378

ABSTRACT

Speaker verification based on phone modelling is examined in this paper. Phone modelling is attractive, because different phonemes have different levels of usefulness for speaker recognition, and because phone modelling essentially makes a speaker verification algorithm text inde-pendent. The speaker verification system used here is based on a two stage approach, where speech recognition (segmentation) is separated from the actual speaker modelling. Hidden Markov Models are employed in the initial stage, whereas Radial Basis Function networks are used in the second for modelling speaker identity. The system is evaluated on a large realistic telephone database.

A0389.pdf

TOP

ANALYSIS AND COMPARISON OF SCORE NORMALISATION METHODS FOR TEXT-DEPENDENT SPEAKER VERIFICATION

Authors: A. M. Ariyaeeinia and P. Sivakumaran

University of Hertfordshire, Hatfield, Hertfordshire, AL10 9AB, UK A.M.Ariyaeeinia@herts.ac.uk, P.Sivakumaran@herts.ac.uk

Volume 3 pages 1379 - 1382

ABSTRACT

This paper presents an investigation into the relative effectiveness of various score normalisation methods for speaker verification. The study provides a thorough analysis of different approaches for normalising verification scores, and comparatively examines these under identical experimental conditions. The experiments are based on the use of subsets of the Brent (telephone quality) speech database, consisting of repetitions of isolated digit utterances zero to nine spoken by native English speakers. Based on the experimental results it is demonstrated that amongst the considered methods, a particular form of the cohort normalisation method provides the best performance in terms of the verification accuracy. The paper discusses details of the experimental study and presents an analysis of the results.

A0762.pdf

TOP

AUTOMATIC SPEAKER RECOGNITION ON A VOCODER LINK

Authors: Frédéric Jauquet, Patrick Verlinde and Claude Vloeberghs

Signal and Image Processing Center (SIC) Electrical & Telecommunication Dept. Royal Military Academy, Av. de la Renaissance, 30 - 1000 Brussels - Belgium Tel : +32 2 737 62 53, FAX : +32 2 737 62 53, E-mail : frederic.jauquet@tele.rma.ac.be

Volume 3 pages 1383 - 1386

ABSTRACT

Automatic speaker recognition on a vocoder link has rarely been explicitly tested. In this paper, we show how the automatic speaker recognition could be used on a vocoder link. In a first experiment where we consider the "coder-link-decoder" speech system as a black box, a classic speaker recognition method (applied on the reconstructed speech) is shown to be able to provide an objective measurement of the voice quality of the vocoder. In a second experiment, the same speaker recognition method is directly applied on the information contained in the coded frames. In latter case, the recognition scores provide an interesting analysis.

A1089.pdf

TOP

LIKELIHOOD RATIO ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION

Authors: Frederic BIMBOT (1) and Dominiqne GENOUD (2)

(1) ENST - Dept Signal, CNRS - URA 820, 46 Rue Barrault, 75634 Paris cedex 13, FRANCE, European Union (2) IDIAP, Rue du Simplon 4, Case Postale 592, CH-1920 Martigny, SWITZERLAND bimbot@sig.enst.fr genoud@idiap.ch

Volume 3 pages 1387 - 1390

ABSTRACT

Cet article presente une methode d'ajustement des seuils de verification du locuteur basee sur un modele Gaussien des distributions du logarithme du rapport de vraisemblance. L'article expose les hypotheses sous lesquelles ce modele est valide, indique plusieurs methodes d'ajustement des seuils, et en illustre les apports et les limites par des experiences de verification sur une base de donnees de 20 locuteurs.

A1147.pdf

TOP

A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY BASED SPEAKER RECOGNITION

Authors: M. Kemal Sonmez Larry Heck Mitchel Weintraub Elizabeth Shriberg

SRI International 333 Ravenswood Ave. Menlo Park, CA 94025

Volume 3 pages 1391 - 1394

ABSTRACT

Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that "clean" pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the "one-session" condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and I 1 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

A1148.pdf