ABSTRACT
An open-set speaker identification system is described in which general-text, sentence-long phrases are used as passwords. Customers are allowed to select their own password phrases and the system has no knowledge of the text. Passwords are represented by phone transcriptions and whole-phrase Hidden Markov Models (HMM's). Phrase identification, carried out using both speaker dependent and speaker independent models, constitutes an identity claim. Verification of the claim uses likelihood ratio scoring with speaker independent phone HMM's providing the background model score. An evaluation has been carried out over a database of password phrases spoken by 250 speakers. 100 of the speakers are test speakers. In an experimental trial, each test speaker is designated as a customer or an imposter and speaks the phrase associated with the customer. The imposter set for each customer consists of same-gender test speakers excluding the customer. At a 5% reject level, the rate of imposter identification is approximately 4%. The misidentification rate for both customers and imposters is less than 0.1%. The closed-set identification error rate is less than 1%, while the average verification equal-error rate is approximately 3%.
ABSTRACT
Speaker verification based on phone modelling is examined in this paper. Phone modelling is attractive, because different phonemes have different levels of usefulness for speaker recognition, and because phone modelling essentially makes a speaker verification algorithm text inde-pendent. The speaker verification system used here is based on a two stage approach, where speech recognition (segmentation) is separated from the actual speaker modelling. Hidden Markov Models are employed in the initial stage, whereas Radial Basis Function networks are used in the second for modelling speaker identity. The system is evaluated on a large realistic telephone database.
ABSTRACT
This paper presents an investigation into the relative effectiveness of various score normalisation methods for speaker verification. The study provides a thorough analysis of different approaches for normalising verification scores, and comparatively examines these under identical experimental conditions. The experiments are based on the use of subsets of the Brent (telephone quality) speech database, consisting of repetitions of isolated digit utterances zero to nine spoken by native English speakers. Based on the experimental results it is demonstrated that amongst the considered methods, a particular form of the cohort normalisation method provides the best performance in terms of the verification accuracy. The paper discusses details of the experimental study and presents an analysis of the results.
ABSTRACT
Automatic speaker recognition on a vocoder link has rarely been explicitly tested. In this paper, we show how the automatic speaker recognition could be used on a vocoder link. In a first experiment where we consider the "coder-link-decoder" speech system as a black box, a classic speaker recognition method (applied on the reconstructed speech) is shown to be able to provide an objective measurement of the voice quality of the vocoder. In a second experiment, the same speaker recognition method is directly applied on the information contained in the coded frames. In latter case, the recognition scores provide an interesting analysis.
ABSTRACT
Cet article presente une methode d'ajustement des seuils de verification du locuteur basee sur un modele Gaussien des distributions du logarithme du rapport de vraisemblance. L'article expose les hypotheses sous lesquelles ce modele est valide, indique plusieurs methodes d'ajustement des seuils, et en illustre les apports et les limites par des experiences de verification sur une base de donnees de 20 locuteurs.
ABSTRACT
Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that "clean" pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the "one-session" condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and I 1 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.