Session T3B Speaker and Language Identification

Chairperson Sadaoki Furui NTT, Japan

Home

IMPROVED SPEAKER VERIFICATION SYSTEM WITH LIMITED TRAINING DATA ON TELEPHONE QUALITY SPEECH

Authors: S. Hussain(1)(2), F. R. McInnes (1) and M. A. Jack(1)

e-mail: hussain@ccir.ed.ac.uk (1)Centre for Communication Interface Research, University of Edinburgh, UK (2)Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Malaysia

Volume 2 pages 835 - 838

ABSTRACT

A hybrid neural network is proposed for speaker verification (SV). The basic idea in this system is the usage of vector quantization preprocessing as the feature extractor. The experiments were carried out using a neural network model(NNM) with frame labelling performed from a client codebook known as NNM-C. Improved performance for NNM-C with more inputs and proper alignment of the speech signals supports the hypothesis that a more detailed representation of the speech patterns proved helpful for the system. The flexibility of this system allows an equal error rate (EER) of 11.2% on a single isolated digit and 0.7% on a sequence of 12 isolated digits. This paper also compares neural network speaker verification system with the more conventional method like Hidden Markov models.

A0019.pdf

TOP

VERBAL INFORMATION VERIFICATION

Authors: Qi Li, Biing-Hwang Juang, Qiru Zhou and Chin-Hui Lee

Multimedia Communications Research Laboratory Bell Labs, Lucent Technologies, Murray Hill, NJ 07974, USA fqli,bhj,qzhou,chlg@research.bell-labs.com

Volume 2 pages 839 - 842

ABSTRACT

Traditionally, speaker authentication has focused on two categories of techniques: speaker verification and speaker identification. In this paper, we introduce a third category called verbal information verification (VIV) in which a claimed speaker's utterances are verified against the key information in the speaker's registered profile to decide whether the claimed identity should be accepted or rejected. The proposed VIV technique can be used independently or combined with the traditional speaker verification techniques to achieve flexible and improved speaker authentication. Instead of accomplishing VIV through recognizing the key information, the proposed VIV algorithm is based on the concept of sequential utterance verification. In a telephone speaker authentication experiment on 100 speakers and using three pass-utterances in response to three categories of questions, the proposed VIV system achieved 0.00% equal-error rate, compared to 30% false rejection rate on an automatic speech recognition approach.

A0079.pdf

TOP

A SEGMENT-BASED SPEAKER VERIFICATION SYSTEM USING SUMMIT 1

Authors: Sridevi V. Sarma and Victor W. Zue

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA fsree,zueg@sls.lcs.mit.edu

Volume 2 pages 843 - 846

ABSTRACT

The main goal of this work is to develop a competitive segment- based speaker verification system that is computationally efficient. To achieve our goal, we modified SUMMIT [12] to suit our needs. The speech signal was first transformed into a hierarchical segment network using frame-based measurements. Next, acoustic models for 168 speakers were developed for a set of 6 broad phoneme classes. The models represented feature statistics with diagonal Gaussians, preceded by principle component analysis. The feature vector included segment-averaged MFCCs, plus three prosodic measurements: energy, fundamental frequency (F0), and duration. The size and content of the feature vector were determined through a greedy algorithm while optimizing overall speaker verification performance. We were able to achieve a performance of 2.74% equal error rate (EER) using cohorts during testing; and 1.59% EER using all speakers during testing. We reduced computation significantly through the use of a small number of features, a small number of phonetic models per speaker, few model parameters, and few competing speakers during testing (when cohorts are used).

A0101.pdf

TOP

SPEAKER VERIFICATION ON THE WORLD WIDE WEB

Authors: Michael Sokolov

Digital Equipment Corporation Cambridge Research Laboratory 1 Kendall Square, Cambridge, MA 02139 USA Tel. (617) 692-7659, E-mail: sokolov@crl.dec.com http://www.research.digital.com/CRL

Volume 2 pages 847 - 850

ABSTRACT

This paper describes a system for controlling access to web resources built using well-known speaker verification techniques. We describe the implementation of a speech verification server and an associated authentication module for the Apache web server. Speaker verification requires two inputs: a sample of the user's speech and an identity claim for the user; typically the user's name. However a more convenient system would not require a user name to be entered. We present the results of an attempt to implement speech-only authentication using open set speaker identification. We explore the effect of database size on performance.

A0303.pdf

TOP

TEXT-PROMPTED VERSUS SOUND-PROMPTED PASSWORDS IN SPEAKER VERIFICATION SYSTEMS

Authors: Johan Lindberg and Håkan Melin

Department of Speech, Music and Hearing KTH S-100 44 Stockholm, Sweden. Tel. +46 8 790 9269, Fax: +46 8 790 7854, E-mail: {lindberg,melin}@speech.kth.se

Volume 2 pages 851 - 854

ABSTRACT

The problem of how to prompt a client for a password in an automatic, prompted speaker verification system is addressed. Text-prompting of four-digit sequences is compared to speech-prompting of the same sequences, and speech-prompting of four digits is compared to speech-prompting of five digits. Speech recordings are analyzed by comparing speaker verification performance and by inspecting the number and type of speaking errors that subjects made. From the experiment it is clear that text-prompting gives the subjects an easier task and fewer speaking errors are produced in that context. When enrolling clients with text-prompted speech and performing verification with an HMM-based system, the average EER was larger for speech-prompted items compared to text-prompted items, but changes in individual EERs varies across the test population.

A0973.pdf

TOP

GMM SAMPLE STATISTIC LOG-LIKELIHOODS FOR TEXT-INDEPENDENT SPEAKER RECOGNITION

Authors: Michael Schmidt John Golden Herbert Gish

BBN Systems and Technologies 70 Fawcett St., Cambridge, MA 02138 USA mschmidt@bbn.com

Volume 2 pages 855 - 858

ABSTRACT

A novel approach to scoring Gaussian mixture mod- els is presented. Feature vectors are assigned to the individual Gaussians making up the model and log-likelihoods of the separate Gaussians are computed and summed. Furthermore, the log-likelihoods of the individual Gaussians can be decomposed into sample weight, mean, and covariance log-likelihoods. Correlation likelihoods can also be computed. The results of the various systems are comparable on text-independent speaker recognition experiments despite the fact that the models and scoring are all quite dierent. By decomposing log-likelihoods of models into various sample statistic log-likelihoods, it is possible to diagnose which part of the model has the greatest discriminative power, whether the location of the Gaussians or their shapes.

A1301.pdf