Speaker and Language Recognition 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Speaker Detection in Broadcast Speech Databases

Authors:

Aaron E. Rosenberg, AT&T Labs Research (USA)
Ivan Magrin-Chagnolleau, AT&T Labs Research (USA)
S. Parthasarathy, AT&T Labs Research (USA)
Qian Huang, AT&T Labs Research (USA)

Page (NA) Paper number 202

Abstract:

Experiments have been carried out to assess the feasibility of detecting target speaker segments in multi-speaker broadcast databases, The experimental database consists of NBC Nightly News broadcasts. The target speaker is the news anchor, Tom Brokaw. Gaussian mixture models are constructed from labelled training data for the target speaker as well as background models for other speakers,, commercials, and music. Four labelled 30-min. broadcasts are used ffor testing. Mel-frequency cepstral features, augmented by delta cepstral features, are calculated over 20 msec. windows shifted every 10 msecs. through a broadcast. Likelihood ratio scores are calculated for each test frame averaged over blocks of frames with a specified duration. The block scores are input to a detection routine which returns estimates of target segment boundaries. The range of best results obtained over the test broadcasts is 82% to 100% detection of target segments with segment frame accuracy ranging from 86% to 95%. 0 to 2 false alarm segments are detected over each 30 min. broadcast.

SL980202.PDF (From Author) SL980202.PDF (Rasterized)

TOP


Multilateral Techniques for Speaker Recognition

Authors:

Eluned S. Parris, Ensigma Ltd. (U.K.)
Michael J. Carey, Ensigma Ltd. (U.K.)

Page (NA) Paper number 444

Abstract:

Speaker recognition is usually accomplished by building a set of models from speech of a known speaker, training data, and subsequently using a pattern matching algorithm to score the speech from an unknown speaker, test data. In this paper we discard the notion of train and test data in speaker recognition and introduce the multilateral scoring technique. This technique comprises building speaker models on material for the known speaker and matching the unknown speaker data to these models, the traditional approach to speaker recognition. The resultant scores are fused with an equivalent set of scores produced by matching the known speaker utterance to models built on the unknown speaker data. Significant improvements have been achieved using this technique on the NIST 1996, 1997 and 1998 Speaker Recognition Evaluation data. Results are presented for two speaker recognition systems, the first based on Hidden Markov models and the second based on Gaussian Mixture models.

SL980444.PDF (From Author) SL980444.PDF (Rasterized)

TOP


Real Time Speaker Indexing Based on Subspace Method - Application to TV News Articles and Debate

Authors:

Masafumi Nishida, Ryukoku University (Japan)
Yasuo Ariki, Ryukoku University (Japan)

Page (NA) Paper number 125

Abstract:

In this paper, we propose a method to extract and verify individual speaker utterance using a subspace method. This method can extract speech section of the same speaker by repeating speaker verification between the present speech section and the immediately previous speech section. The speaker models are automatically trained in the verification process without constructing speaker templates in advance. As a result, this speaker verification method is applied to speaker indexing. In this study, announcer utterances are automatically extracted from news speech data which includes reporter or interviewer utterances. Also extracted automatically are the utterances of each participator in debate program broadcasted on TV.

SL980125.PDF (From Author) SL980125.PDF (Rasterized)

TOP


SHEEP, GOATS, LAMBS and WOLVES: A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation

Authors:

George Doddington, NIST (USA)
Walter Liggett, NIST (USA)
Alvin Martin, NIST (USA)
Mark Przybocki, NIST (USA)
Douglas A. Reynolds, MIT Lincoln Laboratory (USA)

Page (NA) Paper number 608

Abstract:

Performance variability in speech and speaker recognition systems can be attributed to many factors. One major factor, which is acknowledged but seldom analyzed, is differences in the recognizability of different speakers. In speaker recognition systems such differences are characterized by the use of animal names for different types of speakers, including sheep, goats, lambs and wolves, depending on their behavior with respect to automatic recognition systems. In this paper we propose statistical tests for the existance of these animals and hunt for such animals using results from the 1998 NIST speaker recognition evaluation.

SL980608.PDF (From Author) SL980608.PDF (Rasterized)

TOP


Progress in Speaker Recognition at Dragon Systems

Authors:

Andres Corrada-Emmanuel, Dragon Systems, Inc. (USA)
Michael Newman, Dragon Systems, Inc. (USA)
Barbara Peskin, Dragon Systems, Inc. (USA)
Lawrence Gillick, Dragon Systems, Inc. (USA)
Robert Roth, Dragon Systems, Inc. (USA)

Page (NA) Paper number 1017

Abstract:

We present a new algorithm for speaker recognition (the Sequential Non-Parametric system, or SNP) that has the potential to overcome two limitations of the current approaches. It uses sequences of frames instead of one frame at a time; and it avoids the need to model a speaker with mixtures of Gaussians by scoring the data non-parametrically. Although at an early stage in its development, SNP's output can be interpolated with that of our GMM system to outperform state-of-the-art GMM's. Comparative results are presented for the 1998 NIST Speaker Recognition Evaluation test set.

SL981017.PDF (Scanned)

TOP


A Comparative Study Of Speaker Verification Systems Using The Polycost Database

Authors:

Tomas Nordström, Telia Research AB (Sweden)
Haakan Melin, KTH, Dept. of Speech, Music and Hearing (Sweden)
Johan Lindberg, KTH, Dept. of Speech, Music and Hearing (Sweden)

Page (NA) Paper number 773

Abstract:

This paper reports on a comparative study of several automatic speaker verification systems using the Polycost database. Polycost is a multi-lingual database with non-native English and mother-tongue speech by subjects from 14 countries. We present results for the first three baseline experiments defined for the database as well as explore the multi-lingual aspects of Polycost in a number of experiments where we compare cross-language and same-language impostor attempts. Our results then lead us to suggest a revised set of baseline experiments.

SL980773.PDF (From Author) SL980773.PDF (Rasterized)

TOP