Authors:
Aaron E. Rosenberg, AT&T Labs Research (USA)
Ivan Magrin-Chagnolleau, AT&T Labs Research (USA)
S. Parthasarathy, AT&T Labs Research (USA)
Qian Huang, AT&T Labs Research (USA)
Page (NA) Paper number 202
Abstract:
Experiments have been carried out to assess the feasibility of detecting
target speaker segments in multi-speaker broadcast databases, The experimental
database consists of NBC Nightly News broadcasts. The target speaker
is the news anchor, Tom Brokaw. Gaussian mixture models are constructed
from labelled training data for the target speaker as well as background
models for other speakers,, commercials, and music. Four labelled
30-min. broadcasts are used ffor testing. Mel-frequency cepstral features,
augmented by delta cepstral features, are calculated over 20 msec.
windows shifted every 10 msecs. through a broadcast. Likelihood ratio
scores are calculated for each test frame averaged over blocks of frames
with a specified duration. The block scores are input to a detection
routine which returns estimates of target segment boundaries. The
range of best results obtained over the test broadcasts is 82% to 100%
detection of target segments with segment frame accuracy ranging from
86% to 95%. 0 to 2 false alarm segments are detected over each 30
min. broadcast.
Authors:
Eluned S. Parris, Ensigma Ltd. (U.K.)
Michael J. Carey, Ensigma Ltd. (U.K.)
Page (NA) Paper number 444
Abstract:
Speaker recognition is usually accomplished by building a set of models
from speech of a known speaker, training data, and subsequently using
a pattern matching algorithm to score the speech from an unknown speaker,
test data. In this paper we discard the notion of train and test data
in speaker recognition and introduce the multilateral scoring technique.
This technique comprises building speaker models on material for the
known speaker and matching the unknown speaker data to these models,
the traditional approach to speaker recognition. The resultant scores
are fused with an equivalent set of scores produced by matching the
known speaker utterance to models built on the unknown speaker data.
Significant improvements have been achieved using this technique on
the NIST 1996, 1997 and 1998 Speaker Recognition Evaluation data. Results
are presented for two speaker recognition systems, the first based
on Hidden Markov models and the second based on Gaussian Mixture models.
Authors:
Masafumi Nishida, Ryukoku University (Japan)
Yasuo Ariki, Ryukoku University (Japan)
Page (NA) Paper number 125
Abstract:
In this paper, we propose a method to extract and verify individual
speaker utterance using a subspace method. This method can extract
speech section of the same speaker by repeating speaker verification
between the present speech section and the immediately previous speech
section. The speaker models are automatically trained in the verification
process without constructing speaker templates in advance. As a result,
this speaker verification method is applied to speaker indexing. In
this study, announcer utterances are automatically extracted from news
speech data which includes reporter or interviewer utterances. Also
extracted automatically are the utterances of each participator in
debate program broadcasted on TV.
Authors:
George Doddington, NIST (USA)
Walter Liggett, NIST (USA)
Alvin Martin, NIST (USA)
Mark Przybocki, NIST (USA)
Douglas A. Reynolds, MIT Lincoln Laboratory (USA)
Page (NA) Paper number 608
Abstract:
Performance variability in speech and speaker recognition systems can
be attributed to many factors. One major factor, which is acknowledged
but seldom analyzed, is differences in the recognizability of different
speakers. In speaker recognition systems such differences are characterized
by the use of animal names for different types of speakers, including
sheep, goats, lambs and wolves, depending on their behavior with respect
to automatic recognition systems. In this paper we propose statistical
tests for the existance of these animals and hunt for such animals
using results from the 1998 NIST speaker recognition evaluation.
Authors:
Andres Corrada-Emmanuel, Dragon Systems, Inc. (USA)
Michael Newman, Dragon Systems, Inc. (USA)
Barbara Peskin, Dragon Systems, Inc. (USA)
Lawrence Gillick, Dragon Systems, Inc. (USA)
Robert Roth, Dragon Systems, Inc. (USA)
Page (NA) Paper number 1017
Abstract:
We present a new algorithm for speaker recognition (the Sequential
Non-Parametric system, or SNP) that has the potential to overcome two
limitations of the current approaches. It uses sequences of frames
instead of one frame at a time; and it avoids the need to model a speaker
with mixtures of Gaussians by scoring the data non-parametrically.
Although at an early stage in its development, SNP's output can be
interpolated with that of our GMM system to outperform state-of-the-art
GMM's. Comparative results are presented for the 1998 NIST Speaker
Recognition Evaluation test set.
Authors:
Tomas Nordström, Telia Research AB (Sweden)
Haakan Melin, KTH, Dept. of Speech, Music and Hearing (Sweden)
Johan Lindberg, KTH, Dept. of Speech, Music and Hearing (Sweden)
Page (NA) Paper number 773
Abstract:
This paper reports on a comparative study of several automatic speaker
verification systems using the Polycost database. Polycost is a multi-lingual
database with non-native English and mother-tongue speech by subjects
from 14 countries. We present results for the first three baseline
experiments defined for the database as well as explore the multi-lingual
aspects of Polycost in a number of experiments where we compare cross-language
and same-language impostor attempts. Our results then lead us to suggest
a revised set of baseline experiments.
|