Authors:
Padma Ramesh, Lucent Technologies (USA)
Chin-Hui Lee, Lucent Technologies (USA)
Biing-Hwang Juang, Lucent Technologies (USA)
Page (NA) Paper number 880
Abstract:
Utterance verification is used in spoken language dialog systems to
reject the speech that does not belong to the task and to correctly
recognize the sentences that do. Current verification systems use context
dependent (CD) or context independent (CI) subword models and CI anti-subword
models. We propose many methods of modeling the CD anti-subword models.
We have compared these anti-models and show that the anti-models with
the same context have the most separation between the speech that contains
the subword and the speech that does not contain the subword. We have
also conducted recognition/verification experiments with a two pass
verifier and two one pass verification systems to compare the different
types of anti-subword models. Our results show that the same context
anti-subword models have the best recognition/verification performance.
Authors:
J.G.A. Dolfing, Philips Research Laboratories (Germany)
Andreas Wendemuth, Philips Research Laboratories (Germany)
Page (NA) Paper number 481
Abstract:
In the context of command-and-control applications, we exploit confidence
measures in order to classify utterances into two categories: utterances
within the vocabulary which are recognized correctly, and other (out-of-vocabulary=
OOV and misrecognized) utterances. We investigate the classification
error rate (CER) of several classes of confidence measures and transformations
based on a database containing 3345 utterances by 50 male and female
individuals, employing data-independent and data-dependent measures.
The transformations we investigated include mapping to single confidence
measures, LDA-transformed measures, and other linear combinations of
these measures. These combinations are computed by means of neural
networks trained with Bayes-optimal, and with Gardner-Derrida-optimal
criteria. Compared to a recognition system without confidence measures,
the selection of (various combinations of) confidence measures, and
the selection of suitable neural network architectures and training
methods, continuously improves the CER from 16.7% to 6.6% (-60% relative).
Furthermore, a linear perceptron generalizes better than a non-linear
backpropagation network.
Authors:
Daniel Willett, Gerhard-Mercator-University, Duisburg (Germany)
Andreas Worm, Gerhard-Mercator-University, Duisburg (Germany)
Christoph Neukirchen, Gerhard-Mercator-University, Duisburg (Germany)
Gerhard Rigoll, Gerhard-Mercator-University, Duisburg (Germany)
Page (NA) Paper number 525
Abstract:
In this paper, we describe our work on the field of confidence measures
for HMM-based speech recognition. Confidence measures are a means of
estimating the recognition reliability for single words of the recognizer
output. The possible applications of such measures are manifold.
We present our experiments with well known approaches and propose some
new ones. Particularly, we propose to combine the mere acoustical
measures with language model-based ones for continuous speech recognition
that involves a stochastic language model. This slightly improves the
acoustical measures and preserves their advantage of being computationally
very cheap. Experiments are carried out on a German isolated word
recognition system and on continuous speech recognition systems for
the Resource Management database and the Wall Street Journal WSJ0 task.
Authors:
Li Jiang, Microsoft Research (USA)
Xuedong Huang, Microsoft Research (USA)
Page (NA) Paper number 625
Abstract:
This paper discusses how to compute word-level confidence measures
based on sub-word features for large-vocabulary speaker-independent
speech recognition. The performance of confidence measure using features
at word, phone and senone level is experimentally studied. A framework
of transformation function based system using sub-word features is
proposed for high performance confidence estimation. In this system,
discriminative training is used to optimize the parameters of the transformation
function. In comparison to the baseline, experiments show that the
proposed system reduces the equal error rate by 15%, with up to 40%
false acceptance error reduction at various fixed false rejection rate.
The combination of multiple features under the proposed framework is
also discussed.
Authors:
Qiguang Lin, IBM T. J. Watson Research Center (USA)
Subrata Das, IBM T. J. Watson Research Center (USA)
David Lubensky, IBM T. J. Watson Research Center (USA)
Michael Picheny, IBM T. J. Watson Research Center (USA)
Page (NA) Paper number 806
Abstract:
This paper presents a new approach to measuring how confidently a word
has been correctly recognized, or confidence measure. The approach
consists of three major steps: (1) standard decoding; (2) forced Viterbi
alignment (needed for stack decoders); and (3) rank-ordering of the
subphone scores. More specifically, from the aligned sentence the third
step computes the likelihood scores of the hypothesized subphone and
all other competing subphones. A list of the subphones is generated
in the descending order of the scores and a rank is assigned to the
hypothesized subphone according to its positioning. Additional processing
of selective weighting and upper-bound limiting is applied to minimize
contamination of rank computations by bad segments or by highly-variable
phones. The obtained rank is then used as the confidence measure. Results
of word rejection experiments show that the new approach outperforms
other measures such as whole-word scores by reducing the equal error
rate from 32% to 20%.
Authors:
Tatsuya Kawahara, Kyoto University (Japan)
Kentaro Ishizuka, Kyoto University (Japan)
Shuji Doshita, Kyoto University (Japan)
Chin-Hui Lee, Bell Laboratories (USA)
Page (NA) Paper number 761
Abstract:
A task-independent filler modeling for robust key-phrase detection
and verification is proposed. Instead of assuming task-specific lexical
knowledge, our model is designed to characterize phrases depending
on the speaking-style, thus can be trained with large corpora of different
but similar tasks. We present two implementations of the portable
and general model. The dialogue-style dependent model trained with
the ATIS corpus is used as a filler and shown to be effective in detection-based
speech understanding on different dialogue applications. The lecture-style
dependent filler model trained with transcriptions of various oral
presentations also improves the verification of key-phrases uttered
during lectures.
|