Utterance Verification and Word Spotting 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Context Dependent Anti Subword Modeling for Utterance Verification

Authors:

Padma Ramesh, Lucent Technologies (USA)
Chin-Hui Lee, Lucent Technologies (USA)
Biing-Hwang Juang, Lucent Technologies (USA)

Page (NA) Paper number 880

Abstract:

Utterance verification is used in spoken language dialog systems to reject the speech that does not belong to the task and to correctly recognize the sentences that do. Current verification systems use context dependent (CD) or context independent (CI) subword models and CI anti-subword models. We propose many methods of modeling the CD anti-subword models. We have compared these anti-models and show that the anti-models with the same context have the most separation between the speech that contains the subword and the speech that does not contain the subword. We have also conducted recognition/verification experiments with a two pass verifier and two one pass verification systems to compare the different types of anti-subword models. Our results show that the same context anti-subword models have the best recognition/verification performance.

SL980880.PDF (From Author) SL980880.PDF (Rasterized)

TOP


Combination of Confidence Measures in Isolated Word Recognition

Authors:

J.G.A. Dolfing, Philips Research Laboratories (Germany)
Andreas Wendemuth, Philips Research Laboratories (Germany)

Page (NA) Paper number 481

Abstract:

In the context of command-and-control applications, we exploit confidence measures in order to classify utterances into two categories: utterances within the vocabulary which are recognized correctly, and other (out-of-vocabulary= OOV and misrecognized) utterances. We investigate the classification error rate (CER) of several classes of confidence measures and transformations based on a database containing 3345 utterances by 50 male and female individuals, employing data-independent and data-dependent measures. The transformations we investigated include mapping to single confidence measures, LDA-transformed measures, and other linear combinations of these measures. These combinations are computed by means of neural networks trained with Bayes-optimal, and with Gardner-Derrida-optimal criteria. Compared to a recognition system without confidence measures, the selection of (various combinations of) confidence measures, and the selection of suitable neural network architectures and training methods, continuously improves the CER from 16.7% to 6.6% (-60% relative). Furthermore, a linear perceptron generalizes better than a non-linear backpropagation network.

SL980481.PDF (From Author) SL980481.PDF (Rasterized)

TOP


Confidence Measures for HMM-based Speech Recognition

Authors:

Daniel Willett, Gerhard-Mercator-University, Duisburg (Germany)
Andreas Worm, Gerhard-Mercator-University, Duisburg (Germany)
Christoph Neukirchen, Gerhard-Mercator-University, Duisburg (Germany)
Gerhard Rigoll, Gerhard-Mercator-University, Duisburg (Germany)

Page (NA) Paper number 525

Abstract:

In this paper, we describe our work on the field of confidence measures for HMM-based speech recognition. Confidence measures are a means of estimating the recognition reliability for single words of the recognizer output. The possible applications of such measures are manifold. We present our experiments with well known approaches and propose some new ones. Particularly, we propose to combine the mere acoustical measures with language model-based ones for continuous speech recognition that involves a stochastic language model. This slightly improves the acoustical measures and preserves their advantage of being computationally very cheap. Experiments are carried out on a German isolated word recognition system and on continuous speech recognition systems for the Resource Management database and the Wall Street Journal WSJ0 task.

SL980525.PDF (From Author) SL980525.PDF (Rasterized)

TOP


Vocabulary-Independent Word Confidence Measure Using Subword Features

Authors:

Li Jiang, Microsoft Research (USA)
Xuedong Huang, Microsoft Research (USA)

Page (NA) Paper number 625

Abstract:

This paper discusses how to compute word-level confidence measures based on sub-word features for large-vocabulary speaker-independent speech recognition. The performance of confidence measure using features at word, phone and senone level is experimentally studied. A framework of transformation function based system using sub-word features is proposed for high performance confidence estimation. In this system, discriminative training is used to optimize the parameters of the transformation function. In comparison to the baseline, experiments show that the proposed system reduces the equal error rate by 15%, with up to 40% false acceptance error reduction at various fixed false rejection rate. The combination of multiple features under the proposed framework is also discussed.

SL980625.PDF (From Author) SL980625.PDF (Rasterized)

TOP


A New Confidence Measure Based on Rank-Ordering Subphone Scores

Authors:

Qiguang Lin, IBM T. J. Watson Research Center (USA)
Subrata Das, IBM T. J. Watson Research Center (USA)
David Lubensky, IBM T. J. Watson Research Center (USA)
Michael Picheny, IBM T. J. Watson Research Center (USA)

Page (NA) Paper number 806

Abstract:

This paper presents a new approach to measuring how confidently a word has been correctly recognized, or confidence measure. The approach consists of three major steps: (1) standard decoding; (2) forced Viterbi alignment (needed for stack decoders); and (3) rank-ordering of the subphone scores. More specifically, from the aligned sentence the third step computes the likelihood scores of the hypothesized subphone and all other competing subphones. A list of the subphones is generated in the descending order of the scores and a rank is assigned to the hypothesized subphone according to its positioning. Additional processing of selective weighting and upper-bound limiting is applied to minimize contamination of rank computations by bad segments or by highly-variable phones. The obtained rank is then used as the confidence measure. Results of word rejection experiments show that the new approach outperforms other measures such as whole-word scores by reducing the equal error rate from 32% to 20%.

SL980806.PDF (From Author) SL980806.PDF (Rasterized)

TOP


Speaking-Style Dependent Lexicalized Filler Model for Key-Phrase Detection and Verification

Authors:

Tatsuya Kawahara, Kyoto University (Japan)
Kentaro Ishizuka, Kyoto University (Japan)
Shuji Doshita, Kyoto University (Japan)
Chin-Hui Lee, Bell Laboratories (USA)

Page (NA) Paper number 761

Abstract:

A task-independent filler modeling for robust key-phrase detection and verification is proposed. Instead of assuming task-specific lexical knowledge, our model is designed to characterize phrases depending on the speaking-style, thus can be trained with large corpora of different but similar tasks. We present two implementations of the portable and general model. The dialogue-style dependent model trained with the ATIS corpus is used as a filler and shown to be effective in detection-based speech understanding on different dialogue applications. The lecture-style dependent filler model trained with transcriptions of various oral presentations also improves the verification of key-phrases uttered during lectures.

SL980761.PDF (From Author) SL980761.PDF (Rasterized)

TOP