Utterance Verification and Word Spotting

Chair: Mazin Rahim, AT&T Labs, USA

Home

Experiments in Confidence Scoring Using Spanish Callhome Data

Authors:

Jon G Vaver, Department of Defense (U.S.A.)

Volume 1, Page 209, Paper number 1194

Abstract:

We present results relevant to tasks involved in the confidence scoring of output from a continuous speech recognition system, including the search for predictor variables and model selection. We introduce the DET curve characteristic (DCC) score, which we use along with the normalized cross entropy (NCE) score, to perform the model and predictor variable evaluation. We also show results from experiments that suggest how the NCE and DCC scores vary with recognizer performance.

ic981194.pdf (Scanned)

TOP

A New Decoder Based on a Generalized Confidence Score

Authors:

Myoung-Wan Koo, Telecom (Korea)
Chin-Hui Lee, Lucent Technologies (U.S.A.)
Biing-Hwang Juang, Lucent Technologies (U.S.A.)

Volume 1, Page 213, Paper number 1745

Abstract:

We propose a new decoder based on a generalized confidence score. The generalized confidence score is defined as a product of confidence scores obtained from confidence information sources such as likelihood, likelihood ratio, duration, duration ratio, language model probabilities, supra-segmental information etc. All confidence information sources are converted into confidence score by a confidence pre-processor. We show an extended hybrid as an example of the decoder based on the generalized confidence score. The extended hybrid decoder uses multi-level confidence scores such as frame-level, phone-level, and word-level likelihood ratios, while the conventional hybrid decoder uses the frame-level confidence score. Experimental result shows that the extended decoder gives better result than the conventional hybrid decoder, particularly in dealing with out-of-vocabulary words or out-of-task sentences.

ic981745.pdf (From Postscript)

TOP

Rejection of Out-of-Vocabulary Words Using Phoneme Confidence Likelihood

Authors:

Takatoshi Jitsuhiro, NTT Human Interface Laboratories (Japan)
Satoshi Takahashi, NTT Human Interface Laboratories (Japan)
Kiyoaki Aikawa, NTT Human Interface Laboratories (Japan)

Volume 1, Page 217, Paper number 1872

Abstract:

The rejection of unknown words is important in improving the performance of speech recognition. The anti-keyword model method can reject unknown words with high accuracy in small vocabulary and specified task. Unfortunately, it is either inconvenient or impossible to apply if words in the vocabulary change frequently. We propose a new method for task independent rejection of unknown words, where a new phoneme confidence measure is used to verify partial utterances. It is used to verify each phoneme while locating candidates. Furthermore, the whole utterance is verified by a phonetic typewriter. This method can improve the accuracy of verification in each phoneme, and improve the speed of candidate search. Tests show that the proposed method improves the recognition rate by 4% compared to the conventional algorithm at equal error rates. Furthermore, a 3% improvement is obtained by training acoustic models with the MCE algorithm.

ic981872.pdf (From Postscript)

TOP

Keyword Verification Considering the Correlation of Succeeding Feature Vectors

Authors:

Jochen Junkawitsch, Siemens AG (Germany)
Harald Hoege, Siemens AG (Germany)

Volume 1, Page 221, Paper number 1931

Abstract:

The assumption of statistically independent feature vectors within the HMM approach is a well known problem. The aim of this study is to explore a simple and feasible method, that takes the correlation of adjacent feature vectors into account. A so called correlated HMM, that estimates the emission probability of a state with respect to correlated feature vectors, is built by combining two separate knowledge sources. On the one side, a traditional HMM provides an emission probability under the condition of a certain state, whereas on the other side a linear predictor delivers an emission probability considering the previous feature vectors. The efficiency of this method is shown with the help of the German SpeechDat(M) database. The application of the correlated HMM within the verification procedure of a keyword spotter provided an improvement of the Figure-of-Merit from 87.1% to 88.6%

ic981931.pdf (From Postscript)

TOP

Using Word Probabilities as Confidence Measures

Authors:

Frank Wessel, RWTH Aachen (Germany)
Klaus Macherey, RWTH Aachen (Germany)
Ralf Schlüter, RWTH Aachen (Germany)

Volume 1, Page 225, Paper number 1949

Abstract:

Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. In this paper we propose to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance. The basic idea of our approach is to estimate the posterior word probabilities as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time. The word hypothesis probabilities are approximated by paths in a wordgraph and are computed using a simplified forward-backward algorithm. We present experimental results on the North American Business (NAB'94) and the German Verbmobil recognition task.

ic981949.pdf (From Postscript)

TOP

Subword-Based Minimum Verification Error (SB-MVE) Training for Task Independent Utterance Verification

Authors:

Rafid A. Sukkar, Lucent Technologies (U.S.A.)

Volume 1, Page 229, Paper number 2143

Abstract:

In this paper we formulate a training framework and present a method for task independent utterance verification. Verification-specific HMMs are defined and discriminatively trained using minimum verification error training. Task independence is accomplished by performing the verification on the subword level and training the verification models using a general phonetically balanced database that is independent of the application tasks. Experimental results show that the proposed method significantly outperforms two other commonly used task independent utterance verification techniques. It is shown that the equal error rate of false alarms and false keyword rejection is reduced by more than 22% compared to the other two methods on a large vocabulary recognition task.

ic982143.pdf (From Postscript)

TOP

A Fast Vocabulary Independent Algorithm for Spotting Words in Speech

Authors:

Satya Dharanipragada, IBM (U.S.A.)
Salim E. Roukos, IBM (U.S.A.)

Volume 1, Page 233, Paper number 2379

Abstract:

In applications such as audio-indexing, spoken message retrieval and video-browsing, it is necessary to have the ability to detect spoken words that are outside the vocabulary of the speech recognizer used inthese systems, in large amounts of speech at speeds many times faster than real-time. In this paper we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a preprocessing stage and a coarse-to-detailed search strategy for spotting a word/phone sequence in speech. The preprocessing method provides a phone-level representation of the speech that can be searched efficiently. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us the desired accuracy and speed in wordspotting.

ic982379.pdf (From Postscript)

TOP

Integration of Utterance Verification with Statistical Language Modeling and Spoken Language Understanding

Authors:

Richard C. Rose, AT&T Labs - Research (U.S.A.)
Huan Yao, AT&T Labs - Research (U.S.A.)
Giuseppe Riccardi, AT&T Labs - Research (U.S.A.)
Jeremy H. Wright, AT&T Labs - Research (U.S.A.)

Volume 1, Page 237, Paper number 5102

Abstract:

Methods for utterance verification (UV) and their integration into statistical language modeling and spoken language understanding formalisms for a large vocabulary spoken understanding system are presented. The paper consists of three parts. First, a set of acoustic likelihood ratio based utterance verification techniques are described and applied to the problem or rejecting portions of a hypothesized word string that may have been incorrectly decoded by a large vocabulary continuous speech recognizer. Second, a procedure for integrating the acoustic level confidence measures with the statistical language model is described. Finally, the effect of integrating acoustic level confidence into the spoken language understanding unit (SLU) in a call-type classification is discussed. These techniques were evaluated on utterances collected from a highly unconstrained call routing task performed over the telephone network. They have been evaluated in terms of their ability to classify utterances into a set of 15 semantic actions corresponding to call-types that are accepted by the application.