SessionT3A Confidence Measures in ASR

Chairperson Jose M. Pardo UPM, Spain

Home

A Low-Cost Phonetic Transcription Method

Authors: Pablo Fetter Udo Haiber Peter Regel-Brietzmann

Daimler-Benz AG, Research and Technology, Wilhelm-Runge-Str. 11, D-89081 Ulm, Germany e-mail: fetter@dbag.ulm.daimlerbenz.com

Volume 2 pages 811 - 814

ABSTRACT

In this paper our goal is to find the phonetic transcription of spoken utterances. We present a method which uses information extracted directly from the word-based search to compute the most likely phoneme sequence. Utterances are transcribed during recognition, so that the phonetic representation of the input is available after the search. Using this method, the computational cost of the word-based search remains almost unaltered, and the phonetic transcription is obtained almost for free.

A0444.pdf

TOP

WORD AND ACOUSTIC CONFIDENCE ANNOTATION FOR LARGE VOCABULARY SPEECH RECOGNITION

Authors: Lin Chase

The Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, Pennsylvania 15213 USA chase@cs.cmu.edu

Volume 2 pages 815 - 818

ABSTRACT

We present improvements in confidence annotation of automatic speech recognizer output for large vocabulary, speaker- independent systems. Several strong additions to the set of predictor variables used for this purpose are discussed. Extensions which allow prediction of separate tvpes of errors, as opposed to the simple presence of an error, are presented. A new development, acoustic confidenceannotation, is explored, in which a predictor is built that indicates the likely successes and failures of the acoustic models alone. Four separate learning mechanisms are compared in terms of their ability to provide good confidence annotations from the same set of predictor variables. Performance figures are reported on both read news (the North American Business news corpus) and conversational telephone speech (the Switchboard corpus), both in American English. The Sphinx-II system [1] is used for the NAB tests. The Janus system [2J is used for the Switchboard tests.

A0612.pdf

TOP

A SENONE BASED CONFIDENCE MEASURE FOR SPEECH RECOGNITION

Authors: Z. Bergen W. Ward

Berdy Medical Systems 4909 Pearl East Circle, Suite 202 Boulder, Colorado, USA 80301 Tel. 303-417-1603, FAX 303-417-1662, E-mail: zbergen@berdy.com Carnegie Mellon University Pittsburgh, PA, USA 15213 E-mail: whw@cs.cmu.edu

Volume 2 pages 819 - 822

ABSTRACT

This paper describes three experiments in using frame level observation probabilities as the basis for word confidence annotation in an HMM speech recognition system. One experiment is at the word level, one uses word classes, and the other uses phone classes. In each experiment we categorize hypotheses into correct and incorrect categories by aligning a best recognition hypothesis with the known transcript. The confidence of error prediction for each class is a measure of the resolvability between the correct and incorrect histograms.

A0846.pdf

TOP

OOV Utterance Detection based on the Recognizer Response Function

Authors: Erica Bernstein and Ward R. Evans

The MITRE Corporation 1820 Dolly Madison Blvd. McLean, VA 22102 email: egb@mitre.org or wrevans@mitre.org

Volume 2 pages 823 - 826

ABSTRACT

This paper addresses the problem of out of vocabulary (OOV) utterance detection for spoken language systems in an open microphone environment. This problem is becoming crucial as use of spoken language systems grows beyond the research laboratory. In the past this problem has been addressed in the context of keyword spotting, e.g., for connected digits in a telephone environment and more recently in OOV word detection in a large vocabulary continuous speech recognition system. We develop a novel technique for designing a lexical garbage model that takes advantage of application specific knowledge and any potential bias in the recognizer. We do this through the formulation of a recognizer response function.

A0868.pdf

TOP

ESTIMATING CONFIDENCE USING WORD LATTICES

Authors: Thomas Kemp Thomas Schaaf

Interactive Systems Laboratories, ILKD University of Karlsruhe 76128 Karlsruhe, Germany

Volume 2 pages 827 - 830

ABSTRACT

For many practical applications of speech recognition systems, it is desirable to have an estimate of confidence for each hypothesized word, i.e. to have an estimate which words of the speech recognizer's output are likely to be correct and which are not reliable. Many oftoday's speech recognition systems use word lattices as a compact representation of a set of alternative hypothesis. We exploit the use of such word lattices as information sources for the measure-of-confidence tagger JANKA [1]. In experiments on spontaneous human-to-human speech data the use of word lattice related information significantly improves the tagging accuracy.

A1045.pdf

TOP

IMPROVED ESTIMATION, EVALUATION AND APPLICATIONS OF CONFIDENCE MEASURES FOR SPEECH RECOGNITION

Authors: Man-hung Siu, Herbert Gish, Fred Richardson

BBN Systems and Technologies 70 Fawcett St. Cambridge, MA 02138

Volume 2 pages 831 - 834

ABSTRACT

This paper describes our approach to the estimation of confidence in the words generated by a speech recognition system. We describe the models and the features employed for confidence estimation. In addition we discuss the characteristics of an information -theoretic metric for assessing the performance of the confidence measure. We provide a simple application of confidence measures in which we rank the performance of speakers.

A1144.pdf