ABSTRACT
In this paper our goal is to find the phonetic transcription of spoken utterances. We present a method which uses information extracted directly from the word-based search to compute the most likely phoneme sequence. Utterances are transcribed during recognition, so that the phonetic representation of the input is available after the search. Using this method, the computational cost of the word-based search remains almost unaltered, and the phonetic transcription is obtained almost for free.
ABSTRACT
We present improvements in confidence annotation of automatic speech recognizer output for large vocabulary, speaker- independent systems. Several strong additions to the set of predictor variables used for this purpose are discussed. Extensions which allow prediction of separate tvpes of errors, as opposed to the simple presence of an error, are presented. A new development, acoustic confidenceannotation, is explored, in which a predictor is built that indicates the likely successes and failures of the acoustic models alone. Four separate learning mechanisms are compared in terms of their ability to provide good confidence annotations from the same set of predictor variables. Performance figures are reported on both read news (the North American Business news corpus) and conversational telephone speech (the Switchboard corpus), both in American English. The Sphinx-II system [1] is used for the NAB tests. The Janus system [2J is used for the Switchboard tests.
ABSTRACT
This paper describes three experiments in using frame level observation probabilities as the basis for word confidence annotation in an HMM speech recognition system. One experiment is at the word level, one uses word classes, and the other uses phone classes. In each experiment we categorize hypotheses into correct and incorrect categories by aligning a best recognition hypothesis with the known transcript. The confidence of error prediction for each class is a measure of the resolvability between the correct and incorrect histograms.
ABSTRACT
This paper addresses the problem of out of vocabulary (OOV) utterance detection for spoken language systems in an open microphone environment. This problem is becoming crucial as use of spoken language systems grows beyond the research laboratory. In the past this problem has been addressed in the context of keyword spotting, e.g., for connected digits in a telephone environment and more recently in OOV word detection in a large vocabulary continuous speech recognition system. We develop a novel technique for designing a lexical garbage model that takes advantage of application specific knowledge and any potential bias in the recognizer. We do this through the formulation of a recognizer response function.
ABSTRACT
For many practical applications of speech recognition systems, it is desirable to have an estimate of confidence for each hypothesized word, i.e. to have an estimate which words of the speech recognizer's output are likely to be correct and which are not reliable. Many oftoday's speech recognition systems use word lattices as a compact representation of a set of alternative hypothesis. We exploit the use of such word lattices as information sources for the measure-of-confidence tagger JANKA [1]. In experiments on spontaneous human-to-human speech data the use of word lattice related information significantly improves the tagging accuracy.
ABSTRACT
This paper describes our approach to the estimation of confidence in the words generated by a speech recognition system. We describe the models and the features employed for confidence estimation. In addition we discuss the characteristics of an information -theoretic metric for assessing the performance of the confidence measure. We provide a simple application of confidence measures in which we rank the performance of speakers.