Chair: Salim E. Roukos, IBM, USA
Jerome R Bellegarda, Apple Computer (U.S.A.)
A new framework is proposed to integrate the various constraints, both local and global, that are present in the language. Local constraints are captured via n-gram language modeling, while global constraints are taken into account through the use of latent semantic analysis. An integrative formulation is derived for the combination of these two paradigms, resulting in several families of multi-span language models for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the performance of the integrated language models, as measured by perplexity, compares favorably with the corresponding n-gram performance.
Stanley F Chen, Carnegie Mellon University (U.S.A.)
Kristie Seymore, Carnegie Mellon University (U.S.A.)
Ronald Rosenfeld, Carnegie Mellon University (U.S.A.)
In this paper, we present novel techniques for performing topic adaptation on an n-gram language model. Given training text labeled with topic information, we automatically identify the most relevant topics for new text. We adapt our language model toward these topics using an exponential model, by adjusting probabilities in our model to agree with those found in the topical subset of the training data. For efficiency, we do not normalize the model; that is, we do not require that the ""probabilities"" in the language model sum to 1. With these techniques, we were able to achieve a modest reduction in speech recognition word-error rate in the Broadcast News domain.
Adam L Buchsbaum, AT&T Labs (U.S.A.)
Raffaele Giancarlo, University of Palermo (Italy)
Jeffery R Westbrook, AT&T Labs (U.S.A.)
We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorithms then provide a general method to produce smaller but equivalent WFAs. We extend this approach by introducing the notion of approximate determinization. We provide an algorithm that, when applied to language models for the North American Business task, achieves 25-35% size reduction compared to previous techniques, with negligible effects on recognition time and accuracy.
Doug Beeferman, Carnegie Mellon University (U.S.A.)
Adam Berger, Carnegie Mellon University (U.S.A.)
John Lafferty, Carnegie Mellon University (U.S.A.)
This paper describes a lightweight method for the automatic insertion of intra-sentence punctuation into text. Despite the intuition that pauses in an acoustic stream are a positive indicator for some types of punctuation, this work will demonstrate the feasibility of a system which relies solely on lexical information. Besides its potential role in a speech recognition system, such a system could serve equally well in non-speech applications such as automatic grammar correction in a word processor and parsing of spoken text. After describing the design of a punctuation-restoration system, which relies on a trigram language model and a straightforward application of the Viterbi algorithm, we summarize results, both quantitative and subjective, of the performance and behavior of a prototype system.
Kristine W. Ma, GTE/BBN Technologies (U.S.A.)
George Zavaliagkos, GTE/BBN Technologies (U.S.A.)
Marie Meteer, GTE/BBN Technologies (U.S.A.)
According to discourse theories in linguistics, conversational utterances possess an informational structure that partitions each sentence into two portions: a ""given"" and ""new"". In this work, we explore this idea by building sub-sentence discourse language models for conversational speech recognition. The internal sentence structure is captured in statistical language modeling by training multiple n-gram models using the Expectation-Maximization algorithm on the Switchboard corpus. The resulting model contributes to a 30% reduction in language model perplexity and a small gain in word error rate.
Shoichi Matsunaga, NTT Human Interface Laboratories (Japan)
Shigeki Sagayama, NTT Human Interface Laboratories (Japan)
This paper proposes two-step generation of a variable-length class-based language model that integrates local and global constraints. In the first-step, an initial class set is recursively designed using local constraints. Word elements for each class are determined using Kullback divergence and total entropy. In the second step, the word classes are recursively and words are iteratively recreated, by grouping consecutive words to generate longer units and by splitting the initial classes into finer classes. These operations in the second step are carried out selectively, taking into account local and global constraints on the basis of a minimum entropy criterion. Experiments showed that the perplexity of the proposed initial class set is superior to that of the conventional part-of-speech class, and the perplexity of the variable-word-length model consequently becomes lower. Furthermore, this two-step model generation approach greatly reduces the training time.
Dietrich G Klakow, Philips GmbH Forschungslaboratorien (Germany)
It is questionable whether words are really the best basic units for the estimation of stochastic language models -grouping frequent word sequences to phrases can improve language models. More generally, we have investigated various coding schemes for a corpus. In this paper, they are applied to optimize the perplexity of n-gram language models. In tests on two large corpora (WSJ and BNA) the bigram perplexity was reduced by up to 29%. Furthermore, this approach allows to tackle the problem of an open vocabulary with no unknown word.
Adam Berger, Carnegie Mellon University (U.S.A.)
Robert C Miller, Carnegie Mellon University (U.S.A.)
Traditional approaches to language modelling have relied on a fixedcorpus of text to inform the parameters of a probability distributionover word sequences. Increasing the corpus size often leads tobetter-performing language models, but no matter how large, the corpusis a static entity, unable to reflect information about events whichpostdate it. In these pages we introduce an online paradigm whichinterleaves the estimation and application of a language model. Wepresent a Bayesian approach to online language modelling, in which themarginal probabilities of a static trigram model are dynamicallyupdated to match the topic being dictated to the system. We alsodescribe the architecture of a prototype we have implemented whichuses the World Wide Web (WWW) as a source of information, and theresults of some initial proof of concept experiments.