Language Modeling

Chair: Salim E. Roukos, IBM, USA

Home

Exploiting Both Local and Global Constraints for Multi-Span Statistical Language Modeling

Authors:

Jerome R Bellegarda, Apple Computer (U.S.A.)

Volume 2, Page 677, Paper number 1164

Abstract:

A new framework is proposed to integrate the various constraints, both local and global, that are present in the language. Local constraints are captured via n-gram language modeling, while global constraints are taken into account through the use of latent semantic analysis. An integrative formulation is derived for the combination of these two paradigms, resulting in several families of multi-span language models for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the performance of the integrated language models, as measured by perplexity, compares favorably with the corresponding n-gram performance.

ic981164.pdf (Scanned)

TOP

Topic Adaptation for Language Modeling Using Unnormalized Exponential Models

Authors:

Stanley F Chen, Carnegie Mellon University (U.S.A.)
Kristie Seymore, Carnegie Mellon University (U.S.A.)
Ronald Rosenfeld, Carnegie Mellon University (U.S.A.)

Volume 2, Page 681, Paper number 2357

Abstract:

In this paper, we present novel techniques for performing topic adaptation on an n-gram language model. Given training text labeled with topic information, we automatically identify the most relevant topics for new text. We adapt our language model toward these topics using an exponential model, by adjusting probabilities in our model to agree with those found in the topical subset of the training data. For efficiency, we do not normalize the model; that is, we do not require that the ""probabilities"" in the language model sum to 1. With these techniques, we were able to achieve a modest reduction in speech recognition word-error rate in the Broadcast News domain.

ic982357.pdf (From Postscript)

TOP

Shrinking Language Models by Robust Approximation

Authors:

Adam L Buchsbaum, AT&T Labs (U.S.A.)
Raffaele Giancarlo, University of Palermo (Italy)
Jeffery R Westbrook, AT&T Labs (U.S.A.)

Volume 2, Page 685, Paper number 1792

Abstract:

We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorithms then provide a general method to produce smaller but equivalent WFAs. We extend this approach by introducing the notion of approximate determinization. We provide an algorithm that, when applied to language models for the North American Business task, achieves 25-35% size reduction compared to previous techniques, with negligible effects on recognition time and accuracy.

ic981792.pdf (From Postscript)

TOP

Cyberpunc: A Lightweight Punctuation Annotation System for Speech

Authors:

Doug Beeferman, Carnegie Mellon University (U.S.A.)
Adam Berger, Carnegie Mellon University (U.S.A.)
John Lafferty, Carnegie Mellon University (U.S.A.)

Volume 2, Page 689, Paper number 2392

Abstract:

This paper describes a lightweight method for the automatic insertion of intra-sentence punctuation into text. Despite the intuition that pauses in an acoustic stream are a positive indicator for some types of punctuation, this work will demonstrate the feasibility of a system which relies solely on lexical information. Besides its potential role in a speech recognition system, such a system could serve equally well in non-speech applications such as automatic grammar correction in a word processor and parsing of spoken text. After describing the design of a punctuation-restoration system, which relies on a trigram language model and a straightforward application of the Viterbi algorithm, we summarize results, both quantitative and subjective, of the performance and behavior of a prototype system.

ic982392.pdf (From Postscript)

TOP

Sub-Sentence Discourse Models for Conversational Speech Recognition

Authors:

Kristine W. Ma, GTE/BBN Technologies (U.S.A.)
George Zavaliagkos, GTE/BBN Technologies (U.S.A.)
Marie Meteer, GTE/BBN Technologies (U.S.A.)

Volume 2, Page 693, Paper number 2020

Abstract:

According to discourse theories in linguistics, conversational utterances possess an informational structure that partitions each sentence into two portions: a ""given"" and ""new"". In this work, we explore this idea by building sub-sentence discourse language models for conversational speech recognition. The internal sentence structure is captured in statistical language modeling by training multiple n-gram models using the Expectation-Maximization algorithm on the Switchboard corpus. The resulting model contributes to a 30% reduction in language model perplexity and a small gain in word error rate.

ic982020.pdf (Scanned)

TOP

Two-Step Generation of Variable-Word-Length Language Model Integrating Local and Global Constraints

Authors:

Shoichi Matsunaga, NTT Human Interface Laboratories (Japan)
Shigeki Sagayama, NTT Human Interface Laboratories (Japan)

Volume 2, Page 697, Paper number 1940

Abstract:

This paper proposes two-step generation of a variable-length class-based language model that integrates local and global constraints. In the first-step, an initial class set is recursively designed using local constraints. Word elements for each class are determined using Kullback divergence and total entropy. In the second step, the word classes are recursively and words are iteratively recreated, by grouping consecutive words to generate longer units and by splitting the initial classes into finer classes. These operations in the second step are carried out selectively, taking into account local and global constraints on the basis of a minimum entropy criterion. Experiments showed that the perplexity of the proposed initial class set is superior to that of the conventional part-of-speech class, and the perplexity of the variable-word-length model consequently becomes lower. Furthermore, this two-step model generation approach greatly reduces the training time.

ic981940.pdf (From Postscript)

TOP

Language-Model Optimization by Mapping of Corpora

Authors:

Dietrich G Klakow, Philips GmbH Forschungslaboratorien (Germany)

Volume 2, Page 701, Paper number 2287

Abstract:

It is questionable whether words are really the best basic units for the estimation of stochastic language models -grouping frequent word sequences to phrases can improve language models. More generally, we have investigated various coding schemes for a corpus. In this paper, they are applied to optimize the perplexity of n-gram language models. In tests on two large corpora (WSJ and BNA) the bigram perplexity was reduced by up to 29%. Furthermore, this approach allows to tackle the problem of an open vocabulary with no unknown word.

ic982287.pdf (From Postscript)

TOP

Just-In-Time Language Modelling

Authors:

Adam Berger, Carnegie Mellon University (U.S.A.)
Robert C Miller, Carnegie Mellon University (U.S.A.)

Volume 2, Page 705, Paper number 2330

Abstract:

Traditional approaches to language modelling have relied on a fixedcorpus of text to inform the parameters of a probability distributionover word sequences. Increasing the corpus size often leads tobetter-performing language models, but no matter how large, the corpusis a static entity, unable to reflect information about events whichpostdate it. In these pages we introduce an online paradigm whichinterleaves the estimation and application of a language model. Wepresent a Bayesian approach to online language modelling, in which themarginal probabilities of a static trigram model are dynamicallyupdated to match the topic being dictated to the system. We alsodescribe the architecture of a prototype we have implemented whichuses the World Wide Web (WWW) as a source of information, and theresults of some initial proof of concept experiments.