Session Th2B Topic and Dialogue Dependent Language Modelling

Chairperson Frederic Jelinek Johns Hopkins Univ. Baltimore, MD, USA

Home

EXPERIMENTS IN ADAPTATION OF LANGUAGE MODELS FOR COMMERCIAL APPLICATIONS

Authors: Petra Witschel, Harald Höge

Siemens AG, Corporate Technology Otto-Hahn-Ring 6, 81739 Munich, Germany E-mail: Petra.Witschel@mchp.siemens.de

Volume 4 pages 1967 - 1970

ABSTRACT

To improve recognition accuracy for large vocabulary speech recognition systems we use language models based on linguistic classes (extended POS). In this paper an adaptation technique is presented, which profits from linguistic knowledge about unknown words of new domain. Switching from basis domain to new domain we keep the bigram probabilities of linguistic classes fixed and adapt only monograms of word probabilities. In our experiments we use three different corpora: financial columns of a newspaper corpus and two medical corpora (computer tomography and magnetic resonance). Adapted language models show an improvement of test- set perplexity of 48% to 51% compared to the case of putting unknown words into the language model ``unknown'' class.

A0045.pdf

TOP

Language Model Adaptation Using Dynamic Marginals

Authors: Reinhard Kneser (1) , Jochen Peters (2) , and Dietrich Klakow (2)

(1) Philips GmbH Speech Processing, Kackertstr.10, D-52072 Aachen, Germany kneser@acn.be.philips.com (2) Philips GmbH Forschungslaboratorien, Weisshausstr.2, D-52066 Aachen, Germany {peters | klakow}@pfa.research.philips.com

Volume 4 pages 1971 - 1974

ABSTRACT

A new method is presented to quickly adapt a given language model to local text characteristics. The basic approach is to choose the adaptive models as close as possible to the background estimates while constraining them to respect the locally estimated unigram probabilities. Several means are investigated to speed up the calculations. We measure both perplexity and word error rate to gauge the quality of our model.

A0093.pdf

TOP

TRANSFORMING OUT-OF-DOMAIN ESTIMATES TO IMPROVE IN-DOMAIN LANGUAGE MODELS

Authors: Rukmini Iyer, Mari Ostendorf

Electrical and Computer Engineering Department Boston University, Boston, MA 02215

Volume 4 pages 1975 - 1978

ABSTRACT

Standard statistical language modeling techniques suffer from sparse-data problems when applied to real tasks in speech recognition, where large amounts of domain-dependent text are not available. In this work, we introduce a modified representation of the standard word n-gram model using part-of-speech (POS) labels that compensates for word and POS usage differences across domains. Two different approaches are explored: (i) imposing an explicit transformation of the out-of- domain n-gram distributions before combining with an in-domain model, and (ii) POS smoothing of multi- domain n-gram components. Results are presented on a spontaneous speech recognition task (Switchboard), showing that the POS smoothing framework reduces word error rate and perplexity over a standard word n-gram model on in-domain data, with increased gains using multi-domain models.

A0401.pdf

TOP

MDI ADAPTATION OF LANGUAGE MODELS ACROSS CORPORA

Authors: P. S. Rao S. Dharanipragada S. Roukos

IBM Thomas J. Watson Research Center P. O. Box 218, Yorktown Heights, NY 10598

Volume 4 pages 1979 - 1982

ABSTRACT

The amount of text data available from a corpus for training language models is usually limited. Data from larger general or related corpora can be utilized to improve the performance of the language model on the corpus of interest. We explore one method of adapting a prior model from a large corpus to a smaller one of interest. Perplexity results of adapting a prior model constructed using the NAB corpus to the Switchboard and ATIS corpora are presented and compared with those of interpolated models.

A0878.pdf

TOP

A CLASS BASED APPROACH TO DOMAIN ADAPTATION AND CONSTRAINT INTEGRATION FOR EMPIRICAL M-GRAM MODELS

Authors: Klaus Ries

kries@ira.uka.de ries+@cs.cmu.edu Interactive Systems Laboratories University of Karlsruhe, Karlsruhe, Germany Carnegie Mellon University, Pittsburgh, PA, USA

Volume 4 pages 1983 - 1986

ABSTRACT

The first class based adaptation approaches [FGH + 97, Ueb97] take the use of classes in the construction of statistical m-gram models one significant step further than just using them as a smoothing technique: The m-gram of classes is trained on the large background corpus while the word likelihoods given the class are estimated on the small target corpus. To make full use of this technique a specialized clusteralgorithm has been developed [FGH + 97, Ueb97]. In this paper we extend class adaptation to make use of the m-gram distribution of the target domain. As a second independent contribution this paper introduces an efficient morphing algorithm, that tries to achieve adaptation by using a stochastic mapping of words between the vocabularies of the respective domains. As a result we can show, that for small adaptation steps class based adaptation is a very useful technique. For larger adaptation steps the perplexity of the modified model is greatly improved, yet no improvement over the unadapted model was observed when used in linear interpolation. Whether this is due to the fact that we use class based adaptation or that we do just modify the unigram distribution is still unresolved, although the new stochastic mapping technique might help to give an answer to this question in the future.

A0975.pdf

TOP

USING STORY TOPICS FOR LANGUAGE MODEL ADAPTATION

Authors: Kristie Seymore and Ronald Rosenfeld

School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 kseymore@cs.cmu.edu, roni@cs.cmu.edu

Volume 4 pages 1987 - 1990

ABSTRACT

The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes a piece of text, chooses the most similar topic clusters from a set of over 5000 elemental topics, and uses topic specific language models built from the topic clusters to rescore N-best lists. We are able to achieve a 15% reduction in perplexity and a small improvement in WER by using this adaptation. We also investigate the use of a topic tree, where the amount of training data for a specific topic can be judiciously increased in cases where the elemental topic cluster has too few word tokens to build a reliably smoothed and representative language model. Our system is able to fine-tune topic adaptation by interpolating models chosen from thousands of topics, allowing for adaptation to unique, previously unseen combinations of subjects.

A1001.pdf