Session ThAC Language Modelling

Chairperson Ronald Rosenfeld Carnegie Mellon Univ., USA

Home


CONSTRUCTION OF LANGUAGE MODELS USING THE MORPHIC GENERATOR GRAMMATICAL INFERENCE (MGGI) METHODOLOGY

Authors: E. Segarra* , L. Hurtado

Dept. Sistemas Informáticos y Computación Universidad Politécnica de Valencia (Spain) E-mail: esegarra,lhurtado@dsic.upv.es, Tel.: 34 6 3877738

Volume 5 pages 2695 - 2698

ABSTRACT

Over the last few years, some alternatives to N-gram language models, which are based on stochastic regular grammars, have been proposed. These grammars are estimated from data through Grammatical Inference algorithms. In particular, the Morphic Generator Grammatical Inference (MGGI) methodology has been applied to tasks of written natural language queries to databases. As for N-gram models, language models obtained through this methodology require the use of smoothing techniques. This work incorporates a version of the well-known Back-Off smoothing method to the MGGI language models to solve the estimation problem of unseen events in the training corpus, and shows the behaviour of the smoothed MGGI models in two tasks of written sentences. The results illustrate that the smoothed MGGI model works better than the standard smoothed bigram model.

A0052.pdf

TOP


An Integrated Language Modeling with n-gram model and WA model for Speech Recognition*

Authors: Shuwu ZHANG , Taiyi HUANG

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, P.R.China Email: {zsw,huang}@prldec3.ia.ac.cn

Volume 5 pages 2699 - 2702

ABSTRACT

As to traditional n-gram model, smaller n value is an inherent defect for estimating language probabilities in speech recognition, simply because that estimation could not be executed over farther word association but by means of short sequential word correlated information. This has an strong effect on the performance of speech recognition. This paper introduces an integrated language modeling with n-gram model and word association model (abbreviated as WA model). This model integrated two kind of joint probabilities, traditional n-gram probability and word association probability, to estimate actual output probability. WA model are based on a combined probability estimation of orderly word association without distant and strict sequential limitation. In addition, two kinds of local linguistic constraints have also been incorporated into n-gram estimation for smoothing date sparse and adjusting special language unit score locally. A substantial improvement for the performance of Chinese phonetic-to-text transcription in speech recognition has been obtained.

A0057.pdf

TOP


STATISTICAL ANALYSIS OF DIALOGUE STRUCTURE

Authors: Ye-Yi Wang and Alex Waibel

Language Technology Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA Email: fyyw,waibelg@cs.cmu.edu

Volume 5 pages 2703 - 2706

ABSTRACT

We introduce a statistical model for dialogues. We describe a dynamic programming algorithm that can be used to bracket a dialogue into segments and label each segment with its speech act. We evaluate the performance of the model. We also use this model for language modelling and get perplexity reduction.

A0068.pdf

TOP


STATISTICAL LANGUAGE MODELING USING THE CMU-CAMBRIDGE TOOLKIT

Authors: (1) Philip Clarkson, (2) Ronald Rosenfeld

(1) prcl4@eng.cam.ac.uk Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK. (2) roni@cmu.edu School of Computer Science, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.

Volume 5 pages 2707 - 2710

ABSTRACT

The CMU Statistical Language Modeling toolkit was released in 1994 in order to facilitate the construction and testing of bigram and trigram language models. It is currently in use in over 40 academic, government and industrial laboratories in over 12 countries. This paper presents a new version of the toolkit. We outline the conventional language modeling technology, as implemented in the toolkit, and describe the extra efficiency and functionality that the new toolkit provides as compared tï previous software for this task. Finally, we give an example of the use of the toolkit in constructing and testing a simple language model.

A0124.pdf

TOP


TEXT NORMALIZATION AND SPEECH RECOGNITION IN FRENCH

Authors: Gilles Adda, Martine Adda-Decker, Jean-Luc Gauvain, Lori Lamel

Spoken Language Processing Group LIMSI-CNRS, BP 133, 91403 Orsay cedex, FRANCE fgadda,madda,gauvain,lamelg@limsi.fr http://www.limsi.fr/TLP

Volume 5 pages 2711 - 2714

ABSTRACT

In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure different lexical coverages and language model perplexities, both of which are closely related to the speech recognition accuracies obtained on read news-paper texts. Different text normalizations of up to 185M words of newspaper texts are presented along with corresponding lexical coverage and perplexity measures. Some normalizations were found to be necessary to achieve good lexical coverage, while others were more or less equivalent in this regard. The choice of normalization to create language models for use in the recognition experiments with read newspaper texts was based on these findings. Our best system configuration obtained a 11.2% word error rate in the AUPELF 'French-speaking' speech recognizer evaluation test held in February 1997.

A0153.pdf

TOP


A Novel Tree-Based Clustering Algorithm for Statistical Language Modeling

Authors: G. Damnati and J. Simonin

France Telecom CNET DIH/RCP, 2 av. Pierre Marzin, 22307 Lannion Cedex, France. Tel.(33)2.96.05.13.88 / Fax.(33)2.96.05.35.30 e-mail: damnati@lannion.cnet.fr, simonin@lannion.cnet.fr

Volume 5 pages 2715 - 2718

ABSTRACT

In this paper, a new method to cluster words into classes is proposed in order to define a statistical language model. The purpose of this algorithm is to decrease the computational cost of the clustering task while not degrading speech recognition performance. The algorithm provides a bottom-up hierarchical clustering using the reciprocal neighbours method. This technique consists in merging several pairs of classes within a single iteration. Experiments on a spontaneous speech corpus are presented. Results are given both in terms of perplexity and word recognition error rate. We obtain a large reduction in the number of iterations necessary to build a classification tree and thus a CPU time reduction in building the model as well as a reduction in both perplexity and word error rate.

A0181.pdf

TOP


VARIABLE-LENGTH LANGUAGE MODELING INTEGRATING GLOBAL CONSTRAINTS

Authors: Shoichi Matsunaga and Shigeki Sagayama

NTT Human Interface Labs., 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa 238 Japan. E-mail: mat@nttspch.hil.ntt.co.jp

Volume 5 pages 2719 - 2722

ABSTRACT

This paper proposes a novel variable-length class- based language model that integrates local and global constraints. In this model, the classes are iteratively recreated by grouping consecutive words and by splitting initial part-of speech (POS) clusters into finer clusters (word-classes). The main characteristic of this modeling is that these operations of grouping and splitting is carried out selectively, taking into account global constraints between noncontiguous words on the basis of a minimum entropy criterion. To capture the global constraints, the model takes into account the sequences of the function words and of the content words, which are expected to respectively represent the syntactic and semantic relationships between words. Experiments showed that the perplexity of the proposed model for the test corpus is lower than that of conventional models and that this model requires a small number of statistical parameters, showing the model's effectiveness.

A0194.pdf

TOP


An Hybrid Language Model For a Continuous Dictation Prototype

Authors: K. Smaili, I. Zitouni, F. Charpillet and J-P. Haton

CRIN-CNRS/INRIA Lorraine BP 239 54506 Vandoeuvre Lès-Nancy France E-mail: {smaili, zitouni, charp, jph}@loria.fr Tel: (33) 03-83-59-20-83 Fax: (33) 03-27-83-29

Volume 5 pages 2723 - 2726

ABSTRACT

This paper describes the combination of a stochastic language model and a formal grammar modelled such as a unification grammar. The stochastic model is trained over 42 million words extracted from Le monde newspaper. The stochastic model is based on smoothed 3-gram and 3-class. The 3-class model is represented by a Markov chain made up of four states. Several experiments have been done to state which values are the best for specific training and test corpus. Experiments indicate that the unification grammar reduce strongly the number of hypothesis (sentences) produced by the stochastic model.

A0227.pdf

TOP


DEALING WITH PRONUNCIATION VARIANTS AT THE LANGUAGE MODEL LEVEL FOR THE CONTINUOUS AUTOMATIC SPEECH RECOGNITION OF FRENCH

Authors: L. Pousse and G. Pérennou

IRIT University Paul Sabatier 118 Route de Narbonne, 31062 TOULOUSE, France. Tel. 33 561 55 61 73, FAX: 33 561 55 62 58, E-mail: pousse@irit.fr, perennou@irit.fr

Volume 5 pages 2727 - 2730

ABSTRACT

In this paper, we describe three approaches of continuous speech recognition. Two of them (referred to as (W,P) and (W',P) models) take into account pronunciation variants of words. They allow to handle (very common) phonological french phenomena like liaisons or mute-e elision. The (W',P) model introduces the phonotypical level as defined in the MHAT Model [4,5]. Comparing (W,P) and (W',P) models show a significant improvement in recognition accuracy when a contextual language model is introduced at this phonotypical level.

A0271.pdf

TOP


ABSTRACT

RATIONAL INTERPOLATION OF MAXIMUM LIKELIHOOD PREDICTORS IN STOCHASTIC LANGUAGE MODELING

Authors: Ernst Gunter Schukat-Talamazzini (l), Florian Gallwitz (2) , ,Stefan Harbeck (2), Volker Warnke (2)

(l) Institute for Computer Science University of Jena, Germany Ernst-Abbe-Platz 1-4 D-07740 Jena, Germany e-mail: schukatQinformatik.uni-jena.de (2) Chair for Pattern Recognition University of Erlangen-Nuremberg Martensstrasse 3 D-91058 Erlangen, Germany e-mail: {name}@informatik.uni-erlangen.de

Volume 5 pages 2731 - 2734

ABSTRACT

In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional linea,r interpolation scheme. The superiority of rational interpolation is substantiated by experimental results from language modeling, speech recognition, dialog act classiflcation, and language identiflcation.

A0298.pdf

TOP


N-gram language model adaptation using small corpus for spoken dialog recognition

Authors: Akinori Ito, Hideyuki Saitoh, Masaharu Katoh and Masaki Kohda

Faculty of Engineering, Yamagata University Jonan 4-3-16, Yonezawa, Yamagata 992 Japan TEL&FAX +81 238 26 3369 Email: aito@ei5sun.yz.yamagata-u.ac.jp

Volume 5 pages 2735 - 2738

ABSTRACT

This paper describes an N-gram language model adaptation technique. As an N-gram model requires a large size sample corpus for probability estimation, it is difficult to utilize N-gram model for a specific small task. In this paper, N-gram task adaptation is proposed using large corpus of the general task (TI text) and small corpus of the specific task (AD text). A simple weighting is employed to mix TI and AD text. In addition to mix two texts, the effect of vocabulary is also investigated. The experimental results show that adapted N-gram model with proper vocabulary size has significantly lower perplexity than the task independent models.

A0309.pdf

TOP


VARIABLE N-GRAM LANGUAGE MODELING AND EXTENSIONS FOR CONVERSATIONAL SPEECH

Authors: Manhung Siu* and Mari Ostendorf

Boston University, 730 Commonwealth Ave, Boston, MA 02215 *Currently working for BBN Inc.

Volume 5 pages 2739 - 2742

ABSTRACT

Recent progress in variable n-gram language modeling provides an ecient representation of n-gram models and makes training of higher order n-grams possible. In this paper, we apply the variable n-gram design algorithm to conversational speech, extending the algorithm to learn skips and classes in context to handle conversational speech characteristics such as repetitions and dis uency markers. We show that using the extended variable n-gram, we can build a language model that uses fewer parameters for longer context and improves the test perplexity and recognition accuracy.

A0402.pdf

TOP


FUZZY CLASS RESCORING: A PART-OF-SPEECH LANGUAGE MODEL

Authors: P. Geutner

pgeutner@ira.uka.de Interactive Systems Laboratories Department of Computer Science, University of Karlsruhe, 76128 Karlsruhe, Germany

Volume 5 pages 2743 - 2746

ABSTRACT

Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how tointerpolate with a Part-of-Speech (POS) tag-based language model as example of a class-based model, where a word can be member of many different classes. Here the actual class membership of a word in the lattice becomes a hidden event of the A-algorithm used for rescoring. A forward type of algorithm is defined as extension of the lattice rescorer to handle these hidden events in a mathematically sound fashion. Applying the mixture of viterbi and forward kind of rescoring procedure to the German Spontaneous Scheduling Task (GSST) yields some improvement inword accuracy. Above all, the rescoring procedure enables usage of any fuzzy/stochastic class de nition for recognition units that might be determined through automatic clustering algorithms in the future.

A0605.pdf

TOP


SPEECH UNDERSTANDING BASED ON INTEGRATING CONCEPTS BY CONCEPTUAL DEPENDENCY

Authors: Akito Nagai and Yasushi Ishikawa

Human Media Technology Dept. Information Technology R&D Center MITSUBISHI Electric Corporation 5--1--1, Ofuna, Kamakura, Kanagawa 247, Japan Tel. +81 467 41 2077, FAX: +81 467 41 2136, E-mail: nagai@media.isl.melco.co.jp

Volume 5 pages 2747 - 2750

ABSTRACT

We have proposed a concept-driven semantic interpretation method for a spoken dialogue system that robustly understands various expressions uttered by a naive user. The method is now being improved for practical application. Domain knowledge is important for this improvement. The system must also have portability. This paper discusses the generalization of the semantic interpretation method, and proposes a method that integrates concepts using general linguistic knowledge of conceptual dependency. Speech understanding for various utterances about Kamakura sightseeing with a 1000-word vocabulary was empirically evaluated. The results show that this method can achieve a satisfactory understanding rate.

A0682.pdf

TOP


DYNAMIC LANGUAGE MODELS FOR INTERACTIVE SPEECH APPLICATIONS

Authors: Fabio Brugnara and Marcello Federico

IRST - Istituto per la Ricerca Scienti ca e Tecnologica I-38050 Povo, Trento, Italy. fbrugnara,federicog@irst.itc.it

Volume 5 pages 2751 - 2754

ABSTRACT

This work proposes the use of hierarchical LMs as an effective method both for efficiently dealing with context- dependent LMs in a dialogue system and for increasing the robustness of LM estimation and adaptation. Starting from basic LMs that express elementary semantic units, concepts, or data-types, sentence level LMs are recursively built. The resulting LMs may be a combination of grammars, word classes, and statistical LMs. Moreover, these LMs can be efficiently compiled into probabilistic recursive transition networks. A speech decoding algorithm directly exploits the recursive representation and produces the most probable parse tree matching the speech signal. The proposed approach has been implemented for a data-entry task which covers structured data, e.g. numbers, dates, and proper names, as well as free texts. In this task, the active LMmust continuously change according to the current status, the active form, and the data entered so far. Finally, while the hierarchical approach results very convenient to cope with this task, it also looks very general and can give advantages in other applications, e.g. dictation.

A0715.pdf

TOP


LARGE-SCALE LEXICAL SEMANTICS FOR SPEECH RECOGNITION SUPPORT

Authors: George Demetriou, Eric Atwell & Clive Souter

Centre for Computer Analysis of Language And Speech (CCALAS) & Artificial Intelligence Division, School of Computer Studies University of Leeds, Leeds LS2 9JT, UK Tel. +44 113 233 6827, FAX: +44 113 233 5468, e-mail: george@scs.leeds.ac.uk

Volume 5 pages 2755 - 2758

ABSTRACT

This paper presents a study on the use of wide-coverage semantic knowledge for large vocabulary (theoretically unrestricted) domain-independent speech recognition. A machine readable dictionary was used to provide the semantic information about the words and a semantic model was developed based on the conceptual association between words as computed directly from the textual representations of their meanings. The findings of our research suggest that the model is capable of capturing phenomena of semantic associativity or connectivity between words in texts and considerably reducing the semantic ambiguity in natural language. The model can cover both short and long-distance semantic relationships between words and has shown signs of robustness across various text genres. Experiments with simulated speech recognition hypotheses indicate that the model can efficiently be used to reduce the word error rates when applied to word lattices or N-best sentence hypotheses.

A0739.pdf

TOP


INTEGRATION OF GRAMMAR AND STATISTICAL LANGUAGE CONSTRAINTS FOR PARTIAL WORD-SEQUENCE RECOGNITION

Authors: Hajime Tsukada, Hirofumi Yamamoto, Yoshinori Sagisaka

ATR Interpreting Telecommunications Research Laboratories Tel: +81 774 95 1374, Fax: +81 774 95 1308, E-mail: tsukada@itl.atr.co.jp

Volume 5 pages 2759 - 2762

ABSTRACT

This paper proposes a novel spontaneous speech recognition approach to obtain not a whole utterance but reliably recognized partial segments of an utterance to achieve robust speech understanding. Our method obtains reliably recognized partial segments of an utterance by using both grammatical and n-gram based statistical language constraints cooperatively, and uses a robust parsing technique to apply the grammatical constraints. Through an experiment, it has been confirmed that the proposed method can recognize partial segments of an utterance with a higher reliability than conventional continuous speech recognition methods using an n-gram based statistical language model.

A0758.pdf

TOP


USING INTONATION TO CONSTRAIN LANGUAGE MODELS IN SPEECH RECOGNITION

Authors: Paul Taylor, Simon King, Stephen Isard, Helen Wright and Jacqueline Kowtko

Centre for Speech Technology Research, University of Edinburgh, 80, South Bridge, Edinburgh, U.K. EH1 1HN http://www.cstr.ed.ac.uk email: pault, simonk, stepheni, helen, kowtko @cstr.ed.ac.uk

Volume 5 pages 2763 - 2766

ABSTRACT

This paper describes a method for using intonation to reduce word error rate in a speech recognition system designed to recognise spontaneous dialogue speech. We use a form of dialogue analysis based on the theory of conversational games. Different move types under this analysis conform to different language models. Different move types are also characterised by different into-national tunes. Our overall recognition strategy is first to predict from intonation the type of game move that a test utterance represents, and then to use a bigram language model for that type of move during recognition.

A0791.pdf

TOP


INCORPORATING POS TAGGING INTO LANGUAGE MODELING

Authors: Peter A. Heeman James F. Allen

France Télécom CNET Technopole Anticipa - 2 Avenue Pierre Marzin 22301 Lannion Cedex, France. heeman@lannion.cnet.fr Department of Computer Science University of Rochester Rochester NY 14627, USA james@cs.rochester.edu

Volume 5 pages 2767 - 2770

ABSTRACT

Language models for speech recognition tend to concentrate solely on recognizing the words that were spoken. In this paper, we redefine the speech recognition problem so that its goal is to find both the best sequence of words and their syntactic role (part-of-speech) in the utterance. This is a necessary first step towards tightening the interaction between speech recognition and natural language understanding.

A0803.pdf

TOP


CONFIDENCE METRICS BASED ON N-GRAM LANGUAGE MODEL BACKOFF BEHAVIORS

Authors: C. Uhrik W. Ward

Berdy Medical Systems 4909 Pearl East Circle, Suite 202 Boulder, Colorado, USA 80301 Tel. 303-417-1603, FAX 303-417-1662, E-mail: uhrik@berdy.com Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA Tel. 303-442-8807, FAX 303-417-1662, E-mail: whw@cs.cmu.edu

Volume 5 pages 2771 - 2774

ABSTRACT

We report results from using language model confidence measures based on the degree of backoff used in a trigram language model. Both utterance-level and word-level confidence metrics proved useful for a dialog manager to identify out-of-domain utterances. The metric assigns successively lower confidence as the language model estimate is backed off to a bigram or unigram. It also bases its estimates on sequences of backoff degree. Experimental results with utterances from the domain of medical records management showed that the distributions of the confidence metric for in-domain and out-of-domain utterances are separated. Use of the corresponding word-level confidence metric shows similar encouraging results.

A0898.pdf

TOP


STRUCTURE AND PERFORMANCE OF A DEPENDENCY LANGUAGE MODEL

Authors: Ciprian Chelba (1) David Engle (2) Frederick Jelinek (1) Victor Jimenez (3) Sanjeev Khudanpur (1) Lidia Mangu (1) Harry Printz (4) Eric Ristad (5) Ronald Rosenfeld (6) Andreas Stolcke (7) Dekai Wu (8)

(1) Johns Hopkins University Baltimore, MD (2) Department of Defense Fort Meade, MD (3) U Politecnica de Valencia Valencia, Spain (4) IBM Watson Research Center Yorktown Heights, NY (5) Princeton University Princeton, NJ (6) Carnegie Mellon Pittsburgh, PA (7) SRI International Menlo Park, CA (8) Hong Kong Tech University Hong Kong

Volume 5 pages 2775 - 2778

ABSTRACT

We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar. Such a grammar expresses the relations between words by a directed graph. Because the edges of this graph may connect words that are arbitrarily far apart in a sentence, this technique can incorporate the predictive power of words that lie outside of bigram or trigram range. We have built several simple dependency models, as we call them, and tested them in a speech recognition experiment. We report experimental results for these models here, including one that has a small but statistically significant advantage (p<.02) over a bigram language model.

A0901.pdf

TOP


MODELING LINGUISTIC SEGMENT AND TURN BOUNDARIES FOR N-BEST RESCORING OF SPONTANEOUS SPEECH

Authors: Andreas Stolcke

Speech Technology and Research Laboratory SRI International, Menlo Park, CA, U.S.A. http://www.speech.sri.com/ stolcke@speech.sri.com

Volume 5 pages 2779 - 2782

ABSTRACT

Language modeling, especially for spontaneous speech, often suffers from a mismatch of utterance segmentations between training and test conditions. In particular, training often uses linguistically-based segments, whereas testing occurs on acoustically determined segments, resulting in degraded performance. We present an N-best rescoring algorithm that removes the effect of segmentation mismatch. Furthermore, we show that explicit language modeling of hidden linguistic segment boundaries is improved by including turn-boundary events in the model.

A0924.pdf

TOP


HYBRID LANGUAGE MODELS: IS SIMPLER BETTER?

Authors: P.E.Kenne and Mary O'Kane

The University of Adelaide South Australia 5005 Australia Tel. +61 8 83033282 FAX:+61 8 83034417 E-mail: pek@dvcr.adelaide.edu.au

Volume 5 pages 2783 - 2786

ABSTRACT

The use of several n-gram and hybrid language models with and without cache is examined in the context of producing court transcripts. Language models with cache (in which words which have recently been uttered are preferred) have seen considerable use. The suitability of cache models (with fixed size cache) in the production of court transcripts is not clear. A decrease in perplexity and an improvement in the word error rate is observed with some of the models when using a cache, however, performance deteriorates with increasing cache size.

A0940.pdf

TOP


Internal and External Tagsets in Part-of-Speech Tagging

Authors: Thorsten Brants

Universitat des Saarlandes Computational Linguistics D-66041 Saarbrucken, Germany thorsten@coli.uni-sb.de

Volume 5 pages 2787 - 2790

ABSTRACT

We present an approach to statistical part- of-speech tagging that uses two different tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modi ed and optimized to increase tagging accuracy (with respect to the external tagset). We evaluate this approach inan experiment and show that it performs significantly better than approaches using only one tagset.

A0977.pdf

TOP