Session ThAC Language Modelling

Chairperson Ronald Rosenfeld Carnegie Mellon Univ., USA

Home

CONSTRUCTION OF LANGUAGE MODELS USING THE MORPHIC GENERATOR GRAMMATICAL INFERENCE (MGGI) METHODOLOGY

Authors: E. Segarra* , L. Hurtado

Dept. Sistemas Informáticos y Computación Universidad Politécnica de Valencia (Spain) E-mail: esegarra,lhurtado@dsic.upv.es, Tel.: 34 6 3877738

Volume 5 pages 2695 - 2698

ABSTRACT

Over the last few years, some alternatives to N-gram language models, which are based on stochastic regular grammars, have been proposed. These grammars are estimated from data through Grammatical Inference algorithms. In particular, the Morphic Generator Grammatical Inference (MGGI) methodology has been applied to tasks of written natural language queries to databases. As for N-gram models, language models obtained through this methodology require the use of smoothing techniques. This work incorporates a version of the well-known Back-Off smoothing method to the MGGI language models to solve the estimation problem of unseen events in the training corpus, and shows the behaviour of the smoothed MGGI models in two tasks of written sentences. The results illustrate that the smoothed MGGI model works better than the standard smoothed bigram model.

Session ThAC Language Modelling

Chairperson Ronald Rosenfeld Carnegie Mellon Univ., USA

Authors: E. Segarra* , L. Hurtado

Dept. Sistemas Informáticos y Computación Universidad Politécnica de Valencia (Spain) E-mail: esegarra,lhurtado@dsic.upv.es, Tel.: 34 6 3877738

Volume 5 pages 2695 - 2698

Authors: Shuwu ZHANG , Taiyi HUANG

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, P.R.China Email: {zsw,huang}@prldec3.ia.ac.cn

Volume 5 pages 2699 - 2702

Authors: Ye-Yi Wang and Alex Waibel

Language Technology Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA Email: fyyw,waibelg@cs.cmu.edu

Volume 5 pages 2703 - 2706

Authors: (1) Philip Clarkson, (2) Ronald Rosenfeld

(1) prcl4@eng.cam.ac.uk Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK. (2) roni@cmu.edu School of Computer Science, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.

Volume 5 pages 2707 - 2710

Authors: Gilles Adda, Martine Adda-Decker, Jean-Luc Gauvain, Lori Lamel

Spoken Language Processing Group LIMSI-CNRS, BP 133, 91403 Orsay cedex, FRANCE fgadda,madda,gauvain,lamelg@limsi.fr http://www.limsi.fr/TLP

Volume 5 pages 2711 - 2714

Authors: G. Damnati and J. Simonin

France Telecom CNET DIH/RCP, 2 av. Pierre Marzin, 22307 Lannion Cedex, France. Tel.(33)2.96.05.13.88 / Fax.(33)2.96.05.35.30 e-mail: damnati@lannion.cnet.fr, simonin@lannion.cnet.fr

Volume 5 pages 2715 - 2718

Authors: Shoichi Matsunaga and Shigeki Sagayama

NTT Human Interface Labs., 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa 238 Japan. E-mail: mat@nttspch.hil.ntt.co.jp

Volume 5 pages 2719 - 2722

Authors: K. Smaili, I. Zitouni, F. Charpillet and J-P. Haton

CRIN-CNRS/INRIA Lorraine BP 239 54506 Vandoeuvre Lès-Nancy France E-mail: {smaili, zitouni, charp, jph}@loria.fr Tel: (33) 03-83-59-20-83 Fax: (33) 03-27-83-29

Volume 5 pages 2723 - 2726

Authors: L. Pousse and G. Pérennou

IRIT University Paul Sabatier 118 Route de Narbonne, 31062 TOULOUSE, France. Tel. 33 561 55 61 73, FAX: 33 561 55 62 58, E-mail: pousse@irit.fr, perennou@irit.fr

Volume 5 pages 2727 - 2730

Authors: Ernst Gunter Schukat-Talamazzini (l), Florian Gallwitz (2) , ,Stefan Harbeck (2), Volker Warnke (2)

(l) Institute for Computer Science University of Jena, Germany Ernst-Abbe-Platz 1-4 D-07740 Jena, Germany e-mail: schukatQinformatik.uni-jena.de (2) Chair for Pattern Recognition University of Erlangen-Nuremberg Martensstrasse 3 D-91058 Erlangen, Germany e-mail: {name}@informatik.uni-erlangen.de

Volume 5 pages 2731 - 2734

Authors: Akinori Ito, Hideyuki Saitoh, Masaharu Katoh and Masaki Kohda

Faculty of Engineering, Yamagata University Jonan 4-3-16, Yonezawa, Yamagata 992 Japan TEL&FAX +81 238 26 3369 Email: aito@ei5sun.yz.yamagata-u.ac.jp

Volume 5 pages 2735 - 2738

Authors: Manhung Siu* and Mari Ostendorf

Boston University, 730 Commonwealth Ave, Boston, MA 02215 *Currently working for BBN Inc.

Volume 5 pages 2739 - 2742

Authors: P. Geutner

pgeutner@ira.uka.de Interactive Systems Laboratories Department of Computer Science, University of Karlsruhe, 76128 Karlsruhe, Germany

Volume 5 pages 2743 - 2746

Authors: Akito Nagai and Yasushi Ishikawa

Human Media Technology Dept. Information Technology R&D Center MITSUBISHI Electric Corporation 5--1--1, Ofuna, Kamakura, Kanagawa 247, Japan Tel. +81 467 41 2077, FAX: +81 467 41 2136, E-mail: nagai@media.isl.melco.co.jp

Volume 5 pages 2747 - 2750

Authors: Fabio Brugnara and Marcello Federico

IRST - Istituto per la Ricerca Scienti ca e Tecnologica I-38050 Povo, Trento, Italy. fbrugnara,federicog@irst.itc.it

Volume 5 pages 2751 - 2754

Authors: George Demetriou, Eric Atwell & Clive Souter

Centre for Computer Analysis of Language And Speech (CCALAS) & Artificial Intelligence Division, School of Computer Studies University of Leeds, Leeds LS2 9JT, UK Tel. +44 113 233 6827, FAX: +44 113 233 5468, e-mail: george@scs.leeds.ac.uk

Volume 5 pages 2755 - 2758

Authors: Hajime Tsukada, Hirofumi Yamamoto, Yoshinori Sagisaka

ATR Interpreting Telecommunications Research Laboratories Tel: +81 774 95 1374, Fax: +81 774 95 1308, E-mail: tsukada@itl.atr.co.jp

Volume 5 pages 2759 - 2762

Authors: Paul Taylor, Simon King, Stephen Isard, Helen Wright and Jacqueline Kowtko

Centre for Speech Technology Research, University of Edinburgh, 80, South Bridge, Edinburgh, U.K. EH1 1HN http://www.cstr.ed.ac.uk email: pault, simonk, stepheni, helen, kowtko @cstr.ed.ac.uk

Volume 5 pages 2763 - 2766

Authors: Peter A. Heeman James F. Allen

France Télécom CNET Technopole Anticipa - 2 Avenue Pierre Marzin 22301 Lannion Cedex, France. heeman@lannion.cnet.fr Department of Computer Science University of Rochester Rochester NY 14627, USA james@cs.rochester.edu

Volume 5 pages 2767 - 2770

Authors: C. Uhrik W. Ward

Berdy Medical Systems 4909 Pearl East Circle, Suite 202 Boulder, Colorado, USA 80301 Tel. 303-417-1603, FAX 303-417-1662, E-mail: uhrik@berdy.com Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA Tel. 303-442-8807, FAX 303-417-1662, E-mail: whw@cs.cmu.edu

Volume 5 pages 2771 - 2774

Authors: Ciprian Chelba (1) David Engle (2) Frederick Jelinek (1) Victor Jimenez (3) Sanjeev Khudanpur (1) Lidia Mangu (1) Harry Printz (4) Eric Ristad (5) Ronald Rosenfeld (6) Andreas Stolcke (7) Dekai Wu (8)

Volume 5 pages 2775 - 2778

Authors: Andreas Stolcke

Speech Technology and Research Laboratory SRI International, Menlo Park, CA, U.S.A. http://www.speech.sri.com/ stolcke@speech.sri.com

Volume 5 pages 2779 - 2782

Authors: P.E.Kenne and Mary O'Kane

The University of Adelaide South Australia 5005 Australia Tel. +61 8 83033282 FAX:+61 8 83034417 E-mail: pek@dvcr.adelaide.edu.au

Volume 5 pages 2783 - 2786

Authors: Thorsten Brants

Universitat des Saarlandes Computational Linguistics D-66041 Saarbrucken, Germany thorsten@coli.uni-sb.de

Volume 5 pages 2787 - 2790

IRST - Istituto per la Ricerca Scientica e Tecnologica I-38050 Povo, Trento, Italy. fbrugnara,federicog@irst.itc.it