Session T2A Multilingual Recognition

Chairperson Richard Lippman MIT Lincoln Lab., USA

Home

YINHE: A MANDARIN CHINESE VERSION OF THE GALAXY SYSTEM

Authors: Chao Wang, James Glass, Helen Meng, Joe Polifroni, Stephanie Seneff, and Victor Zue

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA fwangc, jrg, hmmeng, joe, seneff, zueg@sls.lcs.mit.edu

Volume 1 pages 351 - 354

ABSTRACT

The galaxy system is a human-computer conversational system providing a spoken language interface for accessing on-line information. It was initially implemented for English in travel-related domains, including air travel, local city navigation, and weather. We began an effort to develop multilingual systems within the framework of galaxy several years ago. This paper describes our recent work on porting the system to Mandarin Chinese, including speech recognition, language understanding, and language generation components. Overall, the system produced reasonable responses nearly 70% of the time for spontaneous test data collected in a wizard environment.

A0586.pdf

TOP

MULTILINGUAL SPEECH RECOGNITION FOR FLEXIBLE VOCABULARIES

Authors: Bonaventura P. (1), Gallocchio F. (2) and Micca G. (3)

(10 CSELT Consultant, Turin, Italy (2) Dipartimento di Elettronica e Informatica, Università di Padova, Italy (3) CSELT, Turin, Italy patrizia.bonaventura@cselt.it, filippo@luna.cselt.it, giorgio.micca@cselt.it

Volume 1 pages 355 - 358

ABSTRACT

The paper addresses the problem of designing a speech recogniser for multilingual vocabularies. The goal of the research is twofold: future Interactive Voice Recognition (IVR) systems, like a speech activated flight information service, are likely to require multilinguality as a major feature; besides, a general language-independent phonetic inventory might be very useful in bootstrapping phonetic models for a new language for which insufficient training data are available. Metrics were introduced in order to measure cross-language phonetic dissimilarities, and a multilingual phonemic inventory was created. Experiments were run on a speech database including Italian (I), Spanish (S), English (E) and German (G) words. Results clearly show that it is possible to reduce the complexity of a multilingual phonetic recogniser by exploiting phonetic commonalities across different languages, without significant losses in WA for multilingual tasks with respect to single language recognition tasks.

A0705.pdf

TOP

A STUDY OF MULTILINGUAL SPEECH RECOGNITION

Authors: Fuliang Weng, Harry Bratt, Leonardo Neumeyer, and Andreas Stolcke

Speech Technology And Research Laboratory SRI International Menlo Park, California http://www.speech.sri.com

Volume 1 pages 359 - 362

ABSTRACT

This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian codebooks across Swedish and English allophones. The language model (LM) components are constructed by training a statistical bigram model, with a common backoff node, on bilingual texts, and by combining two monolingual LMs into a probabilistic finite state grammar. This system uses a single decoder for Swedish and English sentences, and is capable of recognizing sentences with words from both languages. Preliminary experiments show that sharing acoustic models across the two languages has not resulted in improved performance, while sharing a backoff node at the LM component provides flexibility and ease in recognizing bilingual sentences at the expense of a slight increase in word error rate in some cases. As a by-product, the bilingual decoder also achieves good performance on language identification (LID).

A0902.pdf

TOP

MULTILINGUAL SPEECH RECOGNITION: THE 1996 BYBLOS CALLHOME SYSTEM

Authors: Jayadev Billa (1) , (2) Kristine Ma (1) John W. McDonough (1) George Zavaliagkos (1) David R. Miller (1) Kenneth N. Ross (1) Amro El-Jaroudi (2)

(1) BBN Systems and Technologies, Cambridge MA 02138. USA (2) University of Pittsburgh, Pittsburgh PA 15261. USA

Volume 1 pages 363 - 366

ABSTRACT

This paper describes the 1996 Byblos Callhome speech recognition system for Spanish and Egyptian Colloquial Arabic. The system uses a combination of Phoneticly Tied-Mixture Gaussian HMMs and State-Clustered Tied-Mixture Gaussian HMMs in a multiple pass decoder. We focus here on the aspects of the system which are language specific and demonstrate the adaptability of the Byblos English system to new languages. Language related issues arising from both dialectal differences as well as differences between transcribed and spoken language are discussed. This system gave the lowest error rates in both Egyptian Colloquial Arabic and Spanish in the October 1996 NIST Callhome evaluation.

A0950.pdf

TOP

JAPANESE LVCSR ON THE SPONTANEOUS SCHEDULING TASK WITH JANUS-3

Authors: T. Schultz, D. Koll, and A. Waibel

Interactive Systems Laboratories University of Karlsruhe (Germany), Carnegie Mellon University (USA) ftanja,koll,waibelg@ira.uka.de

Volume 1 pages 367 - 370

ABSTRACT

This paper presents our findings during the development of the recognition engine for the Japanese part of the VERBMOBIL speech-to-speech translation project. We describe an eficient method to bootstrap a large vocabulary speech recognizer for spontaneously spoken Japanese speech from a German recognizer and show that the amount of effort in developing the system could be reduced by using this rapid cross language bootstrapping technique. The Japanese recognizer is integrated into the VERBMOBIL system and shows very promising results achiev- ing 9.3% word error rate.

A0997.pdf

TOP

FAST BOOTSTRAPPING OF LVCSR SYSTEMS WITH MULTILINGUAL PHONEME SETS

Authors: T. Schultz and A. Waibel

Interactive Systems Laboratories University of Karlsruhe (Germany), Carnegie Mellon University (USA) ftanja,waibelg@ira.uka.de

Volume 1 pages 371 - 374

ABSTRACT

In this paper we described an eficient method to bootstrap continuously spoken, large vocabulary speech recognition systems by multilingual phoneme sets. To evaluate this techniques we collected the multilingual database GlobalPhone which currently consists of 9 different languages. A multilingual recognizer (MULTI) based on the four languages German, English, Japanese and Spanish was developed to serve as a source system. Likewise this system is very useful for language identification and achieves 100% language identification rate. Based on the MULTI system we evaluated our bootstrap technique on such completely different languages as Chinese, Croatian, and Turkish.

A1401.pdf