ABSTRACT
The galaxy system is a human-computer conversational system providing a spoken language interface for accessing on-line information. It was initially implemented for English in travel-related domains, including air travel, local city navigation, and weather. We began an effort to develop multilingual systems within the framework of galaxy several years ago. This paper describes our recent work on porting the system to Mandarin Chinese, including speech recognition, language understanding, and language generation components. Overall, the system produced reasonable responses nearly 70% of the time for spontaneous test data collected in a wizard environment.
ABSTRACT
The paper addresses the problem of designing a speech recogniser for multilingual vocabularies. The goal of the research is twofold: future Interactive Voice Recognition (IVR) systems, like a speech activated flight information service, are likely to require multilinguality as a major feature; besides, a general language-independent phonetic inventory might be very useful in bootstrapping phonetic models for a new language for which insufficient training data are available. Metrics were introduced in order to measure cross-language phonetic dissimilarities, and a multilingual phonemic inventory was created. Experiments were run on a speech database including Italian (I), Spanish (S), English (E) and German (G) words. Results clearly show that it is possible to reduce the complexity of a multilingual phonetic recogniser by exploiting phonetic commonalities across different languages, without significant losses in WA for multilingual tasks with respect to single language recognition tasks.
ABSTRACT
This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian codebooks across Swedish and English allophones. The language model (LM) components are constructed by training a statistical bigram model, with a common backoff node, on bilingual texts, and by combining two monolingual LMs into a probabilistic finite state grammar. This system uses a single decoder for Swedish and English sentences, and is capable of recognizing sentences with words from both languages. Preliminary experiments show that sharing acoustic models across the two languages has not resulted in improved performance, while sharing a backoff node at the LM component provides flexibility and ease in recognizing bilingual sentences at the expense of a slight increase in word error rate in some cases. As a by-product, the bilingual decoder also achieves good performance on language identification (LID).
ABSTRACT
This paper describes the 1996 Byblos Callhome speech recognition system for Spanish and Egyptian Colloquial Arabic. The system uses a combination of Phoneticly Tied-Mixture Gaussian HMMs and State-Clustered Tied-Mixture Gaussian HMMs in a multiple pass decoder. We focus here on the aspects of the system which are language specific and demonstrate the adaptability of the Byblos English system to new languages. Language related issues arising from both dialectal differences as well as differences between transcribed and spoken language are discussed. This system gave the lowest error rates in both Egyptian Colloquial Arabic and Spanish in the October 1996 NIST Callhome evaluation.
ABSTRACT
This paper presents our findings during the development of the recognition engine for the Japanese part of the VERBMOBIL speech-to-speech translation project. We describe an eficient method to bootstrap a large vocabulary speech recognizer for spontaneously spoken Japanese speech from a German recognizer and show that the amount of effort in developing the system could be reduced by using this rapid cross language bootstrapping technique. The Japanese recognizer is integrated into the VERBMOBIL system and shows very promising results achiev- ing 9.3% word error rate.
ABSTRACT
In this paper we described an eficient method to bootstrap continuously spoken, large vocabulary speech recognition systems by multilingual phoneme sets. To evaluate this techniques we collected the multilingual database GlobalPhone which currently consists of 9 different languages. A multilingual recognizer (MULTI) based on the four languages German, English, Japanese and Spanish was developed to serve as a source system. Likewise this system is very useful for language identification and achieves 100% language identification rate. Based on the MULTI system we evaluated our bootstrap technique on such completely different languages as Chinese, Croatian, and Turkish.