Speech-to-Speech Translation

Home


VERBMOBIL: The Combination Of Deep And Shallow Processing For Spontaneous Speech Translation

Authors:

Thomas Bub, DFKI (Germany)
Wolfgang Wahlster, DFKI (Germany)
Alex Waibel, Carnegie Mellon University (U.S.A.)

Volume 1, Page 71

Abstract:

Verbmobil is a speech-to-speech translation system for spontaneously spoken negotiation dialogs. The actual system translates 74.2% of spontaneously spoken German input. We give an overview of the Verbmobil system. After the introduction of the Verbmobil scenario and the unique constraints of the project, we describe the underlying system architecture and its realization. The progress that was achieved on the end-to-end translation rate owes much to the increase of the word recognition rate from 45% in 1993 to 87% in 1996. But in order to achieve the envisaged coverage on the incertain speech recognizer output, deep and shallow approaches to the analysis and transfer problem had to be combined.

ic970071.pdf

ic970071.pdf

TOP



Prosodic Processing and its Use in Verbmobil

Authors:

Heinrich Niemann, University of Erlangen (Germany)
Elmar Nöth, University of Erlangen (Germany)
Andreas Kießling, University of Erlangen (Germany)
Ralf Kompe, University of Erlangen (Germany)
Anton Batliner, L.M.-Univ. München (Germany)

Volume 1, Page 75

Abstract:

We present the prosody module of the VERBMOBIL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings.

ic970075.pdf

ic970075.pdf

TOP



The Language Components in Verbmobil

Authors:

Hans Ulrich Block, Siemens AG (Germany)

Volume 1, Page 79

Abstract:

This paper gives an overview over the main problems and their solutions in the language components of the Verbmobil speech translation system. Interpretation of spontaneously spoken language has to take into account that syntax and semantics differ from written language, that punctuation is missing, that accent and intonation have effects on the meaning and the translation, that the output of the speech recognizer may be noisy and that speakers produce errors due to distraction. The Verbmobil interpretation and translation components try to attack these problems by means of a grammar for spoken language, heavy use of prosodic information, a syntactic search on word hypothesis graphs and a shallow robust fall back translation device that is used in case the "deep" translation fails.

ic970079.pdf

ic970079.pdf

TOP



The Karlsruhe-Verbmobil Speech Recognition Engine

Authors:

Michael Finke, University of Karlsruhe (Germany)
Petra Geutner, University of Karlsruhe (Germany)
Hermann Hild, University of Karlsruhe (Germany)
Thomas Kemp, University of Karlsruhe (Germany)
Klaus Ries, University of Karlsruhe (Germany)
Martin Westphal, University of Karlsruhe (Germany)

Volume 1, Page 83

Abstract:

Verbmobil, a German research project, aims at machine translation of spontaneous speech input. The ultimate goal is the development of a portable machine translator that will allow people to negotiate in their native language. Within this project the University of Karlsruhe has developed a speech recognition engine that has been evaluated on a yearly basis during the project and shows very promising speech recognition word accuracy results on large vocabulary spontaneous speech. In this paper we will introduce the Janus Speech Recognition Toolkit underlying the speech recognizer. The main new contributions to the acoustic modeling part of our 1996 evaluation system -- speaker normalization, channel normalization and polyphonic clustering -- will be discussed and evaluated. Besides the acoustic models we delineate the different language models used in our evaluation system: Word trigram models interpolated with class based models and a separate spelling language model were applied. As a result of using the toolkit and integrating all these parts into the recognition engine the word error rate on the German Spontaneous Scheduling Task (GSST) could be decreased from 30% word error rate in 1995 to 13.8% in 1996.

ic970083.pdf

ic970083.pdf

TOP



An Experiment On Korean-To-English And Korean-To-Japanese Spoken Language Translation

Authors:

Jae-Woo Yang, ETRI (Korea)
Jun Park, ETRI (Korea)

Volume 1, Page 87

Abstract:

We have implemented a Korean-to-English and Korean-to-Japanese spoken language translation system prototype. The system can translate speech in travel planning domain with 5,000 word vocabulary. In our prototype, we concentrate on how to transfer the intention of a user to the partner in spite of current limitation of spoken language processing technology. We measured the end-to-end performance of the prototype to test whether the output of the system is understandable using a subjective measure. We also used an objective measure to evaluate the system performance and found that it generates coherent result with the subjective test. The test result shows that the user can understand the output even in the case that the system cannot translate speech correctly. Thus it is important to provide even partially correct translation output to the user, in order not to neglect the possibility that the user can infer the intended message using the context and his/her intelligence.

ic970087.pdf

ic970087.pdf

TOP



Multilingual Person to Person Communication at IRST

Authors:

Bianca Angelini, IRST (Italy)
Mauro Cettolo, IRST (Italy)
Anna Corazza, IRST (Italy)
Daniele Falavigna, IRST (Italy)
Gianni Lazzari, IRST (Italy)

Volume 1, Page 91

Abstract:

This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted not only at the acoustic level, but also for the linguistic processing. Therefore, while an overview of the global architecture will be briefly introduced, the focus will be put on the acoustic recognizer and the understanding module. Experimental evaluations complete the presentation.

ic970091.pdf

ic970091.pdf

TOP



Fast Word-Graph Generation For Spontaneous Conversational Speech Translation

Authors:

Tohru Shimizu, ATR-ITL (Japan)
Harald Singer, ATR-ITL (Japan)
Yoshinori Sagisaka, ATR-ITL (Japan)

Volume 1, Page 95

Abstract:

This paper introduces the latest advances in research at ATR on speech translation for spontaneous conversations, especially focusing on speech recognition efforts. For recognition, we employ a word search technique that generates moderate sized word graphs in real-time. To cope with a variety in length of utterances, e.g., word, phrase, sentence fragment, sentence, and concatenated sentences in spontaneous speech, we have adopted a two pass search strategy that uses variable-order word n-gram statistics in the first stage and task dependent language constraints in the second stage. This strategy is evaluated using the ``ATR Travel Arrangement'' corpus.

ic970095.pdf

ic970095.pdf

TOP



JANUS-III: Speech-to-Speech Translation in Multiple Languages

Authors:

Alon Lavie, Carnegie Mellon University (U.S.A.)
Alex Waibel, Carnegie Mellon University (U.S.A.)
Lori Levin, Carnegie Mellon University (U.S.A.)
Michael Finke, Carnegie Mellon University (U.S.A.)
Donna Gates, Carnegie Mellon University (U.S.A.)
Marsal Gavalda, Carnegie Mellon University (U.S.A.)
Torsten Zeppenfeld, Carnegie Mellon University (U.S.A.)
Puming Zhan, Carnegie Mellon University (U.S.A.)

Volume 1, Page 99

Abstract:

This paper describes JANUS-III, our most recent version of the JANUS speech-to-speech translation system. We present an overview of the system and focus on how system design facilitates speech translation between multiple languages, and allows for easy adaptation to new source and target languages. We also describe our methodology for evaluation of end-to-end system performance with a variety of source and target languages. For system development and evaluation, we have experimented with both push-to-talk as well as cross-talk recording conditions. To date, our system has achieved performance levels of over 80% acceptable translations on transcribed input, and over 75% acceptable translations on speech input recognized with a 75-90% word accuracy. Our current major research is concentrated on enhancing the capabilities of the system to deal with input in broad and general domains.

ic970099.pdf

ic970099.pdf

TOP



State-Transition Cost Functions and an Application to Language Translation

Authors:

Hiyan Alshawi, AT&T Labs (U.S.A.)
Adam L. Buchsbaum, AT&T Labs (U.S.A.)

Volume 1, Page 103

Abstract:

We define a general method for ranking the solutions of a search process by associating costs with equivalence classes of state transitions of the process. We show how the method accommodates models based on probabilistic, discriminative, and distance cost functions, including assignment of costs to unseen events. By applying the method to our machine translation prototype, we are able to experiment with different cost functions and training procedures, including an unsupervised procedure for training the numerical parameters of our English-Chinese translation model. Results from these experiments show that the choice of cost function leads to significant differences in translation quality.

ic970103.pdf

ic970103.pdf

TOP



Hybrid language processing in the Spoken Language Translator

Authors:

Manny Rayner, SRI International (U.K.)
David M. Carter, SRI International (U.K.)

Volume 1, Page 107

Abstract:

The paper presents an overview of the Spoken Language Translator (SLT) system's hybrid language-processing architecture, focussing on the way in which rule-based and statistical methods are combined to achieve robust and efficient performance within a linguistically motivated framework. In general, we argue that rules are desirable in order to encode domain-independent linguistic constraints and achieve high-quality grammatical output, while corpus-derived statistics are needed if systems are to be efficient and robust; further, that hybrid architectures are superior from the point of view of portability to architectures which only make use of one type of information. We address the topics of ``multi-engine'' strategies for robust translation; robust bottom-up parsing using pruning and grammar specialization; rational development of linguistic rule-sets using balanced domain corpora; and efficient supervised training by interactive disambiguation. All work described is fully implemented in the current version of the SLT-2 system.

ic970107.pdf

ic970107.pdf

TOP



Finite-State Speech-to-Speech Translation

Authors:

Enrique Vidal, DSIC UPV (Spain)

Volume 1, Page 111

Abstract:

A fully integrated approach to Speech-Input Language Translation in limited-domain applications is presented. The mapping from the input to the output language is modeled in terms of a finite state translation model which is learned from examples of input-output sentences of the task considered. This model is tightly integrated with standard acoustic-phonetic models of the input language and the resulting global model directly supplies, through Viterbi search, an optimal output-language sentence for each input-language utterance. Several extensions to this framework, recently developed to cope with the increasing difficulty of translation tasks, are reviewed. Finally, results for a task in the framework of hotel front-desk communication, with a vocabulary of about 700 words, are reported.

ic970111.pdf

ic970111.pdf

TOP



An Experimental Bidirectional Japanese/English Interpreting Video Phone System Using Internet.

Authors:

Shoji Hiraoka, MRIT (Japan)
Masakatsu Hoshimi, MRIT (Japan)
Kenji Matsui, CRL (Japan)
Jean-Claude Junqua, STL (U.S.A.)

Volume 1, Page 115

Abstract:

In this paper we report on an experimental bidirectional Japanese/English interpreting video phone system using Internet. We particularly emphasize the motivation for this work, the task, and the experiments conducted. Using in house technology developed both in Japan and in the United States, we demonstrated an Internet home shopping application where an American shop assistant and a Japanese customer engaged in task-directed dialogues, using their native languages. The experiments showed that when users are familiar with the application language, a natural interaction can be obtained.

ic970115.pdf

ic970115.pdf

TOP