Authors:
Esther Klabbers, IPO, Center for Research on User-System Interaction (The Netherlands)
Emiel Krahmer, IPO, Center for Research on User-System Interaction (The Netherlands)
Mariët Theune, IPO, Center for Research on User-System Interaction (The Netherlands)
Page (NA) Paper number 278
Abstract:
The defining property of a Concept-to-Speech system is that it combines
language and speech generation. Language generation converts the input-concepts
into natural language, which speech generation subsequently transforms
into speech. Potentially, this leads to a more `natural sounding' output
than can be achieved in a plain Text-to-Speech system, since the correct
placement of pitch accents and intonational boundaries ---an important
factor contributing to the `naturalness' of the generated speech---
is co-determined by syntactic and discourse information, which is typically
available in the language generation module. In this paper, a generic
algorithm for the generation of coherent spoken monologues is discussed,
called D2S. Language generation is done by a module called LGM which
is based on TAG-like syntactic structures with open slots, combined
with conditions which determine when the syntactic structure can be
used properly. A speech generation module converts the output of the
LGM into speech using either phrase-concatenation or diphone-synthesis.
Authors:
Janet Hitzeman, Centre for Speech Technology Research, University of Edinburgh (U.K.)
Alan W. Black, Centre for Speech Technology Research, University of Edinburgh (U.K.)
Paul Taylor, Centre for Speech Technology Research, University of Edinburgh (U.K.)
Chris Mellish, Department of Artificial Intelligence, University of Edinburgh (U.K.)
Jon Oberlander, Human Communication Research Centre, University of Edinburgh (U.K.)
Page (NA) Paper number 591
Abstract:
This paper describes the latest version of the SOLE concept-to-speech
system, which uses linguistic information provided by a natural language
generation system to improve the prosody of synthetic speech. We discuss
the types of linguistic information that prove most useful and the
implications for text-to-speech systems.
Authors:
Hiyan Alshawi, AT&T Labs (USA)
Srinivas Bangalore, AT&T Labs (USA)
Shona Douglas, AT&T Labs (USA)
Page (NA) Paper number 293
Abstract:
We describe a method for learning head-transducer models of translation
automatically from examples consisting of transcribed spoken utterances
and reference translations of the utterances. The method proceeds
by first searching for a hierarchical alignment (specifically a synchronized
dependency tree) of each training example. The alignments produced
are optimal with respect to a cost function that takes into account
co-occurrence statistics and the recursive decomposition of the example
into aligned substrings. A probabilistic head-transducer model is
then constructed from the alignments. We report results of applying
the method to English-to-Spanish translation in the domain of air travel
information and English-to-Japanese translation in the domain of telephone
operator assistance. We also report on a variation on this model-construction
method in which multi-word pairings are used in the computation of
the hierarchical alignments and head transducer models.
Authors:
Toshiaki Fukada, ATR-ITL (Japan)
Detlef Koll, CMU-ISL (USA)
Alex Waibel, CMU-ISL (USA)
Kouichi Tanigaki, ATR-ITL (Japan)
Page (NA) Paper number 657
Abstract:
This paper describes a probabilistic method for dialogue act (DA) extraction
for concept-based multilingual translation systems. A DA is a unit
of a semantic interlingua and it consists of speaker information, speech
act, concept and argument. Probabilistic models for the extraction
of speech acts or concepts are trained as speech act or concept dependent
word n-gram models. The proposed method is evaluated on DA-annotated
English and Japanese databases. The experimental results show that
the proposed method gives a better performance compared to the conventional
grammar-based approach. In addition, the proposed method is much more
robust for erroneous inputs obtained as speech recognition outputs.
Authors:
Ye-Yi Wang, Carnegie Mellon University (USA)
Alex Waibel, Carnegie Mellon University (USA)
Page (NA) Paper number 826
Abstract:
We investigated an efficient decoding algorithm for statistical machine
translation. Compared to the other algorithms, this new algorithm is
applicable to different translation models, and it is much faster.
Experiments showed that the algorithm achieved an overall performance
comparable to the state of the art decoding algorithms.
Authors:
Toshiyuki Takezawa, ATR Interpreting Telecommunications Research Laboratories (Japan)
Tsuyoshi Morimoto, Fukuoka University (Japan)
Yoshinori Sagisaka, ATR Interpreting Telecommunications Research Laboratories (Japan)
Nick Campbell, ATR Interpreting Telecommunications Research Laboratories (Japan)
Hitoshi Iida, ATR Interpreting Telecommunications Research Laboratories (Japan)
Fumiaki Sugaya, ATR Interpreting Telecommunications Research Laboratories (Japan)
Akio Yokoo, ATR Interpreting Telecommunications Research Laboratories (Japan)
Seiichi Yamamoto, ATR Interpreting Telecommunications Research Laboratories (Japan)
Page (NA) Paper number 957
Abstract:
We have built a new speech translation system called ATR-MATRIX (ATR's
Multilingual Automatic Translation System for Information Exchange).
This system can recognize natural Japanese utterances such as those
used in daily life, translate them into English and output synthesized
speech. This system is running on a workstation or a high-end PC and
achieves nearly real-time processing. The current implementation of
our system deals with a hotel room reservation task/domain. We plan
to develop a bidirectional speech translation system, i.e., Japanese-to-English
and English-to-Japanese. We also plan to develop multi-language output
functions from ATR-MATRIX (Japanese-to-English, German and Korean)
for the international joint experiment of C-STAR II (Consortium for
Speech Translation Advanced Research).
|