Spoken Language Generation and Translation 1

Chen Fang, Institute of Information Science,Northern Jiaotong University (China)
Yuan Baozong, Institute of Information Science,Northern Jiaotong University (China)

Page (NA) Paper number 1034

Abstract:

The paper gives an overall discussion on problems in Chinese natural speech generation. A Chinese Bi-directional Grammar is developed to suit for Chinese Language understanding and generation. A comprehensive discription about the structure of characteristic network of all ranks in language have been built up. In Natural language generation, text planning is proceeded at first to extract concrete content related to the semantic. Through text organization the internal generation structure is formed. Grammar realization transforms internal structure to natural language. After the text of natural language generated, the next step is to convert the text into speech. We build up a speech characteristic database with speech of 50 thousand phrases and hundreds of pronunciation rules. After recognizing the structure of the input text and abstracting the rhythm characteristics in text, the database gives completely a description from Chinese characters to speech. The whole Chinese character in GB2312-80 can be described to speech. Based on the research all above, a natural speech generation system is established. It can automatically plan and organize the output sentences in natural speech. The synthetic speech has good quality in naturalness and intelligibility.

SL981034.PDF (From Author) SL981034.PDF (Scanned)

TOP

"Ko Tok Ples Ensin bilong Tok Pisin" or The TP-CLE: A First Report From a Pilot Speech-to-Speech Translation Project From Swedish to Tok Pisin

Authors:

Robert Eklund, Telia Research AB (Sweden)

Page (NA) Paper number 804

Abstract:

This paper describes an operational speech-to-speech translation system from Swedish to Tok Pisin within the framework of the Spoken Language Translator project. The domain of translation is ATIS. The grammar formalism used in the SLT project is the Core Language Engine. A general presentation of Tok Pisin is provided, as well as a description of some grammatical characteristics of Tok Pisin of potential interest for the testing of grammar machines. The first step of a CLE implementation of Tok Pisin is described. A corpus of Tok Pisin ATIS data has been created from data collected on location in New Ireland, Papua New Guinea, and observations are made as to the relative importance of some of the grammatical phenomena discussed in the paper. A Tok Pisin synthesizer based on an already existing Swedish concatenative synthesis is described. Despite a marked Swedish accent, preliminary evaluation indicates that intelligible speech output is produced.

SL980804.PDF (From Author) SL980804.PDF (Rasterized)

0804_01.WAV (was: 0804_01.WAV.gz)	Example sound file. File type: Sound File Format: NIST/Sphere Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM Creating Application:: Unknown Creating OS: unix
0804_02.WAV (was: 0804_02.WAV.gz)	Example sound file. File type: Sound File Format: NIST/Sphere Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM Creating Application:: Unknown Creating OS: unix
0804_03.WAV (was: 0804_03.WAV.gz)	Example sound file. File type: Sound File Format: NIST/Sphere Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM Creating Application:: Unknown Creating OS: unix
0804_04.WAV (was: 0804_04.WAV.gz)	Example sound file. File type: Sound File Format: NIST/Sphere Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM Creating Application:: Unknown Creating OS: unix
0804_05.WAV (was: 0804_05.WAV.gz)	Example sound file. File type: Sound File Format: NIST/Sphere Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM Creating Application:: Unknown Creating OS: unix
0804_06.WAV (was: 0804_06.WAV.gz)	Example sound file. File type: Sound File Format: NIST/Sphere Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM Creating Application:: Unknown Creating OS: unix

TOP

An Iterative, DP-Based Search Algorithm For Statistical Machine Translation

Authors:

Ismael García-Varea, Instituto Tecnológico de Informática, Universidad Politécnica de Valencia (Spain)
Francisco Casacuberta, Instituto Tecnológico de Informática, Universidad Politécnica de Valencia (Spain)
Hermann Ney, Lerhstuhl für Informatik VI, RWTH Aachen, University of Technology (Germany)

Page (NA) Paper number 209

Abstract:

The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the problems with Statistical Machine Translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently a Dynamic- Programming-like algorithm has been introduced which computes approximate solutions for some models. These solutions can be improved by using an iterative algorithm that refines the successive solutions and uses a smoothing technique for some probabilistic distribution of the models based on an interpolation of different distributions. The technique resulting from this combination has been tested on the "Tourist Task" corpus, which was generated in a semi-automated way. The best results achieved were a translation word-error rate of 9.3% and a sentence-error rate of 44.4%.

SL980209.PDF (From Author) SL980209.PDF (Rasterized)

TOP

Information Extraction and Text Generation of News Reports for a Swedish-English Bilingual Spoken Dialogue System

Authors:

Barbara Gawronska, University of Skovde (Sweden)
David House, KTH (Royal Institute of Technology) (Sweden)

Page (NA) Paper number 1047

Abstract:

This paper describes an experimental dialog system designed to retrieve information and generate summaries of internet news reports related to user queries in Swedish and English. The extraction component is based on parsing and on matching the parsing output against stereotypic event templates. Bilingual text generation is accomplished by filling the templates after which grammar components generate the final text. The interfaces between the templates and the language-specific text generators are marked for prosodic information resulting in a text output where deaccentuation, accentuation, levels of focal accentuation, and phrasing are specified. These prosodic markers, which are primarily dependent on the giveness/newness structure of the text, modify the default prosody rules of the text-to-speech system which then reads the text with subsequent improvement in intonation.

SL981047.PDF (From Author) SL981047.PDF (Rasterized)

TOP

Utterance Generation for Transaction Dialogues

Authors:

Joris Hulstijn, University of Twente (The Netherlands)
Arjan van Hessen, University of Twente (The Netherlands)

Page (NA) Paper number 776

Abstract:

This paper discusses the utterance generation module of a spoken dialogue system for transactions. Transactions are interesting because they involve obligations of both parties: the system should provide all relevant information; the user should feel committed to the transaction once it has been concluded. Utterance generation plays a major role in this. The utterance generation module works with prosodically annotated utterance templates. An appropriate template for a given dialogue act is selected by the following parameters: utterance type, body of the template, given information, wanted and new information. Templates respect rules of accenting and deaccenting.

SL980776.PDF (From Author) SL980776.PDF (Rasterized)

TOP

Example-Based Error Recovery Method For Speech Translation: Repairing Sub-Trees According to the Semantic Distance

Authors:

Kai Ishikawa, ATR Interpreting Telecommunications Research Laboratories (Japan)
Eiichiro Sumita, ATR Interpreting Telecommunications Research Laboratories (Japan)
Hitoshi Iida, ATR Interpreting Telecommunications Research Laboratories (Japan)

Page (NA) Paper number 725

Abstract:

In speech translation, recognition errors produced by the speech recognition process can cause parsing and translation errors. Because of this, the development of a robust error handling framework is quite essential to improve the performance of the speech translation system. Previously, a robust translation method was proposed by Wakita, which translates only reliable parts in utterances. In this method, however, the recall of translated parts for a whole utterance is low, and sometimes no translation is output. In this paper, we propose an example-based error recovery method to solve the low recall problem of Wakita's method. The proposed method recovers an unreliable utterance, by repairing the parse-tree of the utterance based on similar example parse-trees in the tree- bank. A recovered translation is generated from the recovered tree.

SL980725.PDF (From Author) SL980725.PDF (Rasterized)

TOP

Context Sensitive Generation of Descriptions

Authors:

Emiel Krahmer, IPO, Center for Research on User-System Interaction (The Netherlands)
Mariët Theune, IPO, Center for Research on User-System Interaction (The Netherlands)

Page (NA) Paper number 277

Abstract:

Probably the best current algorithm for generating definite descriptions is the Incremental Algorithm due to Dale and Reiter. If we want to use this algorithm in a Concept-to-Speech system, however, we encounter two limitations: (1) the algorithm is insensitive to the linguistic context and thus always produces the same description for an object, (2) the output is a list of properties which uniquely determine one object from a set of objects: how this list is to be expressed in spoken natural language is not addressed. We propose a modification of the Incremental Algorithm based on the idea that a definite description refers to the most salient element in the current context satisfying the descriptive content. We show that the modified algorithm allows for the context-sensitive generation of both distinguishing and anaphoric descriptions, while retaining the attractive properties of Dale and Reiter's original algorithm.

SL980277.PDF (From Author) SL980277.PDF (Rasterized)

TOP

An Interlingua Based on Domain Actions for Machine Translation of Task-Oriented Dialogues

Authors:

Lori Levin, Carnegie Mellon University (USA)
Donna Gates, Carnegie Mellon University (USA)
Alon Lavie, Carnegie Mellon University (USA)
Alex Waibel, Carnegie Mellon University (USA)

Page (NA) Paper number 999

Abstract:

This paper describes an interlingua for spoken language translation that is based on domain actions in the travel planning domain. Domain actions are composed of speech acts (e.g., request-information), attributes (e.g., size, price), and objects (e.g., hotel, flight) and can take arguments. Development of the interlingua is guided by a database containing travel dialogues in English, Korean, Japanese, and Italian. There are currently 423 domain actions that cover hotel reservation and transportation. The interlingua will soon be extended to cover tours, tourist attractions, and events. The interlingua is used by the C-STAR speech translation consortium for translating travel planning dialogues in six languages: English, Japanese, German, Korean, Italian, and French. The paper also addresses the role of the interlingua in Carnegie Mellon's JANUS translation system.

SL980999.PDF (From Author) SL980999.PDF (Rasterized)

TOP

Generating Pitch Accents in a Concept-to-Speech System Using a Knowledge Base

Authors:

Sandra Williams, Microsoft Research Institute (Australia)

Page (NA) Paper number 799

Abstract:

This paper describes a concept-to-speech system for generating spoken descriptions of routes between places within Macquarie University Computing Department. The Natural Language Generation (NLG) component of the system generates a textual route description marked with intonational information. The discourse structure of the route description is related closely to the knowledge representation of the route. The NLG component includes a pitch accenting algorithm which places appropriate pitch accents on elements of the utterance requiring particular emphasis or stress. Our pitch accenting algorithm uses a domain knowledge base and a discourse history. From these it determines whether information selected to form the content of the utterance is shared mutual domain knowledge, given information, or new information. It can then assign an appropriate pitch accent to one word in each prosodic phrase. The text-to-speech component then determines the appropriate syllable to be accented in the word.

SL980799.PDF (From Author) SL980799.PDF (Rasterized)

TOP

Making the Most of Multiplicity: a Multi-Parser Multi-Strategy Architecture for the Robust Processing of Spoken Language

Authors:

Tobias Ruland, Siemens AG (Germany)
C. J. Rupp, University of the Saarland (Germany)
Jörg Spilker, University of Erlangen-Nürnberg (Germany)
Hans Weber, University of Erlangen-Nürnberg (Germany)
Karsten L. Worm, University of the Saarland (Germany)

Page (NA) Paper number 570

Abstract:

This paper describes ongoing research on robust spoken language understanding in the context of the Verbmobil speech-to-speech machine translation project. We focus on recent developments in the processing steps which map a word lattice to a semantic representations. The approach described firstly applies speech repair correction to word lattices. Four analysis methods of varying depth are then applied in parallel to the normalized word lattices, producing output for sub-portions of the lattice in the same semantic description language, the VIT format. These fragmentary analyses are stored and combined by a further processing component, which finally selects a sequence of semantic representations as a result.

SL980570.PDF (From Author) SL980570.PDF (Rasterized)

TOP

Natural-Sounding Speech Synthesis Using Variable-Length Units

Authors:

Jon R.W. Yi, MIT Laboratory for Computer Science (USA)
James R. Glass, MIT Laboratory for Computer Science (USA)

Page (NA) Paper number 1151

Abstract:

The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create natural-sounding speech. Our initial work showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with word- and phrase-level concatenation. In order to extend the flexibility of this framework, we focused on generating novel words from a corpus of sub-word units. The design of the corpus was motivated by perceptual experiments that investigated where speech could be spliced with minimal audible distortion and what contextual constraints were necessary to maintain in order to produce natural-sounding speech. From this sub-word corpus, a Viterbi search selects a sequence of units based on how well they match the input specification and concatenation constraints. This concatenative speech synthesis system, ENVOICE, has been used in a conversational system in two application domains to convert meaning representations into speech waveforms.

SL981151.PDF (From Author) SL981151.PDF (Rasterized)

1151_01.WAV (was: 1151_01.WAV)	1st of 3 example waveforms from section 2 File type: Sound File Format: Sound File: WAV Tech. description: 16000 Hz, 16 bits/sample, mono, PCM Creating Application:: Unknown Creating OS: Unknown
1151_02.WAV (was: 1151_02.WAV)	2nd of 3 example waveforms from section 2 File type: Sound File Format: Sound File: WAV Tech. description: 16000 Hz, 16 bits/sample, mono, PCM Creating Application:: Unknown Creating OS: Unknown
1151_03.WAV (was: 1151_03.WAV)	3rd of 3 example waveforms from section 2 File type: Sound File Format: Sound File: WAV Tech. description: 16000 Hz, 16 bits/sample, mono, PCM Creating Application:: Unknown Creating OS: Unknown
1151_04.WAV (was: 1151_04.WAV)	1st of 2 example waveforms from section 6 File type: Sound File Format: Sound File: WAV Tech. description: 16000 Hz, 16 bits/sample, mono, PCM Creating Application:: Unknown Creating OS: Unknown
1151_05.WAV (was: 1151_05.WAV)	2nd of 2 example waveforms from section 6 File type: Sound File Format: Sound File: WAV Tech. description: 16000 Hz, 16 bits/sample, mono, PCM Creating Application:: Unknown Creating OS: Unknown

Spoken Language Generation and Translation 1

Authors:

Page (NA) Paper number 1034

Abstract:

Authors:

Page (NA) Paper number 804

Abstract:

(was: 0804_01.WAV.gz)

(was: 0804_02.WAV.gz)

(was: 0804_03.WAV.gz)

(was: 0804_04.WAV.gz)

(was: 0804_05.WAV.gz)

(was: 0804_06.WAV.gz)

Authors:

Page (NA) Paper number 209

Abstract:

Authors:

Page (NA) Paper number 1047

Abstract:

Authors:

Page (NA) Paper number 776

Abstract:

Authors:

Page (NA) Paper number 725

Abstract:

Authors:

Page (NA) Paper number 277

Abstract:

Authors:

Page (NA) Paper number 999

Abstract:

Authors:

Page (NA) Paper number 799

Abstract:

Authors:

Page (NA) Paper number 570

Abstract:

Authors:

Page (NA) Paper number 1151

Abstract:

(was: 1151_01.WAV)

(was: 1151_02.WAV)

(was: 1151_03.WAV)

(was: 1151_04.WAV)

(was: 1151_05.WAV)