Spoken Language Generation and Translation 1

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

The Modeling and Realization of Natural Speech Generation System

Authors:

Chen Fang, Institute of Information Science,Northern Jiaotong University (China)
Yuan Baozong, Institute of Information Science,Northern Jiaotong University (China)

Page (NA) Paper number 1034

Abstract:

The paper gives an overall discussion on problems in Chinese natural speech generation. A Chinese Bi-directional Grammar is developed to suit for Chinese Language understanding and generation. A comprehensive discription about the structure of characteristic network of all ranks in language have been built up. In Natural language generation, text planning is proceeded at first to extract concrete content related to the semantic. Through text organization the internal generation structure is formed. Grammar realization transforms internal structure to natural language. After the text of natural language generated, the next step is to convert the text into speech. We build up a speech characteristic database with speech of 50 thousand phrases and hundreds of pronunciation rules. After recognizing the structure of the input text and abstracting the rhythm characteristics in text, the database gives completely a description from Chinese characters to speech. The whole Chinese character in GB2312-80 can be described to speech. Based on the research all above, a natural speech generation system is established. It can automatically plan and organize the output sentences in natural speech. The synthetic speech has good quality in naturalness and intelligibility.

SL981034.PDF (From Author) SL981034.PDF (Scanned)

TOP


"Ko Tok Ples Ensin bilong Tok Pisin" or The TP-CLE: A First Report From a Pilot Speech-to-Speech Translation Project From Swedish to Tok Pisin

Authors:

Robert Eklund, Telia Research AB (Sweden)

Page (NA) Paper number 804

Abstract:

This paper describes an operational speech-to-speech translation system from Swedish to Tok Pisin within the framework of the Spoken Language Translator project. The domain of translation is ATIS. The grammar formalism used in the SLT project is the Core Language Engine. A general presentation of Tok Pisin is provided, as well as a description of some grammatical characteristics of Tok Pisin of potential interest for the testing of grammar machines. The first step of a CLE implementation of Tok Pisin is described. A corpus of Tok Pisin ATIS data has been created from data collected on location in New Ireland, Papua New Guinea, and observations are made as to the relative importance of some of the grammatical phenomena discussed in the paper. A Tok Pisin synthesizer based on an already existing Swedish concatenative synthesis is described. Despite a marked Swedish accent, preliminary evaluation indicates that intelligible speech output is produced.

SL980804.PDF (From Author) SL980804.PDF (Rasterized)

0804_01.WAV
(was: 0804_01.WAV.gz)
Example sound file.
File type: Sound File
Format: NIST/Sphere
Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM
Creating Application:: Unknown
Creating OS: unix
0804_02.WAV
(was: 0804_02.WAV.gz)
Example sound file.
File type: Sound File
Format: NIST/Sphere
Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM
Creating Application:: Unknown
Creating OS: unix
0804_03.WAV
(was: 0804_03.WAV.gz)
Example sound file.
File type: Sound File
Format: NIST/Sphere
Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM
Creating Application:: Unknown
Creating OS: unix
0804_04.WAV
(was: 0804_04.WAV.gz)
Example sound file.
File type: Sound File
Format: NIST/Sphere
Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM
Creating Application:: Unknown
Creating OS: unix
0804_05.WAV
(was: 0804_05.WAV.gz)
Example sound file.
File type: Sound File
Format: NIST/Sphere
Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM
Creating Application:: Unknown
Creating OS: unix
0804_06.WAV
(was: 0804_06.WAV.gz)
Example sound file.
File type: Sound File
Format: NIST/Sphere
Tech. description: Sampling rate: 16 kHz, Bits-per-sample: 16, Encoding: Linear PCM
Creating Application:: Unknown
Creating OS: unix

TOP


An Iterative, DP-Based Search Algorithm For Statistical Machine Translation

Authors:

Ismael García-Varea, Instituto Tecnológico de Informática, Universidad Politécnica de Valencia (Spain)
Francisco Casacuberta, Instituto Tecnológico de Informática, Universidad Politécnica de Valencia (Spain)
Hermann Ney, Lerhstuhl für Informatik VI, RWTH Aachen, University of Technology (Germany)

Page (NA) Paper number 209

Abstract:

The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the problems with Statistical Machine Translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently a Dynamic- Programming-like algorithm has been introduced which computes approximate solutions for some models. These solutions can be improved by using an iterative algorithm that refines the successive solutions and uses a smoothing technique for some probabilistic distribution of the models based on an interpolation of different distributions. The technique resulting from this combination has been tested on the "Tourist Task" corpus, which was generated in a semi-automated way. The best results achieved were a translation word-error rate of 9.3% and a sentence-error rate of 44.4%.

SL980209.PDF (From Author) SL980209.PDF (Rasterized)

TOP


Information Extraction and Text Generation of News Reports for a Swedish-English Bilingual Spoken Dialogue System

Authors:

Barbara Gawronska, University of Skovde (Sweden)
David House, KTH (Royal Institute of Technology) (Sweden)

Page (NA) Paper number 1047

Abstract:

This paper describes an experimental dialog system designed to retrieve information and generate summaries of internet news reports related to user queries in Swedish and English. The extraction component is based on parsing and on matching the parsing output against stereotypic event templates. Bilingual text generation is accomplished by filling the templates after which grammar components generate the final text. The interfaces between the templates and the language-specific text generators are marked for prosodic information resulting in a text output where deaccentuation, accentuation, levels of focal accentuation, and phrasing are specified. These prosodic markers, which are primarily dependent on the giveness/newness structure of the text, modify the default prosody rules of the text-to-speech system which then reads the text with subsequent improvement in intonation.

SL981047.PDF (From Author) SL981047.PDF (Rasterized)

TOP


Utterance Generation for Transaction Dialogues

Authors:

Joris Hulstijn, University of Twente (The Netherlands)
Arjan van Hessen, University of Twente (The Netherlands)

Page (NA) Paper number 776

Abstract:

This paper discusses the utterance generation module of a spoken dialogue system for transactions. Transactions are interesting because they involve obligations of both parties: the system should provide all relevant information; the user should feel committed to the transaction once it has been concluded. Utterance generation plays a major role in this. The utterance generation module works with prosodically annotated utterance templates. An appropriate template for a given dialogue act is selected by the following parameters: utterance type, body of the template, given information, wanted and new information. Templates respect rules of accenting and deaccenting.

SL980776.PDF (From Author) SL980776.PDF (Rasterized)

TOP


Example-Based Error Recovery Method For Speech Translation: Repairing Sub-Trees According to the Semantic Distance

Authors:

Kai Ishikawa, ATR Interpreting Telecommunications Research Laboratories (Japan)
Eiichiro Sumita, ATR Interpreting Telecommunications Research Laboratories (Japan)
Hitoshi Iida, ATR Interpreting Telecommunications Research Laboratories (Japan)

Page (NA) Paper number 725

Abstract:

In speech translation, recognition errors produced by the speech recognition process can cause parsing and translation errors. Because of this, the development of a robust error handling framework is quite essential to improve the performance of the speech translation system. Previously, a robust translation method was proposed by Wakita, which translates only reliable parts in utterances. In this method, however, the recall of translated parts for a whole utterance is low, and sometimes no translation is output. In this paper, we propose an example-based error recovery method to solve the low recall problem of Wakita's method. The proposed method recovers an unreliable utterance, by repairing the parse-tree of the utterance based on similar example parse-trees in the tree- bank. A recovered translation is generated from the recovered tree.

SL980725.PDF (From Author) SL980725.PDF (Rasterized)

TOP


Context Sensitive Generation of Descriptions

Authors:

Emiel Krahmer, IPO, Center for Research on User-System Interaction (The Netherlands)
Mariët Theune, IPO, Center for Research on User-System Interaction (The Netherlands)

Page (NA) Paper number 277

Abstract:

Probably the best current algorithm for generating definite descriptions is the Incremental Algorithm due to Dale and Reiter. If we want to use this algorithm in a Concept-to-Speech system, however, we encounter two limitations: (1) the algorithm is insensitive to the linguistic context and thus always produces the same description for an object, (2) the output is a list of properties which uniquely determine one object from a set of objects: how this list is to be expressed in spoken natural language is not addressed. We propose a modification of the Incremental Algorithm based on the idea that a definite description refers to the most salient element in the current context satisfying the descriptive content. We show that the modified algorithm allows for the context-sensitive generation of both distinguishing and anaphoric descriptions, while retaining the attractive properties of Dale and Reiter's original algorithm.

SL980277.PDF (From Author) SL980277.PDF (Rasterized)

TOP


An Interlingua Based on Domain Actions for Machine Translation of Task-Oriented Dialogues

Authors:

Lori Levin, Carnegie Mellon University (USA)
Donna Gates, Carnegie Mellon University (USA)
Alon Lavie, Carnegie Mellon University (USA)
Alex Waibel, Carnegie Mellon University (USA)

Page (NA) Paper number 999

Abstract:

This paper describes an interlingua for spoken language translation that is based on domain actions in the travel planning domain. Domain actions are composed of speech acts (e.g., request-information), attributes (e.g., size, price), and objects (e.g., hotel, flight) and can take arguments. Development of the interlingua is guided by a database containing travel dialogues in English, Korean, Japanese, and Italian. There are currently 423 domain actions that cover hotel reservation and transportation. The interlingua will soon be extended to cover tours, tourist attractions, and events. The interlingua is used by the C-STAR speech translation consortium for translating travel planning dialogues in six languages: English, Japanese, German, Korean, Italian, and French. The paper also addresses the role of the interlingua in Carnegie Mellon's JANUS translation system.

SL980999.PDF (From Author) SL980999.PDF (Rasterized)

TOP


Generating Pitch Accents in a Concept-to-Speech System Using a Knowledge Base

Authors:

Sandra Williams, Microsoft Research Institute (Australia)

Page (NA) Paper number 799

Abstract:

This paper describes a concept-to-speech system for generating spoken descriptions of routes between places within Macquarie University Computing Department. The Natural Language Generation (NLG) component of the system generates a textual route description marked with intonational information. The discourse structure of the route description is related closely to the knowledge representation of the route. The NLG component includes a pitch accenting algorithm which places appropriate pitch accents on elements of the utterance requiring particular emphasis or stress. Our pitch accenting algorithm uses a domain knowledge base and a discourse history. From these it determines whether information selected to form the content of the utterance is shared mutual domain knowledge, given information, or new information. It can then assign an appropriate pitch accent to one word in each prosodic phrase. The text-to-speech component then determines the appropriate syllable to be accented in the word.

SL980799.PDF (From Author) SL980799.PDF (Rasterized)

TOP


Making the Most of Multiplicity: a Multi-Parser Multi-Strategy Architecture for the Robust Processing of Spoken Language

Authors:

Tobias Ruland, Siemens AG (Germany)
C. J. Rupp, University of the Saarland (Germany)
Jörg Spilker, University of Erlangen-Nürnberg (Germany)
Hans Weber, University of Erlangen-Nürnberg (Germany)
Karsten L. Worm, University of the Saarland (Germany)

Page (NA) Paper number 570

Abstract:

This paper describes ongoing research on robust spoken language understanding in the context of the Verbmobil speech-to-speech machine translation project. We focus on recent developments in the processing steps which map a word lattice to a semantic representations. The approach described firstly applies speech repair correction to word lattices. Four analysis methods of varying depth are then applied in parallel to the normalized word lattices, producing output for sub-portions of the lattice in the same semantic description language, the VIT format. These fragmentary analyses are stored and combined by a further processing component, which finally selects a sequence of semantic representations as a result.

SL980570.PDF (From Author) SL980570.PDF (Rasterized)

TOP


Natural-Sounding Speech Synthesis Using Variable-Length Units

Authors:

Jon R.W. Yi, MIT Laboratory for Computer Science (USA)
James R. Glass, MIT Laboratory for Computer Science (USA)

Page (NA) Paper number 1151

Abstract:

The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create natural-sounding speech. Our initial work showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with word- and phrase-level concatenation. In order to extend the flexibility of this framework, we focused on generating novel words from a corpus of sub-word units. The design of the corpus was motivated by perceptual experiments that investigated where speech could be spliced with minimal audible distortion and what contextual constraints were necessary to maintain in order to produce natural-sounding speech. From this sub-word corpus, a Viterbi search selects a sequence of units based on how well they match the input specification and concatenation constraints. This concatenative speech synthesis system, ENVOICE, has been used in a conversational system in two application domains to convert meaning representations into speech waveforms.

SL981151.PDF (From Author) SL981151.PDF (Rasterized)

1151_01.WAV
(was: 1151_01.WAV)
1st of 3 example waveforms from section 2
File type: Sound File
Format: Sound File: WAV
Tech. description: 16000 Hz, 16 bits/sample, mono, PCM
Creating Application:: Unknown
Creating OS: Unknown
1151_02.WAV
(was: 1151_02.WAV)
2nd of 3 example waveforms from section 2
File type: Sound File
Format: Sound File: WAV
Tech. description: 16000 Hz, 16 bits/sample, mono, PCM
Creating Application:: Unknown
Creating OS: Unknown
1151_03.WAV
(was: 1151_03.WAV)
3rd of 3 example waveforms from section 2
File type: Sound File
Format: Sound File: WAV
Tech. description: 16000 Hz, 16 bits/sample, mono, PCM
Creating Application:: Unknown
Creating OS: Unknown
1151_04.WAV
(was: 1151_04.WAV)
1st of 2 example waveforms from section 6
File type: Sound File
Format: Sound File: WAV
Tech. description: 16000 Hz, 16 bits/sample, mono, PCM
Creating Application:: Unknown
Creating OS: Unknown
1151_05.WAV
(was: 1151_05.WAV)
2nd of 2 example waveforms from section 6
File type: Sound File
Format: Sound File: WAV
Tech. description: 16000 Hz, 16 bits/sample, mono, PCM
Creating Application:: Unknown
Creating OS: Unknown

TOP