Prosody and Emotion 5

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

The Tilt Intonation Model

Authors:

Paul Taylor, University of Edinburgh (U.K.)

Page (NA) Paper number 827

Abstract:

The tilt intonation model facilitates automatic analysis and synthesis of intonation. The analysis algorithm detects intonational events in F0 contours and parameterises them in terms of the continuously varying parameters. We describe the analysis system and give results for speaker independent spontaneous dialogue speech. We then describe a synthesis algorithm which can generate F0 contours given a tilt parameterisation of an utterance. We give results showing how well the automatically produced contours match natural ones. The paper concludes with a discussion of the linguistic relevance of the tilt parameters and show that this is both a useful and natural way of representing intonation.

SL980827.PDF (From Author) SL980827.PDF (Rasterized)

TOP


Analysis of Occurrence of Pauses and Their Durations in Japanese Text Reading

Authors:

Hiroya Fujisaki, Department of Applied Electronics, Science University of Tokyo (Japan)
Sumio Ohno, Department of Applied Electronics, Science University of Tokyo (Japan)
Seiji Yamada, Department of Applied Electronics, Science University of Tokyo (Japan)

Page (NA) Paper number 831

Abstract:

Pauses play important roles both for the intelligibility and the naturalness of speech. Their occurrences and durations in text reading are influenced by syntactic structures of the text as well as by physiological constraints of respiration on the part of the speaker. The present paper describes some of the preliminary findings on Japanese text reading, especially on the effects of the syntactic role of the preceding phrase on the rate of occurrence and the duration of a pause at a syntactic boundary.

SL980831.PDF (From Author) SL980831.PDF (Rasterized)

TOP


A Statistical Study of Pitch Target Points in Five Languages

Authors:

Estelle Campione, Université de Provence (France)
Jean Véronis, Université de Provence (France)

Page (NA) Paper number 845

Abstract:

We present the results of a large-scale statistical study of pitch target points in five languages, on a corpus comprising 4 hours 20 minutes of speech and involving 50 different speakers. The entire corpus has been stylized automatically by a technique reducing the F0 contour to a series of target points representing the significant pitch changes. It was then entirely verified by experts using a resynthesis method, in order to ensure that there was no audible difference with the original. The set of ca. 50000 pitch target points thus obtained was then analyzed from a statistical point of view. In this paper we describe the main results of this study, in terms of frequency distribution of target points, pitch movements and relation of pitch movements to time interval. Our study reveals interesting differences across languages and sex.

SL980845.PDF (From Author) SL980845.PDF (Rasterized)

TOP


Fully Automatic Prosody Generator For Text-to-Speech

Authors:

Fabrice Malfrère, Faculté Polytechnique de Mons (Belgium)
Thierry Dutoit, Faculté Polytechnique de Mons (Belgium)
Piet Mertens, K.U. Leuven - Département de Linguistique (Belgium)

Page (NA) Paper number 355

Abstract:

Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be a key point for further development of new Text-to-Speech systems. This paper describes a system using such speech databases to generate the rhythm and the intonation of a French written text. The system is based on a very crude chinks 'n chunks prosodic phrasing algorithm and on a prosodic analysis of a natural speech database. The rhythm of the synthetic speech is generated with a CART tree trained on a large mono-speaker speech corpus. The acoustic aspect of the intonation is derived from a set of prosodic patterns automatically derived from the same speech corpus. The system has been tested on single sentences and news paragraphs. Informal listening tests have shown that the resulting prosody is convincing most of the time.

SL980355.PDF (From Author) SL980355.PDF (Rasterized)

0355_01.WAV
(was: sound355_01.wav)
Synthesized French sentence: 'Le petit canard apprend à nager' (The little duck learns to swim).
File type: Sound File
Format: Sound File: WAV
Tech. description: Sound file: 16000Hz, 16 bits/sample, mono, pcm,pc windows wav format
Creating Application:: Unknown
Creating OS: Unknown

TOP


Automatic Prosodic Labeling of 6 Languages

Authors:

Halewijn Vereecken, ELIS, University of Ghent (Belgium)
Jean-Pierre Martens, ELIS, University of Ghent (Belgium)
Cynthia Grover, Lernout & Hauspie Speech Products NV (Belgium)
Justin Fackrell, Lernout & Hauspie Speech Products NV (Belgium)
Bert Van Coile, Lernout & Hauspie Speech Products NV (Belgium)

Page (NA) Paper number 45

Abstract:

This contribution describes a method for the automatic prosodic labeling of multi-lingual speech data. The automatic labeler assigns a boundary strength between 0 and 3 to each word boundary, and a word prominence between 0 and 9 to each word. The speech signal and its orthographic representation are first transformed to feature vectors comprising acoustic and linguistic features such as pitch, duration, energy, part-of-speech, punctuation, word frequency and stress. Next, the feature vectors are mapped to prosodic labels via a cascade of multi-layer perceptrons. Experiments on 6 different languages demonstrate that combining acoustic with linguistic features yields a better performance than obtainable on the basis of acoustic features alone. We also present experiments in which we assess the influence of the quality of the underlying phonetic segmentation and labeling on the prosodic labeling performance.

SL980045.PDF (From Author) SL980045.PDF (Rasterized)

TOP


Automatic Utterance Type Detection Using Suprasegmental Features

Authors:

Helen Wright, Centre for Speech Technology Research, University of Edinburgh (U.K.)

Page (NA) Paper number 575

Abstract:

The goal of the work presented here is to automatically predict the type of an utterance in spoken dialogue by using automatically extracted suprasegmental information. For this task we present and compare three stochastic algorithms: hidden Markov models, artificial neural nets, and classification and regression trees. These models are easily trainable, reasonably robust and fit into the probabilistic framework required for speech recognition. Utterance type detection is dependent on the assumption that different types of utterances have different suprasegmental characteristics. The categorisation of these utterance types is based on the theory of conversation games and consists of 12 move types (e.g. reply to a question, wh-question, acknowledgement). The system is speaker independent and is trained on spontaneous goal-directed dialogue collected from Canadian speakers. This utterance type detector is used in an automatic speech recognition system to reduce word error rate.

SL980575.PDF (From Author) SL980575.PDF (Rasterized)

TOP