Session MAB Prosody

Chairperson Nick Campbell ATR, Japan

Home


Persistence of prosodic features between dialectal and standard Italian utterances in six sub-varieties of a region of Southern Italy (Salento): first assessments of the results of a recognition test and an instrumental analysis

Authors: Antonio Romano

Centre de Dialectologie Université Stendhal - Grenoble III Domaine Universitaire, 38040 Grenoble, France. Tel. +4 76 82 43 80 Fax +4 76 82 43 56, E-mail: romano@u-grenoble3.fr

Volume 1 pages 175 - 178

ABSTRACT

The aim of the work to be reported here is to attempt to verify that some dialectal varieties of a restricted area show sensible differences in their prosody and that normally the same differences characterise the prosodic system of speakers when performing sentences in Italian. To verify these hypothesis two kind of experiment were carried out: a perceptual recognition test based on sentences differing only by prosodic cues and uttered in Italian by different speakers of the region; a detailed phonetic inspection of the acoustical makeup to detect which cues are most likely to be responsible for the listener success in the recognition task.

A0070.pdf

TOP


IMPROVING THE PHONETIC ANNOTATION BY MEANS OF PROSODIC PHRASING

Authors: H. Vereecken (1) ,A.Vorstermans (2) ,J.-P.Martens (1) and B. Van Coile (1),(2)

1 ELIS, University of Gent, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium 2 Lernout & Hauspie Speech Products NV, Sint-Krispijnstraat 7, B-8900 Ieper, Belgium E-mail: halewijn@elis.rug.ac.be

Volume 1 pages 179 - 182

ABSTRACT

It was established that the performance of our annotation system [8] is affected by the length of the utterances: the error rate, the CPU-load and the memory requirements tend to increase as the utterances get longer. In this contribution the speech signal is first segmented into speech, pauzes and noise (breaths, clicks, : : :) and subsequently split in signal phrases prior to the annotation. Experiments on 3 different databases (3 languages) demonstrate that this stategy yields a significant improvement of the annotation accuracy.

A0098.pdf

TOP


A DESCRIPTIVE STUDY OF PROSODIC PHENOMENA IN MPUR (WEST PAPUAN PHYLUM)*

Authors: Cecilia Ode

Irian Jaya Studies, Leiden University, The Netherlands email: ode@rullet.leidenuniv.nl

Volume 1 pages 183 - 186

ABSTRACT

A descriptive study of prosody in Mpur (West Papuan Phylum), an unwritten tone language with perceptually five tone contrasts, is presented, using the stylization method (see 1.). Three issues, observed at prosodic boundaries, are analysed and compared to their occurrence in other positions: 1) realization of tone; 2) vowel lengthening; 3) expression of emotive emphasis by means of repeated words, tail-head constructions, clitics and particles (2 and 3 frequently occur in the oral tradition of peoples of New Guinea). Results show that 1) level tones exhibit clearly audible pitch movements (falling or rising) at prosodic boundaries, sometimes with vowel lengthening; 2) vowels may be lengthened up to more than five times their original duration; 3) words may be repeated up to ten times without any change in the realization of tone; in tail-head constructions a reset (a jump upwards or downwards in the course of F0) may be observed.

A0252.pdf

TOP


Automated Quantitative Analysis of F 0 Contours of Utterances from a German ToBI-Labeled Speech Database

Authors: Hansjorg Mixdorff and Hiroya Fujisaki

1Technical University Dresden 2Science University ofTokyo Institute ofTechnical Acoustics Mommsenstr. 13, 01062 Dresden, Germany E-mail: mixdorff@teles.de

Volume 1 pages 187 - 190

ABSTRACT

The present paper proposes a method for automating the analysis of F0 contours using the Fujisaki-model on ToBI-labeled speech data. ToBI-labels are used to preselect the number of necessary phrase and accent commands and align the onsets and o sets ofthese commands with the segments of the utterance. Local optimization is then performed with special regard to `reliable' portions of the F0 contour, for instance, the syllable nuclei. Analysis results are used for formulating quantitative F0 control rules for speech synthesis.

A0285.pdf

TOP


IDENTIFICATION AND AUTOMATIC GENERATION OF PROSODIC CONTOURS FOR A TEXT-TO-SPEECH SYNTHESIS SYSTEM IN FRENCH

Authors: S. de Tournemire

France Telecom, CNET (Centre National d'Etudes des Télécommunications) Technopole Anticipa, 2 avenue Pierre Marzin, 22307 Lannion Cedex E-mail: detourns@lannion.cnet.fr

Volume 1 pages 191 - 194

ABSTRACT

This paper presents the realisation of an automatically trainable computational prosodic model for French Text-to-Speech Synthesis. The methodology proposes the construction of the model in two steps. The first step consists in predicting fundamental frequency contours and duration of syllables from abstract prosodic markers using neural networks [17,12]. In this step, the abstract prosodic markers are automatically extracted from the signal by analysing prosodic realisations [2] and identifying a prosodic alphabet and a set of labelling rules. The second step integrates the model into the CNET Text-to-Speech Synthesis system [7] by using its linguistic levels and predicting abstract prosodic markers from text and linguistic labels. The system is evaluated by naïve listeners and compared with the actual CNET Text-to-Speech Synthesis system.

A0410.pdf

TOP


Quantitative Analysis and Formulation of Tone Concatenation in Chinese F0 Contours

Authors: Jin-Fu Ni*, Ren-Hua Wang*, Keikichi Hirose**

*Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei,P.R.China, 230027 **Department of Information and Communication Engineering, School of Engineering, University of Tokyo, Bunkyo-ku, Tokyo, 113, Japan. jfn@eeis.ustc.edu.cn rhw@ustc.edu.cn hirose@gavo.t.u-tokyo.ac.jp

Volume 1 pages 195 - 198

ABSTRACT

With the aim of constructing a set of prosodic rules enabling to generate high-quality synthetic speech of Chinese, tone concatenation features were investigated for Chinese words. Using a superpositional model developed for Chinese F0 contours, quantitative analyses were conducted on 124 Chinese multi-syllable words to find out features on their F0 contours, especially the ones related to tone concatenation. A set of rules were then introduced for the control of model parameters to generate F0 contours of connected tones using the model. Comparison between F0 contours of natural utterances and those rule-generated for 340 words with various tone combinations showed the validity of the proposed rules.

A0493.pdf

TOP


AN ENVIRONMENT FOR THE LABELLING AND TESTING OF MELODIC ASPECTS OF SPEECH

Authors: Christel Brindopke, Arno Pahde, Franz Kummert, Gerhard Sagerer

Technical Faculty, University of Bielefeld, Postfach 10 01 31, 33501 Bielefeld Germany email: christel@techfak.uni-bielefeld.de

Volume 1 pages 199 - 202

ABSTRACT

In this paper, we present anenvironment for labelling and testing of melodic aspects of spoken language. The environment has three modes of application: First, the environment provides labelling facilities for a model-based melodic description for German. Second, it supports a language independent pre-theoretical description of speech melody allowing the development of new melodic categories. Third, our test bed can be used to generate speech samples with controlled melodic parameters for further use in perception experiments. The melodic description facilities (model-based, pre-theoretical) are supported by visual and audible feedback allowing a step-by- step refinement of the melodic description in question.

A0507.pdf

TOP


PROPAUSE: A SYNTACTICO-PROSODIC SYSTEM DESIGNED TO ASSIGN PAUSES

Authors: David Casacuberta*, Lourdes Aguilar**, Rafael Marín**

Departament de Filosofia*, Departament de Filologia Espanyola** Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain {david, lourdes, rafa}@liceu.uab.es

Volume 1 pages 203 - 206

ABSTRACT

In this study, a PROLOG-based computational tool designed to assign pauses in Spanish texts is proposed. Our purpose is to develop a prosodic segmentation algorithm suitable to be implemented in a text-to-speech system for Spanish. By means of the analysis of a corpus of read texts in Spanish, prosodic and syntactic factors guiding the location of orthographically unmarked pauses are identified. These factors are used to design a compu-tational model for assigning pauses in unrestricted texts. The performance of the system has been assessed by means of a comparison between its suggested segmentation and natural speech. The obtained results indicate that the system is able to capture empirical facts.

A0578.pdf

TOP


INTEGRATED DIALOG ACT SEGMENTATION AND CLASSIFICATION USING PROSODIC FEATURES AND LANGUAGE MODELS

Authors: V. Warnke (1), R. Kompe (2) , H.Niemann (1) , E.Noth (1)

(1) Universitat Erlangen-Nurnberg, Lehrstuhl fur Mustererkennung, 91058 Erlangen, Germany http://www5.informatik.uni-erlangen.de/ (2) Sony International (Europe) GmbH, 70736 Fellbach, Germany

Volume 1 pages 207 - 210

ABSTRACT

This paper presents an integrated approach for the segmentation and classification of dialog acts (DA) in the Verbmobil project. In Verbmobil it is often suficient to recognize the sequence of DAs occurring during a dialog between the two partners. In our previous work [5] we segmented and classified a dialog in two steps: first we calculated hypotheses for the segment boundaries and decided for a boundary if the probabilities exceeded a predefined threshold level. Second we classified the segments into DAs using semantic classification trees or stochastic language models. In our new approach we integrate the segmentation and classification in the A-algorithm to search for the optimal segmentation and classifica tion of DAs on the basis of word hypotheses graphs (WHGs). The hypotheses for the segment boundaries are calculated with the help of a stochastic language model operating on the word chain and a multi-layer perceptron (MLP) classifying prosodic features. The DA classification is done using a category based language model for each DA. For our experiments we used data from the Verbmobil-corpus.

A0591.pdf

TOP


EVALUATION OF PROSODIC CHARACTERISTICS IN RETOLD STORIES IN DUTCH BY MEANS OF SEMANTIC SCALES

Authors: Monique E. van Donzel and Florien J. Koopmans-van Beinum

University of Amsterdam, Institute of Phonetic Sciences/IFOTT Herengracht 338, 1016 CG Amsterdam, The Netherlands tel: +31 20 525 2183, fax: +31 20 525 2197, email: vandonzel@fon.let.uva.nl

Volume 1 pages 211 - 214

ABSTRACT

This paper describes an experiment in which listeners were asked to evaluate various prosodic aspects in retold stories in Dutch, using semantic scales. The aim was to see what features on prosodic level listeners prefer when listening to a retold story in Dutch, and if 'good' and 'bad' speakers can be distinguished in this respect. Results from a factor analysis show that listeners use Voice appreciation, Dynamics, and Articulation quality as main cues in evaluating the retold stories.

A0648.pdf

TOP


TEXT-TO-INTONATION IN SPONTANEOUS SWEDISH

Authors: Gösta Bruce*, Marcus Filipsson*, Johan Frid*, Björn Granström**, Kjell Gustafson**, Merle Horne* & David House* (names in alphabetical order)

*Dept of Linguistics and Phonetics, Helgonabacken 12, S-22362 Lund {gosta.bruce | marcus.filipsson | johan.frid | merle.horne | david.house} @ ling.lu.se **Dept of Speech, Music and Hearing, KTH, Box 70014, S-10044 Stockholm {bjorn | kjellg} @speech.kth.se

Volume 1 pages 215 - 218

ABSTRACT

This paper deals with a number of aspects of intonation in spontaneous dialogues in a language technology perspective. The key topics to be addressed are: I) the analysis of global intonation and its interaction with textual structure, II) the implementation of global and textual aspects of discourse intonation in an analysis-by-synthesis environment. We present models for the analyses of intonation and textual content in spontaneous conversations in Swedish. The models are implemented in a computational environment, making it possible to generate F0 contours, which can be imposed on a speech waveform using the PSOLA technique. The result is a text-to-intonation system, where textual and lexical analyses automatically generate hypothetical intonation contours, which can through resynthesis, and eventually be used in a text-to-speech system.

A0701.pdf

TOP


SYNTHESISING ATTITUDES WITH GLOBAL RHYTHMIC AND INTONATION CONTOURS

Authors: Yann Morlec, Gerard Bailly et Veronique Auberge

Institut de la Communication Parlée 46, av. Félix Viallet 38031 Grenoble CEDEX FRANCE e-mail: (morlec,bailly,auberge)@icp.grenet.fr

Volume 1 pages 219 - 222

ABSTRACT

We present here a trainable generative model of French prosody. We focus on the sentence level and design SNNs able to generate both rhythmic and intonation contours for diverse attitudes. First results of a perceptual test show that listeners are able to retrieve the right definition of attitudes by listening to synthetic PSOLA stimuli.

A0786.pdf

TOP


PROSODY-PARTICLE PAIRS AS DISCOURSE CONTROL SIGNS

Authors: Dafydd Gibbon and Claudia Sassen

Fakultat fur Linguistik und Literaturwissenschaft Universitat Bielefeld, Postfach 100151, D-33501 Bielefeld Tel. +49 521 106 3510, FAX: +49 521 106 6008 E-mail: gibbon@spectrum.uni-bielefeld.de, csassen@hrz.uni-bielefeld.de

Volume 1 pages 223 - 226

ABSTRACT

We address the problem of integrating the description of discourse particles and their intonation into an HPSG--based lexicon for spontaneous speech applications, and propose a lexical sign type called prosody-- particle pair which has similar structure to grammatical inflexions and is formally described as a nested attribute--value structure. In our discussion we generalise a known class of `stylised intonations' to include the level intonation of hesitation particles. Previous descriptions of discourse particles, their roles and their relations to intonation, have been informal. Our proposal is the first to model particle-- intonation relations explicitly and in detail as an inflexion--like complex sign in a formal lexicon. The inclusion of the intonation of hesitation phenomena in the class of stylised intonations on formal and functional grounds is also new.

A0933.pdf

TOP


FOCUS DETECTION WITH ADDITIONAL INFORMATION OF PHRASE BOUNDARIES AND SENTENCE MODE

Authors: Anja Elsner

e-mail: ape@ikp.uni-bonn.de Institut fur Kommunikationsforschung und Phonetik (IKP), University of Bonn, Poppelsdorfer Allee 47, 53115 Bonn, Germany

Volume 1 pages 227 - 230

ABSTRACT

In this paper an improved method for detection of focus accents is presented. The focus detection algorithm works with a rule-based approach. The main information source is the fundamental frequency F0 of an utterance. Results for the original version are 79 % recognition rate and 67 % average recognition rate for spontaneous speech. By integration of additional information like phrase boundaries and sentence mode, recognition rate increases by about 3 to 4 percent, depending on the dialogue.

A0937.pdf

TOP


THE ROLE OF PROSODY IN INFANTS' NATIVE-LANGUAGE DISCRIMINATION ABILITIES: THE CASE OF TWO PHONOLOGICALLY CLOSE LANGUAGES

Authors: Laura Bosch and Nuria Sebastian-Galles

Departament de Psicologia Basica Universitat de Barcelona, Campus Vall d'Hebrón, 08035 Barcelona (Spain). Tel. +343 4021100 ext.3168, FAX +343 4021363, E-mail: lbosch@psi.ub.es

Volume 1 pages 231 - 234

ABSTRACT

In this paper, the capacity of four-month-old infants from monolingual environments to distinguish between two syllable-timed languages is analysed. Catalan and Spanish are both Romance languages which present differences at the segmental level and at the syllable structure level, but show important similarities concerning prosodic structure at the phonological phrase level. Nevertheless, the presence of vowel reduction only in Catalan may determine rhythmic differences which could be detected by infants and used to tell these two languages apart. Two experiments have been run, with normal and low-pass filtered utterances, using a visual orientation procedure with a reaction time measure. Results indicate that infants are able to discriminate even when segmental information has been removed. The distinction seems to be the result of basic differential rhythmic properties between these two languages.

A0953.pdf

TOP


PROSODIC CYCLES AND INTERPERSONAL SYNCHRONY IN AMERICAN ENGLISH AND SWEDISH

Authors: Eugene H. Buder Anders Eriksson

School of Audiology & Speech-Language Pathology University of Memphis 807 Jefferson Ave., Memphis, TN 38105 USA ehbuder@cc.memphis.edu Department of Phonetics Umeå University S-901 87 Umeå, Sweden anderse@ling.umu.se

Volume 1 pages 235 - 238

ABSTRACT

The paper addresses the question of rhythmic structuring of conversational interaction. Conversational speech requires active co-operation and co-ordination of the behavior of two or more speakers. Previous research indicates that one of the mechanisms used by speakers to regulate conversational interaction, is close monitoring and adaptation to rhythmic patterns. When this does not function properly, interaction may be adversely affected or even break down. There are reasons to believe that these mechanisms are used universally across languages, but there are also likely to be patterns that are language-specific. The research project, of which the present paper forms a first published report, is an attempt at separating the universal and language-specific aspects of the regulating rhythmic patterns. Although this research is primarily meant to clarify the mechanisms of conversational interaction from a linguistic/phonetic point of view, its applicability to speech technology is evident. Growing interest in dialogue systems for applications to man-machine communication demands more detailed data on all aspects of natural human conversation.

A0980.pdf

TOP


Relating Prosody to Syntax: Boundary Signalling in Swedish

Authors: Eva Strangert

Department of Linguistics, Phonetics Umeå University, S-901 87 Umeå, Sweden Tel: +46 90 165680, Fax: +46 90 166377, E-mail: strangert@ling.umu.se

Volume 1 pages 239 - 242

ABSTRACT

Two factors were experimentally varied in order to study their effects on silent interval and segment duration at NP-VP boundaries in Swedish sentences. These factors, the syntactic complexity of the NP and VP portions as well as the length of the words in the sentence both had significant effects on silent interval duration. Concerning word length, the general trend was an increase in silent interval duration, when longer words as compared to shorter ones preceded the boundary. Furthermore, silent interval duration increased, while preboundary segment duration decreased, when the NP complexity was increased. Moreover, there was a tendency to decreasing silence duration when the NP had the simplest structure, containing just a noun, and the VP increased in complexity. The same tendency was observed in the consonant preceding the boundary. This adjustment pattern, common to the silent interval and the final consonant, was assumed to occur in order to counteract imbalance in complexity between the NP and VP.

A0984.pdf

TOP


ON REPRESENTATION OF FUNDAMENTAL FREQUENCY OF SPEECH FOR PROSODY ANALYSIS USING RELIABILITY FUNCTION

Authors: Mitsuru NAKAI and Hiroshi SHIMODAIRA

Japan Advanced Institute of Science and Technology, Hokuriku 1-1 Asahidai, Tatsunokuchi, Nomi, Ishikawa, 923-12 Japan

Volume 1 pages 243 - 246

ABSTRACT

This paper highlights on a method that provides a new prosodic feature called 'F0 reliability field' based on a reliability function of the fundamental frequency (F0 ). The proposed method does not employ any correction process for F0 estimation errors that occur during automatic F0 extraction. By applying this feature as a score function for prosodic analyses like prosodic structure estimation or superpositional modeling of prosodic commands, these prosodic information could be acquired with higher accuracy. The feature has been applied to 'F0 template matching method', which detects accent phrase boundaries in Japanese continuous speech. The experimental results show that compared to the conventional F0 contour, the proposed feature overcomes the harmful influence caused by F0 errors.

A1016.pdf

TOP


Efficient Method of Establishing Words Tone Dictionary for Korean TTS system

Authors: Seong-hwan Kim and Jin-young Kim

DSP Laboratory Dept. of Electronics Engineering Chonnam National University, 500-757 Kwangju, South Korea Tel. +82 62 267 0595, Fax: +82 62 514 6472. E-mail : kseong@dsp.chonnam.ac.kr, kimjin@dsp.chonnam.ac.kr

Volume 1 pages 247 - 250

ABSTRACT

In this paper, we propose an efficient method to establish Word Tone Dictionary(WTD). Vector qantization(VQ) is applied for compressing word tones for compressing word tones, and a phonetic-syntactic distance is adopted for searching the word tone dictionary. Because word tone is a sequence of syllable tones, VQ is used in encoding the syllable tones. As word tones in utterances are specified by their syntactic roles and phonetic features, we propose an adequate distance function to search the appropriate word tone in WTD. It is a combined distance function of syntactic distance and phonetic distance. We tested on a 100-utterance corpus. Preliminary experiments showed that the proposed method could lead to the natural pitch-controlled speech.

A1036.pdf

TOP


Perception of questions and statements in Neapolitan Italian

Authors: Mariapaola D'Imperio* and David House**

*Department of Linguistics, Ohio State University, 222 Oxley Hall, 1712 Neil Avenue Columbus, OH 43210-1298, USA. e-mail: dimperio@ling.ohio-state.edu **Department of Linguistics and Phonetics, Lund University, Helgonabacken 12, S-223 62 Lund, Sweden e-mail: david.house@ling.lund.se

Volume 1 pages 251 - 254

ABSTRACT

This paper addresses the problem of the perception of two different pitch accents in Italian which signal two utterance types (interrogative and declarative). The questions asked concern whether the major perceptual cue to this category distinction involves only the temporal alignment of the high level target with the syllable or if the category percept also depends on the presence of a rising or falling melodic movement within the syllable nucleus. The results show that the primary perceptual cue for questions is a rise through the vowel, while the primary cue for statements is a fall through the vowel. The results bear upon a general theory of intonation and our understanding of intonation in Italian as well as on current models of tonal perception in speech.

A1085.pdf

TOP