Session T1C Modelling of Prosody

Chairperson Hiroya Fujisaki Science Univ. of Tokyo, Japan

Home

METRICAL REPRESENTATIONS OF DEMARCATION AND CONSTITUENCY IN NOUN PHRASES

Authors: Christos Malliopoulos (1) and George Mikros (2)

1.National Technical University of Athens, Department of Electrical Engineering 2.Institute for Language and Speech Processing 22 Margari Str. 115 25 Athens-Greece tel: (+301) 6712250 e-mail: chmall@ilsp.gr, gmikros@ilsp.gr

Volume 1 pages 303 - 306

ABSTRACT

This paper reports on the results of two groups of experiments conducted in order to examine the melodic correlates of demarcation and constituency in subordination and coordination structures. Experimental material were noun phrases of the form "A of B of C of D" and "A, B, C and D" followed by a short VP. A, B, C and D were noun groups like "article + noun" or "pronoun + noun" or "pronoun + adjective + noun". The relation between grammatical structure and pitch is examined in terms of both phonological interpretation and phonetic data. Also a phonological model of intonation is proposed bearing on the results of the experimental material.

A0027.pdf

TOP

A SYSTEM OF STYLIZED INTONATION CONTOURS IN GERMAN

Authors: Hannes PIRKER, Kai ALTER+, Erhard RANK, John MATIASEK, Harald TROST and Gernot KUBIN++

Austrian Research Institute for Artificial Intelligence (OFAI), Schottengasse 3, A–1010 Vienna, Austria. Email: {hannes|erhard|john|harald}@ai.univie.ac.at +Max-Planck-Institute of Cognitive Neuroscience, Inselstraße 22-26, D–04103 Leipzig, Germany. Email: alter@cns.mpg.de ++Institute of Communications and High-Frequency Engineering, Vienna University of Technology, Gubhausstrabe 25/E389, A–1040 Vienna, Austria. Email: g.kubin@ieee.org

Volume 1 pages 307 - 310

ABSTRACT

Modeling intonation, i.e., specifying adequate fundamental frequency (F0) contours, remains a challenging task for speech synthesis systems. This paper discusses the development of a system for phonetically specifying intonation contours for German. It deals with the problem of translating an abstract phonological representation of intonation - namely the tone-sequence model - into a concrete phonetic model. Design options and evaluation methods are discussed

A0232.pdf

TOP

A METHOD OF REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS OF JAPANESE USING STATISTICAL MODELS OF MORAIC TRANSITION

Authors: Keikichi Hirose and Kouji Iwano

Department of Information and Communication Engineering School of Engineering, University of Tokyo Bunkyo-ku, Tokyo, 113, Japan hirose@gavo.t.u-tokyo.ac.jp iwano@gavo.t.u-tokyo.ac.jp

Volume 1 pages 311 - 314

ABSTRACT

A statistical modeling of voice fundamental frequency contours was proposed for the purpose of developing effective ways to utilize prosodic features in speech recognition. In view of the fact that prosodic features should be treated in longer units, the proposed modeling represents the transition in moraic units. A fundamental frequency contour was first segmented into moraic units and then each moraic contour was represented by a code depending on the shape. After modeling fundamental frequency contours for the portions of several morae around boundaries in question based on HMM scheme, experiments on syntactic boundary detection were conducted. Detection rate reached to 89.2 % for the closed condition experiment and was around 85 % for the open (speaker and topic) condition experiment. Experiments on accent type recognition were also conducted yielding around 74 % of correct recognition for the speaker independent cases.

A0275.pdf

TOP

Modeling arbitrarily long sentence-spanning F0 contours by parametric concatenation of word-spanning patterns.

Authors: Evita F.Fotinea, Michael A.Vlahakis† and George V.Carayannis†

National Technical University of Athens, Electrical and Computer Engineering Dpt, Division of Computer Science, Digital Signal Processing Lab., 9, Heroon Polytechniou St. Zographou 157 73, Athens, Greece. e-mail: evita@ilsp.gr or efotin@image.ntua.gr †Institute for Language and Speech Processing, 22, Margari St. Athens 11525, Greece Tel : +301 6712250 Fax : +301 6741262, E-mail: gcara@ilsp.gr.

Volume 1 pages 315 - 318

ABSTRACT

Modeling F0 contours of arbitrarily long and complex sentences of the Greek language may prove to be a difficult task if one considers the various parameters involved, namely focus, position of the prominent vowel within words, syntactic structure and type of expression. None the less, this complexity may be significantly reduced if the expressive requirements of the application area in mind are taken into account. Study of the expressive requirements of the information broadcasting applications revealed that the affirmative type of expression is heavily used and regardless of size and complexity, each sentence-spanning contour may be co-mposed of only four word-spanning patterns. This result not only leads to significant savings in the resources required for a natural sounding speech output but also indicates a highly structured intonative component.

A0347.pdf

TOP

Strong interaction between factors influencing consonant duration

Authors: R.J.J.H. van Son Jan P.H. van Santen

Institute for Phonetic Sciences, University of Amsterdam, Herengracht 338, NL-1016CG Amsterdam, The Netherlands, E-mail: rob@fon.let.uva.nl Bell Labs, Lucent Technologies, Murray Hill NJ, USA, E-Mail: jphvs@research.bell-labs.com

Volume 1 pages 319 - 322

ABSTRACT

Interactions between factors affecting consonant duration are well known. It has proved difficult to quantify these interactions. The difficulty lies in the enormous amount of speech necessary to resolve all factor combinations and their uneven distribution in speech, i.e., factor confounding. Assuming piecewise independence of factor combinations and an additive duration model, it is possible to reconstruct "balanced" mean durations from unbalanced data. Analysis of a corpus of read speech from two speakers allowed us to model the interaction between syllable stress, position in the word, and consonant identity. The strong interactions could be attributed to a "floor" in the shortest durations and irregular behavior of Coronal consonants. The distribution of durations of Coronal consonants is linked to a shift to ballistic articulation, i.e., flaps, in reducing circumstances.

A0456.pdf

TOP

SPEECH TIMING IN SLOVENIAN TTS

Authors: J.Gros N.Pavesic and F.Mihelic

Artificial Perception Laboratory Faculty of Electrical Engineering University of Ljubljana Traska 25, 1000 Ljubljana, Slovenia e-mail: nejka@fe.uni-lj.si

Volume 1 pages 323 - 326

ABSTRACT

Speech timing at different speaking rates was studied for the Slovenian language and the results were applied for duration modelling in the Slovenian text-to-speech system S5 [1]. In order to enable the synthesiser to pronounce input text with several speaking rates, tests were made to study the impact of speaking rate on syllable duration and duration of individual phonemes and phoneme groups for the Slovenian language [2]. A two-level approach to durational modelling is described. A method for segment duration prediction was developed, which adapts a word with an intrinsic duration to the desired extrinsic duration, taking into account how stretching and squeezing apply to duration of individual segments.

A1352.pdf

Recordings