ABSTRACT
This paper reports on the results of two groups of experiments conducted in order to examine the melodic correlates of demarcation and constituency in subordination and coordination structures. Experimental material were noun phrases of the form "A of B of C of D" and "A, B, C and D" followed by a short VP. A, B, C and D were noun groups like "article + noun" or "pronoun + noun" or "pronoun + adjective + noun". The relation between grammatical structure and pitch is examined in terms of both phonological interpretation and phonetic data. Also a phonological model of intonation is proposed bearing on the results of the experimental material.
ABSTRACT
Modeling intonation, i.e., specifying adequate fundamental frequency (F0) contours, remains a challenging task for speech synthesis systems. This paper discusses the development of a system for phonetically specifying intonation contours for German. It deals with the problem of translating an abstract phonological representation of intonation - namely the tone-sequence model - into a concrete phonetic model. Design options and evaluation methods are discussed
ABSTRACT
A statistical modeling of voice fundamental frequency contours was proposed for the purpose of developing effective ways to utilize prosodic features in speech recognition. In view of the fact that prosodic features should be treated in longer units, the proposed modeling represents the transition in moraic units. A fundamental frequency contour was first segmented into moraic units and then each moraic contour was represented by a code depending on the shape. After modeling fundamental frequency contours for the portions of several morae around boundaries in question based on HMM scheme, experiments on syntactic boundary detection were conducted. Detection rate reached to 89.2 % for the closed condition experiment and was around 85 % for the open (speaker and topic) condition experiment. Experiments on accent type recognition were also conducted yielding around 74 % of correct recognition for the speaker independent cases.
ABSTRACT
Modeling F0 contours of arbitrarily long and complex sentences of the Greek language may prove to be a difficult task if one considers the various parameters involved, namely focus, position of the prominent vowel within words, syntactic structure and type of expression. None the less, this complexity may be significantly reduced if the expressive requirements of the application area in mind are taken into account. Study of the expressive requirements of the information broadcasting applications revealed that the affirmative type of expression is heavily used and regardless of size and complexity, each sentence-spanning contour may be co-mposed of only four word-spanning patterns. This result not only leads to significant savings in the resources required for a natural sounding speech output but also indicates a highly structured intonative component.
ABSTRACT
Interactions between factors affecting consonant duration are well known. It has proved difficult to quantify these interactions. The difficulty lies in the enormous amount of speech necessary to resolve all factor combinations and their uneven distribution in speech, i.e., factor confounding. Assuming piecewise independence of factor combinations and an additive duration model, it is possible to reconstruct "balanced" mean durations from unbalanced data. Analysis of a corpus of read speech from two speakers allowed us to model the interaction between syllable stress, position in the word, and consonant identity. The strong interactions could be attributed to a "floor" in the shortest durations and irregular behavior of Coronal consonants. The distribution of durations of Coronal consonants is linked to a shift to ballistic articulation, i.e., flaps, in reducing circumstances.
ABSTRACT
Speech timing at different speaking rates was studied for the Slovenian language and the results were applied for duration modelling in the Slovenian text-to-speech system S5 [1]. In order to enable the synthesiser to pronounce input text with several speaking rates, tests were made to study the impact of speaking rate on syllable duration and duration of individual phonemes and phoneme groups for the Slovenian language [2]. A two-level approach to durational modelling is described. A method for segment duration prediction was developed, which adapts a word with an intrinsic duration to the desired extrinsic duration, taking into account how stretching and squeezing apply to duration of individual segments.