ABSTRACT
A unit inventory for concatenative speech synthesis in Brazilian Portuguese was built on the basis of an analysis of segment-prosody interactions. Segments are viewed as full or reduced depending on stress, syllable structure and phonological boundaries. Demisyllabic units preserve the integrity of segments reduced due to syllable structure. Intersyllabic units preserve the integrity of segments reduced due to stress and boundaries. Integrity of vowel clusters is also preserved, but nasal vowels and diphthongs are successfully concatenated to oral onsets. The resulting units were recorded in carrier words and sentences designed on phonotactic and grammatical grounds. Good quality concatenation is achieved even before the addition of prosodic treatment.
ABSTRACT
This paper investigates "what's in a number", i.e. how natural numbers are pronounced in several European languages. As regards reading numbers above 20, 29 languages read the decade first and then the digit, e.g. twenty-four, and 10 languages read the digit before the decade, e.g. four-and-twenty. Two languages, Norwegian and Czech, use both systems, and 9 languages use (partly) a vigesimal system. An analysis of the Norwegian part of the European SpeechDat database showed that reading the decade first is used more in formal than in non-formal (spontaneous) speech and that typographic layout of digits influenced the reading of them.
ABSTRACT
This study investigates, from an articulatory point of view, the extent to which the different nasalization processes in Brazilian Portuguese (BP) - phonemic, allophonic and coarticulatory - are the result of phonological, language specific rules or a purely phonetic transitional phenomenon between an oral vowel and a nasal consonant. The study revealed that the magnitude of the velic gestures is similar in phonemic and in allophonic nasalization, which suggests that both processes are the result of the application of phonological nasalization rules in BP. On the other hand, in coarticulatory nasalization the degree of velic opening reached during the vowel is smaller, suggesting that in this case we have a purely transitional, coarticulatory phenomenon.
ABSTRACT
Having analyzed various peculiarities of phonetic word formation and speech, we established that phonetic words are constructed according to specific laws. It pertains to the phonetic word formation patterns and the parts of speech adherents of both stressed word and its proclitic parts. The phonetic image of the spoken text description units practically corresponds to the phonetic nature of the traditional concept of word as a lexical unit, containing unstressed syllables differently structured in several hierarchies, and the main stressed syllable, the position of which determines the phonetic word accentual type.
ABSTRACT
The paper addresses the issue whether the ``probabilities'' delivered by a speech recognizer can be directly used as a measure for the confidence of the recognition. As current recognizers have to commit a lot of modelling assumptions and because of estimation problems due to sparse data this certainly is questionable. Nevertheless, this investigation shows, in the framework of recognizing semantic items in the Philips automatic telephone exchange board system PADIS, that there exists a useful correlation between probabilities and confidences. The method proposed works out as a generalization of the more standard method of using likelihood ratios between the first- and second-best recognition path. It offers as distinct advantages a) the integration of all available knowledge sources, and b) the direct and theoretically sound computation of confidence measures on all levels of interest.
ABSTRACT
This paper describes the processing of 2465 sentences (or utterences) which are collected by phonetical rules from a big corpus--recent years' newspaper, "People's Daily" and etc., as materials of speech recognition and speech synthesis database. In these sentences, both phonetic phenomena and sentence patterns are included. We first consider the phonetic distribution among syllables: inter-syllabic diphones, inter-syllabic triphones and final-initial structure. The syllabic balance ensures the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. There are roughly 17 kinds of sentence patterns which appear in our sentence set. We have also created a set of phonetically balanced 2-4 syllable phrases which includes all of the tone structures.
ABSTRACT
From the German SpeechDat(M) database of telephone speech the digit sequences items that were spoken as chains of individual digits were extracted. From these digit strings, a subset of 39 strings was selected by dialect experts and according to the region information provided by the speaker. The German federal states were used as region classes because this information can easily be provided by the speaker. 7 test persons were asked to listen to the subset of digit strings and to classify them by region. It was found that the overall success rate for the classification is 40%; if the regions neighboring the correct region are also counted as correct, the success rate is 68%.
ABSTRACT
Production of a trill depends on several articulatory and aerodynamic constraints. These constraints can be held responsible for various sound changes in Slavic languages which all involve depalatalization or frication of Proto-Slavic palatalized trilled r. As soon as a trill is affected by palatalization, the aerodynamic conditions are changed and the possibility of trill production lowers. Small deviations in aperture size and air velocity can lead to a failure of a trill. This paper proposes a phonetic explanation for the depalatalization and/or frication of the Proto-Slavic palatalized trilled r by considering the detrimental effects of articulatory and aerodynamic constraints on the production of a palatalized trill.
ABSTRACT
The boundaries found in the target 100 travelling domain dialogue sentences were labelled automatically according to the relative 9-stepped phonetic depth. The text database was also tagged with syntactic information. Having established 4 kinds of acoustic features, we arranged the prosodic aspect which can be depicted as a continuous change of duration and intonation across the penultimate, boundary, and post-boundary syllables along the X-Y two dimensional scale. Majority of the syntactic pairs seemed to have a characteristic that the intonation tends to fluctuate from rising to falling and, simultaneously, the duration showed of a short-long-short or a long-short-short pattern in the same syllable string of penultimate-boundary-post_boundary.
ABSTRACT
Based on formant measurements of more than 10000 vowels from 16 German speakers, vowel quality differences between the speakers have been analyzed. The main result of this investigation is that different speakers show not only different formant values (as one would expect due to individual differences), but exhibit different arrangements in their vowel systems. These differences are demonstrated by examples from the general distribution of vowels, the structure of vowel prototypes, and the differing number of degrees of tongue height for different speakers. The problem of vowel normalization for that data will also be demonstrated and discussed.
ABSTRACT
In this paper, we give a first account of speech tempo and its change in spontaneous speech inavery large data base (Verbmobil, i.e., human-human appointment dialogs). As features representing speech tempo, we computed mean normalized speech duration (speaking rate) and normalized phone duration in different ways. The importance of these features is evaluated with an automatic classification of boundaries and accents where different sets of prosodic features (including also information about F0, energy, pause, etc.) were used. The best results (83% for accents, 88% for boundaries, two classes each) could be achieved when all features were used. For the 2nd issue change of tempo was labelled manually. We present the characterizing feature values for changes from slow to fast and from fast to slow, as well as the results of an automatic classifcation of change of tempo (72% for three classes). Finally, we discuss the possible function of change of tempo and its use in automatic speech processing.
ABSTRACT
This paper presents an inventory and relative frequency estimation of glides on the 527,190 word-form Standard Slovenian lexicon. Detailed acoustic-phonetic measurements for the first four most frequent glides /ai/, /au/, /ou/, and /ei/ in stressed syllables are given. Inspection of their formant trajectory plots enabeled measurements of the first four formants in the onset and offset steady-states. Normalized duration patterns for the onset steady-state, glide and offset steady-state are also given. Results represent a broader view to the recently published JIPA paper [4] and are an initial step towards the decision on the most appropriate allophonic symbols to be used in narrow transcription for the glides of Standard Slovenian.
ABSTRACT
The temporal organization of discourse has produced a great deal of works in several languages pointing to different aims: from studies where the identification of cues about the planning of linguistic message is treated to studies in which duration models for text-to-speech systems are proposed. This work is a first step towards the description of Catalan vowel duration. Considering the Catalan vowel system, two subsystems can be distinguished according to stress: stressed vowels: /i/, /e/, /E/, /a/, /O/, /o/, /u/, and unstressed vowels /i/, /u/, /@/. The purpose of the present study is to provide data for Catalan vowels in order to achieve a data-oriented description and at the same time a predictive model suitable to be implemented in a TTS system.
ABSTRACT
This paper examines the phonology and phonetics of intonational patterns of vocatives functioning as calls in discourse. In addition it defines the relevant discourse context for the study of the vocatives and it examines the relations between discourse contexts and intonational patterns. The paper is based on a corpus of spontaneous speech. The analysis shows the existence of many different patterns (rises, falls, and levels). The presence of an interrogative vs a non-interrogative discourse context accounts for (respectively) the occurrence of rises and falls, while the level patterns exhibit context-neutrality. The paper concludes stating that the vocatives with non-level intonation function as modal clues in discourse.
ABSTRACT
This paper presents an acoustic study of spontaneous and read Italian speech based on the analysis of monologues and corresponding read transcribed texts, each produced by three different subjects. The speaking styles were examined in terms of articulation, speech, fluency and word rate indices; typology of pauses and their cooccurrence; mean and range of F0 values; classification of phonetic events resulting from adjacency of vowels situated at word boundaries.
ABSTRACT
This paper focusses on the intonation of yes-no questions in a local non-standard variety of Italian: that spoken in Bari. It has been claimed in any early study [7, 8] that Bari Italian (BI) has a fmal rise on yes-no questions. However, subsequent accounts of BI [5, 6] have found a predominance of fmal falls on such questions. Since the former study was based on a corpus read aloud and the latter on spontaneous dialogues, it was decided to compare read and spontaneous questions produced by the same speaker. Spontaneous questions by six speakers were extracted from recordings of task- oriented dialogues and presented for reading, both in list form and in specially contructed contexts. It was found that the fmal tonal contour was predominantly falling in the spontaneous questions and predominantly rising in the corresponding read questions. These results throw light on the discrepancy in the literature as to the typical yes-no question intonation for this variety of Italian. It is argued that the falling fmal contour is more natural than the rising one since it is typical of spontaneous speech.
ABSTRACT
Both Standard Austrian German and the Austrian dialects are affected by an ongoing change which turns the diphthongs /aE/ and /AO/ into the monophthongs /E:/ and /O:/ respectively. However, this process shows different assimilation patterns according to the two main dialect regions in Austria: In the South Bavarian dialect region, the offset of the diphthong is assimilated towards the onset, whereas in the Middle Bavarian dialect region, the onset is assimilated towards the offset. The present study provides a detailed description of the diphthongs in both reading and spontaneous speech material. In order to give an answer to the question concerning the two different assimilation patterns, historical speech material of the late fifties has been analyzed additionally.
ABSTRACT
In English, the focus of a sentence is an important factor in determining the prosody of an utterance. Some linguistic analyses of focus [9][10][11] claim that (1) prosodic representation of focus is determined by pitch accents, (2) the distribution of pitch accents is determined by the size of the focus constituent, and (3) one prosodic realization may be ambiguous for several focus constituents. In this study, two experiments were conducted to test the interaction of focus with certain structures: verb phrases and noun phrases. Duration and f0 measurements within these phrases were analyzed, and a prosodic analysis was conducted. Results show that speakers tend to distinguish broad and narrow focus using several prosodic strategies, where different pitch accent types and patterns within the phrases signal the different focus conditions.
ABSTRACT
This study describes speech production experiments designed to determine the domain of accentual lengthening in Scottish English. Results suggest that accentual lengthening affects not only the syllable which bears the pitch accent (phrasal stress), but extends rightwards beyond this syllable. Secondly, the amount of lengthening on a syllable adjacent to a pitch accent appears to depend upon its membership in a pitch accented unit. Several candidates for the accentual-lengthening unit are entertained.
ABSTRACT
This paper presents the first results of a semantic-pragmatic model which assigns a specific label to the relevant words of dialogue utterances and predicts their F0 value. The originality of this work lies in the kind of utterances the model has been designed for: dialogue utterances. The labels of the model represent the degrees of both the expected/unexpected and known/unknown aspects of the lexical information while the predicted value of F0 represents the corresponding weight of that information. The aim of this work is 1) to observe the real values of F0 for each label and 2) to compare the prediction of the model to the real values. The real values correspond to the 3 relevant F0 indices (Maximum F0, DF0 and mean F0). In this paper, only the levels 2 and 3 are discussed because they represent most of the population.
ABSTRACT
This paper examines double articulations in three African languages. Mamvu.Lese and Efe. all belonging to the Central Sudanic language family. The phonetic inventory of these languages exhibit some very interesting facts, among which the most striking are voiceless labiovelars stops involving a trilled release and a labiouwlar stop which shows the combination of a voiceless and a voiced part in the same consonant. Acoustic and aerodynamic measurements describing the production of these sounds are presented.
ABSTRACT
In this paper we present part of the analysis performed on intonation for the Basque language. After a brief description of the most relevant characteristics of the language, criteria for corpus fulfilment and speakers selection is described. Results of the analysis show the importance of the F0 drop in focus positioning. A first classification of the selected varieties is done according to the accent position and F0 values relationships.