Full List of Titles 1: ICSLP'98 Proceedings 2: SST Student Day Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files |
The Tilt Intonation ModelAuthors:Paul Taylor, University of Edinburgh (U.K.) Page (NA) Paper number 827Abstract:The tilt intonation model facilitates automatic analysis and synthesis of intonation. The analysis algorithm detects intonational events in F0 contours and parameterises them in terms of the continuously varying parameters. We describe the analysis system and give results for speaker independent spontaneous dialogue speech. We then describe a synthesis algorithm which can generate F0 contours given a tilt parameterisation of an utterance. We give results showing how well the automatically produced contours match natural ones. The paper concludes with a discussion of the linguistic relevance of the tilt parameters and show that this is both a useful and natural way of representing intonation.
|
0355_01.WAV(was: sound355_01.wav) | Synthesized French sentence: 'Le petit canard apprend à nager' (The little duck learns to swim). File type: Sound File Format: Sound File: WAV Tech. description: Sound file: 16000Hz, 16 bits/sample, mono, pcm,pc windows wav format Creating Application:: Unknown Creating OS: Unknown |
Halewijn Vereecken, ELIS, University of Ghent (Belgium)
Jean-Pierre Martens, ELIS, University of Ghent (Belgium)
Cynthia Grover, Lernout & Hauspie Speech Products NV (Belgium)
Justin Fackrell, Lernout & Hauspie Speech Products NV (Belgium)
Bert Van Coile, Lernout & Hauspie Speech Products NV (Belgium)
This contribution describes a method for the automatic prosodic labeling of multi-lingual speech data. The automatic labeler assigns a boundary strength between 0 and 3 to each word boundary, and a word prominence between 0 and 9 to each word. The speech signal and its orthographic representation are first transformed to feature vectors comprising acoustic and linguistic features such as pitch, duration, energy, part-of-speech, punctuation, word frequency and stress. Next, the feature vectors are mapped to prosodic labels via a cascade of multi-layer perceptrons. Experiments on 6 different languages demonstrate that combining acoustic with linguistic features yields a better performance than obtainable on the basis of acoustic features alone. We also present experiments in which we assess the influence of the quality of the underlying phonetic segmentation and labeling on the prosodic labeling performance.
Helen Wright, Centre for Speech Technology Research, University of Edinburgh (U.K.)
The goal of the work presented here is to automatically predict the type of an utterance in spoken dialogue by using automatically extracted suprasegmental information. For this task we present and compare three stochastic algorithms: hidden Markov models, artificial neural nets, and classification and regression trees. These models are easily trainable, reasonably robust and fit into the probabilistic framework required for speech recognition. Utterance type detection is dependent on the assumption that different types of utterances have different suprasegmental characteristics. The categorisation of these utterance types is based on the theory of conversation games and consists of 12 move types (e.g. reply to a question, wh-question, acknowledgement). The system is speaker independent and is trained on spontaneous goal-directed dialogue collected from Canadian speakers. This utterance type detector is used in an automatic speech recognition system to reduce word error rate.