ABSTRACT
This paper concerns the study of information derived from the melodic, temporal and intensity characteristics of the material to be recognized in a speech recognition system, in French. More precisely, it describes experiments we achieved at the suprasegmental levels with a system that outperform automatic correlation between prosodic labels and linguistic organization of a message to decode. Firstly an overview of the system is described along with the results of experiments carried out to determine which prosodic indexes are best- suited for syntactic and rhythmycal prediction.
ABSTRACT
This paper is concerned with measuring the amount of syntactic information contained in prosodic features of Japanese utterances. Five prosodic features are employed, and the statistical relationship between those features and the inter-phrase dependency distance is estimated by using training data. Then parsing experiments are conducted in two different ways:one utilizing the posterior distribution of the interphrase dependency distance given the prosodic feature values, and the other without using such information. It has been shown that significant improvement in parsing accuracy is attained by utilizing the prosodic information, and that the duration of pause between adjacent phrases is more effective than prosodic features related to the fundamental frequency and the power.
ABSTRACT
We describe a novel hierarchical duration model for speech recognition. The modelling scheme is based on the angie framework, a exible unified sublexical representation for speech applications. Our duration model captures contextual factors that in uence duration of sublexical units at multiple linguistic levels simultaneously, using both relative and absolute duration information. The modelling procedure involves a normalization scheme which produces a new measure for relative speaking rate at a word level. This may be used to explore phenomena in speech timing and we present studies on secondary effects of speaking rate here. This duration model demonstrates its ability to aid speech recognition in phonetic recognition experiments where it has yielded a relative improvement of up to 7.7%. In word spotting, a study employing duration as a post-processor in disambiguating between 2 acoustically similar keywords reduces relative error by 68%. Furthermore, a fully integrated duration model in an angie based word spotter improves performance by 21.5%. All gains are over and above any gains realized from standard phone duration models present in the baseline system. All experiments were conducted in the atis domain, using continuous spontaneous speech.
ABSTRACT
In this paper a speech-to-speech translator from German to English is presented. Beside the traditional processing steps it takes advantage of acoustically detected prosodic phrase boundaries and focus. The prosodic phrase boundaries reduce search space during syntactic parsing and rule out analysis trees during semantic parsing. The prosodic focus faciliates a "shallow" translation based on the best word chain in cases where the deep analysis fails.
ABSTRACT
This paper investigates to what extent statements, Wh-questions, Yes/No-questions and declarative questions in Dutch can be automatically discriminated on the basis of global and local F0 -parameters. Global parameters were the slope and mean pitch of upper and lower trend lines that were fitted through F0 -curves; local parameters were onset and offset F0 of a termi-nal question-marking pitch rise. Results indicate that women mark the interrogative status of a sentence more often and perceptually more saliently. Generally, global downtrend parameters are better predictors of sentence type than parameters of the final rise.
ABSTRACT
This paper presents a work on the acquisition of the prosodic knowledge that will be incorporated in a Word Prosody agent of a distributed speech understanding system (MICRO). The multiagent architecture of MICRO, based on wholistic analytic double processing, is first described. MICRO uses prosody with a rather new view. This group of agents quickly produces information that will be used by the analytic pathway (acoustic-phonetic analysis, lexical access, syntactic and semantic analysis, ...) as anchor points or for lexical hypotheses filtering or sorting. We discuss the role of the Word Prosody agent in this architecture and the induced requirements for its design. Then, we present some experiments that were made in order to decipher the prosodic encoding of word boundaries and lexical categories.