Session W2C Prosody and Speech Recognition/Understanding

Chairperson Jan van Santen Bell Labs Lucent, USA

Home

ESTIMATING PROSODIC WEIGHTS IN A SYNTACTIC-RHYTHMICAL PREDICTION SYSTEM

Authors: Philippe Langlais

LIA, 339 chemin des Meinajaries, BP 1228, 84911 Avignon Cedex 9, France langlais@univ-avignon.fr

Volume 3 pages 1467 - 1470

ABSTRACT

This paper concerns the study of information derived from the melodic, temporal and intensity characteristics of the material to be recognized in a speech recognition system, in French. More precisely, it describes experiments we achieved at the suprasegmental levels with a system that outperform automatic correlation between prosodic labels and linguistic organization of a message to decode. Firstly an overview of the system is described along with the results of experiments carried out to determine which prosodic indexes are best- suited for syntactic and rhythmycal prediction.

A0015.pdf

TOP

SYNTACTIC INFORMATION CONTAINED IN PROSODIC FEATURES OF JAPANESE UTTERANCES

Authors: Kazuhiko Ozeki, Kazuyuki Kousaka, and Yujie Zhang

The University of Electro-Communications 1-5-1 Chofugaoka, Chofu, Tokyo, 182 Japan fozeki, kousaka, zhangg@achilleus.cs.uec.ac.jp

Volume 3 pages 1471 - 1474

ABSTRACT

This paper is concerned with measuring the amount of syntactic information contained in prosodic features of Japanese utterances. Five prosodic features are employed, and the statistical relationship between those features and the inter-phrase dependency distance is estimated by using training data. Then parsing experiments are conducted in two different ways:one utilizing the posterior distribution of the interphrase dependency distance given the prosodic feature values, and the other without using such information. It has been shown that significant improvement in parsing accuracy is attained by utilizing the prosodic information, and that the duration of pause between adjacent phrases is more effective than prosodic features related to the fundamental frequency and the power.

A0450.pdf

TOP

HIERARCHICAL DURATION MODELLING FOR SPEECH RECOGNITION USING THE ANGIE FRAMEWORK

Authors: Grace Chung and Stephanie Seneff

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu, mailto:fgraceyc, seneg@mit.edu

Volume 3 pages 1475 - 1478

ABSTRACT

We describe a novel hierarchical duration model for speech recognition. The modelling scheme is based on the angie framework, a exible unified sublexical representation for speech applications. Our duration model captures contextual factors that in uence duration of sublexical units at multiple linguistic levels simultaneously, using both relative and absolute duration information. The modelling procedure involves a normalization scheme which produces a new measure for relative speaking rate at a word level. This may be used to explore phenomena in speech timing and we present studies on secondary effects of speaking rate here. This duration model demonstrates its ability to aid speech recognition in phonetic recognition experiments where it has yielded a relative improvement of up to 7.7%. In word spotting, a study employing duration as a post-processor in disambiguating between 2 acoustically similar keywords reduces relative error by 68%. Furthermore, a fully integrated duration model in an angie based word spotter improves performance by 21.5%. All gains are over and above any gains realized from standard phone duration models present in the baseline system. All experiments were conducted in the atis domain, using continuous spontaneous speech.

A0585.pdf

TOP

ON THE USE OF PROSODY IN A SPEECH-TO-SPEECH TRANSLATOR

Authors: Volker Strom (1), Anja Elsner (1), Wolfgang Hess (1), Walter Kasper (4), Alexandra Klein (2), Hans Ulrich Krieger (4), Jorg Spilker (3), Hans Weber (3), Gunther Gorz (3)

e-mail: vst@asl1.ikp.uni-bonn.de (1) Institute of Communications Research and Phonetics (IKP), University of Bonn, (2) University of Wien, Austrian Research Institute of Artificial Intelligence (3) University of Erlangen-Nurnberg, Computer Science Institute (AI) (4) German Research Center for AI, DFKI GmbH, Saarbrucken

Volume 3 pages 1479 - 1482

ABSTRACT

In this paper a speech-to-speech translator from German to English is presented. Beside the traditional processing steps it takes advantage of acoustically detected prosodic phrase boundaries and focus. The prosodic phrase boundaries reduce search space during syntactic parsing and rule out analysis trees during semantic parsing. The prosodic focus faciliates a "shallow" translation based on the best word chain in cases where the deep analysis fails.

A0733.pdf

TOP

AUTOMATIC RECOGNITION OF SENTENCE TYPE FROM PROSODY IN DUTCH

Authors: Vincent J. van Heuven*, Judith Haan** and Jos J.A. Pacilly*

*Phonetics Laboratory, Department of Linguistics and Holland Institute of Generative Linguistics, Leiden University Cleveringaplaats 1, PO Box 9515, 2300 RA Leiden, The Netherlands **Department of General Linguistics and Dialectology, Centre for Language Studies, Nijmegen University Erasmusplein 1, PO Box 9103, 6500 HD Nijmegen, The Netherlands

Volume 3 pages 1483 - 1486

ABSTRACT

This paper investigates to what extent statements, Wh-questions, Yes/No-questions and declarative questions in Dutch can be automatically discriminated on the basis of global and local F0 -parameters. Global parameters were the slope and mean pitch of upper and lower trend lines that were fitted through F0 -curves; local parameters were onset and offset F0 of a termi-nal question-marking pitch rise. Results indicate that women mark the interrogative status of a sentence more often and perceptually more saliently. Generally, global downtrend parameters are better predictors of sentence type than parameters of the final rise.

A0749.pdf

TOP

AUTOMATIC WORD DEMARCATION BASED ON PROSODY

Authors: Paul Munteanu, Bertrand Caillaud, Jean-François Serignat, Geneviève Caelen-Haumont

Laboratoire CLIPS/IMAG, CNRS, Université Joseph Fourier, INPG 38041 Grenoble CEDEX 9, France Tel : +33 4 76 51 45 26, Fax : +33 4 76 44 66 75

Volume 3 pages 1487 - 1490

ABSTRACT

This paper presents a work on the acquisition of the prosodic knowledge that will be incorporated in a Word Prosody agent of a distributed speech understanding system (MICRO). The multiagent architecture of MICRO, based on wholistic analytic double processing, is first described. MICRO uses prosody with a rather new view. This group of agents quickly produces information that will be used by the analytic pathway (acoustic-phonetic analysis, lexical access, syntactic and semantic analysis, ...) as anchor points or for lexical hypotheses filtering or sorting. We discuss the role of the Word Prosody agent in this architecture and the induced requirements for its design. Then, we present some experiments that were made in order to decipher the prosodic encoding of word boundaries and lexical categories.

A0910.pdf

Session W2C Prosody and Speech Recognition/Understanding

Chairperson Jan van Santen Bell Labs Lucent, USA

Authors: Philippe Langlais

LIA, 339 chemin des Meinajaries, BP 1228, 84911 Avignon Cedex 9, France langlais@univ-avignon.fr

Volume 3 pages 1467 - 1470

Authors: Kazuhiko Ozeki, Kazuyuki Kousaka, and Yujie Zhang

The University of Electro-Communications 1-5-1 Chofugaoka, Chofu, Tokyo, 182 Japan fozeki, kousaka, zhangg@achilleus.cs.uec.ac.jp

Volume 3 pages 1471 - 1474

Authors: Grace Chung and Stephanie Seneff

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu, mailto:fgraceyc, sene g@mit.edu

Volume 3 pages 1475 - 1478

Authors: Volker Strom (1), Anja Elsner (1), Wolfgang Hess (1), Walter Kasper (4), Alexandra Klein (2), Hans Ulrich Krieger (4), Jorg Spilker (3), Hans Weber (3), Gunther Gorz (3)

Volume 3 pages 1479 - 1482

Authors: Vincent J. van Heuven*, Judith Haan** and Jos J.A. Pacilly*

Volume 3 pages 1483 - 1486

Authors: Paul Munteanu, Bertrand Caillaud, Jean-François Serignat, Geneviève Caelen-Haumont

Laboratoire CLIPS/IMAG, CNRS, Université Joseph Fourier, INPG 38041 Grenoble CEDEX 9, France Tel : +33 4 76 51 45 26, Fax : +33 4 76 44 66 75

Volume 3 pages 1487 - 1490

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu, mailto:fgraceyc, seneg@mit.edu