Session Th4A Pronunciation Models

Chairperson Jean Paul Haton CRIN/CNRS-INRIA, France

Home

AUTOMATIC RULE-BASED GENERATION OF WORD PRONUNCIATION NETWORKS

Authors: Nick Cremelie and Jean-Pierre Martens

ELIS, University of Gent, St.-Pietersnieuwstraat 41, B-9000 Gent (Belgium) E-mail: cremelie@elis.rug.ac.be

Volume 5 pages 2459 - 2462

ABSTRACT

In this paper a method for generating word pronunciation networks for speech recognition is proposed. The networks incorporate different acceptable pronunciation variants for each word. These variants are determined by applying pronunciation rules to the standard pronunciation of the words. Instead of a manual search, an automatic learning procedure is used to compose a sensible set of rules. The learning algorithm compairs the standard pronunciation of each utterance in a training corpus with its auditory transcription (i.e. 'how should it be pronounced' versus 'how was it actually pronounced'). It is shown that the latter transcription can be constructed with the assistance of a speech recognizer. Experimental results on a Dutch database and on TIMIT demonstrate that the pronunciation networks reduce the word error rate significantly.

A0099.pdf

TOP

CREATING USER DEFINED NEW VOCABULARIES FOR VOICE DIALING

Authors: José María Elvira, Juan Carlos Torrecilla, Javier Caminero.

Speech Technology Group Telefónica Investigación y Desarrollo, Emilio Vargas 6, 28043 Madrid, Spain e-mail: (chema, jcarlos, jcam)@craso.tid.es

Volume 5 pages 2463 - 2466

ABSTRACT

This paper introduces a new approach for generation of phonetic transcriptions for voice dialing applications. where on-line construction of user vocabularies is mandatory. The proposed method allows adaptive selection of new transcriptions requiring much less speech utterances for system training than other approaches. The new approach is compared to other classical approaches showing a clear improvement on performance and efficiency.

A0167.pdf

TOP

AUTOMATIC GENERATION OF CONTEXT-DEPENDENT PRONUNCIATIONS

Authors: Ravishankar, M. and Eskenazi, M.

School of Computer Science Carnegie Mellon University, Pittsburgh, PA-15213, USA. Tel. +1 412 268 3344, FAX: +1 412 268 5576, E-mail: rkm@cs.cmu.edu

Volume 5 pages 2467 - 2470

ABSTRACT

We describe experiments in modelling the dynamics of fluent speech in which word pronunciations are modified by neighbouring context. Based on all-phone decoding of large volumes of training data, we automatically derive new word pronunciation, and context-dependent transformation rules for phone sequences. In contrast to existing techniques, the rules can be applied even to words not in the training set, and across word boundaries, thus modelling context-dependent behavior. We use the technique on the Wall Street Journal (WSJ) training data and apply the new pronunciations and rules to WSJ and broadcast news tests. The changes correct a significant portion of the errors they could potentially correct. But the transformations introduce a comparable number of new errors, indicating that perhaps stronger constraints on the application of such rules are needed.

A0600.pdf

TOP

AUTOMATIC GENERATION OF A PRONUNCIATION DICTIONARY BASED ON A PRONUNCIATION NETWORK

Authors: Toshiaki Fukada Yoshinori Sagisaka

ATR Interpreting Telecommunications Research Laboratories 2{2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02 Japan Tel: +81 774 95 1301, FAX: +81 774 95 1308, E-mail: fukada@itl.atr.co.jp

Volume 5 pages 2471 - 2474

ABSTRACT

In this paper, we propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network for words that only occur a few times in the database and even for unseen words. Experimental results on spontaneous speech show that the automatically-derived pronunciation dictionaries give consistently higher recognition rates and require less computational time for recognition than a conventional dictionary.

A0798.pdf

TOP

WHAT IS WRONG WITH THE LEXICON – AN ATTEMPT TO MODEL PRONUNCIATIONS PROBABILISTICALLY

Authors: Uwe Jost, Henrik Heine and Gunnar Evermann

University of Hamburg Vogt-Kolln-Str. 30 D-22527 Hamburg jost|heine|3everman @informatik.uni-hamburg.de

Volume 5 pages 2475 - 2478

ABSTRACT

We motivate the integration of a probabilistic pronunciation model into a system for recognizing spontaneous speech and propose a possible architecture of such a model. In order to develop an environment for experiments, a simplified version employing constrained phone recognition and discrete syllable-size HMM subword units was implemented and evaluated. Although the results are still significantly worse than those achieved by our "conventional" word recognizer, they are encouraging given that the experimental system is only a coarse approximation of the proposed approach.

A0832.pdf

TOP

Lexical Tuning Based On Triphone Confidence Estimation

Authors: K.L. Markey W. Ward

Berdy Medical Systems 4909 Pearl East Circle, Suite 202 Boulder, Colorado, USA 80301 Tel. 303-417-1603, FAX 303-417-1662, E-mail: markey@berdy.com Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA, USA 15213 Tel. 303-442-8807, FAX 303-417-1662, E-mail: whw@cs.cmu.edu

Volume 5 pages 2479 - 2482

ABSTRACT

We propose and test a practical means of finding poor pronunciations and missing variants for large lexicons. We do so by statistically assessing the confidence of each phone in each pronunciation and comparing it with the statistical distribution of the same confidence metric for corresponding phones over the entire training corpus. A phone is targeted for correction for each word in which its mean score is significantly less than the phone's mean score over the entire training corpus. Neighboring phones are also reviewed for their contribution to the target phone's poor score. Thus far, we have experimented with this technique by manually correcting the pronunciation. In experiments with Wall Street Journal and dictated physical examination corpora, word error rates were reduced commensurate with the number of dictionary entries whose pronunciations were corrected as result of this process.

A0896.pdf