ABSTRACT
In this paper a method for generating word pronunciation networks for speech recognition is proposed. The networks incorporate different acceptable pronunciation variants for each word. These variants are determined by applying pronunciation rules to the standard pronunciation of the words. Instead of a manual search, an automatic learning procedure is used to compose a sensible set of rules. The learning algorithm compairs the standard pronunciation of each utterance in a training corpus with its auditory transcription (i.e. 'how should it be pronounced' versus 'how was it actually pronounced'). It is shown that the latter transcription can be constructed with the assistance of a speech recognizer. Experimental results on a Dutch database and on TIMIT demonstrate that the pronunciation networks reduce the word error rate significantly.
ABSTRACT
This paper introduces a new approach for generation of phonetic transcriptions for voice dialing applications. where on-line construction of user vocabularies is mandatory. The proposed method allows adaptive selection of new transcriptions requiring much less speech utterances for system training than other approaches. The new approach is compared to other classical approaches showing a clear improvement on performance and efficiency.
ABSTRACT
We describe experiments in modelling the dynamics of fluent speech in which word pronunciations are modified by neighbouring context. Based on all-phone decoding of large volumes of training data, we automatically derive new word pronunciation, and context-dependent transformation rules for phone sequences. In contrast to existing techniques, the rules can be applied even to words not in the training set, and across word boundaries, thus modelling context-dependent behavior. We use the technique on the Wall Street Journal (WSJ) training data and apply the new pronunciations and rules to WSJ and broadcast news tests. The changes correct a significant portion of the errors they could potentially correct. But the transformations introduce a comparable number of new errors, indicating that perhaps stronger constraints on the application of such rules are needed.
ABSTRACT
In this paper, we propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network for words that only occur a few times in the database and even for unseen words. Experimental results on spontaneous speech show that the automatically-derived pronunciation dictionaries give consistently higher recognition rates and require less computational time for recognition than a conventional dictionary.
ABSTRACT
We motivate the integration of a probabilistic pronunciation model into a system for recognizing spontaneous speech and propose a possible architecture of such a model. In order to develop an environment for experiments, a simplified version employing constrained phone recognition and discrete syllable-size HMM subword units was implemented and evaluated. Although the results are still significantly worse than those achieved by our "conventional" word recognizer, they are encouraging given that the experimental system is only a coarse approximation of the proposed approach.
ABSTRACT
We propose and test a practical means of finding poor pronunciations and missing variants for large lexicons. We do so by statistically assessing the confidence of each phone in each pronunciation and comparing it with the statistical distribution of the same confidence metric for corresponding phones over the entire training corpus. A phone is targeted for correction for each word in which its mean score is significantly less than the phone's mean score over the entire training corpus. Neighboring phones are also reviewed for their contribution to the target phone's poor score. Thus far, we have experimented with this technique by manually correcting the pronunciation. In experiments with Wall Street Journal and dictated physical examination corpora, word error rates were reduced commensurate with the number of dictionary entries whose pronunciations were corrected as result of this process.