Session Th2A Hybrid Systems for ASR

Chairperson Shigeki Sagayama NTT Human Interface Labs, Japan

Home

MATCHING TRAINING AND TESTING CRITERIA IN HYBRID SPEECH RECOGNITION SYSTEMS

Authors: Xin Tu , Yonghong Yan, Ron Cole

E-mail: (xintu/yan/cole)@cse.ogi.edu Center for Spoken Language Understanding, Oregon Graduate Institute P.O. Box 91000, Portland, OR 97291-1000, USA

Volume 4 pages 1943 - 1946

ABSTRACT

Inconsistency between training and testing criteria is a drawback of the hybrid artifcial neural network and hidden Markov model (ANN/HMM) approach to speech recognition. This paper presents an effective method to address this problem by modifying the feedforward neural network training paradigm. Word errors are explicitly incorporated in the training procedure to achieve improved word recognition accuracy. Experiments on a continuous digit database show a reduction in word error rate of more than 17% using the proposed method.

A0108.pdf

TOP

CONTEXT INDEPENDENT AND CONTEXT DEPENDENT HYBRID HMM/ANN SYSTEMS FOR VOCABULARY INDEPENDENT TASKS

Authors: S. Dupont , C. Ris, O. Deroo, V. Fontaine, J.M. Boite & L. Zanoni

Faculte Polytechnique de Mons | TCTS 31, Bld. Dolez B-7000 Mons, Belgium Email: dupont,ris,deroo,fontaine,boite,zanoni@tcts.fpms.ac.be

Volume 4 pages 1947 - 1950

ABSTRACT

In this paper, hybrid HMM/ANN systems are used to model context dependent phones. In order to reduce the number of parameters as well as to better catch the dynamics of the phonetic segments, we combine (context dependent) diphone models with context independent phone models. Transitions from phone to phone are modeled as generalized context dependent distributions while phonetic units are context independent models trained on the less coarticulated middle part of each phone. Words are thus modeled as a sequence of probability distributions alternatively representing the middle part of the phonemes and the transitions from phone to phone. A single neural network is used to estimate both context independent phone probabilities and generalized context dependent diphone (phone to phone transition) probabilities. Resulting systems are compared to classical context independent phone-based HMM/ANN systems with the same number of parameters. The Phonebook isolated word database has been used for training the systems. Testing is done on small (75 words), medium (600 words) and large (8000 words) lexicons. Test words were not present in the training vocabulary.

A0213.pdf

TOP

ESTIMATION OF GLOBAL POSTERIORS AND FORWARD-BACKWARD TRAINING OF HYBRID HMM/ANN SYSTEMS

Authors: J. Hennebert(5,2), C. Ris(1), H. Bourlard(3,2), S. Renals(4) and N. Morgan(2)

(1) TCTS, FPMs, B-7000 Mons, Belgium (2) ICSI, Berkeley CA 94704, USA (3) IDIAP, 1920 Martigny, Switzerland (4) Computer Science, University of Sheffield, Sheffield S1 4DP, UK (5) CIRC, EPFL, 1015 Lausanne, Switzerland

Volume 4 pages 1951 - 1954

ABSTRACT

The results of our research presented in this paper are two-fold. First, an estimation of global posteriors is formalized in the framework of hybrid HMM/ANN systems. It is shown that hybrid HMM/ANN systems, in which the ANN part estimates local posteriors, can be used to modelize global model posteriors. This formalization provides us with a clear theory in which both REMAP and \classical" Viterbi trained hybrid systems are unied. Second, a new forward- backward training of hybrid HMM/ANN systems is derived from the previous formulation. Comparisons of performance between Viterbi and forward- back- ward hybrid systems are presented and discussed.

A0233.pdf

TOP

CONFIDENCE MEASURES FOR HYBRID HMM/ANN SPEECH RECOGNITION

Authors: Gethin Williams and Steve Renals

Dept. of Computer Science, University of Sheffield, Sheffield S1 4DP, UK g.williams,s.renals @dcs.shef.ac.uk

Volume 4 pages 1955 - 1958

ABSTRACT

In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word levels, using the North American Business News corpus.

A0237.pdf

TOP

ENSEMBLE METHODS FOR CONNECTIONIST ACOUSTIC MODELLING

Authors: G.D. Cook S.R. Waterhouse A.J. Robinson

Cambridge University Engineering Department Trumpington Street, Cambridge, UK.

Volume 4 pages 1959 - 1962

ABSTRACT

In this paper we investigate a number of ensemble methods for improving the performance of connectionist acoustic models for large vocabulary continuous speech recognition. We discuss boosting, a data selection technique which results in an ensemble of models, and mixtures-of- experts. These techniques have been applied to multi- layer perceptron acoustic models used to build a hy- brid connectionist-HMM speech recognition system. We present results on a number of ARPA benchmark tasks, and show that the ensemble methods lead to considerable improvements in recognition accuracy.

A0400.pdf

TOP

IMPROVING PERFORMANCE ON SWITCHBOARD BY COMBINING HYBRID HME/HMM AND MIXTURE OF GAUSSIANS ACOUSTIC MODELS

Authors: Jurgen Fritsch, Michael Finke

{fritsch,finkem}@ira.uka.de Interactive Systems Laboratories University of Karlsruhe --- Germany Carnegie Mellon University --- USA

Volume 4 pages 1963 - 1966

ABSTRACT

This paper presents results of our efforts on combining standard mixture of Gaussians acoustic modeling [10] with a context-dependent hybrid connectionist HME/HMM architecture [3, 4] for the Switchboard corpus. Using a score normalization scheme which is independent of the stream's modeling paradigm and adaptive methods for combining multiple probability distributions, we achieve a relative decrease in word error rate of 3.5% and 9.3%, compared to each of the single stream systems. As opposed to multiple acoustic streams based on mixture of Gaussians, the integration of hybrid NN/HMM based modeling appears to be advantageous since the differences in modeling techniques and training algorithms allow to capture different aspects of the speech signal. Small dependence among emission probability estimates is considered essential for potential gains in interpolated systems.

A0931.pdf