ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - SP14 |
 |
SP14.1
|
Discriminative Training of Hidden Markov Models Using a Classification Measure Criterion
C. Chesta,
A. Girardi,
P. Laface (Politecnico di Torino, Italy);
M. Nigra (CSELT Torino, Italy)
This paper proposes the optimization of a non standard objective function in the framework of Maximum Mutual Information Estimation (MMIE). In contrast with the classical MMIE estimation, where only misrecognized training utterances contribute to the optimization process, the contributions of near-miss classifications are naturally embedded in the maximization of the proposed function because it takes into account a non linear combination of the probabilities of the competing models that can be tuned by means of a single parameter. This corrective training procedure has been applied to an Isolated Word Recognition task leading to significant performance improvements with respect to Maximum Likelihood Estimation and MMIE.
|
SP14.2
|
A Discriminant Measure for Model Complexity Estimation
L. Bahl,
M. Padmanabhan (IBM, USA)
We present a new discriminant measure that can be used to determine the "goodness" of acoustic models in speech recognition system, and identify shortcomings in the model. We have used this measure to adapt the complexity of the acoustic model. In general, speech recognition systems model phones or sub-phonetic units with mixtures of Gaussians where the number of components in the mixture are chosen using some simple rule-of-thumb. The new measure is used to select the number of mixture components in a more objective fashion, and provides improvements in the error performance.
|
SP14.3
|
Natural Number Recognition Using MCE Trained Inter-Word Context Dependent Acoustic Models
M. Gandhi,
J. Jacob (Lucent Technologies, USA)
Among applications that require number recognition, the focus has largely been on connected digit recognizers. In this paper, we introduce an acoustic model topology for natural number recognition by using minimum classification error (MCE) training of inter-word context dependent models of the head-body-tail (HBT) type. Experimental results on natural number applications involving dollar amounts and U.S. telephone numbers show that using HBT models for natural number data reduces string error rates by as much as 25% over context independent whole-word models. In addition, for speech input which is strictly of connected digit type, the increase in string error rates is negligible when a natural number telephone grammar is used instead of a connected digit telephone grammar. This will enable natural number speech recognition systems to be more widely accepted because recognition accuracy is maintained while permitting a more natural and flexible user interface.
|
SP14.4
|
Deterministically Annealed Design of Speech Recognizers and its Performance on Isolated Letters
A. Rao,
K. Rose,
A. Gersho (University of California, Santa Barbara, USA)
We attack the general problem of HMM-based speech recognizer design, and in particular, the problem of isolated letter recognition in the presence of background noise. The standard design method based on maximum likelihood (ML) is known to perform poorly when applied to isolated letter recognition.The more recent minimum classification error (MCE) approach directly targets the ultimate design criterion and offers substantial improvements over the ML method. However, the standard MCE method relies on gradient descent optimization which is susceptible to shallow local minima traps. In this paper, we propose to overcome this difficulty with a powerful optimization method based on deterministic annealing (DA). The DA method minimizes a randomized MCE cost subject to a constraint on the level of entropy which is gradually relaxed. It may be derived based on information-theoretic or statistical physics principles. DA has a low implementation complexity and outperforms both standard ML and the gradient descent based MCE algorithm by a factor of 1.5 to 2.0 on the benchmark CSLU spoken letter database. Further, the gains are maintained under a variety of background noise conditions.
|
SP14.5
|
Speaker Adaptation for Hybrid MMI/Connectionist Speech Recognition Systems
J. Rottland,
C. Neukirchen,
G. Rigoll (Duisburg University, Germany)
In this paper we present a new adaptation technique for our hybrid large vocabulary continuous speech recognition system. In most adaptation approaches the HMM parameters are reestimated. In our approach, however, we train a speaker independent continuous speech recognizer, then we keep the HMM parameters fixed and we train a second network, which transforms the features of the adaptation data to fit the HMM parameters. Thus, less parameters have to be estimated, and therefore this approach performs well even for a small number of adaptation data. With this approach we achieve relative improvements in recognition rates on the Wall Street Journal (WSJ) task of 16.5%.
|
SP14.6
|
Maximum Mutual Information Based Reduction Strategies for Cross-Correlation Based Joint Distributional Modeling
J. Bilmes (ICSI, USA)
In maximum-likelihood based speech recognition systems, it is important to accurately estimate the joint distribution of feature vectors given a particular acoustic model. In previous work, we showed we can boost accuracy in this task by modeling the joint distribution of time-localized feature vectors along with statistics relating those feature vectors to their surrounding context. In this work, we evaluate information preserving reduction strategies for those statistics. We claim that those statistics corresponding to spectro-temporal loci in speech with relatively large mutual information are most useful in estimating the information contained in the feature-vector joint distribution. Furthermore, we claim that such statistics are most likely to generalize. Using an EM algorithm to compute mutual information between pairs of points in the time-frequency grid, we verify these hypotheses using both overlap plots and speech recognition word error results.
|
SP14.7
|
Experiments of HMM Adaptation for Hands-free Connected Digit Recognition
D. Giuliani,
M. Matassoni,
M. Omologo,
P. Svaizer (ITC-IRST, Italy)
A scenario concerning hands-free connected digit recognition in a noisy office environment is investigated. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a Hidden Markov Model (HMM) based recognizer. Two different techniques of phone HMM adaptation have been considered, to reduce the mismatch between training and test conditions. Adaptation material and test material were collected in two different sessions. Results show that a digit accuracy close to 98% can be achieved when the talker is at 1.5 m distance from the array. This result has to be compared with 99.5% accuracy obtained by using a close-talk microphone.
|
SP14.8
|
Task Independent Minimum Confusibility Training for Continuous Speech Recognition
A. Nogueiras-Rodríguez,
J. Mariño (Universitat Politecnica de Catalunya, Spain)
In this paper, a task independent discriminative training framework for subword units based continuous speech recognition is presented. Instead of aiming at the optimisation of any task independent figure, say the phone classification or recognition rates, we focus our attention to the reduction of the number of errors committed by the system when a task is defined. This consideration leads to the use of a segmental approach based on the minimisation of the confusibility over short chains of subword units. Using this framework, a reduction of 32% in the string error rate may be achieved in the recognition of unknown length digit strings using task independent phone like units.
|
< Previous Abstract - SP13 |
SP15 - Next Abstract > |
|