Large Vocabulary Continuous Speech Recognition 4

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

High Resolution Decision Tree based Acoustic Modeling beyond CART

Authors:

Wu Chou, Bell Labs., Lucent Technologies (USA)
Wolfgang Reichl, Bell Labs., Lucent Technologies (USA)

Page (NA) Paper number 607

Abstract:

In this paper, an m-level optimal subtree based phonetic decision tree clustering algorithm is described. Unlike prior approaches, the m-level optimal subtree in the proposed approach is to generate log likelihood estimates using multiple mixture Gaussians for phonetic decision tree based state tying. It provides a more accurate model of the log likelihood variations in node splitting and it is consistent with the acoustic space partition introduced by the set of phonetic questions applied during the decision tree state tying process. In order to reduce the algorithmic complexity, a caching scheme based on previous search results is also described. It leads to a significant speed up of the m-level optimal subtree construction without degradation of the recognition performance, making the proposed approach suitable for large vocabulary speech recognition tasks. Experimental results on a standard (Wall Street Journal) speech recognition task indicate that the proposed m-level optimal subtree approach outperforms the conventional approach of using single mixture Gaussians in phonetic decision tree based state tying.

SL980607.PDF (From Author) SL980607.PDF (Rasterized)

TOP


Unsupervised Training of a Speech Recognizer Using TV Broadcasts

Authors:

Thomas Kemp, ISL, University of Karlsruhe (Germany)
Alex Waibel, ISL, University of Karlsruhe (Germany)

Page (NA) Paper number 758

Abstract:

Current speech recognition systems require large amounts of expensive transcribed data for parameter estimation. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with untranscribed TV newscast recordings. The newscasts were automatically segmented into segments of similar acoustic background condition. We develop a training scheme, where a recognizer is bootstrapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is necessary to use a confidence measure to judge the initial transcriptions of the recognizer before using them. Higher improvements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the beneficial effect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.

SL980758.PDF (From Author) SL980758.PDF (Rasterized)

TOP


A New Method to Achieve Fast Acoustic Matching for Speech Recognition

Authors:

Clark Z. Lee, INRS-Telecommunications (Canada)
Douglas O'Shaughnessy, INRS-Telecommunications (Canada)

Page (NA) Paper number 208

Abstract:

For large vocabulary continuous speech recognition based on hidden Markov models, we often face the issue of trade-off between the accuracy and the speed. A new method is proposed in this article such that complex models are used to retain a high accuracy whereas the speed is achieved by using the similarities in acoustic matches. These similarities are based on the assumption that we refer as a look-phone-context property. By using the look-phone-context property, the number of acoustic matches can be substantially reduced in the course of scoring all possible phonetic transcriptions of recognition hypotheses. Experiments on the speaker-independent Wall Street Journal task show that a fast-response system can be reached without compromising the accuracy.

SL980208.PDF (From Author) SL980208.PDF (Rasterized)

TOP


Improved Parameter Tying for Efficient Acoustic Model Evaluation in Large Vocabulary Continuous Speech Recognition

Authors:

Jacques Duchateau, Katholieke Universiteit Leuven - ESAT (Belgium)
Kris Demuynck, Katholieke Universiteit Leuven - ESAT (Belgium)
Dirk Van Compernolle, Lernout and Hauspie Speech Products (Belgium)
Patrick Wambacq, Katholieke Universiteit Leuven - ESAT (Belgium)

Page (NA) Paper number 161

Abstract:

In an HMM based large vocabulary continuous speech recognition system, the evaluation of - context dependent - acoustic models is very time consuming. In Semi-Continuous HMMs, a state is modelled as a mixture of elementary - generally gaussian - probability density functions. Observation probability calculations of these states can be made faster by reducing the size of the mixture of gaussians used to model them. In this paper, we propose different criteria to decide which gaussians should remain in the mixture for a state, and which ones can be removed. The performance of the criteria is compared on context dependent tied state models using the WSJ recognition task. Our novel criterion, which decides to remove a gaussian in a state if it is based on too few acoustic data, outperforms the other described criteria.

SL980161.PDF (From Author) SL980161.PDF (Rasterized)

TOP


A New Look at HMM Parameter Tying for Large Vocabulary Speech Recognition

Authors:

Ananth Sankar, SRI International (USA)

Page (NA) Paper number 193

Abstract:

Most current state-of-the-art large-vocabulary continuous speech recognition (LVCSR) systems are based on state-clustered hidden Markov models (HMMs). Typical systems use thousands of state clusters, each represented by a Gaussian mixture model with a few tens of Gaussians. In this paper, we show that models with far more parameter tying, like phonetically tied mixture (PTM) models, give better performance in terms of both recognition accuracy and speed. In particular, we achieved between a 5 and 10% improvement in word error rate, while cutting the number of Gaussian distance computations in half, for three different Wall Street Journal (WSJ) test sets, by using a PTM system with 38 phone-class state clusters, as compared to a state-clustered system with 937 state clusters. For both systems, the total number of Gaussians was fixed at about 30,000. This result is of real practical significance as we show that a conceptually simpler PTM system can achieve faster and more accurate performance than current state-of-the-art state-clustered HMM systems.

SL980193.PDF (From Author) SL980193.PDF (Rasterized)

TOP


Factor Analysis Invariant to Linear Transformations of Data

Authors:

Ramesh A. Gopinath, IBM T. J. Watson Research (USA)
Bhuvana Ramabhadran, IBM T. J. Watson Research (USA)
Satya Dharanipragada, IBM T. J. Watson Research (USA)

Page (NA) Paper number 397

Abstract:

Modeling data with Gaussian distributions is an important statistical problem. To obtain robust models one imposes constraints the means and covariances of these distributions. Constrained ML modeling implies the existence of optimal feature spaces where the constraints are more valid. This paper introduces one such constrained ML modeling technique called factor analysis invariant to linear transformations(FACILT) which is essentially factor analysis in optimal feature spaces. FACILT is a generalization of several existing methods for modeling covariances. This paper presents an EM algorithm for FACILT modeling.

SL980397.PDF (From Author) SL980397.PDF (Rasterized)

TOP