Robust Speech Processing in Adverse Environments 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Performance Improvements Through Combining Phone- And Syllable-Scale Information In Automatic Speech Recognition

Authors:

Su-Lin Wu, International Computer Science Institute (USA)
Brian E.D. Kingsbury, International Computer Science Institute (USA)
Nelson Morgan, International Computer Science Institute (USA)
Steven Greenberg, International Computer Science Institute (USA)

Page (NA) Paper number 854

Abstract:

Combining knowledge derived from both syllable- (100-250 ms) and phone-length (40-100 ms) intervals in the automatic speech recognition process can yield performance superior to that obtained using information derived from a single time scale alone. The results are particularly pronounced for reverberant test conditions that have not been incorporated into the training set. In the present study, phone- and syllable-based systems are combined at three distinct levels of the recognition process --- the frame, the syllable and the entire utterance. Each strategy successfully integrates the complementary strengths of the individual systems, yielding a significant improvement in accuracy on a small-vocabulary, naturally spoken, telephone speech corpus. The syllable-level combination outperformed the other two methods under both relatively pristine and moderately reverberant acoustic conditions, yielding a 20-40% relative improvement over the baseline.

SL980854.PDF (From Author) SL980854.PDF (Rasterized)

TOP


Predictive Adaptation and Compensation for Robust Speech Recognition

Authors:

Arun C. Surendran, Bell Labs, Lucent Technologies (USA)
Chin-Hui Lee, Bell Labs, Lucent Technologies (USA)

Page (NA) Paper number 859

Abstract:

Earlier work in parametric modeling of distortions for robust speech recognition has focussed on estimating the distortion parameter using maximum likelihood and other techniques as a point in the parameter space, and treating this estimate as if it is the true value in a plug-in maximum a posteriori(MAP) decoder. This approach is deficient in most real environments where, due to many reasons, the value of the distortion parameter varies significantly. In this paper we introduce an approach which combines the power of parametric transformation and Bayesian prediction to solve this problem. Instead of approximating the distortion parameter with a point estimate, we average over its variation, thus taking into consideration the distribution of the parameter as well. This approach provides more robust performance than the conventional maximum-likelihood approach. It also provides the solution that minimizes the overall error given the distribution of the parameter. We present results to demonstrate the robustness and effectiveness of the predictive approach.

SL980859.PDF (From Author) SL980859.PDF (Rasterized)

TOP


Influence of the Speaking Style and the Noise Spectral Tilt on the Lombard Reflex and Automatic Speech Recognition

Authors:

Jean-Claude Junqua, Speech Technology Laboratory (USA)
Steven Fincke, Speech Technology Laboratory (USA)
Ken Field, Speech Technology Laboratory (USA)

Page (NA) Paper number 374

Abstract:

To study the Lombard reflex, more realistic databases representing real world conditions need to be recorded and analyzed. In this paper we 1) propose a procedure to record Lombard data which provides a good approximation of realistic conditions and 2) present a comparison between two sets of experiments where subjects are in communication with a device while listening to noise through open-ear headphones and where subjects are reading a list. By studying acoustic correlates of the Lombard reflex and performing off-line speaker-independent recognition experiments it is shown that the communication factor affects the Lombard reflex. We also show evidence that several types of noise differing mainly by their spectral tilt induce different acoustic changes. This result reinforces the notion that it is difficult to separate the speaker from the environment stressor (in this case the noise) when studying the Lombard reflex.

SL980374.PDF (From Author) SL980374.PDF (Rasterized)

TOP


Data-driven PMC and Bayesian Learning Integration for Fast Model Adaptation in Noisy Conditions

Authors:

Stefano Crafa, CSELT (Italy)
Luciano Fissore, CSELT (Italy)
Claudio Vair, CSELT (Italy)

Page (NA) Paper number 1140

Abstract:

In this paper, we present an integration of Data Driven Parallel Model Combination (DPMC) and Bayesian Learning into a fast and accurate framework which can be easily integrated in standard training and recognition systems. The original DPMC technique has been enhanced to avoid any modification of the acoustic models, as required by the original method. The Bayesian Learning estimation has been used in order to specialize a general noisy speech model (the a priori model) to the target acoustic environment, where the DPMC-generated observations are used as adaptation data. Thanks to these innovations, the proposed method can achieve better performance than the original DPMC, while consuming far less computational resources.

SL981140.PDF (From Author) SL981140.PDF (Rasterized)

TOP


Improving The Noise And Spectral Robustness Of An Isolated-Word Recognizer Using An Auditory-Model Front End

Authors:

Martin Hunke, San Francisco State University, San Francisco, CA (USA)
Meeran Hyun, San Francisco State University, San Francisco, CA (USA)
Steve Love, Meridian Speech Technology (USA)
Thomas Holton, San Francisco State University, San Francisco, CA (USA)

Page (NA) Paper number 715

Abstract:

In this study, the performance of an auditory-model feature-extraction 'front end' was assessed in an isolated-word speech recognition task using a common hidden Markov model (HMM) 'back end', and compared with the performance of other feature representation front-end methods including mel-frequency cepstral coefficients (MFCC) and two variants (J- and L-) of the relative spectral amplitude (RASTA) technique. The recognition task was performed in the presence of varying levels and types of additive noise and spectral distortion using standard HMM whole-word models with the Bellcore Digit database as a corpus. While all front ends achieved comparable recognition performance in clean speech, the performance of the auditory-model front end was generally significantly higher than other methods in recognition tasks involving background noise or spectral distortion. Training HMMs with speech processed by the auditory-model or L-RASTA front end in one type of noise also improved the recognition performance with other kinds of noise. This 'cross-training' effect did not occur with the MFCC or J-RASTA front end.

SL980715.PDF (From Author) SL980715.PDF (Rasterized)

TOP


A Model for Speech Reverberation and Intelligibility Restoring Filters

Authors:

Owen P. Kenny, Defence Science and Technology Organization (Australia)
Douglas J. Nelson, Department of Defence (USA)

Page (NA) Paper number 863

Abstract:

The problem of removing channel effects from speech has generally been attacked by attempting to recover a time-varying filter which inverts the entire channel impulse response. We show that human listeners are insensitive to many channel conditions and that the human ear seems to respond primarily to discontinu-ities of the channel. As a result of these observations, a partial equalization is proposed in which the channel effects to which the ear is sensitive may be removed, without full inversion of the channel. In addition, it is shown that it is possible to build filters of arbitrary length which do not reduce speech intelligibility and do not produce annoying artifacts.

SL980863.PDF (From Author) SL980863.PDF (Rasterized)

TOP