Robust Speech Processing in Adverse Environments 1

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Robust Speech Recognition using HMM's with Toeplitz State Covariance matrices

Authors:

William J.J. Roberts, Defence Science Technology Organisation (Australia)
Yariv Ephraim, George Mason University (USA)

Page (NA) Paper number 141

Abstract:

Hidden Markov modeling of speech waveforms is studied and applied to speech recognition of clean and noisy signals. Signal vectors in each state are assumed Gaussian with zero mean and a Toeplitz covariance matrix. This model allows short signal vectors and thus is useful for speech signals with rapidly changing second order statistics. It can also be straightforwardly adapted to noisy signals especially when the noise is additive and independent of the signal. Since no closed form solution exists for the maximum likelihood estimate of the Toeplitz covariance matrices, an expectation-maximization procedure was used and efficiently implemented. HMM's with Toeplitz as well as asymptotically Toeplitz (e.g., circulant, autoregressive) covariance matrices are theoretically and experimentally studied. While asymptotically all of these matrices provide similar performance, they differ significantly when the frame length is finite. Recognition results are provided for clean and noisy signals at 0-30dB SNR.

SL980141.PDF (From Author) SL980141.PDF (Rasterized)

TOP


Modeling of Output Probability Distribution to Improve Small Vocabulary Speech Recognition in Adverse Environments

Authors:

David Thambiratnam, Queensland University of Technology (Australia)
Sridha Sridharan, Queensland University of Technology (Australia)

Page (NA) Paper number 308

Abstract:

This paper presents a solution to the adverse environment, open microphone problem, by using the information stored in HMM output probability distributions to obtain a confidence measure of the results. This information can also be used to perform a secondary classification and improve recognition results. The system was tested on data from the TI46 database that had been corrupted by noise from the NOISEX-92 database, as well as on real-world data, and shows promising results.

SL980308.PDF (From Author) SL980308.PDF (Rasterized)

TOP


Robust and Compact Multilingual Word Recognizers Using Features Extracted from a Phoneme Similarity Front-End

Authors:

Philippe Morin, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Ted H. Applebaum, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Robert Boman, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Yi Zhao, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Jean-Claude Junqua, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)

Page (NA) Paper number 402

Abstract:

In this paper we characterize the sensitivity of two speaker-dependent isolated word recognizers toward several kinds of variability and distortions; namely noise, channels, distance to microphone and target language. Both recognizers use a phoneme similarity acoustic front-end as a rich representation for speech from which reliable features are extracted. A cross-correlation test showed that a phoneme similarity front-end is more robust to variability and distortions (especially intra-speaker variability) than a LPC cepstral front-end. The first recognizer (Condor) uses a frame-based approach while the second (Pasha) uses the phoneme similarity information contained in a small number of speech segments. The two recognition methods are presented with a special emphasis on the robustness improvements and computational trade-offs that have been made. Experimental results are reported for car noise at different speeds, speakerphone versus handset input in an office environment and several target languages. Recognition accuracy greater than 94% was achieved in a car environment at 60 mph (Condor) and recognition accuracy greater than 95% was achieved for speakerphone input at a distance of 50 cm. in an office environment.

SL980402.PDF (From Author) SL980402.PDF (Rasterized)

TOP


An Effect of Adaptive Beamforming on Hands-Free Speech Recognition Based on 3-D Viterbi Search

Authors:

Takeshi Yamada, Graduate School of Information Science, Nara Institute of Science and Technology (Japan)
Satoshi Nakamura, Graduate School of Information Science, Nara Institute of Science and Technology (Japan)
Kiyohiro Shikano, Graduate School of Information Science, Nara Institute of Science and Technology (Japan)

Page (NA) Paper number 484

Abstract:

To integrate the microphone array processing into speech recognition, we have proposed a speech recognition algorithm based on 3-D Viterbi search, which localizes a target talker considering the likelihood of HMMs (Hidden Markov Models) while performing speech recognition. The performance of the 3-D Viterbi search method depends on the improvement of the SNR (Signal to Noise Ratio) by the beamforming technique. This paper proposes a novel method based on an adaptive beamforming technique instead of the delay-and-sum beamformer used in our previous study. The speaker-dependent isolated-word recognition experiments were carried out on real environment data to evaluate the effect of the adaptive beamformer. These results showed that the adaptive beamformer drastically improves the recognition performance both for a fixed-position talker and for a moving-talker.

SL980484.PDF (From Author) SL980484.PDF (Rasterized)

TOP


Coherence-based Subband Decomposition for Robust Speech and Speaker Recognition in Noisy and Reverberant Rooms

Authors:

Joaquin González-Rodríguez, DIAC- Universidad Politecnica de Madrid (Spain)
Santiago Cruz-Llanas, DIAC- Universidad Politecnica de Madrid (Spain)
Javier Ortega-García, DIAC- Universidad Politecnica de Madrid (Spain)

Page (NA) Paper number 64

Abstract:

In this paper, the acoustic characteristics of sound fields in enclosed rooms are studied in the joint presence of speech and noise, in order to design a broadband microphone array system capable of coping with both coherent and diffuse noises. Several state-of-the-art speech enhancement array structures are presented and compared to our new system in terms of correct word recognition rates in a simple command and control task. The proposed structure, based on a broadband subband-nested array, performs real-time estimations of the spatial coherence in order to determine the coherent/diffuse nature of the different subbands, using different filters in each case, improving also the classical Wiener post-filter, typically used for diffuse noise supression, for proper cancellation of coherent noises. The results obtained with a 15-channel simultaneous recording database in different reverberation and noise conditions show better performance than other structures previously proposed.

SL980064.PDF (From Author) SL980064.PDF (Rasterized)

TOP


A Minimax Search Algorithm for CDHMM based Robust Continuous Speech Recognition

Authors:

Hui Jiang, The University of Tokyo (Japan)
Keikichi Hirose, The University of Tokyo (Japan)
Qiang Huo, The University of Hong Kong (China)

Page (NA) Paper number 693

Abstract:

In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov model based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of its intrinsic nature of a recursive search, the proposed method can be easily extended to perform continuous speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and efficiency of the proposed minimax search algorithm.

SL980693.PDF (From Author) SL980693.PDF (Rasterized)

TOP