Robust Speech Processing in Adverse Environments 3

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Linear and Nonlinear Speech Feature Analysis for Stress Classification

Authors:

Guojun Zhou, Robust Speech Processing Lab; Duke Univ. (USA)
John H.L. Hansen, Robust Speech Processing Lab; Duke Univ. (USA)
James F. Kaiser, Robust Speech Processing Lab; Duke Univ. (USA)

Page (NA) Paper number 840

Abstract:

Many stressful environments can deteriorate the performance of speech recognition systems such as aircraft cockpits or high workload task stress/emotional situations. To address this, we investigate a number of linear and nonlinear features and processing methods for stressed speech classification. The linear features include properties of pitch, duration, intensity, glottal source, and the vocal tract spectrum. Nonlinear processing is based on our newly proposed Teager Energy Operator speech feature which incorporates frequency domain critical band filters and properties of the resulting TEO autocorrelation envelope. In this study, we employ a Bayesian hypothesis testing and a hidden Markov model processor as classification methods. Evaluations focused on speech under loud, angry, and the Lombard effect from the SUSAS database. Results using ROC curves and EER based detection show that pitch is the best of the five linear features for stress classification; while the new nonlinear TEO-based feature outperforms the best linear feature by +5.2%, with a reduction in classification rate variability from 8.66 to 3.90.

SL980840.PDF (From Author) SL980840.PDF (Rasterized)

TOP


Speech Feature Modeling for Robust Stressed Speech Recognition

Authors:

Sahar E. Bou-Ghazale, Rockwell (USA)
John H.L. Hansen, Robust Speech Processing Lab; Duke Univ. (USA)

Page (NA) Paper number 918

Abstract:

It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard effect. This study evaluates the effectiveness of traditional features in recognition of speech under stress and formulates new features which are shown to improve stressed speech recognition. The focus is on formulating robust features which are less dependent on the speaking conditions rather than applying compensation or adaptation techniques. The stressed speaking styles considered are simulated angry and loud, Lombard effect speech, and noisy actual stressed speech from the SUSAS database. In addition, this study investigates the immunity of LP and FFT power spectrum to the presence of stress. Our results show that unlike FFT's immunity to noise, the LP power spectrum is more effective than the FFT to stress as well as to a combination of a noisy and stressful environment. Two alternative frequency partitioning methods (M-MFCC, ExpoLog) are proposed and compared with traditional MFCC features for stressed speech recognition. It is shown that the alternate filterbank frequency partitions are more effective for recognition of speech under both simulated and actual stressed conditions.

SL980918.PDF (From Author) SL980918.PDF (Rasterized)

TOP


Combining Articulatory and Acoustic Information for Speech Recognition in Noisy and Reverberant Environments

Authors:

Katrin Kirchhoff, University of Bielefeld (Germany)

Page (NA) Paper number 873

Abstract:

Robust speech recognition under varying acoustic conditions may be achieved by exploiting multiple sources of information in the speech signal. In addition to an acoustic signal representation, we use an articulatory representation consisting of pseudo-articulatory features as an additional information source. Hybrid ANN/HMM recognizers using either of these representations are evaluated on a continuous numbers recognition task (OGI Numbers95) under clean, reverberant and noisy conditions. An error analysis of preliminary recognition results shows that the different representations produce qualitatively different errors, which suggests a combination of both representations. We investigate various combination possibilities at the phoneme estimation level and show that significant improvements can been achieved under all three acoustic conditions.

SL980873.PDF (From Author) SL980873.PDF (Rasterized)

TOP


Improving Speaker Identification Performance in Reverberant Conditions using Lip Information

Authors:

Timothy Wark, Speech Research Laboratory, QUT (Australia)
Sridha Sridharan, Speech Research Laboratory, QUT (Australia)

Page (NA) Paper number 294

Abstract:

This paper considers the improvement of speaker identification performance in reverberant conditions using additional lip information. Automatic speaker identification (ASI) using speech characteristics alone can be highly successful, however problems occur with mismatches between training and testing conditions. In particular, we find that ASI performance drops dramatically when given anechoic training but reverberant test speech. Previous work [1][2] has shown that speaker dependant information can be extracted from the static and dynamic qualities of moving lips. Given that lip information is unaffected by reverberation, we choose to fuse this additional information with speech data. We propose a new method for estimating confidence levels to allow adaptive fusion of the audio and visual data. Identification results are presented for increasing levels of artificially reverberated data, where lip information is shown to provide excellent ASI performance improvement.

SL980294.PDF (From Author) SL980294.PDF (Rasterized)

TOP