Robust Speech Processing in Adverse Environments 5

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Auditory Modeling Techniques For Robust Pitch Extraction And Noise Reduction

Authors:

Piero Cosi, Institute of Phonetics and Dialectology, National Research Council (Italy)
Stefano Pasquin, University of Padua, Electronic Engineering Departement (Italy)
Enrico Zovato, University of Padua, Electronic Engineering Departement (Italy)

Page (NA) Paper number 1053

Abstract:

A novel method for robust pitch extraction, based on the correlogram output of the Lyon's cochlear model is described. The value of the autocorrelation lag for which the signals of the cochlear channels have the same periodicity can be computed thus tracking how the pitch of the input signal varies in the time domain. In the case of a stationary noise, a sort of 'spectral-subtraction' technique, built in the correlogram domain named 'correlogram subtraction', is applied to enhance the signal before computing its fundamental frequency. Finally, a correction algorithm based on an 'island driven' strategy, working on particular zones of the signal with stable pitch values, is used to refine the pitch estimate. This method of pitch extraction is extremely reliable, even in the case of a signal to noise ratio of 0dB. The same subtraction technique, with some new specific filter-bank energy-based modifications, is considered to re-synthesize, by an inversion strategy, a clean version of an input noisy signal. The quality of the re-synthesized signal is quite promising, leading us to try, in the future, to use this technique as a new signal enhancement scheme.

SL981053.PDF (From Author) SL981053.PDF (Rasterized)

1053_01.WAV
(was: 1053_01.WAV)
Italian word /lavan'daja/ ('washerwoman) pronouced by a male speaker in clean condition.
File type: Sound File
Format: Sound File: WAV
Tech. description: 16kHz, 16bits-per-sample, mono, Windows PCM
Creating Application:: Windows Sound Utilities
Creating OS: Windows 95/NT
1053_03.WAV
(was: 1053_03.WAV)
Italian word /lavan'daja/ ('washerwoman) reconstructed, by the correlogram subtraction technique described in the paper, from the corresponding noisy signal (0dB SNR).
File type: Sound File
Format: Sound File: WAV
Tech. description: 16kHz, 16bits-per-sample, mono, Windows PCM
Creating Application:: Windows Sound Utilities
Creating OS: Windows 95/NT
1053_02.WAV
(was: 1053_02.WAV)
Italian word /lavan'daja/ ('washerwoman) pronouced by a male speaker in a noisy condition (0dB SNR).
File type: Sound File
Format: Sound File: WAV
Tech. description: 16kHz, 16bits-per-sample, mono, Windows PCM
Creating Application:: Windows Sound Utilities
Creating OS: Windows 95/NT

TOP


Wavelet Transform-based Speech Enhancement

Authors:

Eliathamby Ambikairajah, Athlone Institute of Technology (Ireland)
Graham Tattersall, University of East Anglia (U.K.)
Andrew Davis, BT Laboratories (U.K.)

Page (NA) Paper number 140

Abstract:

This paper describes a speech enhancement system using a novel combination of a Fast Wavelet Transform structure, together with "Wiener filtering" in the wavelet domain. The specific application of interest is the enhancement of speech when a cellular phone is used within a moving vehicle. Subjective tests carried out using speech with additive vehicle noise at a signal-to-noise ratio of 10 dB indicate that the Wavelet transform-based Wiener filtering approach works well. In particular, the technique was compared to several other common enhancement methods such as thresholding applied in the wavelet domain, FFT-based Wiener filtering, and spectral subtraction, and was found to outperform these other techniques.

SL980140.PDF (From Author) SL980140.PDF (Rasterized)

TOP


A Practical Perceptual Frequency Autoregressive HMM Enhancement System

Authors:

Beth Logan, University of Cambridge (U.K.)
Tony Robinson, University of Cambridge (U.K.)

Page (NA) Paper number 1083

Abstract:

We have previously developed an adaptive speech enhancement scheme. This models speech and noise using perceptual frequency or `warped' autoregressive HMMs (AR-HMMs) and estimates the clean speech and noise parameters within this framework. In this paper, we investigate the use of our system as a front-end to a clean MFCC recognition system. We make two main modifications to our scheme. First, we use MMSE spectral rather than time domain estimators for enhancement. Second, for computational reasons, we form estimators using non-warped AR-HMMs. To avoid mismatch when converting between warped and non-warped models, we use parallel models. Results are presented for small and medium vocabulary tasks. On the simple task, we approach the performance of a matched system when language model information is included. On the second task, we are unable to incorporate a language model due to modelling deficiencies in AR-HMMs. However, we still demonstrate substantial improvements over baseline results.

SL981083.PDF (From Author) SL981083.PDF (Rasterized)

TOP


An Effective Quality Evaluation Protocol For Speech Enhancement Algorithms

Authors:

John H.L. Hansen, Robust Speech Processing Lab; Duke Univ. (USA)
Bryan L. Pellom, Robust Speech Processing Lab; Duke Univ. (USA)

Page (NA) Paper number 917

Abstract:

Much progress has been made in speech enhancement algorithm formulation in recent years. However, while researchers in the speech coding and recognition communities have standard criteria for algorithm performance comparison, similar standards do not exist for researchers in speech enhancement. This paper discusses the necessary ingredients for an effective speech enhancement evaluation. We propose that researchers use the evaluation core test set of TIMIT (192 sentences), with a set of noise files, and a combination of objective measures and subjective testing for broad and fine phone-level quality assessment. Evaluation results include overall objective speech quality measure scores, measure histograms, and phoneme class and individual phone scores. The reported results are meant to illustrate specific ways of detailing quality assessment for an enhancement algorithm.

SL980917.PDF (From Author) SL980917.PDF (Rasterized)

TOP


An Adaptive Beamforming Microphone Array System Using A Blind Deconvolution

Authors:

Jin-Nam Park, Department of Computer Science, Kumamoto University (Japan)
Tsuyoshi Usagawa, Department of Computer Science, Kumamoto University (Japan)
Masanao Ebata, Department of Computer Science, Kumamoto University (Japan)

Page (NA) Paper number 418

Abstract:

This paper proposes an adaptive microphone array using blind deconvolution. The method realizes an signal enhancement based on the combination with beamforming using blind deconvolution, synchronized summation and DSA method. The proposed method improves a performance of estimation by the iterative operation of blind deconvolution using a cost-function base on the coherency function.

SL980418.PDF (From Author) SL980418.PDF (Rasterized)

0418_01.WAV
(was: 0418_4bA.wav)
The performance by mean value of coherence in simulation: Case 1
File type: Sound File
Format: Sound File: WAV
Tech. description: 44.1kHz, 16bit, mono
Creating Application:: Unknown
Creating OS: Unix 2.0.32
0418_02.WAV
(was: 0418_4bB.wav)
The performance by mean value of coherence in simulation: Case 2
File type: Sound File
Format: Sound File: WAV
Tech. description: 44.1kHz, 16bit, mono
Creating Application:: Unknown
Creating OS: Unix 2.0.32
0418_03.WAV
(was: 0418_5bA.wav)
The performance as a mean value of coherence in experiment: Case 1
File type: Sound File
Format: Sound File: WAV
Tech. description: 44.1kHz, 16bit, mono
Creating Application:: Unknown
Creating OS: Unix 2.0.32
0418_04.WAV
(was: 0418_5bB.wav)
The performance as a mean value of coherence in experiment: Case 2
File type: Sound File
Format: Sound File: WAV
Tech. description: 44.1kHz, 16bit, mono
Creating Application:: Unknown
Creating OS: Unix 2.0.32

TOP


Speech Enhancement Using Critical Band Spectral Subtraction

Authors:

Latchman Singh, Queensland University of Technology (Australia)
Sridha Sridharan, Queensland University of Technology (Australia)

Page (NA) Paper number 1134

Abstract:

This paper proposes a new enhancement technique for the enhancement of broadband noise corrupted speech. The technique exploits the human auditory systems inability to distinguish between individual frequency components within critical frequency bands. Spectral subtraction is used and the spectrum is considered as critical frequency bands rather than individual frequency components. The proposed technique is compared with the existing spectral subtraction technique, using both subjective and objective speech assessment measures. Results are quoted and indicate that there is a significant increase in intelligibility and quality.

SL981134.PDF (From Author) SL981134.PDF (Rasterized)

TOP