|
Full List of Titles 1: ICSLP'98 Proceedings 2: SST Student Day Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files |
Auditory Modeling Techniques For Robust Pitch Extraction And Noise ReductionAuthors:
Piero Cosi, Institute of Phonetics and Dialectology, National Research Council (Italy)
Page (NA) Paper number 1053Abstract:A novel method for robust pitch extraction, based on the correlogram output of the Lyon's cochlear model is described. The value of the autocorrelation lag for which the signals of the cochlear channels have the same periodicity can be computed thus tracking how the pitch of the input signal varies in the time domain. In the case of a stationary noise, a sort of 'spectral-subtraction' technique, built in the correlogram domain named 'correlogram subtraction', is applied to enhance the signal before computing its fundamental frequency. Finally, a correction algorithm based on an 'island driven' strategy, working on particular zones of the signal with stable pitch values, is used to refine the pitch estimate. This method of pitch extraction is extremely reliable, even in the case of a signal to noise ratio of 0dB. The same subtraction technique, with some new specific filter-bank energy-based modifications, is considered to re-synthesize, by an inversion strategy, a clean version of an input noisy signal. The quality of the re-synthesized signal is quite promising, leading us to try, in the future, to use this technique as a new signal enhancement scheme.
|
1053_01.WAV(was: 1053_01.WAV) | Italian word /lavan'daja/ ('washerwoman) pronouced by
a male speaker in clean condition. File type: Sound File Format: Sound File: WAV Tech. description: 16kHz, 16bits-per-sample, mono, Windows PCM Creating Application:: Windows Sound Utilities Creating OS: Windows 95/NT |
1053_03.WAV(was: 1053_03.WAV) | Italian word /lavan'daja/ ('washerwoman) reconstructed, by the
correlogram subtraction technique described in the paper, from
the corresponding noisy signal (0dB SNR). File type: Sound File Format: Sound File: WAV Tech. description: 16kHz, 16bits-per-sample, mono, Windows PCM Creating Application:: Windows Sound Utilities Creating OS: Windows 95/NT |
1053_02.WAV(was: 1053_02.WAV) | Italian word /lavan'daja/ ('washerwoman) pronouced by
a male speaker in a noisy condition (0dB SNR). File type: Sound File Format: Sound File: WAV Tech. description: 16kHz, 16bits-per-sample, mono, Windows PCM Creating Application:: Windows Sound Utilities Creating OS: Windows 95/NT |
Eliathamby Ambikairajah, Athlone Institute of Technology (Ireland)
Graham Tattersall, University of East Anglia (U.K.)
Andrew Davis, BT Laboratories (U.K.)
This paper describes a speech enhancement system using a novel combination of a Fast Wavelet Transform structure, together with "Wiener filtering" in the wavelet domain. The specific application of interest is the enhancement of speech when a cellular phone is used within a moving vehicle. Subjective tests carried out using speech with additive vehicle noise at a signal-to-noise ratio of 10 dB indicate that the Wavelet transform-based Wiener filtering approach works well. In particular, the technique was compared to several other common enhancement methods such as thresholding applied in the wavelet domain, FFT-based Wiener filtering, and spectral subtraction, and was found to outperform these other techniques.
Beth Logan, University of Cambridge (U.K.)
Tony Robinson, University of Cambridge (U.K.)
We have previously developed an adaptive speech enhancement scheme. This models speech and noise using perceptual frequency or `warped' autoregressive HMMs (AR-HMMs) and estimates the clean speech and noise parameters within this framework. In this paper, we investigate the use of our system as a front-end to a clean MFCC recognition system. We make two main modifications to our scheme. First, we use MMSE spectral rather than time domain estimators for enhancement. Second, for computational reasons, we form estimators using non-warped AR-HMMs. To avoid mismatch when converting between warped and non-warped models, we use parallel models. Results are presented for small and medium vocabulary tasks. On the simple task, we approach the performance of a matched system when language model information is included. On the second task, we are unable to incorporate a language model due to modelling deficiencies in AR-HMMs. However, we still demonstrate substantial improvements over baseline results.
John H.L. Hansen, Robust Speech Processing Lab; Duke Univ. (USA)
Bryan L. Pellom, Robust Speech Processing Lab; Duke Univ. (USA)
Much progress has been made in speech enhancement algorithm formulation in recent years. However, while researchers in the speech coding and recognition communities have standard criteria for algorithm performance comparison, similar standards do not exist for researchers in speech enhancement. This paper discusses the necessary ingredients for an effective speech enhancement evaluation. We propose that researchers use the evaluation core test set of TIMIT (192 sentences), with a set of noise files, and a combination of objective measures and subjective testing for broad and fine phone-level quality assessment. Evaluation results include overall objective speech quality measure scores, measure histograms, and phoneme class and individual phone scores. The reported results are meant to illustrate specific ways of detailing quality assessment for an enhancement algorithm.
Jin-Nam Park, Department of Computer Science, Kumamoto University (Japan)
Tsuyoshi Usagawa, Department of Computer Science, Kumamoto University (Japan)
Masanao Ebata, Department of Computer Science, Kumamoto University (Japan)
This paper proposes an adaptive microphone array using blind deconvolution. The method realizes an signal enhancement based on the combination with beamforming using blind deconvolution, synchronized summation and DSA method. The proposed method improves a performance of estimation by the iterative operation of blind deconvolution using a cost-function base on the coherency function.
0418_01.WAV(was: 0418_4bA.wav) | The performance by mean value of coherence in simulation: Case 1 File type: Sound File Format: Sound File: WAV Tech. description: 44.1kHz, 16bit, mono Creating Application:: Unknown Creating OS: Unix 2.0.32 |
0418_02.WAV(was: 0418_4bB.wav) | The performance by mean value of coherence in simulation: Case 2 File type: Sound File Format: Sound File: WAV Tech. description: 44.1kHz, 16bit, mono Creating Application:: Unknown Creating OS: Unix 2.0.32 |
0418_03.WAV(was: 0418_5bA.wav) | The performance as a mean value of coherence in experiment: Case 1 File type: Sound File Format: Sound File: WAV Tech. description: 44.1kHz, 16bit, mono Creating Application:: Unknown Creating OS: Unix 2.0.32 |
0418_04.WAV(was: 0418_5bB.wav) | The performance as a mean value of coherence in experiment: Case 2 File type: Sound File Format: Sound File: WAV Tech. description: 44.1kHz, 16bit, mono Creating Application:: Unknown Creating OS: Unix 2.0.32 |
Latchman Singh, Queensland University of Technology (Australia)
Sridha Sridharan, Queensland University of Technology (Australia)
This paper proposes a new enhancement technique for the enhancement of broadband noise corrupted speech. The technique exploits the human auditory systems inability to distinguish between individual frequency components within critical frequency bands. Spectral subtraction is used and the spectrum is considered as critical frequency bands rather than individual frequency components. The proposed technique is compared with the existing spectral subtraction technique, using both subjective and objective speech assessment measures. Results are quoted and indicate that there is a significant increase in intelligibility and quality.