Speech Enhancement

Home


Multi-Channel Speech Enhancement in a Car Environment using Wiener Filtering and Spectral Subtraction

Authors:

Joerg Meyer, University of Bremen (Germany)
Klaus Uwe Simmer, University of Bremen (Germany)

Volume 2, Page 1167

Abstract:

This paper presents a multichannel-algorithm for speech enhancement for hands-free telephone systems in cars. This new algorithm takes advantage of the special noise characteristics in fast driving cars. The incoherence of the noise allows to use adaptive Wiener filtering in the frequencies above a theoretically determined frequency. Below this frequency a smoothed spectral subtraction (SSS) is used to get an improved noise suppression. The algorithm yields better results in noise reduction with significantly less distortions and artificial noise than spectral subtraction or Wiener filtering alone.

ic971167.pdf

ic971167.pdf

TOP



Weighted Matching Algorithms and Reliability in Noise Cancelling by Spectral Subtraction

Authors:

Nestor Becerra Yoma, CCIR/University of Edinburgh (U.K.)
Fergus McInnes, CCIR/University of Edinburgh (U.K.)
Mervyn Jack, CCIR/University of Edinburgh (U.K.)

Volume 2, Page 1171

Abstract:

This paper addresses the problem of speech recognition with signals corrupted by additive noise at moderate SNR. A technique based on spectral subtraction and noise cancellation reliability weighting in acoustic pattern matching algorithms is studied. A model for additive noise is proposed and used to compute the variance of the hidden clean signal information and the reliability of the spectral subtraction process. The results presented in this paper show that a proper weight on the information provided by static parameters can substantially reduce the error rate.

ic971171.pdf

TOP



HMM-Based Speech Enhancement Using Harmonic Modeling

Authors:

Michael E. Deisher, Intel (U.S.A.)
Andreas S. Spanias, ASU (U.S.A.)

Volume 2, Page 1175

Abstract:

This paper describes a technique for reduction of non-stationary noise in electronic voice communication systems. Removal of noise is needed in many such systems, particularly those deployed in harsh mobile or otherwise dynamic acoustic environments. The proposed method employs state-based statistical models of both speech and noise, and is thus capable of tracking variations in noise during sustained speech. This work extends the hidden Markov model (HMM) based minimum mean square error (MMSE) estimator to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech. Noise reduction during voiced sounds is thereby improved. Performance is evaluated using speech and noise from standard databases. The extended algorithm is demonstrated to improve speech quality as measured by informal preference tests and objective measures, to preserve speech intelligibility as measured by informal Diagnostic Rhyme Tests, and to improve the performance of a low bit-rate speech coder and a speech recognition system when used as a pre-processor.

ic971175.pdf

ic971175.pdf

TOP



Model Based Speech Pause Detection

Authors:

Bruce L. McKinley, Signal Processing Consultants (U.S.A.)
Gary H. Whipple, U.S. Department of Defense (U.S.A.)

Volume 2, Page 1179

Abstract:

This paper presents two new algorithms for robust speech pause detection (SPD) in noise. Our approach was to formulate SPD into a statistical decision theory problem for the optimal detection of noise-only segments, using the framework of model-based speech enhancement (MBSE). The advantages of this approach are that it performs well in high noise conditions, all necessary information is available in MBSE, and no other features are required to be computed. The first algorithm is based on a maximum a posteriori probability (MAP) test and the second is based on a Neyman-Pearson test. These tests are seen to make use of the spectral distance between the input vector and the composite spectral prototypes of the speech and noise models, as well as the probabilistic framework of the hidden Markov model. The algorithms are evaluated and shown to perform well against different types of noise at various SNRs.

ic971179.pdf

ic971179.pdf

TOP



Integrated Speech Enhancement and Coding in the Time-Frequency Domain

Authors:

Andrzej Drygajlo, LTS-DE, EPFL (Switzerland)
Benito Carnero, LTS-DE, EPFL (Switzerland)

Volume 2, Page 1183

Abstract:

This paper addresses the problem of merging speech enhancement and coding in the context of an auditory modeling. The noisy signal is first processed by a fast wavelet packet transform algorithm to obtain an auditory spectrum, from which a rough masking model is estimated. Then, this model is used to refine a subtractive-type enhancement algorithm. The enhanced speech coefficients are then encoded in the same time-frequency transform domain using masking threshold constraints for quantization noise. The advantage of the proposed method is that both enhancement and coding are performed with the transform coefficients, without making use of the additional FFT processing.

ic971183.pdf

ic971183.pdf

TOP



Quality Enhancement Of Narrowband CELP-Coded Speech Via Wideband Harmonic Re-synthesis

Authors:

Cheung-Fat Chan, City University of Hong Kong (Hong Kong)
Wai-Kwong Hui, City University of Hong Kong (Hong Kong)

Volume 2, Page 1187

Abstract:

Results for improving the quality of narrowband CELP-coded speech by enhancing the pitch periodicity and by regenerating the highband components of speech spectra are reported. Multiband excitation (MBE) analysis is applied to enhance the pitch periodicity by re-synthesizing the speech signal using a harmonic synthesizer. The highband magnitude spectra are regenerated by matching to lowband spectra using a trained wideband spectral codebook. Information about the voiced/unvoiced (V/UV) excitation in the highband are derived from a training procedure and recovered by using the matched lowband index. Simulation results indicate that the quality of the wideband enhanced speech is significantly improved over the narrowband CELP-coded speech.

ic971187.pdf

TOP



Speech Enhancement using CSS-based Array Processing

Authors:

Futoshi Asano, ETL (Japan)
Satoru Hayamizu, ETL (Japan)

Volume 2, Page 1191

Abstract:

A method for recovering the LPC spectrum from a microphone array input signal corrupted by ambient noise is proposed. This method is based on the CSS (coherent subspace) method, which is designed for DOA (direction of arrival) estimation of broadband array input signals. The noise energy is reduced in the subspace domain by the maximum likelihood method. To enhance the performance of noise reduction, elimination of noise-dominant subspace using projection is further employed, which is effective when the SNR is low and classification of noise and signals in the subspace domain is difficult. The results of the simulation show that some small formants, which cannot be estimated by the conventional delay-and-sum beamformer, were well estimated by the proposed method.

ic971191.pdf

ic971191.pdf

TOP



Co-channel Speaker Separation Using Constrained Nonlinear Optimization

Authors:

Daniel S. Benincasa, Rome Laboratory (U.S.A.)
Michael I. Savic, Rensselaer Polytechnic Institute (U.S.A.)

Volume 2, Page 1195

Abstract:

This paper describes a technique to separate the speech of two speakers recorded over a single channel. The main focus of this research is to separate overlapping voiced speech signals using constrained nonlinear optimization. Based on the assumption that voiced speech can be modeled as a slowly-varying vocal tract filter with a quasi-periodic train of impulses, the speech waveform is represented as a sum of sine waves with time-varying amplitude, frequency and phase. In this work the unknown parameters of our speech model will be the amplitude, frequency and phase of the harmonics of both speech signals. Using constrained nonlinear optimization, we will determine, on a frame by frame basis, the best possible parameters that provides the least mean square error (LMSE) between the original co-channel speech signal and the sum of the reconstructed speech signals.

ic971195.pdf

ic971195.pdf

TOP



A Contextual Blind Separation of Delayed and Convolved Sources

Authors:

Te-Won Lee, Max-Planck-Society (U.S.A.)
Reinhold Orglmeister, Berlin University of Technology (Germany)

Volume 2, Page 1199

Abstract:

We present a new method to tackle the problem of separating mixtures of real sources which have been convolved and time-delayed under real world conditions. To this end, we learn two sets of parameters to unmix the mixtures and to estimate the true density function. The solutions are discussed for feedback and feedforward architectures. Since the quality of separation depends on the modeling of the underlying density we propose different methods to closer approximate the density function using some context. The proposed density estimation achieves separation of a wider class of sources. Furthermore, we employ the FIR polynomial matrix techniques in the frequency domain to invert a true-phase mixing system. The significance of the new method is demonstrated with the successful separation of two speakers and separation of music and speech recorded with two microphones in a reverberating room.

ic971199.pdf

ic971199.pdf

TOP



Segregation of Concurrent Speech: an Application of the Reassigned Spectrum

Authors:

Georg F. Meyer, Keele University (U.K.)
Fabrice Plante, Liverpool University (U.K.)
Frederic Berthommier, ICP Grenoble (France)

Volume 2, Page 1203

Abstract:

Modulation maps provide an effective method for the segregation of voiced speech sounds from competing background activity. The maps are constructed by computing modulation spectra in a bank of auditory filters. Target spectra are recovered by sampling the modulation spectra at the initial five multiples of the fundamental frequency of the target sound. If the modulation spectra are computed using a conventional DFT, windows of 200ms duration are necessary. Using the reassigned spectrum, a new time-frequency representation, the window size can be reduced to 50ms with minimal loss of performance. The algorithm is tested on a 'double vowel' identification task that has been used extensively in psychophysical experiments.

ic971203.pdf

ic971203.pdf

TOP



Enhancement of esophageal speech by injection noise rejection

Authors:

Hector Raul Javkin, PTI-STL (U.S.A.)
Michael Galler, PTI-STL (U.S.A.)
Nancy Niedzielski, PTI-STL (U.S.A.)

Volume 2, Page 1207

Abstract:

Esophageal speakers, who produce a voice source by bringing about a vibration of the esophageal superior sphincter, must insufflate the esophagus with an air injection gesture before every utterance, creating an air reservoir to drive the vibration. The resulting noise is generally undesired by the speakers. This paper describes a method for the automatic recognition and rejection of the injection noise which occurs in esophageal speech.

ic971207.pdf

ic971207.pdf

TOP



Real-Time Digital Speech Processing Strategies For The Hearing Impaired

Authors:

Neeraj Magotra, University of New Mexico (U.S.A.)
Sudheer Sirivara, University of New Mexico (U.S.A.)

Volume 2, Page 1211

Abstract:

This paper deals with digital processing of speech as it pertains to the hearing impaired. The issues described in this paper deal with the development of a true real-time digital hearing aid. The system (based on Texas Instruments TMS320C3X) implements frequency shaping, noise reduction, interaural time delay, amplitude compression and various timing options. It also provides a testbed for future development. The device is referred to as the DIgital Programmable Hearing Aid (DIPHA). DIPHA uses a wide bandwidth (upto 16 KHz). DIPHA is a fully programmable device that permits us to program various speech processing algorithms and test them on hearing impaired subjects in the real world as well as in the laboratory.

ic971211.pdf

ic971211.pdf

TOP



Iterative-Batch And Sequential Algorithms For Single Microphone Speech Enhancement

Authors:

Sharon Gannot, Tel-Aviv University (Israel)
David Burshtein, Tel-Aviv University (Israel)
Ehud Weinstein, Tel-Aviv University (Israel)

Volume 2, Page 1215

Abstract:

Speech quality and intelligibility might significantly deteriorate in the presence of background noise, especially when the speech signal is subject to subsequent processing. In this paper we represent a class of Kalman-filter based speech enhancement algorithms with some extensions, modifications, and improvements. The first algorithm employs the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters. The enhanced speech signal is obtained as a byproduct of the parameter estimation algorithm. The second algorithm is a sequential, computationally efficient, gradient descent algorithm. We discuss various topics concerning the practical implementation of these algorithms. Experimental study, using real speech and noise signals is provided to compare these algorithms with alternative speech enhancement algorithms, and to compare the performance of the iterative and sequential algorithms.

ic971215.pdf

ic971215.pdf

TOP



Kalman filtering for low distortion speech enhancement in mobile communication

Authors:

Patrik Sörqvist, Ericsson (Sweden)
Peter Händel, Ericsson (Sweden)
Björn Ottersten, KTH (Sweden)

Volume 2, Page 1219

Abstract:

This paper presents a model-based approach for noise suppression of speech contaminated by additive noise. A Kalman filter based speech enhancement system is presented and its performance is investigated in detail. It is shown that with a novel speech parameter estimation algorithm, it is possible to achieve 10dB noise suppression with a high total audible quality.

ic971219.pdf

ic971219.pdf

TOP