Speech Enhancement I

Chair: Hynek Hermansky, Oregon Graduate Institute of Science, USA

Home


An Energy-Constrained Signal Subspace Method for Speech Enhancement and Recognition in Colored Noise

Authors:

Jun Huang, University of Illinois, Urbana-Champaign (U.S.A.)
Yunxin Zhao, University of Illinois, Urbana-Champaign (U.S.A.)

Volume 1, Page 377, Paper number 2173

Abstract:

An energy-constrained signal subspace (ECSS) method is proposed for speech enhancement and recognition under an additive colored noise condition. The key idea is to match the short-time energy of the enhanced speech signalto the unbiased estimate of the short-time energy of the clean speech, which is proven very effective for improving the estimation of the noise-like, low-energy segments in speech signal. The colored noise is modelled by an autoregressive (AR) process. A modified covariance method is used to estimate the AR parameters of the colored noise and a prewhitening filter is constructed based on the estimated parameters. The performance of the proposed algorithm was evaluated using the TI46 digit database and the TIMIT continuous speech database. It was found that the ECSS method can significantly improve the signal-to-noise ratio (SNR) and word recognition accuracy (WRA) for isolated digits and continuous speech under various SNR conditions.

ic982173.pdf (From Postscript)

TOP



Removal of Noise from Speech Using the Dual EKF Algorithm

Authors:

Eric A Wan, Oregon Graduate Institute (U.S.A.)
Alex T Nelson, Oregon Graduate Institute (U.S.A.)

Volume 1, Page 381, Paper number 2119

Abstract:

Noise reduction for speech signals has applications ranging from speech enhancement for cellular communications, to front ends for speech recognition systems. A neural network based time-domain method called Dual Extended Kalman Filtering (Dual EKF) is presented for removing nonstationary and colored noise from speech. This paper describes the algorithm and provides a set of experimental results.

ic982119.pdf (From Postscript)

TOP



Combined Wiener and Coherence Filtering in Wavelet Domain For Microphone Array Speech Enhancement

Authors:

Djamila Mahmoudi, Swiss Federal Institute of Technology, Lausanne (Switzerland)
Andrzej Drygajlo, Swiss Federal Institute of Technology, Lausanne (Switzerland)

Volume 1, Page 385, Paper number 1927

Abstract:

Wiener filter based postfiltering has shown its usefulness in microphone array speech enhancement systems. In our earlier work, wedeveloped a postfilter in the wavelet domain where better performancehas been obtained compared to the algorithms developed in the Fourierdomain. Furthermore, considerable computational savings are providedthanks to the multi-resolution and multi-rate analysis. Thiscontribution shows that the coherence function, calculated between thebeamforming output signal and the microphone reference output signalusing wavelet transform, provides a revelant and an exploitableinformation for further noise suppression. Thus, a nonlinear coherencefiltering and Wiener filter are combined in the wavelet transformdomain to improve the performance of the Wiener filter basedpostfilter, especially during pauses. Evaluations of the new algorithmconfirm that speech quality is indeed improved with significantlyreduced distortions. Finally, the results of the objective measuresare presented.

ic981927.pdf (From Postscript)

TOP



Speech Enhancement in a Bayesian Framework

Authors:

Gaafar M.K. Saleh, Cambridge University (U.K.)
Mahesan Niranjan, Cambridge University (U.K.)

Volume 1, Page 389, Paper number 5192

Abstract:

We present an approach for the enhancement of speech signals corrupted by additive white noise of Gaussian statistics. The speech enhancement problem is treated as a signal estimation problem within a Bayesian framework. The conventional all-pole speech production model is assumed to govern the behaviour of the clean speech signal. The additive noise level and all-pole model gain are automatically inferred during the speech enhancement process. The strength of the Bayesian approach developed in this paper lies in its ability to perform speech enhancement without the usual requirement of estimating the level of the corrupting noise from ``silence'' segments of the corrupted signal. The performance of the Bayesian approach is compared to that of the Lim & Oppenheim framework, to which it follows a similar iterative nature. A significant quality improvement is obtained over the Lim & Oppenheim framework.

ic985192.pdf (Scanned)

TOP



Speech Enhancement for Bandlimited Speech

Authors:

David A Heide, Naval Research Laboratory (U.S.A.)
George S Kang, Naval Research Laboratory (U.S.A.)

Volume 1, Page 393, Paper number 1022

Abstract:

Throughout the history of telecommunication, speech has rarely been transmitted with its full analog bandwidth (0 to 8 kHz or more) due to limitations in channel bandwidth. This impaired legacy continues with tactical voice communication. The passband of a voice terminal is typically 0 to 4 kHz. Hence, high-frequency speech components (4 to 8 kHz) are removed prior to transmission. As a result, speech intelligibility suffers, particularly for low-data-rate vocoders. In this paper, we describe our speech-processing technique, which permits some of the upperband speech components to be translated into the passband of the vocoder. According to our test results, speech intelligibility is improved by as much as three to four points even for the recently developed and excellent Department of Defense-standard Mixed Excitation Linear Predictor (MELP) 2.4 kb/s vocoder. Note that speech intelligibility is improved without expanding the transmission bandwidth or compromising interoperability with others.

ic981022.pdf (From Postscript)

TOP



A Novel Psychoacoustically Motivated Audio Enhancement Algorithm Preserving Background Noise Characteristics

Authors:

Stefan N.A. Gustafsson, IND, RWTH Aachen (Germany)
Peter J. Jax, IND, RWTH Aachen (Germany)
Peter Vary, IND, RWTH Aachen (Germany)

Volume 1, Page 397, Paper number 1183

Abstract:

In this paper we propose an algorithm for reduction of noise in audio signals. In contrast to several previous approaches we do not try to achieve a complete removal of the noise, but instead our goal is to preserve a pre-defined amount of the original noise in the processed signal. This is accomplished by exploiting the masking properties of the human auditory system. The speech and noise distortions are considered separately. The spectral weighting rule, adapted by utilizing only estimates of the masking threshold and the noise power spectral density, has been designed to guarantee complete masking of distortions of the residual noise. Simulation results confirm that no audible artifacts are left in theprocessed signal, while speech distortions are comparable to those caused by conventional noise reduction techniques. Audio demonstrations are available from http://www.ind.rwth-aachen.de.

ic981183.pdf (From Postscript)

TOP



Speech Enhancement based on a Voiced-Unvoiced Speech Model

Authors:

Zenton Goh, Nanyang Technological University (Singapore)
Kah-Chye Tan, Nanyang Technological University (Singapore)
B.T.G. Tan, National University of Singapore (Singapore)

Volume 1, Page 401, Paper number 1050

Abstract:

In this work, we attempt to refine the methods based on autoregressive (AR) modeling for speech enhancement [1,2]. As a matter of fact, AR modelling, which is a key strategy of the methods reported in [1,2], is known to be good for representing unvoiced speech but not quite appropriate for voiced speech which is quite periodic in nature. Here, we incorporate a speech model which satisfactorily describes voiced and unvoiced speeches and silence (i.e., pauses between speech utterances) into the enhancement framework developed in [1,2], and specifically devise an algorithm for computing the optimal estimate of the clean speech in the minimum-mean-square-error sense. We also present the methods we use for estimating the model parameters and give a description of the complete enhancement procedure. Performance assessment based on spectrogram plots, objective measures and informal subjective listening tests all indicate that our method gives consistently good results.

ic981050.pdf (From Postscript)

TOP



Enhancement of Reverberant Speech Using LP Residual

Authors:

Bayya Yegnanarayana, Institute of Technology, Madras (India)
Philkhana Satyanarayana Murthy, Institute of Technology, Madras (India)
Carlos Avendano, Oregon Graduate Institute of Science (U.S.A.)
Hynek Hermansky, Oregon Graduate Institute of Science (U.S.A.)

Volume 1, Page 405, Paper number 1433

Abstract:

In this paper we propose a new method of processing speech degraded by reverberation. The method is based on analysis of short (2 ms) segments of data to enhance the regions in the speech signal having high Signal to Reverberant component Ratio (SRR). The short segment analysis shows that SRR is different in different segments of speech. The processing method involves identifying and manipulating the linear prediction residual in three different regions of the speech signal, namely, high SRR region, low SRR region and only reverberation component region. A weighting function is derived to modify the LP residual. The weighted residual samples are used to excite the time-varying LP all-pole filter to obtain perceptually enhanced speech.

ic981433.pdf (From Postscript)

TOP