ICASSP '98 Main Page

General Information

Conference Schedule

Technical Program

    Overview
    50th Annivary Events
    Plenary Sessions
    Special Sessions
    Tutorials
    Technical Sessions

	By Date
		May 12, Tue
		May 13, Wed
		May 14, Thur
		May 15, Fri

By Category
	AE	ANNIV
	COMM	DSP
	IMDSP	MMSP
	NNSP	PLEN
	SP	SPEC
	SSAP	UA
	VLSI

By Author
	A	B	C	D	E
	F	G	H	I	J
	K	L	M	N	O
	P	Q	R	S	T
	U	V	W	X	Y
	Z

Invited Speakers

Registration

Exhibits

Social Events

Coming to Seattle

Satellite Events

Call for Papers/
Author's Kit

Future Conferences

Help

Abstract - SP12

SP12.1	An Energy-Constrained Signal Subspace Method for Speech Enhancement and Recognition in Colored Noise J. Huang, Y. Zhao (University of Illinois, Urbana-Champaign, USA) An energy-constrained signal subspace (ECSS) method is proposed for speech enhancement and recognition under an additive colored noise condition. The key idea is to match the short-time energy of the enhanced speech signal to the unbiased estimate of the short-time energy of the clean speech, which is proven very effective for improving the estimation of the noise-like, low-energy segments in speech signal. The colored noise is modelled by an autoregressive (AR) process. A modified covariance method is used to estimate the AR parameters of the colored noise and a prewhitening filter is constructed based on the estimated parameters. The performance of the proposed algorithm was evaluated using the TI46 digit database and the TIMIT continuous speech database. It was found that the ECSS method can significantly improve the signal-to-noise ratio (SNR) and word recognition accuracy (WRA) for isolated digits and continuous speech under various SNR conditions.
SP12.2	Removal of Noise from Speech Using the Dual EKF Algorithm E. Wan, A. Nelson (Oregon Graduate Institute, USA) Noise reduction for speech signals has applications ranging from speech enhancement for cellular communications, to front ends for speech recognition systems. A neural network based time-domain method called Dual Extended Kalman Filtering (Dual EKF) is presented for removing nonstationary and colored noise from speech. This paper describes the algorithm and provides a set of experimental results.
SP12.3	Combined Wiener and Coherence Filtering in Wavelet Domain For Microphone Array Speech Enhancement D. Mahmoudi, A. Drygajlo (Swiss Federal Institute of Technology, Lausanne, Switzerland) Wiener filter based postfiltering has shown its usefulness in microphone array speech enhancement systems. In our earlier work, wedeveloped a postfilter in the wavelet domain where better performancehas been obtained compared to the algorithms developed in the Fourierdomain. Furthermore, considerable computational savings are providedthanks to the multi-resolution and multi-rate analysis. Thiscontribution shows that the coherence function, calculated between thebeamforming output signal and the microphone reference output signalusing wavelet transform, provides a revelant and an exploitableinformation for further noise suppression. Thus, a nonlinear coherencefiltering and Wiener filter are combined in the wavelet transformdomain to improve the performance of the Wiener filter basedpostfilter, especially during pauses. Evaluations of the new algorithmconfirm that speech quality is indeed improved with significantlyreduced distortions. Finally, the results of the objective measuresare presented.
SP12.4	Speech Enhancement in a Bayesian Framework G. Saleh, M. Niranjan (Cambridge University Engineering Department, UK) We present an approach for the enhancement of speech signals corrupted by additive white noise of Gaussian statistics. The speech enhancement problem is treated as a signal estimation problem within a Bayesian framework. The conventional all-pole speech production model is assumed to govern the behaviour of the clean speech signal. The additive noise level and all-pole model gain are automatically inferred during the speech enhancement process. The strength of the Bayesian approach developed in this paper lies in its ability to perform speech enhancement without the usual requirement of estimating the level of the corrupting noise from ``silence'' segments of the corrupted signal. The performance of the Bayesian approach is compared to that of the Lim \& Oppenheim framework, to which it follows a similar iterative nature. A significant quality improvement is obtained over the Lim \& Oppenheim framework.
SP12.5	Speech Enhancement for Bandlimited Speech D. Heide, G. Kang (Naval Research Lab, USA) Throughout the history of telecommunication, speech has rarely been transmitted with its full analog bandwidth (0 to 8 kHz or more) due to limitations in channel bandwidth. This impaired legacy continues with tactical voice communication. The passband of a voice terminal is typically 0 to 4 kHz. Hence, high-frequency speech components (4 to 8 kHz) are removed prior to transmission. As a result, speech intelligibility suffers, particularly for low-data-rate vocoders. In this paper, we describe our speech-processing technique, which permits some of the upperband speech components to be translated into the passband of the vocoder. According to our test results, speech intelligibility is improved by as much as three to four points even for the recently developed and excellent Department of Defense-standard Mixed Excitation Linear Predictor (MELP) 2.4 kb/s vocoder. Note that speech intelligibility is improved without expanding the transmission bandwidth or compromising interoperability with others.
SP12.6	A Novel Psychoacoustically Motivated Audio Enhancement Algorithm Preserving Background Noise Characteristics S. Gustafsson, P. Jax, P. Vary (IND, RWTH Aachen, Germany) In this paper we propose an algorithm for reduction of noise in audio signals. In contrast to several previous approaches we do not try to achieve a complete removal of the noise, but instead our goal is to preserve a pre-defined amount of the original noise in the processed signal. This is accomplished by exploiting the masking properties of the human auditory system. The speech and noise distortions are considered separately. The spectral weighting rule, adapted by utilizing only estimates of the masking threshold and the noise power spectral density, has been designed to guarantee complete masking of distortions of the residual noise. Simulation results confirm that no audible artifacts are left in the processed signal, while speech distortions are comparable to those caused by conventional noise reduction techniques. Audio demonstrations are available from http://www.ind.rwth-aachen.de.
SP12.7	Speech Enhancement based on a Voiced-Unvoiced Speech Model Z. Goh, K. Tan (Centre for Signal Processing, Singapore); B. Tan (National University of Singapore, Singapore) In this work, we attempt to refine the methods based on autoregressive (AR) modeling for speech enhancement [1,2]. As a matter of fact, AR modelling, which is a key strategy of the methods reported in [1,2], is known to be good for representing unvoiced speech but not quite appropriate for voiced speech which is quite periodic in nature. Here, we incorporate a speech model which satisfactorily describes voiced and unvoiced speeches and silence (i.e., pauses between speech utterances) into the enhancement framework developed in [1,2], and specifically devise an algorithm for computing the optimal estimate of the clean speech in the minimum-mean-square-error sense. We also present the methods we use for estimating the model parameters and give a description of the complete enhancement procedure. Performance assessment based on spectrogram plots, objective measures and informal subjective listening tests all indicate that our method gives consistently good results.
SP12.8	Enhancement of Reverberant Speech Using LP Residual B. Yegnanarayana, P. Satyanarayana Murthy (Indian Institute of Technology, Madras, India); C. Avendano, H. Hermansky (Oregon Graduate Institute of Science, USA) In this paper we propose a new method of processing speech degraded by reverberation. The method is based on analysis of short (2 ms) segments of data to enhance the regions in the speech signal having high Signal to Reverberant component Ratio (SRR). The short segment analysis shows that SRR is different in different segments of speech. The processing method involves identifying and manipulating the linear prediction residual in three different regions of the speech signal, namely, high SRR region, low SRR region and only reverberation component region. A weighting function is derived to modify the LP residual. The weighted residual samples are used to excite the time-varying LP all-pole filter to obtain perceptually enhanced speech.

< Previous Abstract - SP11

SP13 - Next Abstract >