ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - AE3 |
 |
AE3.1
|
Dichotic Presentation of Speech Signal with Critical Band Filtering for Improving Speech Perception
D. Chaudhari,
P. Pandey (Indian Institute of Technology, Bombay, India)
Spread of spectral masking along the cochlear partion is one of the major factors contributing to the relatively poor speech reception in cases of hearing impairment of sensorineural origin. We have carried out experimental evaluation of splitting speech into two signals on the basis of frequency and presenting it dichotically over two ears for increasing the speech intelligibility. In this scheme, input speech signal is filtered into two signals by using a bank of critical bank filters where odd numbered critical bands are presented to one ear and even numbered ones to the other. Thus, the effect of spectral masking on speech information in the cochlea is reduced, and the dichotically presented signals are perceptually integrated in the auditory cortex. The processing of speech signal quantized with 12-bit resolution with centre frequencies ranging from 150 Hz to 4.80 kHz. The scheme was evaluated using normal hearing subjects, with sensorineural loss being simulated by adding white noise to the speech signal as a masker at different SNRs. Listening tests were carried out to record stimulus-response confusion matrices. Test stimuli consisted of twelve English consonants in vowel-consonant -vowel and consonant-vowel context with vowel /a/. The relative improvements in recognition scores was about 15%. Improvement in speech reception was contributed by the features of voicing, place, and manner.
|
AE3.2
|
A Realtime Robust Adaptive Microphone Array Controlled by an SNR Estimate
O. Hoshuyama (NEC Corporation, Japan);
B. Begasse (INSA);
A. Sugiyama,
A. Hirano (NEC Corporation, Japan)
A robust adaptive microphone array (RAMA) using a new adaptation-mode control method (AMC) and its evaluation by hardware are presented. The adaptation of the RAMA is controlled based on an SNR (signal-to-noise) estimate using the output powers of the fixed beamformer and the adaptive blocking matrix. The RAMA is implemented on a multi-DSP realtime signal-processing system with a C-compiler. Simulation results with real acoustic data show that the AMC based on the SNR estimate causes less breathing noise than the conventional AMC and that it obtains 1.0-point higher score on a 5-point mean opinion score scale. Evaluation through a the realtime signal-processing system demonstrates that noise reduction achieved by the RAMA is over 12 dB even in reverberant environments.
|
AE3.3
|
Automatic Classification of Environmental Noise Events by Hidden Markov Models
P. Gaunard,
C. Mubikangiey,
C. Couvreur,
V. Fontaine (Faculte Polytechnique de Mons, Belgium)
The automatic classification of environmental noise sources from their acoustic signatures recorded at the microphone of a noise monitoring system (NMS) is an active subject of research nowadays. This paper shows how hidden Markov models (HMM's) can be used to build an environmental noise recognition system based on a time-frequency analysis of the noise signal. The performance of the proposed HMM-based approach is evaluated experimentally for the classification of five types of noise events (car,truck, moped, aircraft, train). The HMM-based approach is found to outperform previously proposed classifiers based on the average spectrum of noise event with more than 95% of correct classifications. For comparison, a classification test is performed with human listeners for the same data which shows that the best HMM-based classifier outperforms the ``average'' human listener who achieves only 91.8% of correct classification for the same task.
|
AE3.4
|
On the Use of Explicit Speech Modeling in Microphone Array Applications
M. Brandstein (Harvard University, USA)
This paper addresses the limitations of current approaches to distant-talker speech acquisition and advocates the development of techniques which explicitly incorporate the nature of the speech signal (e.g. statistical non-stationarity, method of production, pitch, voicing, formant structure, and source radiator model) into a multi-channel context. The goal is to combine the advantages of spatial filtering achieved through beamforming with knowledge of the desired time-series attributes. The potential utility of such an approach is demonstrated through the application of a multi-channel version of the Dual Excitation speech model.
|
AE3.5
|
Construction of a Joint Peak-Interval Histogram Using Higher-Order Cumulant-Based Inverse Filtering
S. Lei,
R. Hamernik (Plattsburgh State University, USA)
Conventional metrics used to quantify signals in noise/hearing research rely primarily on time-averaged energy and spectral analyses. Such metrics, while appropriate for Gaussian-distributed waveforms, are of limited value in the more complex sound environments encountered in industrial/military settings that have nonGaussian and nonstationary-distributed waveforms. Recent research has shown that metrics incorporating the temporal characteristics of a waveform are needed to evaluate hazardous acoustic environments for purposes of hearing conservation. The joint peak-interval histogram is a prospective candidate for use in such an application. This paper shows that the joint peak-interval histogram can be obtained from an estimation of the temporal pattern of a complex noise waveform by using higher-order cumulant-based inverse filtering.
|
AE3.6
|
Classification of Audio Signals Using Statistical Features on Time and Wavelet Transform Domains
T. Lambrou,
P. Kudumakis,
R. Speller,
M. Sandler,
A. Linney (University College London, UK)
This paper presents a study on musical signal classification, using wavelet transform analysis in conjunction with statistical pattern recognition techniques. A comparative evaluation between different wavelet analysis architectures in terms of their classification ability, as well as between different classifiers is carried out. We seek to establish which statistical measurements clearly distinguish between the three different musical styles of rock, piano, and jazz. Out preliminary results suggest that the features collected by the adaptive splitting wavelet transform technique performed better compared to the other wavelet based techniques, achieving overall classifications accuracy of 91.67%, using either the MinimumDistance Classifer or the Least Squares Minimum Distance Classifier. Such a system can play a useful part in multimedia applications which require content based search, classification, and retrieval of audio signals, as defined in MPEG-7.
|
AE3.7
|
Personal Computer Software Vowel Training Aid for the Hearing Impaired
A. Zimmer,
B. Dai,
S. Zahorian (Old Dominion University, USA)
A vowel training aid system for hearing impaired persons which uses a Windows-based multimedia computer has been developed. The system provides two main displays which give visual feedback for vowels spoken in isolation and short word contexts. Feature extraction methods and neural network processing techniques provide a high degree of accuracy for speaker independent vowel training. The system typically provides correct classification of over 85% of steady state vowels spoken by adult male, adult female and child (both genders combined) speakers. Similar classification accuracy is also observed for vowels spoken in short words. Low cost and good performance make this system potentially useful for speech training at home.
|
AE3.8
|
Optimal Truncation Time for Matched Filter Array Processing
D. Rabinkin (CAIP Center, Rutgers University, USA);
D. Macomber (SEAS, University of Pennsylvania, USA);
R. Renomeron,
J. Flanagan (CAIP Center, Rutgers University, USA)
Matched filter array processing (MFA) has been shown to improve signal-to-noise (SNR) quality for array speech capture in reverberant environments. However, under non-optimum conditions, MFA processing is computationally costly, and may produce little improvement or even subjective quality degradation as compared with simple time delay compensation (TDC). Appropriate truncation of the MFA filter bank is shown to reduce the computational burden without significantly reducing the capture SNR. This work attempts to find an optimal truncation time with respect to room size, wall absorption and the number of microphones used for the system. Simulations were conducted to evaluate MFA performance as a function of truncation length as these parameters were varied in situations typical of teleconferencing applications. It was demonstrated that judicious MFA truncation allows a reduction in computation load without sacrificing capture SNR.
|
AE3.9
|
Multi-Microphone Noise Cancellation For Improvement of Hearing Aid Performance
P. Shields,
D. Campbell (University of Paisley, Scotland, UK)
A scheme for binaural pre-processing of speech signals for input to a standard linear hearing aid has been investigated. The system is based on that of Toner & Campbel** who applied the Least Mean Squares (LMS) algorithm in sub-bands to speech signals from various acoustic environments and signal to noise ratios (SNR). The processing scheme attempts to take advantage of the multiple inputs to perform noise cancellation. The use of sub-bands enables a diverse processing mechanism to be employed, where the wide-band signal is split into smaller frequency limited sub-bands, which can subsequently be processed according to their signal characteristics. The results of a large scale series of intelligibility tests are presented from experiments in which acoustic speech and noise data, generated using simulated and real-room acoustics was tested on hearing impaired volunteers. ** Toner, E., Campbell, D.R., (1993), 'Speech Enhancement using Sub-Band intermittent adaptation', Speech Communication, 12, 253-259.
|
AE3.10
|
Novel Brick-Wall Filters Based on the Auditory System
A. Biswas (Speech & Hearing Sciences, Indiana University, USA)
A novel class of narrow band filters is presented that offers great rejection of out of band noise and a flat top at the peak. The filter is unconventional, as the output of such a filter is a series of spikes, like the action potential in the auditory nerve. The product of its time and frequency window is less than unity even using 100% output cutoff points. A bank of such filters has been used to compute a spectrogram like display that has no streaks as seen in conventional spectrograms. This model differs from Lyon's cochlea in several areas particularly energy management and damping factor requirements. Its Q factor depends on the signal intensity, and it produces acoustic cubic distortion products similar to the auditory system, however, it is basically linear with a very wide dynamic range and its dynamics can be analyzed using linear filter theory.
|
< Previous Abstract - AE2 |
AE4 - Next Abstract > |
|