ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - SPEC-NNSP |
 |
SPEC-NNSP.1
|
Recovering Depth from Stereo Using ART Neural Networks
S. Markogiannakis,
E. Manolakos (Northeastern University, USA)
One of the long standing problems in passive stereo vision is that of constructing an accurate range map using only two images, providing two views of the same world schene. It amounts to identifying pairs of corresponding pixels that are associated with the same point of the real world. We are introducing ART-1 neural networks as a primitive for addressing effectively all aspects of the challenging stereo correspondence problem. Using a multi-pass approach it is possible to increase gradually the density of matched points, while at the same time false matches are filtered by requiring close agreement between disparity estimates in a neighborhood. At the end a reasonably dense disparity map is obtained, to the estend that it allows scene reconstruction by interpolation. Our scheme was tested on random dot stereograms, artificial and real world scenes. In all cases scene reconstructions are shown to be quite realistic.
|
SPEC-NNSP.2
|
Nonlinear Acoustic Echo Cancellation Using a Hammerstein Model
L. Ngia,
J. Sjöberg (Chalmers University of Technology, Sweden)
In hands-free telephone or video conference application, there exists an acoustic feedback coupling between the loudspeaker and microphone, which creates the acoustic echo. Linear acoustic echo cancellers (AECs) are commonly used to remove this echo. However, they are unable to effectively cancel nonlinear distortions. This paper employs a Hammerstein model to describe the acoustic channel of a nonlinear system concatenated with a linear faded echo path. A feed-forward neural network is used to model the static nonlinearity and a Finite Impulse Response (FIR) structure is used to model the linear dynamic system. The formed nonlinear model is applied to real data collected in an anechoic chamber and it performs slightly better than linear models. Although the improvement is small, the results show some interesting insights on the characteristic of a loudspeaker's nonlinearities and their effect on the performance of an AEC.
|
SPEC-NNSP.3
|
Noise Reduction and Speech Enhancement via Temporal Anti-Hebbian Learning
M. Girolami (University of Paisley, Scotland, UK)
Temporal extensions of both linear and nonlinear anti-Hebbian learning have been shown to be suited to the problem of blind separation of sources from their convolved mixtures. This paper presents a generalized form of anti-Hebbian learning for a partially connected recurrent network based on the maximum likelihood estimation principle. Inspired by features of the binaural unmasking effect the network and associated online adaptation are applied to the enhancement of speech, which is corrupted by interfering noise, competing speech and reverberation. Graded simulations based on speech corrupted with increasingly complex levels of reverberation are reported. It is shown that for high levels of reverberation the proposed method compares favorably with classical adaptive filter approaches to speech enhancement in real acoustic environments.
|
SPEC-NNSP.4
|
A High Quality Text-to-Speech System Composed of Multiple Neural Networks
O. Karaali,
G. Corrigan,
N. Massey,
C. Miller,
O. Schnurr,
A. Mackie (Motorola, USA)
While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.
|
SPEC-NNSP.5
|
Fraud Detection in Communications Networks Using Neural and Probabilistic Methods
M. Taniguchi,
M. Haft,
J. Hollmen,
V. Tresp (Siemens AG, Corporate Technology, Germany)
Fraud detection refers to the attempt to detect illegitimate usage of a communications network. Three methods to detect fraud are presented. Firstly, a feed-forward neural network based on supervised learning is used to learn a discriminative function to classify subscribers using summary statistics. Secondly, Gaussian mixture model is used to model the probability density of subscribers' past behavior so that the probability of current behavior can be calculated to detect any abnormalities from the past behavior. Lastly, Bayesian networks are used to describe the statistics of a particular user and the statistics of different fraud scenarios. The Bayesian networks can be used to infer the probability of fraud given the subscribers' behavior. The data features are derived from toll tickets. The experiments show that the methods detect over 85 % of the fraudsters in our testing set without causing false alarms.
|
SPEC-NNSP.6
|
Neural Vision System and Applications in Image Processing and Analysis
L. Guan,
S. Perry,
R. Romagnoli,
H. Wong,
H. Kong (University of Sydney, Australia)
We present a computer vision system based on an integrated neural network architecture. In the low level vision subsystem, a network of networks - a biologically inspired network is used to recursively perform filtering,segmentation and edge detection; in the intermediate level and the high level, hierarchically structured arrays of self-organizing tree maps (SOTM) - extension of the popular self-organizing map are utilized to carry out image/feature analysis. The system has been applied to solve a number of real world problems. Some interesting and encouraging results will be reported.
|
SPEC-NNSP.7
|
Combining Time-Delayed Decorrelation and ICA: Towards Solving the Cocktail Party Problem
T. Lee (The Salk Institute, CNL, USA);
A. Ziehe (GMD, FIRST, Germany);
R. Orglmeister (Berlin University of Technology, Germany);
T. Sejnowski (The Salk Institute, CNL, USA)
We present methods to separate blindly mixed signals recorded in a room. The learning algorithm is based on the information maximization in a single layer neural network. We focus on the implementation of the learning algorithm and on issues that arise when separating speakers in room recordings. We used an infomax approach in a feedforward neural network implemented in the frequency domain using the polynomial filter matrix algebra technique. Fast convergence speed was achieved by using a time-delayed decorrelation method as a preprocessing step. Under minimum-phase mixing conditions this preprocessing step was sufficient for the separation of signals. These methods successfully separated a recorded voice with music in the background (cocktail party problem). Finally, we discuss problems that arise in real world recordings and their potential solutions.
|
|