Bert de Vries, David Sarnoff Research Center (U.S.A.)
A method for adaptive (on-line) pruning and constructing a (layered) computational network is introduced. The dimensions of the network are updated for every new available sample, which makes this technique highly suitable for tracking nonstationary sources. This method extends work on predictive least squares by Rissanen [1] and Wax [2] to an adaptive updating scheme. The algorithm is demonstrated by an application to adaptive prediction of exchange rates.
Fabio M. Frattale Mascioli, University of Rome (Italy)
Giuseppe Martinelli, University of Rome (Italy)
Antonello Rizzi, University of Rome (Italy)
We propose a constructive method, inspired by Simpsons Min-Max technique, for obtaining fuzzy neural networks. It adopts a cost function depending on a unique net parameter. This feature allows us to apply a simple unimodal search for determining this parameter and hence the architecture of the optimal net. The algorithm shows a good behavior with respect to other methods when applied to real classification problems. Due to the adopted fuzzy membership functions, it is particularly indicated when the classes are extremely overlapped (for instance, in the case of biological data). Some results at this regard are reported in the paper.
Paulo J.S.G. Ferreira, University of Aveiro (Portugal)
The aim of this paper is to discuss a nonlinear approximation problem relevant to the approximation of data by radial-basis-function neural networks. The approximation is based on superpositions of translated Gaussians. The method used enables us to give explicit approximations and error bounds. New connections between this problem and sampling theory are exposed, but the method used departs radically from those commonly used to obtain sampling results since (i) it applies to signals that are not band-limited, and possibly even discontinuous (ii) the sampling knots (the centers of the radial-basis functions) need not be equidistant (iii) the basic approximation building block is the Gaussian, not the usual sinc kernel. The results given offer an answer to the following problem: how complex should a neural network be in order to be able to approximate a given signal to better than a certain prescribed accuracy? The results show that $O(1/N)$ accuracy is possible with a network of $N$ basis functions.
Ajit V. Rao, UCSB (U.S.A.)
David J. Miller, Pennsylvania State University (U.S.A.)
Kenneth Rose, UCSB (U.S.A.)
Allen Gersho, UCSB (U.S.A.)
A new and effective design method is presented for statistical regression functions that belong to the class of mixture models. The class includes the hierarchical mixture of experts (HME) and the normalized radial basis functions (NRBF). Design algorithms based on the maximum likelihood (ML) approach, which emphasize a probabilistic description of the model, have attracted much interest in HME and NRBF models. However, their design objective is mismatched to the original squared-error regression cost and the algorithms are easily trapped by poor local minima on the cost surface. In this paper, we propose an extension of the deterministic annealing (DA) method for the design of mixture-based regression models. We construct a probabilistic framework, but unlike the ML method, we directly optimize the squared-error regression cost, while avoiding poor local minima. Experimental results show that the DA method outperforms standard design methods for both HME and NRBF regression models.
Lars Kai Hansen, IMM, DTU, Lyngby (Denmark)
Jan Larsen, IMM, DTU, Lyngby (Denmark)
Torben Fog, IMM, DTU, Lyngby (Denmark)
This paper addresses the problem of generalization error estimation in neural networks. A new early stop criterion based on a Bootstrap estimate of the generalization error is suggested. The estimate does not require the network to be trained to the minimum of the cost function, as required by other methods based on asymptotic theory. Moreover, in contrast to methods based on cross-validation which require data left out for testing, and thus biasing the estimate, the Bootstrap technique does not have this disadvantage. The potential of the suggested technique is demonstrated on various time-series problems.
Mohamed Ibnkahla, ENSEEIHT (France)
In many neural network applications to signal processing, the back propagation (BP) algorithm is used for the training process. Recently, several authors have analyzed the behavior of the BP algorithm and studied its properties. The influence of the number of layers on the performance and convergence behavior of the BP algorithm remains, however, not well known. The paper tries to investigate this problem by studying a simplified multi-layer neural network used for adaptive filtering. The analysis is based upon the derivation of recursions for the mean weight update which can be used to predict the weights and mean squared error over time. The paper shows also the effects of the algorithm step size and the initial weight values upon the algorithm behavior. Computer simulations display good agreement between the actual behavior and the predictions of the theoretical model. The properties of the BP algorithm are illustrated through several simulation examples and compared to the classical LMS algorithm.
Xiao Liu, University of Maryland (U.S.A.)
Tülay Adal, University of Maryland (U.S.A.)
The recurrent canonical piecewise linear (RCPL) network is applied to nonlinear blind equalization by generalizing Donoho's minimum entropy deconvolution approach. We first study the approximation ability of the canonical piecewise linear (CPL) network and the CPL based distribution learning for blind equalization. We then generalize these conclusions to the RCPL network. We show that nonlinear blind equalization can be achieved by matching the distribution of the channel input with that of the RCPL equalizer output. A new blind equalizer structure is constructed by using RCPL network and decision feedback. We discuss application of various cost functions to RCPL based equalization and present experimental results that demonstrate the successful application of RCPL network to blind equalization.
Jong-Min Park, University of Wisconsin-Madison (U.S.A.)
An adaptive on-line learning method is presented to faciliate pattern classification using active sampling to identify optimal decision boundary for a stochastic oracle with minimum number of training samples. The strategy of sampling at the current estimate of the decision boundary is shown to be optimal in the sense that the probability of convergence toward the true decision boundary at each step is maximized, offering theoretical justification on the popular strategy of category boundary sampling used by many query learning algorithms. Analysis of convergence in distribution is formulated using the Markov chain model.
Daniel T. Davis, University of Washington (U.S.A.)
Jenq-Neng Hwang, University of Washington (U.S.A.)
Inverse problems have been often considered ill-posed, i.e., the statement of the problem does not thoroughly constrain the solution space. In this paper we take advantage of this lack of information by adding informative constraints to the problem solution using Bayesian methodology. Remote sensing problems afford opportunities for inclusion of ground truth information, prior probabilities, noise distributions, and other informative constraints within a Bayesian probabilistic framework. We apply Bayesian methods to a synthetic remote sensing problem, showing that the performance is superior to a previously published method of iterative inversion of neural networks. In addition, we show that the addition of ground truth information, naturally included through Bayesian modeling, provides a significant performance improvement.
Zhuang Xinhua, University of Missouri (U.S.A.)
Yunxin Zhao, University of Illinois (U.S.A.)
This paper is theoretical. We present sufficient and ``almost'' necessary conditions for learning compatibility coefficients in relaxation labeling whose satisfaction will guarantee each desired sample labeling to become consistent and each ambiguous or erroneous input sample labeling to be attracted to the corresponding desired sample labeling. The derived learning conditions are parallel and local information based. In fact, they are organized as linear inequalities in unit wise and thus the perceptron like algorithms can be used to solve them efficiently with finite convergence.
Fa-Long Luo, University of Erlangen-Nuremberg (Germany)
Rolf Unbehauen, University of Erlangen-Nuremberg (Germany)
This paper proposes a generalized nonlinear minor component analysis algorithm. First, we will prove that with appropriate nonlinear functions the proposed algorithm can extract adaptively the minor component. Then we will discuss how to choose the related nonlinear functions so as to guarantee the desired convergence. Furthermore, we will show that all the other available minor component analysis algorithms are special cases of this proposed generalized algorithm. Finally, the complex-valued version of the proposed algorithm will be given in this paper for wider applications. In addition, this proposed minor component analysis algorithm can also be used to extract the principal component by simply reversing the sign of the corresponding terms.