Vasily G. Moshnyaga, Kyoto University (Japan)
Keikichi Tamaru, Kyoto University (Japan)
This paper proposes a novel array architecture for full-search block matching motion estimation. The design efforts are focused on transforming the array computation in a way that minimizes the memory and I/O costs while satisfying the highest throughput requirements. Compared with the existing architectures, this one ensures feasible solutions for the HDTV picture format with twice lower memory requirements, minimal I/O pin count and 100% processor utilization. The architecture features regular and simple interconnects and is quite suitable for VLSI implementation.
Sung Bum Pan, Sogang University (Korea)
Rae-Hong Park, Sogang University (Korea)
This paper proposes a new systolic array architectures for computation of the 1-D discrete wavelet transform (DWT). The proposed systolic array consists of $L$ processing element (PE) arrays, where $L$ denotes the number of levels. The proposed PE array computes only the product terms that are required for further computation in higher levels, and the outputs of lowpass and highpass filters are computed in alternate clock cycles. Therefore, the proposed architectures can compute the one-level DWT using a single architecture. Note that the proposed architectures do not require extra processing units whereas the conventional architectures need ones. The required time and hardware cost for computation of the DWT using the proposed systolic arrays are comparable to that of the conventional architectures. The proposed architectures can also be applied to subband decomposition by simply changing the filter coefficients.
Seung Soo Chae, Sogang University (Korea)
Sung Bum Pan, Sogang University (Korea)
Gi Hun Lee, Sogang University (Korea)
Rae-Hong Park, Sogang University (Korea)
Byung-Uk Lee, Sogang University (Korea)
Thiss paper proposes hardware architectures of adaptive equalizers applicable to both the quadrature amplitude modulation (QAM) and vestigial sideband modulation (VSB) systems. This paper presents digitization methods for QAM and VSB systems requiring a low hardware cost with the performance comparable to that of the algorithm employing floating-point operations. To reduce the hardware cost of the high definition television (HDTV) equalizers, we also propose a pipelined architecture that processes some parallel parts sequentially
Stephen Oh, Texas Instruments (U.S.A.)
Bill Priest, Texas Instruments (U.S.A.)
Darel Linebarger, University of Texas (U.S.A.)
Balaji Raghothaman, University of Texas (U.S.A.)
We present an empirical analysis of the fast affine projection (FAP) algorithm to be used in an acoustic echo cancellation application using a fixed point DSP processor. We also introduce a modified FAP algorithm that was developed based on our FAP study. Our analysis study shows that the modified FAP algorithm is more robust and provides more consistent performance than the LMS algorithm. The new FAP algorithm is also numerically efficient and easy to implement with a fixed point DSP processor.
Katsushige Matsubara, Univ. Tokyo Metropolitan (Japan)
Kiyoshi Nishikawa, Univ. Tokyo Metropolitan (Japan)
Hitoshi Kiya, Univ. Tokyo Metropolitan (Japan)
This paper proposes an adaptive algorithm, which can be pipelined, as an extension of the delayed least mean square (DLMS) adaptive algorithm. The proposed algorithm provides a capability to achieve high throughput with less degradation of the convergence characteristic than the DLMS algorithm. An architecture for pipelined implementation of the proposed algorithm is considered, and based on this, the conditions for the implementaion are derived. An efficient implementation of the architecture with less hardware is also considered.
David Peavey, Delfin Systems (U.S.A.)
Tokunbo Ogunfunmi, EE Dept. Santa Clara University, Santa Clara (U.S.A.)
A new technique for obtaining high performance, low power, radio direction finding (RDF) using a single receiver is presented. For man-portable applications, multichannel systems consume too much power, are too expensive, and are too heavy to easily be carried by a single individual. Most single channel systems are not accurate enough or do not provide the capability to listen while direction finding (DF) is being performed. By employing feedback in a pseudo-Doppler system via a vector modulator in the IF of a single receiver and an adaptive algorithm to control it, the accuracy of a pseudo-Doppler system can be enhanced to the accuracy of an interferometer based system without the expense of a multichannel receiver. And, it will maintain audio listen-through while direction finding is being performed - all with a single inexpensive low power receiver. The use of these techniques provides performance not attainable by other single channel methods.
Zheng-She Liu, University of Florida (U.S.A.)
Jian Li, University of Florida (U.S.A.)
As one of the key steps in the feature extraction of targets consisting of both trihedral and dihedral corner reflectors via synthetic aperture radar, this paper studies the problem of estimating the parameters of a single dihedral corner reflector. The data model of the problem and the Cramér-Rao bounds (CRBs) for the parameter estimates of the data model are presented. Two algorithms, the FFTB (fast Fourier transform based) algorithm and the NLS (non-linear least squares) algorithm, are devised to estimate the model parameters. Numerical examples show that the parameter estimates obtained with both algorithms approach the CRBs as the signal-to-noise ratio increases. The parameter estimates obtained with the NLS algorithm start to achieve the CRB at a lower SNR than those with the FFTB algorithm, while the latter algorithm is computationally more efficient.
Chaitali Sengupta, ECE Dept, Rice University (U.S.A.)
Joseph Cavallaro, ECE Dept, Rice University (U.S.A.)
Behnaam Aazhang, ECE Dept, Rice University (U.S.A.)
This paper addresses the problem of tracking the covariance matrix eigenstructure, based on SVD (Singular Value Decomposition) updating, of a time-varying data matrix formed from received vectors. This problem occurs frequently in signal processing applications such as adaptive beamforming, direction finding, spectral estimation, etc. As this problem needs to be solved in real time, it is natural to look for a parallel algorithm so that computation time can be reduced by distributing the work among a number of processing units. This paper proposes a parallel scheme for SVD updating that can be implemented on a fixed sized array of off-the-shelf processors, to get speedups close to the number of processors used.
Bruno Haller, ETHZ (Switzerland)
Matthias Streiff, ETHZ (Switzerland)
Urs Fleisch, ETHZ (Switzerland)
Reto Zimmermann, ETHZ (Switzerland)
In this paper we present the practical hardware implementation of a systolic array for performing recursive least-squares minimisation via orthogonal matrix triangularisation. This is an extremely demanding task for high speed, real-time operation such as required in many modern adaptive antenna, radar, and sonar systems. Since the underlying Givens rotations can be efficiently computed by the CORDIC algorithm, we have implemented a dedicated CORDIC processor element (CPE) in an ASIC. All the required calculations are carried out by a network of these small and simple circuits, which are suitable for constructing a high performance systolic array, either based on MCM technology or as a macro-cell building block for a very highly integrated single chip solution. The design of an adaptive antenna signal processor is described in a top-down manner, from the proposed algorithm down to the bit-level details of the realised component.
José C. Alves, INESC/FEUP (Portugal)
André Puga, INESC/FEUP (Portugal)
Luís Corte-Real, INESC/FEUP (Portugal)
José S. Matos, INESC/FEUP (Portugal)
Higher-order statistics extend the analysis methods of non-linear systems and non-gaussian signals based on the autocorrelation and power spectrum. The main drawback of their use in real time applications is the high complexity of their estimation due to the large number of arithmetic operations. This paper presents an experimental vector architecture for the estimation of the higher-order moments. The processor's core is a pipelined multiply-accumulate unit that receives four data vectors and computes in parallel the moment taps up to the fourth-order. The design of custom cache memory organization and address generation circuits has led to more than 11 operations per clock cycle. The architecture was modeled and simulated in Verilog and is presently being implemented in XILINX field-programmable gate arrays (FPGAs) and one custom integrated cicuit for the multiply-accumulate unit.
Karsten Fanghänel, UniBw Hamburg (Germany)
Kuno Köllmann, UniBw Hamburg (Germany)
Hans Christoph Zeidler, UniBw Hamburg (Germany)
Ralf Pleßmann, C. Plath (Germany)
Karl-Ragmar Riemschneider, C. Plath (Germany)
In this paper the use of digital stochastic computing is proposed to realize a recurrent network for blind separation of undelayed linearly superposed signals into their original components. The stochastic representation of signals allows to design very simple digital processing elements to implement all the operations necessary for the algorithm of Herault and Jutten. A first hardware implementation has been designed and the experimental results are presented.
Karsten Fanghänel, UniBw Hamburg (Germany)
Kuno Köllmann, UniBw Hamburg (Germany)
Hans Christoph Zeidler, UniBw Hamburg (Germany)
Ralf Pleßmann, C. Plath (Germany)
Karl-Ragmar Riemschneider, C. Plath (Germany)
Jeff Bilmes, ICSI / UC Berkeley (U.S.A.)
Krste Asanovic, ICSI / UC Berkeley (U.S.A.)
Chee-whye Chin, ICSI / UC Berkeley (U.S.A.)
Jim Demmel, ICSI / UC Berkeley (U.S.A.)
We introduce PHiPAC, a coding methodology for developing portable high-performance numerical libraries in ANSI C. Using this methodology, we have developed code for optimized matrix multiply routines. These routines can achieve over 90% of peak performance on a variety of current workstations, and are often faster than vendor-supplied optimized libraries. We then describe the bunch-mode back-propagation algorithm and how it can use the PHiPAC derived matrix multiply routines. Using a set of plots, we investigate the tradeoffs between bunch size, convergence rate, and training speed using a standard speech recognition data set and show how use of the PHiPAC routines can lead to a significantly faster back-propagation learning algorithm.