DSP Architectures and Implementations

Home


A Memory Efficient Array Architecture for Full-Search Block Matching Algorithm

Authors:

Vasily G. Moshnyaga, Kyoto University (Japan)
Keikichi Tamaru, Kyoto University (Japan)

Volume 5, Page 4109

Abstract:

This paper proposes a novel array architecture for full-search block matching motion estimation. The design efforts are focused on transforming the array computation in a way that minimizes the memory and I/O costs while satisfying the highest throughput requirements. Compared with the existing architectures, this one ensures feasible solutions for the HDTV picture format with twice lower memory requirements, minimal I/O pin count and 100% processor utilization. The architecture features regular and simple interconnects and is quite suitable for VLSI implementation.

ic974109.pdf

ic974109.pdf

TOP



New Systolic Arrays for Computation of the 1-D Discrete Wavelet Transform

Authors:

Sung Bum Pan, Sogang University (Korea)
Rae-Hong Park, Sogang University (Korea)

Volume 5, Page 4113

Abstract:

This paper proposes a new systolic array architectures for computation of the 1-D discrete wavelet transform (DWT). The proposed systolic array consists of $L$ processing element (PE) arrays, where $L$ denotes the number of levels. The proposed PE array computes only the product terms that are required for further computation in higher levels, and the outputs of lowpass and highpass filters are computed in alternate clock cycles. Therefore, the proposed architectures can compute the one-level DWT using a single architecture. Note that the proposed architectures do not require extra processing units whereas the conventional architectures need ones. The required time and hardware cost for computation of the DWT using the proposed systolic arrays are comparable to that of the conventional architectures. The proposed architectures can also be applied to subband decomposition by simply changing the filter coefficients.

ic974113.pdf

ic974113.pdf

TOP



Efficient VLSI Architectures of Adaptive Equalizers for QAM/VSB Transmission

Authors:

Seung Soo Chae, Sogang University (Korea)
Sung Bum Pan, Sogang University (Korea)
Gi Hun Lee, Sogang University (Korea)
Rae-Hong Park, Sogang University (Korea)
Byung-Uk Lee, Sogang University (Korea)

Volume 5, Page 4117

Abstract:

Thiss paper proposes hardware architectures of adaptive equalizers applicable to both the quadrature amplitude modulation (QAM) and vestigial sideband modulation (VSB) systems. This paper presents digitization methods for QAM and VSB systems requiring a low hardware cost with the performance comparable to that of the algorithm employing floating-point operations. To reduce the hardware cost of the high definition television (HDTV) equalizers, we also propose a pipelined architecture that processes some parallel parts sequentially

ic974117.pdf

ic974117.pdf

TOP



A Fast Affine Projection Algorithm for an Acoustic Echo Canceller using a Fixed-Point DSP Processor

Authors:

Stephen Oh, Texas Instruments (U.S.A.)
Bill Priest, Texas Instruments (U.S.A.)
Darel Linebarger, University of Texas (U.S.A.)
Balaji Raghothaman, University of Texas (U.S.A.)

Volume 5, Page 4121

Abstract:

We present an empirical analysis of the fast affine projection (FAP) algorithm to be used in an acoustic echo cancellation application using a fixed point DSP processor. We also introduce a modified FAP algorithm that was developed based on our FAP study. Our analysis study shows that the modified FAP algorithm is more robust and provides more consistent performance than the LMS algorithm. The new FAP algorithm is also numerically efficient and easy to implement with a fixed point DSP processor.

ic974121.pdf

ic974121.pdf

TOP



A New Pipelined Architecture of the LMS Algorithm without Degradation of Convergence Characteristics

Authors:

Katsushige Matsubara, Univ. Tokyo Metropolitan (Japan)
Kiyoshi Nishikawa, Univ. Tokyo Metropolitan (Japan)
Hitoshi Kiya, Univ. Tokyo Metropolitan (Japan)

Volume 5, Page 4125

Abstract:

This paper proposes an adaptive algorithm, which can be pipelined, as an extension of the delayed least mean square (DLMS) adaptive algorithm. The proposed algorithm provides a capability to achieve high throughput with less degradation of the convergence characteristic than the DLMS algorithm. An architecture for pipelined implementation of the proposed algorithm is considered, and based on this, the conditions for the implementaion are derived. An efficient implementation of the architecture with less hardware is also considered.

ic974125.pdf

ic974125.pdf

TOP



The Single Channel Interferometer Using A Pseudo-Doppler Direction Finding System

Authors:

David Peavey, Delfin Systems (U.S.A.)
Tokunbo Ogunfunmi, EE Dept. Santa Clara University, Santa Clara (U.S.A.)

Volume 5, Page 4129

Abstract:

A new technique for obtaining high performance, low power, radio direction finding (RDF) using a single receiver is presented. For man-portable applications, multichannel systems consume too much power, are too expensive, and are too heavy to easily be carried by a single individual. Most single channel systems are not accurate enough or do not provide the capability to listen while direction finding (DF) is being performed. By employing feedback in a pseudo-Doppler system via a vector modulator in the IF of a single receiver and an adaptive algorithm to control it, the accuracy of a pseudo-Doppler system can be enhanced to the accuracy of an interferometer based system without the expense of a multichannel receiver. And, it will maintain audio listen-through while direction finding is being performed - all with a single inexpensive low power receiver. The use of these techniques provides performance not attainable by other single channel methods.

ic974129.pdf

ic974129.pdf

TOP



Feature Extraction of a Single Dihedral Reflector from SAR Data

Authors:

Zheng-She Liu, University of Florida (U.S.A.)
Jian Li, University of Florida (U.S.A.)

Volume 5, Page 4133

Abstract:

As one of the key steps in the feature extraction of targets consisting of both trihedral and dihedral corner reflectors via synthetic aperture radar, this paper studies the problem of estimating the parameters of a single dihedral corner reflector. The data model of the problem and the Cramér-Rao bounds (CRBs) for the parameter estimates of the data model are presented. Two algorithms, the FFTB (fast Fourier transform based) algorithm and the NLS (non-linear least squares) algorithm, are devised to estimate the model parameters. Numerical examples show that the parameter estimates obtained with both algorithms approach the CRBs as the signal-to-noise ratio increases. The parameter estimates obtained with the NLS algorithm start to achieve the CRB at a lower SNR than those with the FFTB algorithm, while the latter algorithm is computationally more efficient.

ic974133.pdf

ic974133.pdf

TOP



Solving the SVD updating problem for subspace tracking on a fixed sized linear array of processors

Authors:

Chaitali Sengupta, ECE Dept, Rice University (U.S.A.)
Joseph Cavallaro, ECE Dept, Rice University (U.S.A.)
Behnaam Aazhang, ECE Dept, Rice University (U.S.A.)

Volume 5, Page 4137

Abstract:

This paper addresses the problem of tracking the covariance matrix eigenstructure, based on SVD (Singular Value Decomposition) updating, of a time-varying data matrix formed from received vectors. This problem occurs frequently in signal processing applications such as adaptive beamforming, direction finding, spectral estimation, etc. As this problem needs to be solved in real time, it is natural to look for a parallel algorithm so that computation time can be reduced by distributing the work among a number of processing units. This paper proposes a parallel scheme for SVD updating that can be implemented on a fixed sized array of off-the-shelf processors, to get speedups close to the number of processors used.

ic974137.pdf

ic974137.pdf

TOP



Hardware Implementation of a Systolic Antenna Array Signal Processor Based on CORDIC Arithmetic

Authors:

Bruno Haller, ETHZ (Switzerland)
Matthias Streiff, ETHZ (Switzerland)
Urs Fleisch, ETHZ (Switzerland)
Reto Zimmermann, ETHZ (Switzerland)

Volume 5, Page 4141

Abstract:

In this paper we present the practical hardware implementation of a systolic array for performing recursive least-squares minimisation via orthogonal matrix triangularisation. This is an extremely demanding task for high speed, real-time operation such as required in many modern adaptive antenna, radar, and sonar systems. Since the underlying Givens rotations can be efficiently computed by the CORDIC algorithm, we have implemented a dedicated CORDIC processor element (CPE) in an ASIC. All the required calculations are carried out by a network of these small and simple circuits, which are suitable for constructing a high performance systolic array, either based on MCM technology or as a macro-cell building block for a very highly integrated single chip solution. The design of an adaptive antenna signal processor is described in a top-down manner, from the proposed algorithm down to the bit-level details of the realised component.

ic974141.pdf

ic974141.pdf

TOP



A Vector Architecture for Higher-Order Moments Estimation

Authors:

José C. Alves, INESC/FEUP (Portugal)
André Puga, INESC/FEUP (Portugal)
Luís Corte-Real, INESC/FEUP (Portugal)
José S. Matos, INESC/FEUP (Portugal)

Volume 5, Page 4145

Abstract:

Higher-order statistics extend the analysis methods of non-linear systems and non-gaussian signals based on the autocorrelation and power spectrum. The main drawback of their use in real time applications is the high complexity of their estimation due to the large number of arithmetic operations. This paper presents an experimental vector architecture for the estimation of the higher-order moments. The processor's core is a pipelined multiply-accumulate unit that receives four data vectors and computes in parallel the moment taps up to the fourth-order. The design of custom cache memory organization and address generation circuits has led to more than 11 operations per clock cycle. The architecture was modeled and simulated in Verilog and is presently being implemented in XILINX field-programmable gate arrays (FPGAs) and one custom integrated cicuit for the multiply-accumulate unit.

ic974145.pdf

ic974145.pdf

TOP



Parallel Bit-Stream Neurohardware for Blind Separation of Sources

Authors:

Karsten Fanghänel, UniBw Hamburg (Germany)
Kuno Köllmann, UniBw Hamburg (Germany)
Hans Christoph Zeidler, UniBw Hamburg (Germany)
Ralf Pleßmann, C. Plath (Germany)
Karl-Ragmar Riemschneider, C. Plath (Germany)

Volume 5, Page 4149

Abstract:

In this paper the use of digital stochastic computing is proposed to realize a recurrent network for blind separation of undelayed linearly superposed signals into their original components. The stochastic representation of signals allows to design very simple digital processing elements to implement all the operations necessary for the algorithm of Herault and Jutten. A first hardware implementation has been designed and the experimental results are presented.

ic974149.pdf

ic974149.pdf

TOP



Using PhiPAC to Speed Error Back-Propagation Learning

Authors:

Karsten Fanghänel, UniBw Hamburg (Germany)
Kuno Köllmann, UniBw Hamburg (Germany)
Hans Christoph Zeidler, UniBw Hamburg (Germany)
Ralf Pleßmann, C. Plath (Germany)
Karl-Ragmar Riemschneider, C. Plath (Germany)
Jeff Bilmes, ICSI / UC Berkeley (U.S.A.)
Krste Asanovic, ICSI / UC Berkeley (U.S.A.)
Chee-whye Chin, ICSI / UC Berkeley (U.S.A.)
Jim Demmel, ICSI / UC Berkeley (U.S.A.)

Volume 5, Page 4153

Abstract:

We introduce PHiPAC, a coding methodology for developing portable high-performance numerical libraries in ANSI C. Using this methodology, we have developed code for optimized matrix multiply routines. These routines can achieve over 90% of peak performance on a variety of current workstations, and are often faster than vendor-supplied optimized libraries. We then describe the bunch-mode back-propagation algorithm and how it can use the PHiPAC derived matrix multiply routines. Using a set of plots, we investigate the tradeoffs between bunch size, convergence rate, and training speed using a standard speech recognition data set and show how use of the PHiPAC routines can lead to a significantly faster back-propagation learning algorithm.

ic974153.pdf

ic974153.pdf

TOP