Leilei Song, University of Minnesota (U.S.A.)
Keshab K. Parhi, University of Minnesota (U.S.A.)
This paper presents a low-area finite field divider using dual basis representation. This divider is based on the division algorithm of solving Discrete Wiener-Hopf Equation using Gauss-Jordan elimination method. The hardware complexity of the matrix generation part has been reduced dramatically form $O(m^2)$ to $O(m)$. When it is used as a building block for a large system, this divider can achieve more savings in hardware by utilizing sub-structure sharing techniques.
Wolfram Drescher, Technical University of Dresden (Germany)
Kay Bachmann, Technical University of Dresden (Germany)
Gerhard P. Fettweis, Technical University of Dresden (Germany)
This paper examines the implementation of Finite Field arithmetic, i.e. multiplication, division, and exponentiation, for any standard basis $GF(2^m)$ with $m<=8$ on a DSP datapath. We introduce an opportunity to exploit cells and the interconnection structure of a typical binary multiplier unit for the Finite Field operations by adding just a small overhead of logic. We develop division and exponentiation based on multiplication on the algorithm level and present a simple scheme for implementation of all operations on a processor datapath.
Seunghyeon Nahm, Seoul National University (Korea)
Wonyong Sung, Seoul National University (Korea)
This paper describes a new direction sequence generation method for the circular CORDIC algorithm. A conventional approach employs an angle computation algorithm to control the direction of rotation in the form of a sign sequence, where the sign generation is a bottle-neck for the fast implementations. The proposed method reduces the number of sequential computations by employing a new angle representation model and linearizing the arctangent function in small angles. The direction sequence can be generated by about a third of the iterative computations required in the conventional algorithm, which also reduces the hardware requirements as much. Especially, this algorithm is attractive when pipelining is not allowed for feedback control, such as found in phase tracking applications. A VLSI implementation example for a high-speed quadrature demodulator is also discussed.
Chieh-Chih Li, Industrial Research Institute (Taiwan)
Sau-Gee Chen, Nat. Chiao Tung University (Taiwan)
In this work, a fast radix-4 redundant CORDIC algorithm with variable scale factor is proposed. The algorithm includes an on-line scale factor decomposition algorithm that transforms the complicated variable scale factor into a sequence of simple shift-and-add operations and does the variable scale factor compensation in the same fashion. On the other hand, the on-line decomposition algorithm itself can be realized with a simple and fast hardware. The new CORDIC algorithm has the smallest number of 0.8n iterations among all the CORDIC algorithms, which requires only about two-third rotation number that of the existing best (hybrid radix-2 and radix-4) redundant algorithms. Therefore, the new algorithm achieves fast rotation iterations, high-speed and low-overhead scale factor compensations, which are hard to attain simultaneously for the existing algorithms. The on-line scale factor compensation can be also applied to the existing on-line CORDIC algorithms.
Jun P. Ma, University of Minnesota (U.S.A.)
Keshab K. Parhi, University of Minnesota (U.S.A.)
Ed F. Deprettere, Delft University of Technology (The Netherlands)
Cordic based IIR digital filters possess desirable properties for VLSI implementation such as local connection, regularity, and good finite word-length behavior, but can't be pipelined to finer levels (such as bit or multi-bit levels) due to the presence of feedback loops. In this paper, a pipelining method for the cordic based IIR digital filters is proposed using the constrained filter design methods and the polyphase decomposition technique. Using this method, the filter sample rate can be increased to any desired level.
Chris J. Myers, University of Utah (U.S.A.)
Hao Zheng, University of Utah (U.S.A.)
We present an efficient asynchronous VLSI architecture for calculating running maximum or minimum values over a sliding window. Running maximums or minimums are very useful for many signal and image processing tasks. Our architecture performs the calculation using the MAXLIST algorithm. In order to take advantage of the wide delay variations due to data-dependencies and operating conditions, an asynchronous approach is taken to achieve higher performance and lower power. Simulation results demonstrate that our asynchronous architecture is significantly faster than existing and potential synchronous architectures.
Wolfgang Wilhelm, RWTH Aachen (Germany)
Tobias Noll, RWTH Aachen (Germany)
A systematic mapping approach leading to efficient VLSI-architectures for FIR-filters with a wide range of system parameters is presented. This approach is subdivided into two steps. In the first step the folding technique is applied at bit-level. The free parameters of this technique are then fixed in the second step according to guidelines which are derived from design-strategies for efficient VLSI-architectures. For many applications this approach leads to a reduced hardware complexity in comparison with state-of-the-art techniques. In addition, regularity and scalability of the resulting architectures keep the design effort small. In order to demonstrate the efficiency and the flexibility of this approach a new class of efficient time-shared FIR-filters for adaptive equalizing and a new class of efficient matched filters for rapid code acquisition in spread spectrum receivers are presented.