VLSI Building Blocks

Chair: John V. McCanny, The Queens University of Belfast, UK

Home

A 35uW 1.1V Gate Array 8x8 IDCT Processor for Video-Telephony

Authors:

Roberto Rambaldi, Universita di Bologna (Italy)
Alessandro Uguzzoni, Universita di Bologna (Italy)
Roberto Guerrieri, Universita di Bologna (Italy)

Volume 5, Page 2993, Paper number 1503

Abstract:

We have designed and fabricated a low power IC to perform the Inverse 8X8 DCT transform according to the CCITT precision specifications, suitable for portable video communication devices. Several design techniques have been used to reduce the power, such as a fast algorithm, an architecture that can exploit input signal correlation, and large amount of parallelism. The chip is fabricated in a triple metal 0.5um Gate Array CMOS technoloy. The maximum throughput is 400 Kpix/s at 1.1V, and 27 Mpix/s at 3.3V and the measured power consumption is 35 uW for typical image sequences in color QCIF format at 10 frames/sec with a 1.1 V power supply.

ic981503.pdf (From Postscript)

TOP

Discrete Cosine Transform Generator for VLSI Synthesis

Authors:

Jill K Hunter, The Queens University of Belfast (Northern Ireland)
John V. McCanny, The Queens University of Belfast (Northern Ireland)

Volume 5, Page 2997, Paper number 1053

Abstract:

A generator for the automated design of Discrete Cosine Transform (DCT) cores is presented. This can be used to rapidly create silicon circuits from a high level specification. These compare very favourably with existing designs. The DCT cores produced are scaleable in terms of point size as well as input/output and coefficient wordlengths. This provides a high degree of flexibility. An example, 8-point 1D DCT design, produced occupies less than 0.92 mm2 when implemented in a 0.35m double level metal CMOS technology. This can be clocked at a rate of 100MHz.

ic981053.pdf (From Postscript)

TOP

A New Architecture for In-Memory Image Convolution

Authors:

Vasily G. Moshnyaga, Kyoto University (Japan)
Kazuhiro Suzuki, Kyoto University (Japan)
Keikichi Tamaru, Kyoto University (Japan)

Volume 5, Page 3001, Paper number 2443

Abstract:

A new memory-based architecture for real-time image convolution with variable kernels is proposed. The architecture exploits the highest possible bandwidth inherent in memory and achieves the fine-grain parallelism of computations inside the memory. Unlike existing approaches, the architecture ensures convolution with very large kernels under the real time constraints of video applications. It does not require external memory banks or large I/O count and features single chip VLSI implementation.

ic982443.pdf (From Postscript)

TOP

Reconfigurable Hardware for Efficient Implementation of Programmable FIR Filters

Authors:

Tracy C Denk, Lucent Technologies (U.S.A.)
Chris J Nicol, Lucent Technologies (U.S.A.)
Patrik Larsson, Lucent Technologies (U.S.A.)
Kamran Azadet, Lucent Technologies (U.S.A.)

Volume 5, Page 3005, Paper number 2535

Abstract:

We present the architecture of a programmable FIR filter for use in DSP and communication applications. A filter with this architecture is capable of running a wide variety of single-rate and multirate filtering algorithms with low latency. Flexibility isachieved by distributed register files that store input data and filter coefficients. The functionality of the filter is programmed by a set of pipelined control signals that are independent of the filter length. We demonstrate how to generate these control signals for a variety of configurations. In addition to its flexibility, the architecture is scalable, modular, and has no broadcast signals, making it ideally suited for VLSI implementations.

ic982535.pdf (From Postscript)

TOP

Low Power FIR Filter Realization with Differential Coefficients and Input

Authors:

Tian-Sheuan Chang, National Chiao-Tung University (Taiwan)
Chein-Wei Jen, National Chiao-Tung University (Taiwan)

Volume 5, Page 3009, Paper number 2260

Abstract:

Most FIR filter realizations use the inputs and coefficients directly to compute the convolution. In this paper, we present a low power and high speed FIR filter designs by using first order difference between inputs and various orders of differences between coefficients. This design first reformulates the FIR operations with the differences in algorithm level. Then, in architecture level, we adopt the DA architecture to exploit the probability distribution such that power consumption can be reduced further. The design is applied to an example FIR filter to quantify the energy savings and speedup. It shows lower power consumption than the previous design with the comparable performance.

ic982260.pdf (From Postscript)

TOP

A New Approach to Data Conversion: Direct Analog-to-Residue Converter

Authors:

Damu Radhakrishnan, Nanyang Technological University (Singapore)
Adimathara P Preethy, Nanyang Technological University (Singapore)

Volume 5, Page 3013, Paper number 1893

Abstract:

A novel design of a direct analog-to-residue converter is presented in this paper. The design makes use of two successive approximation analog-to-digital (A/D) converters, a few modulo adders and a small look-up table. One of the digital-to-analog converters is modified to generate outputs which are weighted by a constant factor, and one of the comparators is replaced by a difference amplifier. The look-up table needed is a very small percentage of the entire chip area and is shown to be only 840 bytes for a 36 bit residue number sytem converter.

ic981893.pdf (From Postscript)

TOP

Low Power Signal Processing Architectures Using Residue Arithmetic

Authors:

Manish Bhardwaj, Siemens Components Pte Ltd (Singapore)
Arjun Balaram, Siemens Components Pte Ltd (Singapore)

Volume 5, Page 3017, Paper number 2409

Abstract:

Recent trends like increasing frequencies, larger die sizes and demand for greater portability make power reduction a hard taskmaster. It is acknowledged that the greatest returns come from optimisations at the architectural and technology level. In this paper, we present, for the first time, residue architectures that reduce power by more than 70% without changes in technology. This reduction is achieved without sacrificing performance and with minimal sacrifice in area (less than 60%). The key to such low power solutions is trading-off the speed gained by parallelism for lower power. Exising proposals that achieve similar trade-offs demand an area increase of more than a factor of two and also increase control complexity. Other benefits of using residue arithmetic for low power is the significant reduction in peak current and increased design locality. The role of the number of computations per forward (or reverse) conversion in determining the power characteristics of the system are also analysed and explained. The effectiveness of the methodology is illustrated using a system that extracts a 256-point FFT of the input signal.

ic982409.pdf (Scanned)

TOP

Designing Efficient Residue Arithmetic Based VLSI Correlators

Authors:

Aniruddha A Deodhar, Siemens Components Pte Ltd (Singapore)
Manish Bhardwaj, Siemens Components Pte Ltd (Singapore)
C. T Clarke, Submetrics (U.K.)
T Srikanthan, Nanyang Technological University (Singapore)

Volume 5, Page 3021, Paper number 2408

Abstract:

The most important reason for the lack of commercial residue arithmetic (RA) based systems is not the ""slow"" and area consuming reverse conversion, but the absence of research that explores the system-level trade-offs of such arithmetic in actual VLSI implementations. Such system-level issues are - choice of the moduli set, effect of moduli imbalance on resulting VLSI implementation, choice of the reverse and forward converters, use of lookup versus computation for modular operations, system characteristics that indicate RA suitability and finally, typical VLSI area and performance figures. This paper explains these concerens by presenting novel RA architectures for VLSI correlators employed in radio-astronomy and ultrasonic blood flow measurement. A state-of-the-art, high-performance (80-100 MHz), RA-based correlator ASIC was successfully fabricated as a result of this research.

ic982408.pdf (Scanned)

TOP

Pipelined Cordic Based QRD-MVDR Adaptive Beamforming

Authors:

Jun Ma, University of Minnesota (U.S.A.)
Keshab K. Parhi, University of Minnesota (U.S.A.)
Ed F. Deprettere, Delft University of Technology (The Netherlands)

Volume 5, Page 3025, Paper number 1324

Abstract:

Cordic based QRD-MVDR adaptive beamforming algorithms possess desirable properties for VLSI implementation such as regularity and good finite-word length behavior. But this algorithm suffers from speed limitation constraint due to the presence of recursive operations in the algorithm. In this paper, a fine-grain pipelined Cordic based QRD-MVDR adaptive beamforming algorithm is developed using the matrix lookahead technique. The proposed architecture can operate at arbitrarily high sample rates, and consists of only Givens rotations which can be mapped onto a Jacobi specific dataflow processor. It requires a complexity of $O(M(p^2+Kp))$ Givens rotations per sample time, where p is the number of antenna elements, K is the number of look direction constrains, and M is the pipelining level.

ic981324.pdf (From Postscript)

TOP

A Systolic VLSI Implementation of Kalman-Filter-Based Algorithms for Signal Reconstruction

Authors:

Daniel Massicotte, Universite du Quebec a Trois-Rivieres (Canada)

Volume 5, Page 3029, Paper number 1895

Abstract:

The problem of improving the performance of the implementation in VLSI technology of Kalman-based algorithms for signal reconstruction in real time is discussed. A systolic approach is proposed to develop architecture expressly for this specific application. Implemented algorithms are based on the steady-state version of the Kalman filter, which performs for a broad field of specific applications, but the use of a co-processor for the Kalman gain is allowed. We show that the autoregressive model of Kalman filtering is particularly adapted to parallel processing and is well suited for implementation. Although intended to improve signal reconstruction, other applications where a similar autoregressive model of Kalman filtering is required are allowed. The performance of the systolic architecture is validated by comparison with Motorola's general-purpose DSP56002 digital signal for real-world spectrometric signal reconstruction.