ICASSP '98 Main Page

General Information

Conference Schedule

Technical Program

    Overview
    50th Annivary Events
    Plenary Sessions
    Special Sessions
    Tutorials
    Technical Sessions

	By Date
		May 12, Tue
		May 13, Wed
		May 14, Thur
		May 15, Fri

By Category
	AE	ANNIV
	COMM	DSP
	IMDSP	MMSP
	NNSP	PLEN
	SP	SPEC
	SSAP	UA
	VLSI

By Author
	A	B	C	D	E
	F	G	H	I	J
	K	L	M	N	O
	P	Q	R	S	T
	U	V	W	X	Y
	Z

Invited Speakers

Registration

Exhibits

Social Events

Coming to Seattle

Satellite Events

Call for Papers/
Author's Kit

Future Conferences

Help

Abstract - VLSI1

VLSI1.1	A 35uW 1.1V Gate Array 8x8 IDCT Processor for Video-Telephony R. Rambaldi, A. Uguzzoni, R. Guerrieri (Universita' di Bologna, Italy) We have designed and fabricated a low power IC to perform the Inverse 8X8 DCT transform according to the CCITT precision specifications, suitable for portable video communication devices. Several design techniques have been used to reduce the power, such as a fast algorithm, an architecture that can exploit input signal correlation, and large amount of parallelism. The chip is fabricated in a triple metal 0.5um Gate Array CMOS technoloy. The maximum throughput is 400 Kpix/s at 1.1V, and 27 Mpix/s at 3.3V and the measured power consumption is 35 uW for typical image sequences in color QCIF format at 10 frames/sec with a 1.1 V power supply.
VLSI1.2	Discrete Cosine Transform Generator for VLSI Synthesis J. Hunter, J. McCanny (The Queen's University of Belfast, N. Ireland) A generator for the automated design of Discrete Cosine Transform (DCT) cores is presented. This can be used to rapidly create silicon circuits from a high level specification. These compare very favourably with existing designs. The DCT cores produced are scaleable in terms of point size as well as input/output and coefficient wordlengths. This provides a high degree of flexibility. An example, 8-point 1D DCT design, produced occupies less than 0.92 mm2 when implemented in a 0.35m double level metal CMOS technology. This can be clocked at a rate of 100MHz.
VLSI1.3	A New Architecture for In-Memory Image Convolution V. Moshnyaga, K. Suzuki, K. Tamaru (Kyoto University, Japan) A new memory-based architecture for real-time image convolution with variable kernels is proposed. The architecture exploits the highest possible bandwidth inherent in memory and achieves the fine-grain parallelism of computations inside the memory. Unlike existing approaches, the architecture ensures convolution with very large kernels under the real time constraints of video applications. It does not require external memory banks or large I/O count and features single chip VLSI implementation.
VLSI1.4	Reconfigurable Hardware for Efficient Implementation of Programmable FIR Filters T. Denk, C. Nicol, P. Larsson, K. Azadet (Lucent Technologies, USA) We present the architecture of a programmable FIR filter for use in DSP and communication applications. A filter with this architecture is capable of running a wide variety of single-rate and multirate filtering algorithms with low latency. Flexibility is achieved by distributed register files that store input data and filter coefficients. The functionality of the filter is programmed by a set of pipelined control signals that are independent of the filter length. We demonstrate how to generate these control signals for a variety of configurations. In addition to its flexibility, the architecture is scalable, modular, and has no broadcast signals, making it ideally suited for VLSI implementations.
VLSI1.5	Low Power FIR Filter Realization with Differential Coefficients and Input T. Chang, C. Jen (National Chiao-Tung University, Taiwan, ROC) Most FIR filter realizations use the inputs and coefficients directly to compute the convolution. In this paper, we present a low power and high speed FIR filter designs by using first order difference between inputs and various orders of differences between coefficients. This design first reformulates the FIR operations with the differences in algorithm level. Then, in architecture level, we adopt the DA architecture to exploit the probability distribution such that power consumption can be reduced further. The design is applied to an example FIR filter to quantify the energy savings and speedup. It shows lower power consumption than the previous design with the comparable performance.
VLSI1.6	A New Approach to Data Conversion: Direct Analog-to-Residue Converter D. Radhakrishnan, A. Preethy (Nanyang Technological University, Singapore) A novel design of a direct analog-to-residue converter is presented in this paper. The design makes use of two successive approximation analog-to-digital (A/D) converters, a few modulo adders and a small look-up table. One of the digital-to-analog converters is modified to generate outputs which are weighted by a constant factor, and one of the comparators is replaced by a difference amplifier. The look-up table needed is a very small percentage of the entire chip area and is shown to be only 840 bytes for a 36 bit residue number sytem converter.
VLSI1.7	Low Power Signal Processing Architectures Using Residue Arithmetic M. Bhardwaj, A. Balaram (Siemens Components Pte Ltd, Singapore) Recent trends like increasing frequencies, larger die sizes and demand for greater portability make power reduction a hard taskmaster. It is acknowledged that the greatest returns come from optimisations at the architectural and technology level. In this paper, we present, for the first time, residue architectures that reduce power by more than 70% without changes in technology. This reduction is achieved without sacrificing performance and with minimal sacrifice in area (less than 60%). The key to such low power solutions is trading-off the speed gained by parallelism for lower power. Exising proposals that achieve similar trade-offs demand an area increase of more than a factor of two and also increase control complexity. Other benefits of using residue arithmetic for low power is the significant reduction in peak current and increased design locality. The role of the number of computations per forward (or reverse) conversion in determining the power characteristics of the system are also analysed and explained. The effectiveness of the methodology is illustrated using a system that extracts a 256-point FFT of the input signal.
VLSI1.8	Designing Efficient Residue Arithmetic Based VLSI Correlators A. Deodhar, M. Bhardwaj (Siemens Components Pte Ltd, Singapore); C. Clarke (Submetrics, UK); T. Srikanthan (Nanyang Technological University, Singapore) The most important reason for the lack of commercial residue arithmetic (RA) based systems is not the "slow" and area consuming reverse conversion, but the absence of research that explores the system-level trade-offs of such arithmetic in actual VLSI implementations. Such system-level issues are - choice of the moduli set, effect of moduli imbalance on resulting VLSI implementation, choice of the reverse and forward converters, use of lookup versus computation for modular operations, system characteristics that indicate RA suitability and finally, typical VLSI area and performance figures. This paper explains these concerens by presenting novel RA architectures for VLSI correlators employed in radio-astronomy and ultrasonic blood flow measurement. A state-of-the-art, high-performance (80-100 MHz), RA-based correlator ASIC was successfully fabricated as a result of this research.
VLSI1.9	Pipelined Cordic Based QRD-MVDR Adaptive Beamforming J. Ma, K. Parhi (University of Minnesota, USA); E. Deprettere (Delft University of Technology, The Netherlands) Cordic based QRD-MVDR adaptive beamforming algorithms possess desirable properties for VLSI implementation such as regularity and good finite-word length behavior. But this algorithm suffers from speed limitation constraint due to the presence of recursive operations in the algorithm. In this paper, a fine-grain pipelined Cordic based QRD-MVDR adaptive beamforming algorithm is developed using the matrix lookahead technique. The proposed architecture can operate at arbitrarily high sample rates, and consists of only Givens rotations which can be mapped onto a Jacobi specific dataflow processor. It requires a complexity of O(M(p^2+Kp)) Givens rotations per sample time, where p is the number of antenna elements, K is the number of look direction constrains, and M is the pipelining level.
VLSI1.10	A Systolic VLSI Implementation of Kalman-Filter-Based Algorithms for Signal Reconstruction D. Massicotte (Universite du Quebec a Trois-Rivieres, Canada) The problem of improving the performance of the implementation in VLSI technology of Kalman-based algorithms for signal reconstruction in real time is discussed. A systolic approach is proposed to develop architecture expressly for this specific application. Implemented algorithms are based on the steady-state version of the Kalman filter, which performs for a broad field of specific applications, but the use of a co-processor for the Kalman gain is allowed. We show that the autoregressive model of Kalman filtering is particularly adapted to parallel processing and is well suited for implementation. Although intended to improve signal reconstruction, other applications where a similar autoregressive model of Kalman filtering is required are allowed. The performance of the systolic architecture is validated by comparison with Motorola's general-purpose DSP56002 digital signal for real-world spectrometric signal reconstruction.

VLSI2 - Next Abstract >