ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - VLSI3 |
 |
VLSI3.1
|
Software Pipelining of Nested Loops for Real-Time DSP Applications
J. Wang (Speech Recognition Software, Nortel Montreal Lab., Canada);
B. Su (William Paterson University of New Jersey, USA)
Modern DSP Processors have been integrated with Instruction-Level Parallelism (ILP), which presents a challenge to exploit ILP within DSP applications. Software Pipelining is an efficient technique to expose ILP for loop programs and has been used widely for current microprocessors. Recently it has been used in DSP compilers but only for the innermost loops. This paper proposes a new approach to extend software pipelining from innermost loops to the whole nested loops in DSP applications. For a perfect loop, after applying any existing software pipelining approach for the innermost loop, we use the so-called pipelining-dovetailing transformation to extend software pipelining to the outer loops. We also present a transformation to convert a non-perfect nested loop into a perfect one. The above transformations have been verified with some nested loops selected from DSP compiler challenge C code. The preliminary results are also presented.
|
VLSI3.2
|
Improving the Throughput of Flexible-Precision DSPs via Algorithm Transformation
M. Aggarwal,
N. Shanbhag,
N. Ahuja (University of Illinois, USA)
In this paper, we have presented a systematic technique to improve throughput of signal/image processing algorithms when implemented on flexible precision hardware. Many image/signal processing algorithms need 8-16 bit precision while the DSPs available are of much higher precision (32-bit). Significant performance gain can be obtained if multiple low precision computations can be performed in one cycle of a high precision DSP. We have proposed a framework based on algorithm transformation techniques of unfolding and retiming to systematically map low precision algorithms onto high precision DSPs. The improvement in throughput obtained by this framework is linearly related to the ratio of precision used by the processor and that required by the algorithm. The efficacy of this technique has been demonstrated on a IIR filter. We have also established some theoretical bounds on the maximum throughput that can be achieved using the proposed methodology.
|
VLSI3.3
|
Loop Scheduling Algorithms for Power Reduction
Z. Yu,
F. Chen,
E. Sha (University of Notre Dame, USA)
The increasing demand for portable computing has elevated power consumption to be one of the most critical parameters for execution of loops which constitute most of the computation of scientific applications. The reduction of a schedule length is usually considered to be opposite to the reduction of power. This paper presents anovel loop pipelining approach to reduce power consumption while reducing the schedule length. Power consumption is measured by transition activity between operands of successive operations. Both initial scheduling and loop scheduling across iterations try to reduce the transition activity at the inputs to the functional units. A series of experiments show that our method achieves considerable power dissipation and schedule length reduction.
|
VLSI3.4
|
Performance Evaluation of Register Allocator for the Advanced DSP of TMS320C80
J. Kim (Seoul National University, Korea);
G. Short (Texas Instruments, UK)
PPCA is an assembly language-level register allocator and instruction compactor for the Advanced DSPs (ADSPs) of the TMS320C80 digital signal processor. It was developed to help the implementation of time-critical ADSP assembly programs which heavily utilize powerful ADSP features optimized for multimedia and image computing applications for maximum efficiency. PPCA takes as an input ADSP assembly operations with symbolic variables. It then allocates the ADSP's physical registers to the symbolic variables and rearranges the operations into a highly-parallelized compact format. In this paper, we have evaluated the performance of a register allocation capability of PPCA using an extensive image computing library for the TMS320C80. We present the basic algorithm of the PPCA's register allocation module and describe the performance evaluation approach used. The result shows that PPCA essentially achieves optimal register allocation for the test cases based on the image computing library functions.
|
VLSI3.5
|
Low-Power Reconfigurable Signal Processing via Dynamic Algorithm Transformations (DAT)
M. Goel,
N. Shanbhag (University of Illinois, USA)
Presented in this paper are dynamic algorithm transformation (DAT) for systematic design of reconfigurable computing engines. These techniques allow dynamic alteration of algorithm properties in response to input non-stationarities. The input is modeled as a set of states with an underlying probability distribution, P_S. For each input state s, a signal monitoring algorithm SMA computes a power-optimal configuration for the signal processing algorithm SPA block. A fraction \alpha of the SPA block is hardwired and the remaining 1-\alpha is reconfigurable. Similarly, the SMA block computation is partitioned into a fraction \beta for the memory and the remaining 1-\beta for the datapath. For the given input state distribution, the optimal values of \alpha (alpha_opt) and \beta (\beta_opt) are determined. It is shown that for frequency selective filtering, the power savings of 35% - 45% can be achieved by DAT-based reconfigurable system as compared to the traditional design based on the worst-cased scenario.
|
VLSI3.6
|
Pipelined Hogenauer CIC Filters Using Field-Programmable Logic and Residue Number System
A. Garcia (University of Granada, Spain);
U. Meyer-Baese,
F. Taylor (University of Florida, USA)
Field-Programmable Logic (FPL) is on the verge of revolutionizing digital signal processing (DSP) in the manner that programmable DSP microprocessors did nearly two decades ago. While FPL densities and performance have steadily improved to the point where some DSP solutions can be integrated into a single FPL chip, they still have limited use in high-precision high-bandwidth applications. In this paper it is shown that in such cases, the residue number system (RNS) can be an enabling technology. The design of a high-decimation rate digital filter is presented which demonstrates the RNS-FPL synergy.
|
VLSI3.7
|
Synthesis of Folded, Pipelined Architectures for Multi-Dimensional Multirate Systems
V. Sundararajan,
K. Parhi (University of Minnesota, USA)
Motivated by the need for designing efficient architectures for two-dimensional discrete wavelet transforms (DWTs),this paper presents a novel multi-dimensional (MD) folding transformation technique which can be used to synthesize control circuits for pipelined architectures for a specific class of multirate MD digital signal processing (DSP) algorithms. Although a multirate MD DSP algorithm contains decimaters and expanders which change the effective sample rate of a MD discrete time signal, MD folding time-multiplexes the algorithm to hardware in such a manner that the resulting synchronous architecture requires only a single clock signal for the clocking of the datapath. Feasibility constraints are derived for folding a 2-D data-flow graph (DFG) onto a given set of hardware functional units according to a specified schedule. Area/power efficient architectures are derived for 1-4 level 2-D discrete wavelet transforms (DWT) with 18.5%-23.3% savings in storage area.
|
VLSI3.8
|
Minimization of Data Address Computation Overhead in DSP Programs
B. Wess,
M. Gotschlich (University of Technology, Vienna, Austria)
Digital signal processors (DSPs) provide dedicated data address generation units (AGUs) with multiple register files. These units allow data memory access by indirect addressing with automatic address modification. Typically, both linear and modulo addressing are supported. There is no address computation overhead if the next address is within the auto-modify range. Often, this range can be adapted to the application by assigning static values to modify registers. In this paper, we discuss optimized data memory address generation in DSP programs. Here the goal is to minimize data address computation and register initialization costs by optimizing data memory layout, address register assignment, and auto-modify range. The investigated combinatorial optimization problems can have an extremely large solution space. However, experimental results indicate that random neighbourhood sampling by simulated annealing allows to produce highly optimized solutions.
|
< Previous Abstract - VLSI2 |
VLSI4 - Next Abstract > |
|