Design Automation for DSP

Home


An upper bound of the throughput of multirate multiprocessor schedules

Authors:

Rainer Schoenen, ISS, RWTH Aachen (Germany)
Vojin E. Zivojnovic, ISS, RWTH Aachen (Germany)
Heinrich Meyr, ISS, RWTH Aachen (Germany)

Volume 1, Page 655

Abstract:

Multirate Dataflow Graphs are used for modelling iterative computations, allowing concurrency and arbitrary data rates at ports. This model is often used for signal processing algorithms. For static scheduling the iteration period bound represents the final barrier for the computation speed, the approximation of which is often the goal of an implementation. For the singlerate case (SR-DFG), where all rates are one, an explicit bound exists and is subject of many published papers. This work presents a bound for the multirate case, which reduces to the known bound if applied to an SR-DFG. Assumptions made are a vectorized execution and a blocked schedule that organizes multiple iterations inside one period (also called execution cycle). The influence of characteristic properties in the multirate case is emphasized.

ic970655.pdf

ic970655.pdf

TOP



Minimizing The Number Of Operations In DSP Computations

Authors:

Inki Hong, UCLA (U.S.A.)
Miodrag Potkonjak, UCLA (U.S.A.)

Volume 1, Page 659

Abstract:

Reduction of the number of operations optimizes the important design metrics such as area, cost, throughput, and power consumption for both custom ASIC and programmable processor implementations. We propose a novel technique to minimize the number of operations in DSP computations. The first step of the approach logically partitions a computation into strongly connected components. The second step optimizes each component separately. In the third step the components are merged to further optimize. Finally, the components are scheduled to minimize memory consumption. The effectiveness of our approach is demonstrated on real-life examples.

ic970659.pdf

ic970659.pdf

TOP



BEEHIVE: An Adaptive, Distributed, Embedded Signal Processing Environment

Authors:

Shahram Famorzadeh, Georgia Institute of Technology (U.S.A.)
Vijay K. Madisetti, Georgia Institute of Technology (U.S.A.)
Thomas Egolf, Georgia Institute of Technology (U.S.A.)
Tuongvu Nguyen, Georgia Institute of Technology (U.S.A.)

Volume 1, Page 663

Abstract:

We propose an open signal processing system design and implementation environment, BEEHIVE, that allows application developers to rapidly compose and debug functional specifications in a networked, distributed computing environment, and then later migrate the application (transparently) onto an embedded, distributed, computing hardware/software platform, with the capability to reconfigure (adaptively) the resources assigned to the application to meet the dynamic real-time requirements of the implementation. Recent developments in the area of virtual machines; broker-based, distributed, transportable computing; object-oriented programming methodologies, Java and its real-time extensions; reconfigurable and programmable hardware; approximate algorithms; adaptive-load and resource-management algorithms, are harnessed in this operating environment.

ic970663.pdf

ic970663.pdf

TOP



On Objective Function Selection in List Scheduling Algorithms for Digital Signal Processing Applications

Authors:

Jan Jonsson, Chalmers University of Technology (Sweden)
Jonas Vasell, Chalmers University of Technology (Sweden)

Volume 1, Page 667

Abstract:

In this paper we discuss the choice of objective function in list scheduling algorithms for scheduling data flow graphs onto multiprocessor architectures. A majority of the list scheduling algorithms used in practice utilize a global strategy wherein actor static levels are used for making scheduling decisions. When fine-grain DSP applications such as FIR or elliptical filters need to be scheduled on architectures that consist of commodity part processors and a general interconnection network whose interprocessor communication cost cannot be ignored, a traditional list scheduling algorithm is in many cases not the best choice. In an experimental study we compare these global strategies to local strategies that utilize load balancing. The study reveals that global strategies suffer from flaws that could cause local strategies to yield more than 10% shorter schedule lengths on the average. In particular we find that a novel Earliest Finish Time (EFT) strategy exhibits very good performance.

ic970667.pdf

ic970667.pdf

TOP



VLSI High Level Synthesis of Fast Exact Least Mean Square Algorithms based on Fast FIR filters

Authors:

Jean Philippe Diguet, University of Rennes, ENSSAT (France)
Olivier Sentieys, University of Rennes, ENSSAT (France)
Daniel Chillet, University of Rennes, ENSSAT (France)
Jean Luc Philippe, University of Rennes, ENSSAT (France)

Volume 1, Page 671

Abstract:

This paper relates experiences of algorithmic transformations in High Level Synthesis, in the area of acoustic echo cancellation. The processing and memory units are automatically designed for various equivalent LMS algorithms, in the FIR case, with important computational load. The results obtained with different filter lengths, give an accurate prototyping of new fast versions of the LMS algorithm. It also show that a theoretical arithmetic reduction must be correlated to the associated increase of memory requirements.

ic970671.pdf

ic970671.pdf

TOP



Hierarchical VHDL Libraries for DSP ASIC Design

Authors:

John McCanny, Queen's University Belfast (Northern Ireland)
Douglas Ridge, ISS Ltd. (Northern Ireland)
Yi Hu, ISS Ltd. (Northern Ireland)
Jill Hunter, Queen's University Belfast (Northern Ireland)

Volume 1, Page 675

Abstract:

Methods are presented for the rapid design of DSP ASICs based on the use of hierarchical VHDL libraries. These are portable across many silicon foundries and allow complex DSP silicon systems to be developed in a fraction of the time normally required. Resulting designs are highly competitive with ones created using conventional methods. The approach is illustrated by its application to ADPCM codec and DCT cores.

ic970675.pdf

ic970675.pdf

TOP



DSP QUANT: Design, Validation, And Applications Of DSP Hard Real-Time Benchmark

Authors:

Chunho Lee, UCLA (U.S.A.)
Darko Kirovski, UCLA (U.S.A.)
Inki Hong, UCLA (U.S.A.)
Miodrag Potkonjak, UCLA (U.S.A.)

Volume 1, Page 679

Abstract:

Although the undeniable importance of high quality, efficient and effective DSP synthesis benchmark has been firmly and widely established, until now the emphasis of benchmarking has been restricted on assembling individual examples. In this paper we introduce the ``ideal candidate benchmark methodology'' which poses the development of the benchmark as well as defines a statistical and optimization problem. We first outline the goals and requirements relevant for the benchmark development. After discussing the computational complexity of the benchmark selection problem, we present a simulated annealing-based algorithm for solving this computationally intractable optimization task. Using this approach from 150 examples we select 12 examples for the new DSP Quant benchmark for DSP hard Real-Time applications. The DSP benchmark is statistically validated, and its application to the analysis and development of system-level synthesis algorithms is demonstrated.

ic970679.pdf

ic970679.pdf

TOP



Constructing Memory Layouts for Address Generation Units Supporting Offset 2 Access

Authors:

Bernhard Wess, Vienna University of Technology (Austria)
Martin Gotschlich, Vienna University of Technology (Austria)

Volume 1, Page 683

Abstract:

We present an efficient memory layout generation algorithm for digital signal processors (DSPs) which takes advantage of indirect addressing modes with auto-modify operations. Previously proposed algorithms are optimized with respect to offset 1 access (auto-increment and decrement by 1). Our algorithm is based on a heuristic since the problem of generating optimum memory layouts is NP-complete. However, this algorithm produces optimum results if a bandwidth 2 layout exists for a given program variable access sequence. It is verified by experimental results that our technique achieves significant improvements over existing techniques.

ic970683.pdf

ic970683.pdf

TOP



Modulo-Addressing Utilization in Automatic Software Synthesis for Digital Signal Processors

Authors:

Markus Willems, ISS, RWTH Aachen (Germany)
Holger Keding, ISS, RWTH Aachen (Germany)
Vojin E. Zivojnovic, ISS, RWTH Aachen (Germany)
Heinrich Meyr, ISS, RWTH Aachen (Germany)

Volume 1, Page 687

Abstract:

Digital Signal Processors (DSPs) have become key components for the implementation of digital signal processing systems. With DSPs moving into new application domains and the increasing complexity of modern DSP architectures, efficient programming support receives major interest. Therefore, an optimizing compiler becomes a must for future DSP-architectures. Todays DSP compilers result in significant overheads both in memory consumption and program execution time compared to hand-written assembly code. This is mainly due to an inefficient compiler support of the DSP specific architectural features, such as the modulo-addressing capability which is an enabeling feature for a large class of DSP algorithms. Within this paper we analyze why existing compilers fail short in supporting the modulo-addressing mode and present a compiler concept that allows the efficient utilization of this feature. We describe how an advanced compiler optimization strategy allows a near optimum support of the modulo-addressing mode, and point out why this concept is favorable to DSP-specific language extensions.

ic970687.pdf

ic970687.pdf

TOP



Cooperative register assignment and code compaction for digital signal processors with irregular datapaths

Authors:

Werner Kreuzer, Vienna University of Technology (Austria)
Bernhard Wess, Vienna University of Technology (Austria)

Volume 1, Page 691

Abstract:

We address the phase ordering problem of code compaction and register assignment in a data flow graph compiler. During register assignment, we take into account the instruction-level parallelism available. Symbolic variables in straight-line code are allocated to register set/memory location pairs which maximally preserve the freedom available for code compaction. Whenever necessary, spill code is inserted during final register assignment and scheduled during code compaction. Register assignment is performed taking into account its impact on code compaction. This strategy results in final code of high quality.

ic970691.pdf

ic970691.pdf

TOP



Optimization of Embedded DSP Programs Using Post-pass Data-flow Analysis

Authors:

Ashok Sudarsanam, Princeton University (U.S.A.)
Sharad Malik, Princeton University (U.S.A.)
Steven Tjiang, Synopsys, Inc. ATG (U.S.A.)
Stan Liao, Synopsys, Inc. ATG (U.S.A.)

Volume 1, Page 695

Abstract:

We investigate the problem of code generation for DSP systems on a chip. Such systems devote a limited quantity of silicon to program ROM, so application software must be maximally dense. Additionally, the software must be written so as to meet various high-performance constraints, which may include hard real-time constraints. Unfortunately, current compiler technology is unable to generate dense, high-performance code for DSPs, whose architectures are highly irregular. Consequently, designers often resort to programming application software in assembly -- a time-consuming, error-prone, and non-portable task. Thus, DSP compiler technology must be improved substantially. We describe some optimizations that significantly improve the quality of compiler-generated code. Our optimizations are applied globally and even across procedure calls. Additionally, they are applied to the machine-dependent assembly representation of the source program. Our target architecture is the Texas Instruments' TMS320C25 DSP.

ic970695.pdf

ic970695.pdf

TOP



Code Positioning to Reduce Instruction Cache Misses in Signal Processing Applications on Multimedia RISC Processors

Authors:

Hans-Joachim Stolberg, University of Hannover (Germany)
Masao Ikekawa, NEC Corp. (Japan)
Ichiro Kuroda, NEC Corp. (Japan)

Volume 1, Page 699

Abstract:

Real-time operation of signal processing applications on multimedia RISC processors is often limited by high instruction cache miss rates of direct-mapped caches. In this paper, a heuristic approach is presented which reduces high instruction cache miss rates in direct-mapped caches by code positioning. The proposed algorithm rearranges functions in memory based on trace data so as to minimize cache line conflicts. Moreover, a new method to extract potential cache misses from trace data is introduced which enables accurate cache behavior analysis and greatly enhances code positioning efficiency. Application of code positioning to an MPEG-1 video decoder implementation on the V830 multimedia RISC processor reduced instruction cache refill cycles by 66--98 %. The proposed code positioning algorithm does not require hardware modifications; it can easiliy be integrated in an object linker to automate the optimization process.

ic970699.pdf

ic970699.pdf

TOP



Code Generation By Using Integer-Controlled Dataflow Graph

Authors:

Takashi Miyazaki, NEC (Japan)
Edward A. Lee, EECS, UCB (U.S.A.)

Volume 1, Page 703

Abstract:

Integer-Controlled Dataflow (IDF) and its code generation applications in Ptolemy are presented. In IDF graphs, which specify data processing systems, data token flow is controlled by integer control tokens and states of actors at run-time. The firing order of actors (schedule) is determined at compile-time, however, the actors are conditionally activated at run-time. This static schedule contributes to effective simulation of systems. IDF supports code generation. This enables code generation from program graphs that include conditional jumps, loops and repetitions, and greatly improves the practical usability of the program synthesis in Ptolemy.

ic970703.pdf

ic970703.pdf

TOP



Fixed-Point C Compiler for TMS320C50 Digital Signal Processor

Authors:

Jiyang Kang, Seoul National University (Korea)
Wonyong Sung, Seoul National University (Korea)

Volume 1, Page 707

Abstract:

A fixed-point C compiler is developed for convenient and efficient programming of TMS320C50 fixed-point digital signal processor. This compiler supports the `fix' data type that can have an individual integer word-length according to the range of a variable. It can add or subtract two data having different integer word-lengths by automatically inserting shift operations. The accuracy of fixed-point multiply operation is significantly increased by storing the upper part of the multiplied double-precision result instead of keeping the lower part as conducted in the integer multiplication. Several target specific code optimization techniques are employed to improve the compiler efficiency. The empirical results show that the execution speed of a fixed-point C program is much, about an order of magnitude, faster than that of a floating-point C program in a fixed-point digital signal processor.

ic970707.pdf

ic970707.pdf

TOP