Chair: J. Rabaey, University of California at Berkeley, USA
Daniel Martin, Siemens Microelectronics Inc. (U.S.A.)
Robert E. Owen, Data/Time International (U.S.A.)
Digital signal processors are paired with microcontrollers in many applications. Various attempts have been made to combine the two processor functions in one architecture, but there have remained two unresolved conflicts. These are different data and program memory hierarchy choices in speed and size, and different real-time control needs. This paper reviews the basic processing requirements for digital signal processing (DSP) and controllers and shows how a new 32-bit RISC architecture has resolved these conflicts and successfully integrated the two functions seamlessly into one processor core. This is confirmed with a detailed FIR filter example. Major innovations in this Tricore architecture are a novel memory organization used along with variable instruction word sizes and multiple issuing of instructions.
Lionel Lacassagne, LIS/Electronique-Informatique-Applications (EIA) (France)
Frantz Lohier, LIS/Electronique-Informatique-Applications (EIA) (France)
Patrick Garda, Laboratoire des Instruments et Systemes (France)
This paper presents the real time software implementation of the Canny-Deriche's optimal edge detectors on RISC and high performance DSP processors. For each type of architecture, the most leading algorithmic and programming optimization techniques are described. We have shown that real time is achieved for 256x256 images on RISC processors and for 512x512 on state of the art DSPs. Those results outperform the best software and FPGA implementations of optimal edge detectors.
Eri Murata, NEC Corporation (Japan)
Masao Ikekawa, NEC Corporation (Japan)
Ichiro Kuroda, NEC Corporation (Japan)
This paper presents an implementation of a fast two-dimensional Inverse Discrete Cosine Transform (IDCT) with multimedia instructions for a software MPEG-2 decoder. IDCT algorithms for sparse blocks which eliminate the calculation for zero coefficients are realized by using multimedia instructions. To reduce the cycle count for IDCT, an adaptive control method for these IDCT algorithms, based on the bit rate and picture type, is proposed and its performance is described. In the implementation of a software MPEG-2 decoder, the execution time for IDCT is reduced to 10% by using MMX instructions from original C program. Moreover, using proposed adaptive control, it can be further be reduced to 76%
Yossi Shain, Associative Computing Ltd. (Israel)
Avidan Akerib, Associative Computing Ltd. (Israel)
Rutie Adar, Associative Computing Ltd. (Israel)
This paper discusses an associative processor architecture designed to meet the demands of real-time image processing applications. In a single chip, this architecture provides thousands of processors - one for each pixel, in the form of associative memory. This paper focuses on a generic, proprietary associative processor architecture and discusses implementing the discrete cosine transform (DCT) using processors based on this architecutre. Associative Computing Ltd. has developed a commercial associative chip based on this architecutre, and while the DCT implementation discussed refers to future generations based on this architecture, reference is made throughout to the Company's present processor. Processors based on our associative architecture can process the large amounts of data typically required in real-time imaging applications at a lower cost-performance ratio than conventional processors. The scalable nature of memory-based processor architecture allows developers to rapidly increase processing power without altering the fundamental processor, or system architecture. The underlying technologies used in the Company's present processor can significantly facilitate the development of associative processing as an alternative to conventional processing for video applications including compression and video editing.
Milos Ercegovac, University of California, Los Angeles (U.S.A.)
Darko Kirovski, University of California, Los Angeles (U.S.A.)
George Mustafa, University of California, Los Angeles (U.S.A.)
Miodrag Potkonjak, University of California, Los Angeles (U.S.A.)
Modern image and video processing applications are characterized by a unique combination of arithmetic and computational features: fixed point arithmetic, a variety of short data types, high degree of instruction-level parallelism, strict timing constraints, high computational requirements, and high cost sensitivity. The current generation of behavioral synthesis tools does not address well this type of application. In this paper we explore the potential of using multiple precision arithmetic units to effectively support implementation of image and video processing applications as application specific integrated circuits. A new architectural scheme for collaborate addition of sets of variable precision data is proposed as well as an allocation and assignment methodology for multiple precision arithmetic units. Experimental results indicate strong advantages of the proposed approach.
Joon Seok Kim, Yonsei University (Korea)
Sun Kook Yoo, Yonsei University (Korea)
Sung Wook Park, Yonsei University (Korea)
Nam Hoon Jung, Yonsei University (Korea)
Woo Suk Ko, Yonsei University (Korea)
Keun Sup Lee, Yonsei University (Korea)
Dae Hee Youn, Yonsei University (Korea)
The recent audio CODEC (Coding/Decoding) algorithms are complex of several coding techniques, and can be divided into DSP tasks, controller tasks and mixed tasks. The traditional DSP processor has been designed for fast processing of DSP tasks only, but not for controller and mixed tasks. This paper presents a new architecture that achieves high throughput on both controller and mixed tasks of such algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates four functional units (Multiplier, two ALUs, Load/Store Unit) in parallel via 4-issue super-scalar instruction structure. The performance evaluation of YSP-3 has been done through the implementation of the common DSP algorithms and AC-3 decoder.
Hiroyuki Okuhata, Osaka University (Japan)
Morgan H Miki, Osaka University (Japan)
Takao Onoye, Osaka University (Japan)
Isao Shirakawa, Osaka University (Japan)
A VLSI implementation of a low-power DSP is described, which is dedicated to the G.723.1 low bitrate speech codec. A number of sophisticated DSP microarchitectures are devised mainly on dual multiply accumulators, rounding and saturation mechanisms, and two-banked on-chip memory. The proposed DSP architecture has been integrated in the total area of $7.75 mm^2$ by using a 0.35um CMOS technology, which can operate at 10MHz with the dissipation of 45mW from a single 3V supply.
John R Sacha, The Pennsylvania State University (U.S.A.)
Mary Jane Irwin, The Pennsylvania State University (U.S.A.)
In low power VLSI design, fixed point number representations are standard. For some signal processing applications, however, achieving sufficient dynamic range with fixed point may lead to computations utilizing more precision thannecessary. In such cases, trading precision for dynamic range through the use of floating point and logarithmic number system representations can potentially provide power savings. This is demonstrated for a subband speech coding application using architectural-level capacitance modeling.
Wolfram K Drescher, Dresden University of Technology (Germany)
Menno Mennenga, Dresden University of Technology (Germany)
Gerhard P Fettweis, Dresden University of Technology (Germany)
This paper examines architectural issues for a domain specific digital signal processor (DS-DSP) which is capable of fast decoding of block codes. In real time systems it was not possible before to employ common processors for this task because of a lack of architectural and arithmetical support. We proposed solutions for the arithmetical problem in previous work. In this paper we focus on architectures for implementation of different block decoding algorithms on a new DS-DSP architecture. The paper also contains benchmarks for our architecture for some selected codes and compares our DS-DSP to common digital signal processors (DSP) and dedicated logic solutions.
Inki Hong, University of California, Los Angeles (U.S.A.)
Miodrag Potkonjak, University of California, Los Angeles (U.S.A.)
Recently, numerous watermarking-based techniques for intellectual property protection of DSP artifacts, such as images, compressed and uncompressed audio and video data, and text documents have been proposed. However, the applicability of all techniques proposed until now are limited to digital data and they either implicitly or explicitly exploit the imperfection of human perception to audio and video. We propose the first watermarking technique for protecting theintellectual property of DSP designs. The essence of the technique is the use of additional synthesis constraints to encode the authorship signature. The constraints are selected in such a way that they result in minimal hardware overhead while embedding the signature which is unique and difficult to detect and remove. The technique is applicable to all levels of design process, from the algorithm, system and behavioral synthesis to logic synthesis and physical design levels. The technique is illustrated on a set of DSP design examples on all levels of design process.