DSP Processors

Home

Very Long Instruction Word Architectures for Digital Signal Processing

Authors:

Jon Mellott, HSDAL, University of Florida (U.S.A.)
Fred Taylor, HSDAL, University of Florida (U.S.A.)

Volume 1, Page 583

Abstract:

Due to advancements in semiconductor processing technology, unprecedented levels of system integration are now possible in digital signal processing systems. MIMD/multicomputer architectures used for parallel digital signal processing applications are not always efficient, and are difficult to program. Very long instruction word processors are uniquely suited to digital signal processing applications, able to exploit opportunities for fine and coarse grained parallelism efficiently without the overhead of MIMD/multicomputer approaches. A flexible, high-level language programming environment has been developed in support of this processor paradigm.

ic970583.pdf

TOP

A Novel 32 Bit RISC Architecture Unifying RISC and DSP

Authors:

Christoph Baumhof, Hyperstone Electronics (Germany)
Frank Müller, Hyperstone Electronics (Germany)
Otto Müller, Hyperstone Electronics (Germany)
Manfred Schlett, Hyperstone Electronics (Germany)

Volume 1, Page 587

Abstract:

A novel 32 bit RISC architecture is presented which is the basis of a powerful general purpose microprocessor and in parallel a 16/32 bit fixed point DSP processor. This unifying of RISC and DSP was not achieved by simply using a microprocessor and DSP core, but a new concept for the implementation of DSP processors has been developed. With the architecture presented it has been proven that a DSP processor can be implemented using strictly the RISC design philosophy. Besides providing basic 16 bit fixed point functionality, the architecture implements a set of DSP instructions that support an efficient mapping of common DSP algorithms to the processor.

ic970587.pdf

TOP

A Dual-Issue RISC Processor for Multimedia Signal Processing

Authors:

Hisakazu Sato, Mitsubishi Electric Corporation (Japan)
Edgar Holmann, Mitsubishi Electric Corporation (Japan)
Toyohiko Yoshida, Mitsubishi Electric Corporation (Japan)
Masahito Matsuo, Mitsubishi Electric Corporation (Japan)
Toru Kengaku, Mitsubishi Electric Corporation (Japan)

Volume 1, Page 591

Abstract:

This paper presents the architecture of a newly-developed dual-issue RISC processor, D10V, that achieves both high throughput signal processing capability and maintains flexibility for general purpose applications. To achieve adequate performance for signal processing, this RISC processor operates both a MAC unit and a memory access unit in parallel, where two-word data memory access is supported. As the result of several benchmarks illustrate, the D10V competes favorably and in some instances outperforms conventional DSPs.

ic970591.pdf

TOP

A processor-coprocessor architecture for high end video applications

Authors:

Elmar Maas, Braunschweig University of Technology (Germany)
Dirk Herrmann, Braunschweig University of Technology (Germany)
Rolf Ernst, Braunschweig University of Technology (Germany)
Peter Rüffer, Braunschweig University of Technology (Germany)
Sieghard Hasenzahl, Philips (Germany)
Martin Seitz, Philips (Germany)

Volume 1, Page 595

Abstract:

High end video applications are still implemented in hardware consisting of many components. Integration of these components on one IC is difficult as they are typically low volume products and often customization is also required, e.g. in studio applications. This is easier on the board level than on an integrated system. Using hardware parameters for customization can partly overcome the flexibility problem with additional hardware costs. Low cost can be obtained by a change in the architecture paradigm to a processor-coprocessor system. This, however, requires careful design space exploration since the performance target is beyond current DSP processors while at the same time flexibility is required. The paper presents the application of high level synthesis and novel Hardware-Software Co-Synthesis tools to design space exploration. It is shown that completely different algorithms can be mapped to the same target system at much a lower cost than the current approaches.

ic970595.pdf

TOP

An MPEG-2 Encoder Architecture Based on a Single-Chip Dedicated LSI with a control MPU

Authors:

Yasushi Ooi, NEC Corp. (Japan)
Osamu Ohnishi, NEC Corp. (Japan)
Yutaka Yokoyama, NEC Corp. (Japan)
Yoichi Katayama, NEC Corp. (Japan)
Masayuki Mizuno, NEC Corp. (Japan)
Masakazu Yamashina, NEC Corp. (Japan)
Hideto Takano, NEC Corp. (Japan)
Naoya Hayashi, NEC Corp. (Japan)
Ichiro Tamitani, NEC Corp. (Japan)

Volume 1, Page 599

Abstract:

This paper describes an MPEG-2 encoder architecture based on a hard-wired LSI with a control MPU. All basic functions of MPEG-2 MP@ML video compression are integrated in the dedicated LSI. For the motion estimation, a horizontally subsampled, diamond search was employed as a simplified first search step. It can reduce operations to 20% of the full-search, with an estimated SNR degradation of only -0.1dB. To help achieve a single-memory interface, a pair of 81MHz, 16Mb SDRAMs are used as a frame buffer and a code buffer. Data bandwidth between the SDRAMs and the LSI is kept to less than 94% of the maximum data rate. Jobs assigned to the control MPU need be executed less frequently than those of the macroblock coding, which helps reduce the requirements for MPU performance to about 7MIPS.

ic970599.pdf

TOP

An Efficient and Reconfigurable VLSI Architecture for Different Block Matching Motion Estimation Algorithms

Authors:

Xiao-Dong Zhang, University of Science and Technology (China)
Chi-ying Tsui, HKUST (Hong Kong)

Volume 1, Page 603

Abstract:

This paper describes a VLSI architecture which can be reconfigured to support both Full Search Block-Matching algorithm and 3-step Hierarchical Search Block-Matching algorithm. By using a reconfigurable register-mux array and a parameterizable adder tree, the 2-D array architecture provides efficient real time motion estimation for many video applications. We also propose a memory architecture and an associated switching network to solve the simultaneous data access problem.

ic970603.pdf

TOP

An Operation-Saving VLSI Geometry Engine Core

Authors:

Konstantina Karagianni, University of Patras (Greece)
George Diamantakos, University of Patras (Greece)
Vassilis Paliouras, University of Patras (Greece)
Thanos Stouraitis, University of Patras (Greece)

Volume 1, Page 607

Abstract:

A floating point geometry engine core is introduced in this paper. The proposed core is optimized for performing the 3-D geometrical transformations, including the hardware evaluation of sin(x) and cos(x) functions. The architecture exploits the structure of the transformation matrices, thus reducing the number of floating point operations required per transformation. VLSI chip implementation issues for the specific architecture are also discussed.

ic970607.pdf

TOP

The FFT Butterfly Operation in 4 Processor Cycles on a 24 Bit Fixed-point DSP with a Pipelined Multiplier

Authors:

Martin Grajcar, University of Passau (Germany)
Bernhard Sick, University of Passau (Germany)

Volume 1, Page 611

Abstract:

Most of the existing Digital Signal Processors (DSPs) are optimized for a fast and efficient computation of the Fast Fourier Transform (FFT). However, there are only two floating-point DSPs available, which perform the butterfly operation of a FFT in 4 processor cycles, but no fixed-point DSP is designed that way. The new 24 bit fixed-point DSP DAISY, which is able to execute the butterfly in 4 cycles even using a two-stage pipelined multiplier, is described in this paper. With this pipelined multiplication it is possible to reduce the processor cycle time significantly.

ic970611.pdf

TOP

New Unified VLSI Architectures for Computing DFT and Other Transforms

Authors:

Shen-Fu Hsiao, CIE, NSYSU (Taiwan)
Chung-Yi Yen, CIE, NSYSU (Taiwan)

Volume 1, Page 615

Abstract:

Fast computation of DFT (Discrete Fourier Transform) and other popular transform is essential in high-speed DSP applications. This paper proposes new architectures with low hardware cost and high throughput rate. The new architectures are very suitable for VLSI implementation since they are regular and require much fewer complex multipliers compared to the recently proposed approaches. Furthermore, the same architectures may be exploited to compute a variety of frequently-used transforms.

ic970615.pdf

TOP

Half-Rate GSM Vocoder Implementation On A Dual MAC Digital Signal Processor

Authors:

Mohit K. Prasad, Lucent Technologies (U.S.A.)
Paul D'Arcy, Lucent Technologies (U.S.A.)
Arup Gupta, Lucent Technologies (U.S.A.)
Marc S. Diamondstein, Lucent Technologies (U.S.A.)
Hosahalli R. Srinivas, Lucent Technologies (U.S.A.)

Volume 1, Page 619

Abstract:

The Global System for Mobile (GSM) communications uses a 13 Kbps vocoder which expands to 22.8 Kbps after channel coding. To increase the user capacity the half-rate channel has a gross transfer rate of 11.4 Kbps. The vocoder for the half-rate channels operates at 5.6 Kbps. The computational requirements of a half-rate vocoder and other necessary services required design of an entirely new digital signal processing architecture geared towards 1-D signal and speech processing. The architecture is characterized by Very Large Instruction Word (VLIW) and two multiply-accumulate (MAC) units. Other enhancements of the hardware allow an efficient implementation of the half-rate GSM vocoder. This paper describes the architecture and compares the vocoder performance with existing implementations.

ic970619.pdf

TOP

VLSI Implementation of an Area-Efficient Architecture for the Viterbi Algorithm

Authors:

Carlos Cabrera, University of Santiago de Compostela (Spain)
Montserrat Boo, University of Santiago de Compostela (Spain)
Javier Bruguera, University of Santiago de Compostela (Spain)

Volume 1, Page 623

Abstract:

The Viterbi algorithm is widely used in communications and signal processing. Recently, several area--efficient architectures for this algorithm have been proposed. Area--efficient architectures trade speed for area by means of mapping the N states of the trellis describing the Viterbi algorithm to P processing elements, where N>P. In this paper a practical VLSI implementation of an area--efficient architecture to evaluate the Viterbi algorithm is presented. The architecture that has been implemented is composed of only two processing elements and the corresponding routing network to process, in different cycles, all the states of the trellis. The resulting architecture has been integrated in a chip using a 0.7 micron CMOS technology, occupying an area of 9 sq. mm.