Chair: W. Gass, Texas Instruments, USA
Frans Sijstermans, Philips Research (The Netherlands)
Evert Jan Pol, Philips Research (The Netherlands)
Bram Riemens, Philips Research (The Netherlands)
Kees Vissers, Philips Research (The Netherlands)
Selliah Rathnam, Philips Semiconductors (U.S.A.)
Gert Slavenburg, Philips Semiconductors (U.S.A.)
It is widely recognized that fine-grain parallelism can greatly enhance a processor's performance for signal processing applications. For this reason, future generation TriMedias will combine VLIW and subword parallelism in a single CPU. In this article, we present a snapshot of the new CPU's design process: the outlines ar clear but fine tuning is still ongoing. We present the design flow and `work bench' that the designers use for further tuning.
Kouhei Nadehara, NEC Corporation (Japan)
Hanno Lieske, University of Hannover (Germany)
Ichiro Kuroda, NEC Corporation (Japan)
This paper presents a low-power, 32-bit RISC microprocessor with a 64-bit ``single-instruction multiple-data'' multimedia coprocessor, V830R/AV, and its MPEG-2 video decoding performance. This coprocessor basically performs multimedia-oriented four 16-bit operations every clock, such as multiply-accumulate with symmetric rounding and saturation, and accelerates computationally intensive procedures of the video decoding; an 8x8 IDCT is performed in 201 clocks. The processor employs the Concurrent Rambus DRAM interface, and facilities for controlling cache behaviors explicitly by software to speed up enormous memory accesses necessary to motion compensation. The 200-MHz V830R/AV processor with the 600-Mbyte/sec. Concurrent Rambus DRAMs decodes MPEG-2 MP@ML video in real-time (30 frames/sec.).
Ray Simar Jr, Texas Instruments (U.S.A.)
Continuing dramatic improvements in semiconductor manufacturing processes are enabling radical new signal-processing architectures at the chip level. The development of these new architectures must be coupled with clearly defined target applications, a thorough analysis of applicable signal processing algorithms, and significant advancements in code-generation technology. The TMS320C6x development program involved the codevelopment of the VelociTI architecture, a new code-generation capability, and a large set of representative benchmarks.
Song Wu, Texas Instruments (U.S.A.)
Xiaolin Lu, Texas Instruments (U.S.A.)
Walter Chen, Texas Instruments (U.S.A.)
High performance general purpose Digital Signal Processor (DSP) provides a cost efficient solution for broadband Digital Subscriber Line (DSL) transceiver. A DSL modem with a transmission throughput between 400 kbps and 2 Mbps operating over most of existing telephone subscriber loops has been implemented on a single TI TMS320c548 DSP for consumer multimedia applications such as internet access. Except the analog front end (AFE), all the Discrete Multitone Modem (DMT) algorithms are implemented with DSP software. DSP based software DSL modem also provides a convenient interface to Microsoft point-to-point protocol (PPP) for network access.
Mladen Berekovic, Laboratorium für Informationstechnologie (Germany)
Rainer Frase, Laboratorium für Informationstechnologie (Germany)
Peter Pirsch, Laboratorium für Informationstechnologie (Germany)
This paper proposes a new array architecture for MPEG-4 image compositing. The emerging MPEG-4 standard for multimedia applications allows script-based compositing of audiovisual scenes from multiple audio and visual objects at the decoder side. A coprocessor architecture is presented that works in parallel to an MPEG-4 video- and audio-decder , and performs computation and bandwidth intensive low-level tasks for image compositing. The processor consists of an SIMD array of 16 DSPs to reach the required processing power for real-time image warping, alpha blending and 3D rendering tasks. A programmable architecture allows to adapt processing resources to the specific needs of different tasks and applications. The processor has an object-oriented cache architecture with 2D vrtual address space (e.g. textures), that allows concurrent and conflict-free access to shared data objects for all 16 DSPs. Especially I/O intensive tasks like texture mapping, alpha blending, image warping, z-buffer and shading algorithms benefit from shared memory caches and the possibility to preload data before it is accessed.
Yasuhiro Nunomura, Mitsubishi Electric Corporation (Japan)
Toru Shimizu, Mitsubishi Electric Corporation (Japan)
Kazunori Saitoh, Mitsubishi Electric Corporation (Japan)
Koji Tsuchihashi, Mitsubishi Electric Corporation (Japan)
The M32R/D is a 32-bit microprocessor with large-capacity on-chip DRAM. It consists of a 32-bit RISC CPU,a 32-bit x 16-bit multiply and accumulator (MAC), either 1-Mbyte or 2-Mbyte DRAM, 4-Kbyte cache memory, and a memory controller. The CPU, DRAM, and cache memory are connected via a 128-bit 66.6 MHz internal bus yielding high performance and low power dissipation. The chip is capable of coping with a wide range of applications and thus provides system designer with great flexibility. For instance, a portable multimedia system can be realized by only three chips: an M32R/D chip, an I/O ASIC chip, and programming ROM. This means that a total system solution can be achieved at a lower cost with higher performance. Personal digital assistants (PDAs) and digital still cameras are such examples.