HIGH PERFORMANCE AND COST EFFECTIVE
MEMORY ARCHITECTURE FOR AN HDTV DECODER LSI

Tetsuro Takizawa, Junji Tajime and Hidenobu Harasaki

C&C Media Research Laboratories, NEC Corporation
1-1, Miyazaki 4-chome, Miyamae-ku, Kawasaki 216-8555 JAPAN
Email: takizawa@ccm.cl.nec.co.jp

ABSTRACT
This paper proposes an efficient memory mapping and a frame memory compression for an HDTV decoder LSI using Direct Rambus™ DRAM (DRDRAM). DRDRAM is employed to achieve high memory bandwidth required for HDTV decoding at the minimum memory cost. Proposed memory mapping achieves high memory bandwidth sufficient for HDTV decoding even in the worst case and no costly line buffers are required in the LSI for format conversion. Proposed frame memory compression method reduces memory cost half and achieves HDTV decoding with a single 64 Mb DRDRAM chip without loss of memory access efficiency. Simulation results show that SNR degradation is 0.1 to 2 dB in the worst frame and no visible degradation is perceived except for a resolution chart sequence.

1. INTRODUCTION
Digital broadcasting employs various picture formats. Among them, the largest one is HDTV whose size is 1920 x 1080 pixels at 60 Hz interlace scan. The picture size for HDTV is almost six times greater than that for SDTV. Therefore, HDTV decoder LSIs must operate at high speed and memory bandwidth for them should be wide. On the other hand, memory bus width is preferred to be narrow and a number of memory devices should be minimized for cost sensitive consumer products such as TV and STB. Previous HDTV decoder LSIs [1][2] which employ conventional memory devices such as Synchronous DRAM and Concurrent Rambus™ DRAM cannot satisfy such demands. The authors investigate memory architecture for an HDTV decoder LSI based on MPEG2 MF@HL standard. Direct Rambus™ DRAM [3][4] and the optimum memory mapping has satisfied the demands for the LSI.

As MPEG2 employs inter-frame prediction in its encoding, decoders need to store previous and next frame pictures into their frame memory. Therefore, HDTV decoder requires more than 10 MB memory capacity (i.e. two 64 Mb DRAM chips are required). Reducing the required memory capacity to 8 MB or less reduces memory cost half. For this purpose, we introduce a frame memory compression method. Though previous approaches to reduce memory requirements[5][6][7] mainly focus on picture quality, our investigation refers to actual memory mapping and bandwidth consumption.

The HDTV decoder LSI investigated in the paper is under development.

2. MEMORY ARCHITECTURE
HDTV decoder requires 10 to 12 MB memory capacity. 64 Mb or 128 Mb DRAM chip will be the most cost effective in 1999 and 2000. Consequently, it is necessary to achieve high bandwidth with one or two DRAM chips. Direct Rambus™ DRAM (DRDRAM) is the only possible device for the need. Our investigations are based on 64 Mb DRDRAM because of its market availability.

A 64 Mb DRDRAM consists of sixteen 512 page banks. Each page has 1 KB capacity. The peak bandwidth of DRDRAM is 1.6 GB/s with 16 bit data bus and sampling each data at both edges of 400 MHz clock. The followings are the restrictions when accessing DRDRAM.

1. Minimum access unit is 16 byte length.
2. Single 16 byte length access, followed by other access which addresses a different page, requires 10 ns overhead.
3. Accesses, followed by other access which addresses a different page in the same bank or any page in the adjacent banks, requires 70 ns overhead.

The reason why the accesses between adjacent banks require overhead is that sense amps are shared among them. To cope with these restrictions, an appropriate memory mapping needs to be considered based on access localities.

The followings are major accesses in MPEG2 decoding. Other accesses accounts for less than 5%.
1. Write accesses of decoded macro blocks (MBW).
2. Read accesses of reference pictures for motion compensations (MCR).

3. Read accesses for displaying (DISP).

MBW accesses are two dimensional block accesses. A macro block consists of 16 x 16 luminance block and two 8 x 8 chrominance block. All MBW accesses are aligned to macro block boundaries (dotted lines in Figure 1). Though MCR accesses are also two dimensional block accesses, they have various sizes (maximum 17 x 17 for luminance block and 9 x 9 for chrominance block) and MCR accesses may not be aligned to the boundaries. The block may extend over up to four macro blocks. DISP accesses are one dimensional raster accesses. All DISP accesses are also aligned to the boundaries in horizontally.

1. Macro Block Write  2. Reference Picture Read  3. Display Read

Figure 1: Memory accesses in MPEG2 decoding

We propose a page mapping shown in Figure 2 and bank mapping shown in Figure 3. Four horizontally (and vertically for chrominance) successive macro blocks are mapped into the same page. Then, sixteen successive pages are mapped into the same row of 16 banks (B0 to B15).

A principle of memory mapping is to map data which are accessed successively at high probability into the same page. The whole of a macro block is mapped into the same page for MBW access efficiency. Four horizontally successive macro blocks are mapped into the same page because single DISP access extends over these macro blocks. Furthermore, luminance and chrominance data are mapped separately because there are no relations between their addresses at MCR accesses.

Single DISP access extends over several pages when access length is more than 64 byte. Therefore, horizontally adjacent pages are mapped into non-adjacent banks. Moreover, single MCR access may extend over four pages which are adjacent in horizontally and vertically. Therefore, these all adjacent four pages are mapped into non-adjacent banks as shown in Figure 4 (e.g. Bank 0, 2, 13 and 15 are non-adjacent banks). To store a 1920 x 1080 picture, the mapping shown in Figure 3 is repeated 9 times in vertical for luminance and 5 times for chrominance. Macro block to physical memory address conversion is achieved by a simple circuit logic.

Figure 2: Mapping pixels into a page

Figure 3: Mapping pages into banks

Figure 4: Adjacent pages are mapped to non-adjacent banks
We estimate actual worst case memory utilization ratio of the proposed mapping when decoding 1920 x 1080 pictures at 60 Hz interlace scan (30 frames per second).

For MBW accesses, 256 byte for luminance and 128 byte for chrominance are accessed in each macro block. In the best case, that next access addresses a non-adjacent bank, each macro block access is processed in 160 ns for luminance and 80 ns for chrominance. In the worst case, that next access addresses the same or adjacent bank, each access is processed in 230 ns and 150 ns respectively because of overhead. As there are 8160 macro blocks in an HDTV picture, MBW accesses are processed in 3.10 ms for a picture and 93.0 ms for 30 frames (1 second) in the worst case. This corresponds to 9.30 % as bus utilization ratio (100 % means full bandwidth).

The worst case for MCR access occurs during B picture decoding with all motion vectors having half pel accuracy. In this case, MCR access takes 1840 ns for each macro block. As calculated by the same way as above, MCR accesses take 15.01 ms for a picture and 450.4 ms for 30 frames in the worst case. This corresponds to 45.04 % as bus utilization ratio.

For DISP accesses, chrominance should be processed through multi-tap vertical filters for 4:2:0 to 4:2:2 format conversion. As line buffers increase LSI cost (e.g. a line buffer for HDTV occupies about 0.625 mm² at 0.25 µm technology), filters without line buffers are desirable. When five tap vertical filters without line buffers are supposed, DISP accesses take 11.18 ms for a picture and 335.4 ms for 30 frames in the worst case. This corresponds to 33.54 % as bus utilization ratio.

The total bus utilization ratio of all these accesses accounts to 87.88 %. This means that bus bandwidth is sufficient for HDTV decoding even in the worst case without costly line buffers in the LSI. Bandwidth comparisons in the worst case and an average of the proposed mapping and those of a straightforward mapping without bank swapping are summarized in Figure 5. As indicated in the figure, the worst case of the straightforward mapping exceeds maximum bandwidth of DRDRAM.

3. FRAME MEMORY COMPRESSION

As HDTV decoders require over 10 MB of frame memory which is also used for storing bit stream, two 64 Mb DRDRAM chips are required. To reduce memory cost, we introduce frame memory compression method which compresses decoded pictures before storing into frame memories and de-compresses compressed pictures after reading them from frame memories. The method enables the LSI to decode HDTV with a single 64 Mb DRDRAM chip.

We employ one dimensional differential PCM and non-linear quantization as the basic algorithm. This rather simple approach has an advantage because it can be processed at high speed with compact logic and required compression ratio is relatively small. 5 bit quantization is selected to achieve target compression ratio. However, for memory access efficiency, some pixels are quantized into 4 bits.

Figure 6 shows memory formats within a page for the method. As indicated in the figure, a page includes 6 macro blocks for luminance and 12 macro blocks for chrominance. Figure 7 shows page mapping into banks. Adjacent pages are mapped into non-adjacent banks to avoid access overhead during MCR accesses. Total size of frame memory is reduced to 8 MB and HDTV decoding with a single 64 Mb DRDRAM chip is accomplished.

Memory access efficiency raises 50 % for MBW and DISP accesses when three adjacent macro blocks are accessed successively. For MCR accesses, the worst case access efficiency is the same as the original worst case (all blocks extend over four macro blocks). Consequently, total memory access efficiency raises even in the worst case.

Table 1 shows simulation results, which indicate SNR of normally decoded pictures and SNR loss of the worst frame, for 12 streams (each stream has 120 frames). All streams are encoded in 22 Mbps and M = 3, N = 15 by TM5. According to subjective evaluation performed by 20 experts and non-experts, picture quality degradation is not perceived except for stream # 12 which is a resolution chart sequence.
4. CONCLUSION

This paper has proposed an efficient memory mapping for an HDTV decoder LSI using Direct Rambus™ DRAM. The proposed memory mapping achieves high memory bandwidth sufficient for HDTV decoding even in the worst case. It reduces LSI cost because no line buffers are required for format conversion with vertical-multi-tap filters.

The paper has also presented a frame memory compression method to achieve further memory cost reduction. The method achieves decoding HDTV with a single 64 Mb DRDRAM chip. The proposed method achieves efficient memory accesses and no visible degradation is perceived except for a resolution chart sequence in subjective evaluation.

5. REFERENCES


