Chair: Schuyler R. Quackenbush, AT&T Labs - Research, USA
Thippur V Sreenivas, Fraunhofer Institute for Integrated Circuits (Germany)
Martin Dietz, Fraunhofer Institute for Integrated Circuits (Germany)
This paper describes some experiments to reduce the load of side information in the MPEG AAC scheme using vector quantization (VQ) methods. The VQ replaces the existing differential and entropy coding of the scale factors. Various types of VQ are considered, such as sub-vector/product VQ, multi-stage VQ and tree-structured VQ which provide some advantages in the context of AAC applications such as scalability, etc. However, the VQ being a lossy compression scheme, psycho-acoustic sensitivity of the losses is very important. These are studied using objective measures such as NMR and listening tests to make proper choices for the VQ design.
Arouggou Jbira, Ecole Nationale Superieure des Telecommunications (France)
Nicolas Moreau, Ecole Nationale Superieure des Telecommunications (France)
Przemyslaw Dymarski, Technical University of Warsaw (Poland)
A 64 kbps coder of wideband (15 kHz) monophonic audio signals is described. Its structure is based on the transform coded excitation scheme, adopted to 7 kHz band signals. Significant modifications are proposed, that yield reduction of delay while keeping an almost transparent quality of speech and music, equivalent to that provided by the MPEG1, layer II audio standard at the same bit rate. Algorithmic delay has been reduced to 17 ms - approximately 1/3 the delay of the MPEG coder.
Tomas Gänsler, Lund University (Sweden)
Peter Eneroth, Lund University (Sweden)
Stereophonic acoustic echo cancellation has been found more difficult than echo cancellation in mono due to a high correlation between the two audio channels. Different methods to decorrelate the channels have been proposed so that the stereophonic echo canceller identifies the true echo paths and its convergence rate increases. In this paper it is shown that the use of a perceptual audio coder effectively reduces the correlation between the channels and thus convergence to the true echo paths is insured. Furthermore, in those frequency regions where the encoder introduced quantization noise is below the global perceptual masking threshold, an extra amount of inaudible noise can be added to the channels. Thereby the channel correlation is further decreased and the solution is stabilized. In subband audio coders with high frequency resolution only minor modifications are needed in the decoder.
Toshio Irino, ATR Human Information Processing Research Labs. (Japan)
Masashi Unoki, Japan Advanced Institute of Science and Technology (Japan)
A time-varying, analysis/synthesis auditory filterbank has been developed using a new implementation of the ""gammachirp'', which has been shown to be an excellent function for the asymmetric, level-dependent auditory filter. The gammachirp filter is shown to be implemented through a combination of a gammatone filter and an IIR asymmetric compensation filter; which largely reduces the computational cost for time-varying filtering. The gammachirp filterbank is designed using a linear gammatone filterbank and a bank of time-varying asymmetric compensation filters controlled by the sound pressure level estimated at the output of the filterbank. Since the inverse filter of the asymmetric compensation filter is always stable, it is possible to resynthesize signals from time-varying, level-dependent auditory representations. The resynthesis error is only determined by the linear analysis/synthesis gammatone filterbank. The proposed filterbank is applicable to various types of signal processing required to model human auditory filtering. URL: http://www.hip.atr.co.jp/~irino/.
Simon D Boland, Queensland University of Technology (Australia)
Mohamed Deriche, Queensland University of Technology (Australia)
This paper examines a new method for coding high quality digital audio signals based on a combination of Linear Predictive Coding (LPC) and the Discrete Wavelet Transform (DWT). In this method, a linear predictor is first used to model each audio frame. Then, the prediction error is analyzed using the DWT. The LPC coefficients and DWT coefficients are quantized using a novel bit allocation scheme which minimizes the overall quantization error with respect to the masking threshold. The proposed coder is capable of delivering near-transparent audio signal quality at encoding bitrates of around 90-96 kb/s. Objective and subjective results suggest that the proposed coder operating at 90-96 kb/s hasa performance comparable to that of the MPEG layer II codec operating at 128 kb/s.
Olivier Van Der Vrecken, BaBel Technologies sa (Belgium)
Laurent Hubaut, TCTS - Faculté Polytechnique de Mons (Belgium)
Florence Coulon, TCTS - Faculté Polytechnique de Mons (Belgium)
This paper presents an audio coding system which uses filter banks to decompose, in the frequency domain, the audio signal into constant width subbands. A specific compression is applied in each subband. This compression is achieved by means of CELP coders. In order to obtain a high audio quality, psychoacoustic models allocate dynamically the number of bits needed in each subband. A particular care has been taken for the elaboration of the filter banks in order to limit the delay and the computational cost of the system. We have implemented several filter banks and tested their influence on the perceptual quality of the output audio signal. Finally, we show that our proposed coder is capable of delivering excellent audio signal quality at bit rates of 50-60 kbit/s.
Paolo Prandoni, EPFL, Lausanne (Switzerland)
Martin Vetterli, EPFL, Lausanne (Switzerland)
A data transmission framework is proposed to embed digital data into an audio signal in a perceptually undetectable or almost undetectable way. The resulting signal can be reproduced as is with no loss of acoustic quality; the embedded data can be exactly retrieved at the decoder. The transmission process exploits the perceptual redundancy of the audio signal to conceal the acoustic impact of the embedded data; encoding of side information is used to inform the receiver of the time-varying structure of the masking properties of the audio signal. A sample implementation is described with a throughput of the order of 30 kbit/sec over CD-quality audio.
Yasuyuki Nakajima, KDD Co. Ltd. (Japan)
Hiromasa Yanagihara, KDD Co. Ltd. (Japan)
Akio Yoneyama, KDD Co. Ltd. (Japan)
Masaru Sugano, KDD Co. Ltd. (Japan)
Formerly, once the audio data is compressed, transcoding is used to scale the bit rate, where decoding and re-encoding are taken place. Therefore, data manipulation of coded data has been very complex and time consuming work. In this paper, we describe three algorithms for bit rate scaling on coded MPEG data domain. One is bandwidth limitation method cutting higher frequency components until target data rate is satisfied. The other two use re-quantization process where a quantization step in each subband is modified. One of them reflects psychoacoustic model from bit allocation information obtained in the bitstream in order to improve bit rate scaling efficiency. The simulation results show that re-quantization process provides very high conversion efficiency and nearly equal sound quality to direct coding one can be obtained by reflecting psychoacoustic model. It is also shown that very fast scaling (factor of six) have been achieved when compared with transcoding method.