Wen-Whei Chang, National Chiao-Tung University (Taiwan)
De-Yu Wang, National Chiao-Tung University (Taiwan)
Li-Wei Wang, National Chiao-Tung University (Taiwan)
Most LPC-based audio coders employ simplistic noise-shaping operations to perform psychoacoustic control of quantization noise. In this paper, we report on new approaches to exploiting perceptual masking in the design of adaptive quantization of LPC excitation parameters. Due to its localized spectral sensitivity, sinusoidal excitation representation is preferred to spectrally flat signals for use in excitation modeling. Simulation results indicate that the proposed multisinusoid excited coder can deliver high quality audio reproduction at the rate of 72 kb/s.
Xiang Wei, University of Central Lancashire (U.K.)
Martyn J. Shaw, University of Central Lancashire (U.K.)
Martin R. Varley, University of Central Lancashire (U.K.)
Current audio compression schemes are capable of reducing the per channel bit rate of high quality audio signals from 16 bits per sample to around 2-4 bits per sample. In these schemes, knowledge of psychoacoustics is utilised and a uniform or nonuniform frequency decomposition method is used. In this paper we derive the optimum bit allocation to achieve the highest perceptual quality under a fixed bit rate, for an arbitrarily decomposed, critically sampled, filter bank. The resultant optimum bit allocation gives rise to a shaped reconstruction noise floor approximately parallel to the masking threshold level. Perceptual coding gain is defined and should be maximized for an optimum decomposition performed by the filter bank. Optimum band splitting is discussed and it is pointed out that decomposition in the manner of critical band splitting does not lead to optimal performance.
Karine Hay, ENST-Br, Dept. SC. (France)
S. Saoudi, ENST-Br, Dept. SC. (France)
L. Mainard, CCETT, Servive RCS/SDA (France)
A new method for coding generic audio signals at 64 kbit/s in the 20-15000 Hz bandwidth with a low delay is presented. It combines subband coding, Low Delay CELP algorithm and cascaded filterbanks. Our earlier works shown that, when using an equal bit rate on each subband, the resulting audio quality was not appropriate. We propose here a new technique based on lattice quantization to avoid the search complexity of the statistical vector quantization. It allows an adaptive bit rate allocation in each subband. Experimental results assessing the validity of the proposed method are also presented.
Aki Härmä, Helsinki University of Technology (Finland)
Unto K. Laine, Helsinki University of Technology (Finland)
Matti Karjalainen, Helsinki University of Technology (Finland)
Bark-scale warped linear prediction [WLP] is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different methods of converting a stereo signal to one complex valued signal are introduced. The philosophy behind the coding scheme is to integrate some aspects of modern wideband audio coding (e.g. perceptuality and stereo signal processing) into one computational element in order to find a more holistic and economic way of processing.
William Kurt Dobson, U.S. Robotics (U.S.A.)
Jiankan Jack Yang, U.S. Robotics (U.S.A.)
Kevin J. Smart, U.S. Robotics (U.S.A.)
Feng Kathy Guo, U.S. Robotics (U.S.A.)
This paper presents an audio coder for real-time multimedia applications. To achieve high quality at low bit rate, the audio coder uses a wavelet packet decomposition to transform the audio data into the wavelet domain, and a psychoacoustic model is used to minimize quantization noise. The wavelet packet decomposition tree structures were chosen in a way to closely mimic the critical bands in a psychoacoustic model. Instead of determining the masking thresholds in the Fourier domain, the wavelet coefficients are used to drive the psychoacoustic model directly. Most of the standard industrial sampling frequencies are supported by this coder. An efficient bit rate control scheme was designed such that the audio coder operates at virtually any desired bit rate level. The audio coder achieves near perceptually lossless quality at or below 80 kb/s for most audio sources. Real-time encoding/decoding is possible by using only a fraction of a Pentium or faster CPU.
Yuichiro Takamizawa, NEC Corporation (Japan)
Masahiro Iwadare, NEC Corporation (Japan)
Akihiko Sugiyama, NEC Corporation (Japan)
This paper proposes a tonal component coding algorithm for a codec that employs a transform followed by Huffman coding, such as MPEG-2 Audio NBC (Non-Backward Compatible). After the input audio signal is mapped onto a frequency domain, the proposed algorithm withdraws local maximum components that degrade coding efficiency. By this withdrawal, the flatness of the spectrum increases and the efficiency in Huffman coding is improved. The withdrawn components are encoded separately as side information. When the frequency resolution of the time/frequency mapping is high, this algorithm works more effectively since local maximum samples appear more frequently with such a mapping. Simulation results show that this algorithm achieves as much as 11% bit reduction per frame and improves the coding efficiency in 41% of all the audio frames.
Roch Lefebvre, University of Sherbrooke (Canada)
Claude Laflamme, University of Sherbrooke (Canada)
In this paper, we present a new approach to shape the coding noise in speech and audio coders. The approach, called Spectral Amplitude Warping (SAW), consists essentially of a pre- and post-processing which apply a non-linear transformation to the signal short-term spectrum prior to, and after, encoding. Since it is possible to view SAW as a separate entity from the coder, the noise shaping capability of an existing coder can be improved without modifying the coder itself. Using SAW as a pre- and post-process to the G.722 wideband speech coding standard, it was found in an informal listening test that the quality of the 64 kb/s operating mode can be achieved at only 48 kb/s. The price to be paid is an additional delay.
Carlos A. Serantes, Universidad de Vigo (Spain)
Antonio S. Pena, Universidad de Vigo (Spain)
Nuria González-Prelcic, Universidad de Vigo (Spain)
A new bit assignment algorithm is presented. Its goals are the simultaneous assignment on all subbands in a few steps of an iterative calculus, the use of memory to achieve a better speed of convergence and the consideration of a deformable error curve. The basis of the algorithm is discussed and also other considerations that are likely to arise in practice. Finally, an example of performance is given.
Daniele Cadel, Cefriel (Italy)
Giorgio Parladori, Alcatel Telecom (Italy)
Target of this work is the high quality audio coding at low bit rate. It will be shown how the Pyramid Vector Coding (PVC) can conveniently replace the classical Huffman Coding technique in audio compression systems, giving also an advantage in the bit allocation procedure. The compression performances can be further improved by fixing an upper limit value of the vector components.
Karine Gosse, ENST Paris (France)
François Moreau de Saint-Martin, CCETT (France)
Xavier Durot, CCETT (France)
Pierre Duhamel, ENST Paris (France)
Jean-Bernard Rault, CCETT (France)
The design of filter banks for source coding purposes classically relies on the perfect reconstruction (PR) property. However, several recent studies have shown that taking the quantization noise into account in the design could yield noticeable reduction of the mean square reconstruction error. The purpose of this study is to show that perceptual improvement can also be obtained in the particular audio coding context by relaxing the PR constraint. In this context, the mean square error is not relevant any more, and we define a new perceptual distortion criterion, making use of a simplified ear model, the MPE (Mean Perceptual Error). Then, synthesis filters are optimized so as to minimize this MPE. Finally, this MMPE (Minimum MPE) filter bank is included in an audio coding scheme. Compared to the corresponding PR filter bank-based scheme by the means of POM (Perceptual Objective Measure), they show an improved audio quality.
Simon Boland, Queensland University of Technology (Australia)
Mohamed Deriche, Queensland University of Technology (Australia)
In this paper, we propose a new combined harmonic-wavelet representation for audio where a harmonic analysis-synthesis scheme is used, first, to approximate each audio frame as a sum of several sinusoids. Then, the difference between the original signal and the reconstructed harmonic signal is analyzed using a wavelet filtering scheme. After each step (harmonic analysis & wavelet filtering), parameters are quantized and encoded. Compared to previously proposed methods, our audio coder uses different harmonic analysis-synthesis and wavelet filtering schemes. We use the Total Least Squares (TLS)-Prony algorithm for the harmonic analysis-scheme, and an M-band wavelet transform for analyzing the residual. Altogether, our proposed coder is capable of delivering excellent audio signal quality at encoder bitrates of 60-70 kb/s.
Wolfgang J. Klippel, Dresden (Germany)
A weak nonlinear plant can be linearized and will track an input signal if the plant is preceded by a nonlinear controller which approximates the inverse of the plant's transfer function. Present techniques for adjusting the controller adaptively to the plant require an additional nonlinear adaptive filter to perform a separate system identification. Straightforward update algorithms can not directly update the filter parameters in the controller because the transfer function of the plant might cause instabilities in the adaptive process. This problem is overcome by performing additional linear filtering to the nonlinear state vector and/or error signal. Novel filtered-A and filtered-E modifications of the stochastic gradient based methods are presented which are capable to update generic as well as special block-oriented nonlinear filter architectures.