Chair: R. Chen, Voxware Inc, USA
Hironori Ito, NEC Corporation (Japan)
Masahiro Serizawa, NEC Corporation (Japan)
Kazunori Ozawa, NEC Corporation (Japan)
Toshiyuki Nomura, NEC Corporation (Japan)
This paper proposes a speech codec based on the Multi-Pulse based CELP (MP-CELP) coding and a convolutional codingalgorithms for the ETSI Adaptive Multi-Rate (AMR) standard. The codec operates at several speech coding rates, maintaining a fixed gross rate including speech and channel coding for the Full-Rate (FR) and Half-Rate (HR) channel modes. MP-CELP has great features of easily changing the speech coding rate by controlling the parameters such as the number of pulses and other parameters. Subjective tests show that the proposed AMR codec in the FR channel mode achieves higher performance than that of the Enhanced FR codec, and the proposed codec in the HR channel mode gives a comparable coding quality to that by the Full-Rate codec, by selecting an optimal coding rate for each channel condition. T-tests based on the test results also show that the proposed speech codec meets about 80 % of the seventeen requirements, which are selected from the AMR standard study report. Therefore, the proposed codec is promising for the AMR standard.
Janne Vainio, Nokia Research Center (Finland)
Hannu Mikkola, Nokia Research Center (Finland)
Kari Järvinen, Nokia Research Center (Finland)
Petri Haavisto, Nokia Research Center (Finland)
This paper describes a multi-rate codec family developed as a potential candidate for the GSM Adaptive Multi Rate (AMR) codec standard. The codec family consists of the GSM Enhanced Full Rate (EFR) codec [1] and lower bit-rate extensions thereof. The codec family consists of several codecs, i.e., modes that have different bit-rate partitionings between source coding and error protection. All the source codecs use the same ACELP-method (Algebraic Code Excited Linear Predictive Coding) used also in the GSM EFR codec. The codec operates at gross bit-rates of 22.8 kbit/s in the GSM full rate (FR) channel and 11.4 kbit/s in the GSM half rate (HR) channel. In the full rate channel, the codec provides improved error robustness over the GSM Enhanced Full Rate (EFR) codec. It extends wireline quality (equal to or better than G.726-32 ADPCM) to poor channel error conditions with low C/I-ratios of 7 dB or even below. When operated in the half rate channel, the codec provides improved channel capacity while still providing wireline quality at high C/I-ratios above 16-19 dB.
Roar Hagen, Ericsson Radio Systems AB (Sweden)
Erik Ekudden, Ericsson Radio Systems AB (Sweden)
Bjorn Johansson, Ericsson Radio Systems AB (Sweden)
W. Bastiaan Kleijn, Royal Institute of Technology (Sweden)
In CELP, the use of codebooks with entries with only a few non-zero samples provides high speech quality and facilitates fast computation. With decreasing bit-rate, the intervals between the pulses increase, and the quality of the reconstructedsignal begins to suffer from a particular type of artifact, which is strongest for noise-like segments. In this paper we describe experiments which show that the perceived artifacts are mainly concentrated at frequencies above 3 kHz, and this is consistent with our understanding of auditory theory. Our analysis leads to simple strategies to eliminate the artifacts, even at lower bit rates. We describe both a non-adaptive and an adaptive post-processing method to remove the artifacts. The methods are demonstrated to be efficient when used in the ACELP algorithm. A closed-loop method for ACELP is also described.
Hong Kook Kim, Samsung Advanced Institute of Technology (Korea)
In this paper, we propose an adaptive encoding method of fixed codebook in CELP coders and implement an adaptive fixed code excited linear prediction (AF-CELP) speech coder. AF-CELP exploits the fact that the fixed codebook contribution to speech signal is also periodic as the adaptive codebook (or pitch filter) contribution. By modeling the fixed codebook with the pitch lag and the gain from the adaptive codebook, AF-CELP can be implemented at low bit ratesas well as low complexity. Listening tests show that a 6.4 kbit/s AF-CELP has a comparable quality to the 8 kbit/s CS-ACELP.
Kazunori Ozawa, NEC Corporation (Japan)
Masahiro Serizawa, NEC Corporation (Japan)
This paper proposes an MP-CELP(Multi-Pulse-based CELP) speech coding at 6.4 kb/s with 10 ms frame.In MP-CELP, amplitudes or signs of multi-pulse excitation are simultaneously vector quantized (VQ). A combination search between multiple pulse location candidates and VQ codebook remarkably improves the quantization performance. In order to improve speech quality for background noise conditions, an adaptive pulse location restriction method is developed. The subjective evaluation results show that speech quality for 6.4 kb/s MP-CELP is higher than that for G.726 at 32 kb/s and is equivalent to that for 6.3 kb/s G.723.1 with 30 ms frame in clean speech and tandem conditions. For background noise conditions, the adaptive pulse location restriction significantly improves MOS value by 0.9. The speech quality is equivalent to that for G.723.1, but still does not reach to that of 24 kb/s G.726, except interference talker condition.
Juergen Schnitzler, Aachen University of Technology (Germany)
This paper describes a wideband (7 kHz) speech compression scheme operating at a bit rate of 13.0 kbit/s, i.e. 0.8 bit per sample. We apply a split-band (SB) technique, where the 0-6 kHz band is critically subsampled and coded by an ACELP approach. The high frequency signal components (6-7 kHz) are generated by an improved High-Frequency-Resynthesis (HFR) at the decoder such that no additional information has to be transmitted. In informal listening tests, the subjective speech quality was rated to be comparable to the CCITT G.722 wideband codec at 48 kbit/s.
Kazuhito Koishida, Tokyo Institute of Technology (Japan)
Gou Hirabayashi, Tokyo Institute of Technology (Japan)
Keiichi Tokuda, Nagoya Institute of Technology (Japan)
Takao Kobayashi, Tokyo Institute of Technology (Japan)
This paper proposes a wideband CELP coder using frequency warping. Instead of linear prediction, the proposed coder adopts the mel-generalized cepstral analysis, and encodes fullband of the speech signal through a warped frequency scale. It is shown that the subjective quality of the proposed coder at 16 kbit/s is better than that of the ITU-T G.722 at 64 kbit/s. Furthermore, the proposed coder gives a much smaller difference in performance for male and female speakers than the conventional CELP coder. These results indicate that the frequency warping makes a large contribution to the improvement of the subjective quality for wideband speech coding.
Anil W Ubale, University of California, Santa Barbara (U.S.A.)
Allen Gersho, University of California, Santa Barbara (U.S.A.)
A novel low-delay wideband speech coder, called Multi-band CELP (MB-CELP) achieves a delay of about 10 ms, by exploiting time-domain correlations with a two-stage linear prediction scheme. A low-order forward-adaptive LP stage models coarse shape, and a high-order backward-adaptive LP stage models fine structure of the input spectrum. A conditional pitch prediction method improves the performance of the coder for speech without degrading music performance. A multi-band bank of off-line filtered codebooks generates the excitation signal. A 24 kbps version of the coder has nine multi-band codebooks with nonuniform bandwidth.Subjective comparison tests show that this coder outperforms the G.722 coder at the bit-rate of 56 kbps.