Tero Honkanen, NRC (Finland)
Janne Vainio, NRC (Finland)
Kari Järvinen, NRC (Finland)
Petri Haavisto, NRC (Finland)
Redwan Salami, USH (Canada)
Claude Laflamme, USH (Canada)
Jean-Pierre Adoul, USH (Canada)
In this paper, we describe the enhanced full rate (EFR) speech codec that has recently been standardised for the North American TDMA digital cellular system (IS-136). The EFR codec, specified in the IS-641 standard, has been jointly developed by Nokia and University of Sherbrooke. The codec consists of 7.4 kbit/s speech (source) coding and 5.6 kbit/s channel coding (error protection) resulting in a 13.0 kbit/s gross bit-rate in the channel. Speech coding is based on the ACELP algorithm (Algebraic Code Excited Linear Prediction). The codec offers speech quality close to that of wireline telephony (G.726 32 kbit/s ADPCM used as a wireline reference) and provides a substantial improvement over the quality of the current speech channel. The improved speech quality is not only achieved in error-free conditions, but also in typical cellular operating conditions including transmission errors, environmental noise, and tandeming of speech codecs.
Lei Zhang, SFU (Canada)
Tian Wang, SFU (Canada)
Vladimir Cuperman, SFU (Canada)
This paper presents a variable-rate CELP codec which achieves good communications speech quality at an average rate of about 3 kb/s. The codec operates as a source-controlled variable rate coder with rates of 4.9~kb/s for voiced and transition sounds, 3.0~kb/s for unvoiced sounds and 670~b/s for silent frames. New techniques used in the codec include prediction of the fixed codebook target vector and joint optimization of the adaptive and fixed codebook search. The prediction of the fixed codebook target vector is based on fixed codebook selections in previous subframes and a running estimate for the fundamental frequency. Informal subjective testing (MOS) indicates that the proposed codec, at an average rate of less than 3.2 kb/s, achieves better quality than fixed rate standard codecs with rates in the range 4~-~4.8~kb/s.
Mustapha Bouraoui, SGS-THOMSON Microelectronics (France)
François Bill Druilhe, SGS-THOMSON Microelectronics (France)
Gang Feng, ICP (France)
There is an increasing need for low cost, fully integrated digital phone systems including telephony functions, fax, hands-free and answering machines. For the latter feature, a high quality, low bit rate speech coder is recommended. It should require only a reasonable complexity to stay competitive in this product range. Recent advances in CELP speech coding have shown the feasibility of this concept for this kind of consumer applications. A 4.8 kbps Hamming Code Excited Linear Prediction (HCELP) coder is proposed in this paper with an algebraic structure for the codebook. It features a very fast search algorithm which has been evaluated to be 3 times faster than usual algebraic codebook search procedures. Quality evaluation yielded satisfactory results. Implementation aspects and the integration of the coder in an Advanced Telephone Set are also detailed.
Chul-Hong Kwon, Digicom (Korea)
Chong-Kwan Un, KAIST (Korea)
Below 4.8 kbits/s, CELP coders in general suffer from two kinds of perceptually important degradtion. One is noise between adjacent harmonics of output speech - inter-harmonic noise - which results in roughness in voiced sound. The other is poor reproduction of speech signal at high frequencies - high frequency mismatch. To remedy these degradations, we propose in this paper an improved weighting function which utilizes the spectral weighting methodology and also takes into account the periodic character in voiced sound. The function can adapt to variation of pitch by itself without any pitch estimation in voiced sound; it is also applicable to all speech segments without any voiced/unvoiced discrimination algorithm. Simulation results show that the performance of the CELP coder with the proposed weighting function is better than that of the conventional CELP coder.
Pasi Ojala, NRC (Finland)
This paper presents a source controlled variable-rate CELP type speech codec. First, a voice activity detection block distinguishes active speech frames from silence and background noise. The active speech is further classified into voiced and unvoiced frames. The voiced frames have variable bit-rate pitch-lag quantization based on the characteristics of the speech, whereas the unvoiced frames are coded without pitch information. A variable bit-rate fixed codebook excitation with a variable number of excitation pulses is determined for each speech frame. The performance of the linear analysis part of the codec as well as the input speech characteristics determine the excitation bit-rate. The average bit-rate of the codec is around 7.0 kbit/s for active speech, and the overall bit-rate ranges from 0 to 7.85 kbit/s. The described variable-rate codec produces toll quality speech equal to that of the 32 kbit/s ADPCM (G.726) standard.
Erdal Paksoy, Texas Instruments (U.S.A.)
Alan V. McCree, Texas Instruments (U.S.A.)
Vishu Viswanathan, Texas Instruments (U.S.A.)
In general, a variable rate coder can obtain the same speech quality as a fixed rate coder, while reducing the average bit rate. We have developed a variable-rate multimodal speech coder with an average bit rate of 3 kb/s for a speech activity factor of 80% and quality comparable to the GSM full rate coder. The coder has four coding modes and uses a robust classification method involving the pitch gain, zero crossings, and a peakiness measure. Also the coder employs a novel gain-matched analysis-by- synthesis technique for very low rate coding of unvoiced frames and an improved noise-level-dependent postfilter. This paper describes the details of our algorithm and presents the results from subjective listening tests.
Kazunori Mano, NTT Human Interface Labs. (Japan)
This paper describes the design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP. This adaptation method not only gives pitch periodicity to the random excitation but also synchronizes the basic point of the stored random vector with the pitch phase. We further improve the proposed coder by introducing a backward gain prediction scheme. In subjective evaluation experiment, there is no significant difference between the quality of ITU-T G.726 32-kbit/s coder and that of the proposed 4-kbit/s coder under the conditions of normal and low input levels, tandem connection for clean speech. In noisy environment, there are also no significant differences between G.726 and 4-kbit/s coders from MOS results of ACR test.
Soon Y. Kwon, TNI (U.S.A.)
Hochong Park, SEC (Korea)
Hyokang Chang, ComBasis (U.S.A.)
This paper describes "BI-CELP: baseline and implied CELP," which is a high quality speech coding method based on a code excited linear prediction (CELP) model employing excitation vectors combined from two codebooks, one from the baseline codebook and the other from the implied codebook. In this method the index of the baseline codebook is coded and transmitted to the receiver while the index of the implied codebook is extracted from the synthesized speech. This method has been applied to a lower rate voice coder at 8 Kbit/s to produce high quality voice comparable to that of the 16 Kbit/s G.728 LD-CELP. The performance of the 8 Kbit/s BI-CELP coder is measured in terms of SNRseg and MOS. The average SNRseg is 12.14 dB which is 0.6 dB higher than that of the 8 Kbit/s G.729 CS-ACELP. The MOS for the quiet input is 3.8 which is 0.02 higher than that of G.729 CS-ACELP. BI-CELP algorithm is implemented in real-time on a single TMS320C31 with 27 MIPS of CPU.
Jayesh Patel, DSPSE (U.S.A.)
Pitch predictors are successfully used in Linear Prediction Analysis-by-Synthesis (LPAS) coders to model periodicity in speech. The various structures of pitch predictors are investigated and used in LPAS coders. In most of the low bit-rate LPAS coder design, single-tap or three-tap pitch are commonly used. Higher prediction gain can be achieved by using additional taps. 5-tap pitch predictor is rarely used in low bit-rate speech coder because of high complexity and bandwidth requirement in encoding additional tap gains. This paper describes the technique for reducing the complexity and bandwidth requirement for 5-tap pitch predictor.
Hong Kook Kim, SAIT (Korea)
Yong Duk Cho, SAIT (Korea)
Moo Young Kim, SAIT (Korea)
Sang Ryong Kim, SAIT (Korea)
This paper proposes a new 4 kbit/s speech coder based on CELP structure with 45 ms total codec delay. The coder is mainly featured by the renewal codebook of the excitation signal and the linked split-vector quantizer of LSPs which enable the coder to get high quality speech at low bit rate. In addition, techniques of the formant enhancement in spectral envelop and the harmonic recovery in transient region are also introduced to reduce buzzy and hoarse sounds, respectively. From the intensive listening test with intermediated response system (IRS) speech, we obtained the comparable subjective quality to 32 kbit/s ADPCM (ITU Recommendation G.726) under nominal speech input level of -26 dB overload.
Kari Järvinen, NRC (Finland)
Janne Vainio, NRC (Finland)
Pekka Kapanen, NRC (Finland)
Tero Honkanen, NRC (Finland)
Petri Haavisto, NRC (Finland)
Redwan Salami, USH (Canada)
Claude Laflamme, USH (Canada)
Jean-Pierre Adoul, USH (Canada)
This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit/s ADPCM). The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for error protection. Speech coding is based on the ACELP algorithm (Algebraic Code Excited Linear Prediction). The codec provides substantial quality improvement compared to the existing GSM full rate and half rate codecs. The old GSM codecs lack behind wireline quality even in error-free channel conditions, while the EFR codec provides wireline quality not only for error-free conditions but also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile to mobile calls).
Redwan Salami, University of Sherbrooke (Canada)
Claude Laflamme, University of Sherbrooke (Canada)
Bruno Bessette, University of Sherbrooke (Canada)
Jean-Pierre Adoul, University of Sherbrooke (Canada)
This paper describes the recently adopted ITU-T Recommendation G.729 Annex A (G.729A) for encoding speech signals at 8 kbit/s with low complexity. G.729A has been selected as the standard speech coding algorithm for multimedia digital simultaneous voice and data (DSVD). G.729A is bitstream interoperable with G.729; i.e., speech coded with G.729A can be decoded with G.729, and vice versa. As G.729, it uses the CS-ACELP algorithm with 10 ms frames. However, several algorithmic changes have been introduced into G.729 which resulted in 50% drop in its complexity, enabling a DSP implementation with a complexity of about 10--12 MIPS. This paper describes the algorithmic changes which have been introduced in order to achieve the low complexity goal while meeting the terms of reference. Subjective tests have been performed by ITU-T in both the selection phase and the characterization phase and the results showed that the performance of G.729A is equivalent to both G.729 and G.726 at 32 kbit/s in most operating conditions; however, it is slightly worse in case of three tandems and in the presence of background noise. A breakdown of the complexities of both G.729 and G.729A is given at the end of the paper.