Session W2D Wideband Speech Coding

Chairperson Jean Pierre Martens Univ. of Gent, Belgium

Home

A 16-kbit/s Wideband Speech Codec Scalable with G.729

Authors: A. Kataoka, S. Kurihara, S. Sasaki, and S. Hayashi

NTT Human Interface Labs. 3-9-11, Midori-cho, Musashino-shi, Tokyo 180, Japan Tel. +81 422 59 4707, FAX: +81 422 60 7811, E-mail: kata@splab.hil.ntt.co.jp

Volume 3 pages 1491 - 1494

ABSTRACT

A wideband speech scalable codec is proposed for improving the flexibility in telecommunication networks. This coder is scalable with G.729 (ITU 8-kbit/s standard). Its decoder can process the incoming bitstream at three bit rates (8, 12, and 16 kbit/s) and provide a choice of speech types (wideband and telephone-band). The codec has a split-band structure, where both bands are coded by analysis-by-synthesis techniques. This paper proposes two types of scalable codec: a separate one and a composite one. It also proposes a new method (an additional adaptive codebook) for predicting pitch, while maintaining scalability with the G.729 codec. Subjective testing for wideband speech showed that the quality of the proposed codec at 16-kbit/s is equivalent to that of the 64-kbit/s G.722, and at 12-kbit/s is better than that of the 48-kbit/s G.722. Testing has further demonstrated that the 8-kbit/s coder provides high quality for telephone-band speech.

A0109.pdf

TOP

COMPARISON OF AUDITORY MASKING MODELS FOR SPEECH CODING

Authors: M. Lynch, E. Ambikairajah and A. Davis*

Speech Research Group, Department of Electronic Engineering, Regional Technical College, Athlone, Ireland. * BT Laboratories, Martlesham Heath, Ipswich IP5 7RE, U.K. Tel.: +353 902 24542, FAX: +353 902 24493, E-mail: eambi@server1.rtc-athlone.ie

Volume 3 pages 1495 - 1498

ABSTRACT

In this paper various auditory masking models recently developed for audio coding are compared and evaluated for telephone bandwidth speech coding applications. Four such models are outlined and their performance evaluated using a Wavelet Packet Transform based subband coder. The models are compared on the basis of the resulting perceptual speech quality and bit rate requirements. Results show that masking models 3 and 4 outlined in this paper provide near transparent quality at the lowest bit rates.

A0579.pdf

TOP

WIDEBAND SPEECH CODING BASED ON THE MBE STRUCTURE

Authors: A. Amodio and G. Feng

Institut de la Communication Parlée, UPRESA 5009 INPG/ENSERG/Université Stendhal B.P. 25, 38040 GRENOBLE CEDEX 09, FRANCE Tel. +33 (0)4 76 82 41 20, FAX: +33 (0)4 76 82 43 35, E-mail: amodio@icp.grenet.fr

Volume 3 pages 1499 - 1502

ABSTRACT

This paper deals with the adaptation to wideband of the MBE coder which was initially developed for the telephone band. As the constraints of quality and bit rate for a wideband and a telephone band coder are different, and as the signal characteristics on these two bands are different too, we must reconsider the coder structure. Several improvements are proposed, some of which were already proposed for the telephone band such as the phonetic classification of the frames or the multi-harmonic modelling of the spectrum. We also propose in order to reach a good quality, especially for high frequency voices, to model and synthesize, as part of the signal, the initial error between the synthetic and original spectra.

A0670.pdf

TOP

Perceptual Filter Comparisons for Wideband and FM Bandwidth Audio Coders

Authors: Marcos Perreau Guimaraes (1) , Nicolas Moreau (2) , Madeleine Bonnet (1)

(1) Universite Rene Descartes-Paris 5, UFR de Mathematiques et Informatique 45 rue des Saints Peres, 75270 Paris Cedex 06 email : perm,bonnet@math-info.univ-paris5.fr (2) ENST/SIG, 46 rue Barrault, 75634 Paris Cedex 13 email : moreau@sig.enst.fr

Volume 3 pages 1503 - 1506

ABSTRACT

High quality music coders commonly use auditory masked thresholds to account for the characteristics of the human ear. Perceptual filters (based upon linear signal prediction used in speech coders) are compared to filters using masked thresholds. Using listening tests, we have noticed that the second method does not provide better perceptual results. A natural way of proceeding would be to define a better psychoacoustical model. However, an intermediate method is presented here which allows additional degrees of freedom in a standard technique. The roots of the whitening filter are treated individually.

A1101.pdf

TOP

Wideband Coding of Speech using Neural Network Gain Adaptation

Authors: Cheung-Fat Chan and Man-Tak Chu

Department of Electronic Engineering City University of Hong Kong 83, Tat Chee Avenue, Kowloon, HONG KONG Email : eecfchan@cityu.edu.hk

Volume 3 pages 1507 - 1510

ABSTRACT

In this paper, a high-quality wideband speech coder is proposed. The coding structure resembles a LD-CELP coder, however, several novel improvements are made. The gain adapter for the stochastic codebook is driven by a neural network and it updates the excitation gain in a sample-by-sample fashion. The purpose of incorporating a neural network is to exploit both the intra- and inter-frame correlation of speech signal in a non-linear manner. A psychoacoustic model instead of a simple perceptual weighting filter is used to shape the quantization noise. Simulation result shows that the proposed coder can achieve transparent coding of wideband speech at 16 kbps.

A1140.pdf

TOP

WIDEBAND-SPEECH APVQ CODING FROM 16 TO 32 KBPS

Authors: Josep M. Salavedra

* Department of Signal Theory and Communications. Universitat Politècnica de Catalunya. Campus Nord UPC, Mòdul D5. Gran Capità s/n , 08034 BARCELONA. SPAIN Phone: +34.3.4016440. Telefax: +34.3.4016447. E-mail: mia@gps.tsc.upc.es

Volume 3 pages 1511 - 1514

ABSTRACT

This paper describes a coding scheme for broadband speech (sampling frequency 16KHz). We present a wideband speech encoder called APVQ (Adaptive Predictive Vector Quantization). It combines Subband Coding, Vector Quantization and Adaptive Prediction as it is represented in Fig.1. Speech signal is split in 16 subbands by means of a QMF filter bank and so every subband is 500Hz wide. This APVQ encoder can be seen either as a vectorial extension of a conventional ADPCM encoder or as a scalar Subband AVPC encoder [1],[3]. In this scheme, signal vector is formed with one sample of the normalized prediction error signal coming from different subbands and then it is vector quantized. Prediction error signal is normalized by its gain and normalized prediction error signal is the input of the VQ and therefore an adaptive Gain-Shape VQ is considered. This APVQ Encoder combines the advantages of Scalar Prediction and those of Vector Quantization. We evaluate wideband speech coding in the range from 1 to 2 bits/sample.

A1359.pdf