ABSTRACT
A wideband speech scalable codec is proposed for improving the flexibility in telecommunication networks. This coder is scalable with G.729 (ITU 8-kbit/s standard). Its decoder can process the incoming bitstream at three bit rates (8, 12, and 16 kbit/s) and provide a choice of speech types (wideband and telephone-band). The codec has a split-band structure, where both bands are coded by analysis-by-synthesis techniques. This paper proposes two types of scalable codec: a separate one and a composite one. It also proposes a new method (an additional adaptive codebook) for predicting pitch, while maintaining scalability with the G.729 codec. Subjective testing for wideband speech showed that the quality of the proposed codec at 16-kbit/s is equivalent to that of the 64-kbit/s G.722, and at 12-kbit/s is better than that of the 48-kbit/s G.722. Testing has further demonstrated that the 8-kbit/s coder provides high quality for telephone-band speech.
ABSTRACT
In this paper various auditory masking models recently developed for audio coding are compared and evaluated for telephone bandwidth speech coding applications. Four such models are outlined and their performance evaluated using a Wavelet Packet Transform based subband coder. The models are compared on the basis of the resulting perceptual speech quality and bit rate requirements. Results show that masking models 3 and 4 outlined in this paper provide near transparent quality at the lowest bit rates.
ABSTRACT
This paper deals with the adaptation to wideband of the MBE coder which was initially developed for the telephone band. As the constraints of quality and bit rate for a wideband and a telephone band coder are different, and as the signal characteristics on these two bands are different too, we must reconsider the coder structure. Several improvements are proposed, some of which were already proposed for the telephone band such as the phonetic classification of the frames or the multi-harmonic modelling of the spectrum. We also propose in order to reach a good quality, especially for high frequency voices, to model and synthesize, as part of the signal, the initial error between the synthetic and original spectra.
ABSTRACT
High quality music coders commonly use auditory masked thresholds to account for the characteristics of the human ear. Perceptual filters (based upon linear signal prediction used in speech coders) are compared to filters using masked thresholds. Using listening tests, we have noticed that the second method does not provide better perceptual results. A natural way of proceeding would be to define a better psychoacoustical model. However, an intermediate method is presented here which allows additional degrees of freedom in a standard technique. The roots of the whitening filter are treated individually.
ABSTRACT
In this paper, a high-quality wideband speech coder is proposed. The coding structure resembles a LD-CELP coder, however, several novel improvements are made. The gain adapter for the stochastic codebook is driven by a neural network and it updates the excitation gain in a sample-by-sample fashion. The purpose of incorporating a neural network is to exploit both the intra- and inter-frame correlation of speech signal in a non-linear manner. A psychoacoustic model instead of a simple perceptual weighting filter is used to shape the quantization noise. Simulation result shows that the proposed coder can achieve transparent coding of wideband speech at 16 kbps.
ABSTRACT
This paper describes a coding scheme for broadband speech (sampling frequency 16KHz). We present a wideband speech encoder called APVQ (Adaptive Predictive Vector Quantization). It combines Subband Coding, Vector Quantization and Adaptive Prediction as it is represented in Fig.1. Speech signal is split in 16 subbands by means of a QMF filter bank and so every subband is 500Hz wide. This APVQ encoder can be seen either as a vectorial extension of a conventional ADPCM encoder or as a scalar Subband AVPC encoder [1],[3]. In this scheme, signal vector is formed with one sample of the normalized prediction error signal coming from different subbands and then it is vector quantized. Prediction error signal is normalized by its gain and normalized prediction error signal is the input of the VQ and therefore an adaptive Gain-Shape VQ is considered. This APVQ Encoder combines the advantages of Scalar Prediction and those of Vector Quantization. We evaluate wideband speech coding in the range from 1 to 2 bits/sample.