Topics in Speech Coding I

Home


Encoding of Speech Spectral Parameters Using the Adaptive Quantization Methods

Authors:

Insung Lee, Chungbuk National University (Korea)
Hong Chae Woo, Taegu University (Korea)

Volume 2, Page 1335

Abstract:

Efficient quantization methods of the line spectrum pairs(LSP) which have good performances, low complexity and memory are proposed. The adaptive quantization method utilizing the ordering property of LSP parameters is used in a scalar quantizer and a vector-scalar hybrid quantizer. The maximum quantization range of each LSP parameter is varied adaptively on the quantized value of the previous order's LSP parameter. The proposed scalar quantization algorithm needs 31 bits/frame which is 3 bits less than in the conventional scalar quantization method with interframe prediction to maintain the transparent quality of speech. The improved vector-scalar quantizer achieves an average spectral distortion of 1 dB using 26 bits/frame. The performance of proposed quantization methods are evaluated in the channel errors.

ic971335.pdf

TOP



Optimal Transformation of LSP Parameters Using Neural Network

Authors:

Hai Le Vu, BME HIT (Hungary)
László Lois, BME HIT (Hungary)

Volume 2, Page 1339

Abstract:

In this paper, the intraframe correlation properties of Line Spectrum Pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. An important nonuniform statistical characteristics of LSP frequencies are investigated. Based upon this nonuniform property the neural network based techniques for generating the transform vectors via system training are studied. Using Principal Component Analysis (PCA) network to decorrelate LSP coefficients, we show that these new approaches lead to as good or better distortion as compared to other methods for speech analysis-synthesis.

ic971339.pdf

ic971339.pdf

TOP



Speech spectrum representation and coding using multigrams with distance

Authors:

Jan Cernocký, ESIEE (France)
Geneviève Baudoin, ESIEE (France)
Gérard Chollet, ENST (France)

Volume 2, Page 1343

Abstract:

The multigrams allow us to split a string of symbols into a stream of variable length sequences. The direct application of this method to vector-quantized speech spectra fails, we develop an extension of the method called modified multigrams or multigrams with distance. The algorithm for modified multigram dictionary training as well as experimental results are presented. We found a significant improvement of rate/distortion ratio in comparison to vector quantization with small codebooks. For precise spectrum representation, this method is less suitable and we see its application rather in speech segmentation or in very low bit rate coding.

ic971343.pdf

ic971343.pdf

TOP



Incorporating Perception Into LSF Quantization - Some Experiments

Authors:

Ronald P. Cohn, U.S. DoD (U.S.A.)
John S. Collura, U.S. DoD (U.S.A.)

Volume 2, Page 1347

Abstract:

In the context of vector quantization (VQ) of the line spectrum frequency (LSF) parameters, we determine experimentally a spectral distribution of quantization error perceived to be "balanced", i.e., error at all frequencies contributing equally, on average, to the perceived distortion. Quantizers which have a balanced distribution should outperform those which don't, given the same number of bits. We examine the spectral error distributions produced by various weighted Euclidean distance measures in the LSF domain and develop one which produces a quantizer having an approximately balanced distribution. This quantizer's performance is compared with that of others having different error distributions.

ic971347.pdf

ic971347.pdf

TOP



Predictive VQ for Noisy Channel Spectrum Coding: AR or MA?

Authors:

Jan Skoglund, Chalmers University of Technology (Sweden)
Jan Lindén, Chalmers University of Technology (Sweden)

Volume 2, Page 1351

Abstract:

In this paper, the performance of different predictive vector quantization (PVQ) structures is studied and compared for different degrees of channel noise. Predictive quantization schemes with an auto-regressive (AR) decoder structure are compared with schemes that employ a moving average (MA) decoder. For noisy channels MA prediction performs better than AR. It is shown here that a combination of a PVQ scheme (AR or MA) and a memoryless VQ outperforms both types of traditional predictive quantizer schemes in noiseless as well as noisy channels.

ic971351.pdf

ic971351.pdf

TOP



Efficient Encoding of Mel-Generalized Cepstrum for CELP Coders

Authors:

Kazuhito Koishida, P&I Lab., Tokyo Institute of Technology (Japan)
Takao Kobayashi, P&I Lab., Tokyo Institute of Technology (Japan)
Satoshi Imai, P&I Lab., Tokyo Institute of Technology (Japan)
Keiichi Tokuda, Nagoya Institute of Technology (Japan)

Volume 2, Page 1355

Abstract:

In this paper, the performance of several algorithms for the quantization of the mel-generalized cepstral coefficients is studied. First, the objective and subjective performance of two-stage vector quantization (VQ) is measured. It is shown that subjective quality for the mel-generalized cepstral coefficients is higher than that for LSP. Secondly, interframe prediction is introduced in the encoding of mel-generalized cepstral coefficients. By utilizing interframe moving average (MA) prediction, the mel-generalized cepstral coefficients can be encoded more efficiently than LSP in terms of cepstral distortion. Finally, we implement a CELP coder based on mel-generalized cepstral analysis in which mel-generalized cepstral coefficients are quantized using MA prediction. This coder has higher objective quality than conventional CELP.

ic971355.pdf

ic971355.pdf

TOP



A Candidate Coder for the ITU-T's New Wideband Speech Coding Standard

Authors:

Juin-Hwey Chen, Voxware Inc. (U.S.A.)

Volume 2, Page 1359

Abstract:

This paper presents AT&T's candidate coder for the ITU-T's new wideband speech coding standard at 16, 24 and 32 kb/s. This coder achieves high speech quality with a low coder complexity. The basic idea of the coder is to perform closed-loop pitch prediction on perceptually weighted speech, and then quantize the prediction residual using perceptually based transform coding techniques. A first version of the coder based on DFT was thoroughly tested and submitted to the ITU-T in February 1996, and it was selected as one of two surviving candidates to advance to the next phase. A revised version based on MDCT was later submitted in October 1996. Both versions are described in this paper.

ic971359.pdf

TOP



Perceptual Speech Coding Using Time and Frequency Masking Constraints

Authors:

Benito Carnero, LTS-DE, EPFL (Switzerland)
Andrzej Drygajlo, LTS-DE, EPFL (Switzerland)

Volume 2, Page 1363

Abstract:

This paper presents a new wide-band speech coding system based on a fast wavelet packet transform algorithm as well as a formulation of temporal and spectral psychoacoustic models of masking. The proposed FFT-like overlapped block orthogonal transform allows us to approximate the auditory critical band decomposition in an efficient manner, which is a major advantage over previous approaches that used uniform filter banks. As a result of such a decomposition, the perceptually tuned time-frequency structure of the original speech signal is preserved. This allows us to make use of the temporal and spectral properties of the human auditory system to decrease the average bit rate of the encoder, while perceptually hiding the quantization error.

ic971363.pdf

ic971363.pdf

TOP



A Multi-band CELP Wideband Speech Coder

Authors:

Anil Ubale, UCSB (U.S.A.)
Allen Gersho, Pennsylvania State University (U.S.A.)

Volume 2, Page 1367

Abstract:

A novel low-delay wideband speech coder, called Multi-band CELP (MB-CELP), overcomes the major obstacles usually associated with two traditional CELP approaches to wideband speech coding - namely fullband CELP and split-band CELP. The new MB-CELP coder employs a multi-band bank of off-line filtered excitation codebooks, fullband linear prediction synthesis, and minimization of the error between original and synthesized speech signal over the full frequency range. A 16 kbps version of MB-CELP coder with two equal bands, is described in this paper. Subjective comparison test results show that this coder performs better than the G.722 coder at the bit-rate of 48 kbps.

ic971367.pdf

ic971367.pdf

TOP



A Design of Transform Coder for Both Speech and Audio Signals at 1 bit/sample

Authors:

Takehiro Moriya, NTT Human Interface Labs. (Japan)
Naoki Iwakami, NTT Human Interface Labs. (Japan)
Akio Jin, NTT Human Interface Labs. (Japan)
Kazunaga Ikeda, NTT Human Interface Labs. (Japan)
Satoshi Miki, NTT Human Interface Labs. (Japan)

Volume 2, Page 1371

Abstract:

This paper proposes a speech and audio coder which operates at 1 bit/sample, namely an 8 kbit/s coder for 8 kHz sampling or a 16 kbit/s coder for 16 kHz sampling. The basic structure is inherited from a TwinVQ (Transform domain Weighted Interleave Vector Quantization) high-quality audio coding scheme. Periodical component extraction scheme is newly added to the quantization of MDCT coefficients. This scheme is found to be effective for reducing distortion and improving robustness against channel errors. Qualities for music signals at 8 kbit/s are better than those of G.729 at the same bit rates, while they are worse for clean speech. Qualities at 16 kbit/s are comparable or better than those of G.722 at 48 kbit/s.

ic971371.pdf

ic971371.pdf

TOP



Speech Quality Assessment of Compounded Digital Telecommunication Systems

Authors:

Kim Tilgaard Petersen, Tele Danmark A/S (Denmark)
Steffen Duus Hansen, Technical University of Denmark (Denmark)
John Aasted Sorensen, Technical University of Denmark (Denmark)

Volume 2, Page 1375

Abstract:

Digital telecommunication networks may involve a multiple number of public switched telephone networks (PSTN), cellular and mobile systems and to some extent also satellite systems. Most of these networks contain non-linear speech coders and other speech algorithms which may degrade the overall end-to-end quality of speech. An important problem is how to assess the speech quality of such compounded systems. The object of this paper is to describe the first stage of the construction of a proposed three-layer model for speech quality assessment. A subjective test of the speech quality of 16 different compounded transmission paths (mixtures of PCM, GSM full and half rate, DECT, CELP, LD CELP, FS10-16) is carried out by 40 subjects using 21 different rating scales. The main result of this paper is the test results which lead to the definition of four main perceptual dimensions to be used in the second layer of the proposed model.

ic971375.pdf

ic971375.pdf

TOP



Performance Assessment of Tandem Connection of Cellular and Satellite-Mobile Coders

Authors:

Simão F. Campos Neto, COMSAT (U.S.A.)
Franklin L. Corcoran, COMSAT (U.S.A.)
Ara Karahisar, Teleglobe (Canada)

Volume 2, Page 1379

Abstract:

In the near future, 16 and 8 kbit/s toll- or near-toll low-rate codecs are expected to be used together with 32 kbit/s digital circuit multiplication equipment, providing speech compression and digital speech interpolation. Additionally, a growing proportion of international calls originate from different digital cellular/satellite mobile (C/SM) systems. Knowledge of the end-to-end voice quality of tandem connections is fundamental in the planning of international circuits. Previous studies assessed tandem performance of cellular codecs and the fixed network, however satellite-mobile systems were not included. This paper presents a subjective evaluation of the voice quality of tandem connections of C/SM codecs in seven basic scenarios. This study concludes that the number of codecs used in tandem should be minimized and network capacity has to be increased for a given traffic load if voice quality cannot be compromised. In extreme cases, calls originating from C/SM terminals should be transmitted using clear channels.

ic971379.pdf

ic971379.pdf

TOP



The Consequences of Linguistic Perception on Low Rate Speech Coding

Authors:

John J. Parry, University of Wollongong (Australia)
Ian S. Burnett, University of Wollongong (Australia)

Volume 2, Page 1383

Abstract:

This paper considers the issue of the effect of languages and linguistic perception on low rate speech coding. Current algorithms exploit the redundancies of speech but these redundancies are not common across all languages. Similarly speech coder evaluation techniques do not take into account the nuances of linguistic perception across languages. This paper illustrates some of the linguistic sensitivities experienced by low-rate coders and explores approaches to low-rate coder design. This is achieved through an evaluation of cross-language spectral distortion measures which account for specific linguistic peculiarities influencing linguistic perception.

ic971383.pdf

ic971383.pdf

TOP



Using a Quantitative Psychoacoustical Signal Representation for Objective Speech Quality Measurement

Authors:

Martin Hansen, University of Oldenburg (Germany)
Birger Kollmeier, University of Oldenburg (Germany)

Volume 2, Page 1387

Abstract:

This paper describes the application of a quantitative psychoacoustical signal preprocessing model for objective speech quality measurement. The preprocessing is applied to transform the original and the distorted speech signal to an internal representation which is thought of as the information that is accessible to higher neural stages of perception. From a comparison of these internal representations a quality measure can be derived that shows a high correlation to the subjective MOS data of various test data bases. The inherent parameters of the preprocessing model were derived directly from psychoacoustical data independent of the present study. The detection thresholds of codec-like distortions obtained in a psychoacoustical experiment could also be predicted by the model. This indicates that the internal representation contains the relevant information for detecting perceivable differences. It provides evidence for a direct relation between speech quality and detectability of a distortion.

ic971387.pdf

ic971387.pdf

TOP