Topics in Speech Coding I

Chair: R. Salami, University of Sherbrooke, Canada

Home

A Two Stage Hybrid Embedded Speech/Audio Coding Structure

Authors:

Volume 1, Page 337, Paper number 1805

Abstract:

A two stage hybrid embedded speech/audio coding structure is proposed. The structure uses a speech coder as a core to provide the minimal bitrate and an acceptable performance on speech inputs. The second stage is transform coder using a MDCT and perceptual coding principles. This stage is itself embedded both in complexity and bitrate, and provides various levels of enhancement of the core output, particularly for general audio signals like music. Informal A-B comparison tests show that the performance of the structure at 16 kb/s is between that of the GSM Enhanced Full Rate coder at 12.2 kb/s, and the G.728 LD-CELP coder at 16 kb/s.

ic981805.pdf (From Postscript)

TOP

A Bitrate and Bandwidth Scalable CELP Coder

Authors:

Toshiyuki Nomura, NEC Corp - C&C Media Research Labs (Japan)
Masahiro Iwadare, NEC Corp - C&C Media Research Labs (Japan)
Masahiro Serizawa, NEC Corp - C&C Media Research Labs (Japan)
Kazunori Ozawa, NEC Corp - C&C Media Research Labs (Japan)

Volume 1, Page 341, Paper number 1469

Abstract:

This paper proposes a bitrate and bandwidth scalable CELP speech coder. The proposed coder is based on multi-pulse-based CELP coding and consists of a bitrate scalable base-band coder and a bandwidth extension tool. The bitrate scalable base-band CELP coder employs multi-stage excitation coding based on an embedded-coding approach.The multi-pulse excitation codebook at each stage is adaptively produced depending on the selected excitation signal at the previous stage. The bandwidth scalability is realized by bandwidth-conversion from base-band CELP parameters to those for wideband without a widely used subband structure. The bandwidth-conversion improves base-band coding quality and expands bandwidth, simultaneously. The comparison test results show that the bitrate scalable coder is equivalent in speech quality to the fixed-bitrate CELP coder at the same bitrate for the narrowband speech. In the MOS tests, the proposed 16 kbit/s coder with the bandwidth scalability achieves equivalent coding quality to ITU-T G.722 at 56 kbit/s.

ic981469.pdf (From Postscript)

TOP

Nonlinear Prediction with Neural Nets in ADPCM

Authors:

Marcos Faundez-Zanuy, Escola Universitaria Politecnica de Mataro (Spain)
Francesc Vallverdu, Signal Theory & Communications Dept. (Spain)
Enric Monte, Signal Theory & Communications Dept. (Spain)

Volume 1, Page 345, Paper number 1182

Abstract:

In the last years there has been a growing interest for nonlinear speech models. Several works have been published revealing the better performance of nonlinear techniques, but little attention has been dedicated to the implementation of the nonlinear model into real applications. This work is focussed on the study of the behaviour of a nonlinear predictive model based on neural nets, in a speech waveform coder. Our novel scheme obtains an improvement in SEGSNR between 1 and 2 dB for an adaptive quantization ranging from 2 to 5 bits.

ic981182.pdf (Scanned)

TOP

Mach1: Nonuniform Time-Scale Modification of Speech

Authors:

Michele M Covell, Interval Research Corporation (U.S.A.)
Margaret M Withgott, Electric Planet (U.S.A.)
Malcolm G. Slaney, Interval Research Corporation (U.S.A.)

Volume 1, Page 349, Paper number 1515

Abstract:

We propose a new approach to nonuniform time compression, called Mach1, designed to mimic the natural timing of fast speech. At identical overall compression rates, listener comprehension for Mach1-compressed speech increased between 5 and 31 percentage points over that for linearly compressed speech, and response times dropped by 15%. For rates between 2.5 and 4.2 times real time, there was no significant comprehension loss with increasing Mach1 compression rates. In A-B preference tests, Mach1-compressed speech was chosen 95% of the time. This paper describes the Mach1 technique and our listener-test results. Audio examples can be found on http://www.interval.com/papers/1997-061/.

ic981515.pdf (From Postscript)

TOP

Speech Compression Based on Exact Modeling and Structured Total Least Norm Optimization

Authors:

Philippe Lemmerling, Katholieke Universiteit Leuven (Belgium)
Ioannis Dologlou, Katholieke Universiteit Leuven (Belgium)
Sabine Van Huffel, Katholieke Universiteit Leuven (Belgium)

Volume 1, Page 353, Paper number 1693

Abstract:

We present a new speech coding algorithm, based on an all-pole model of the vocal tract. Whereas current Auto Regressive (AR) based modeling techniques (e.g. CELP, LPC-10) minimize a prediction error, approach determines the closest (in L2 norm) signal, which exactly satisfies an all-pole model. Each frame is then encoded by storing the parameters of the complex damped exponentials deduced from the all-pole model and its initial conditions. Decoding is performed by adding the complex damped exponentials based on the transmitted parameters. The new algorithm is demonstrated on a speech signal. The quality is compared with that of a standard coding algorithm at comparable compression ratios, by using the segmental Signal-to-Noise Ratio (SNR).

ic981693.pdf (From Postscript)

TOP

Gender Adapted Speech Coding

Authors:

David F Marston, Ensigma Ltd (U.K.)

Volume 1, Page 357, Paper number 1729

Abstract:

Speech coders that are optimized to the characteristics of a particular set of speakers will outperform a speech coder which caters for all speakers; providing that thespeaker using it is one of that particular set. This paper describes how speech coders that are optimized to either male or female speech can be an improvement over unoptimised coders. These improvements are bit-rate reduction, speech quality and robustness. A reliable gender identifier is described, which would be practical for the most demanding applications, achieving 95% accuracy after 1 second of speech. The improvements in terms of gender specific speech coding are shown in LSF quantisation with bit-saving, and pitch detection with both bit-saving and robustness.

ic981729.pdf (Scanned)

TOP

On Nonlinear Utilization of Intervector Dependency in Vector Quantization

Authors:

Mikael Skoglund, Royal Institute of Technology (Sweden)
Jan Skoglund, Chalmers University of Technology (Sweden)

Volume 1, Page 361, Paper number 2029

Abstract:

This paper presents an approach to vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder for such decoding. The optimal decoder can be implemented as a table look-up decoder, however the size of the decoder codebook grows very fast with the size of the collection of utilized indices. This leads us to introduce a method for storing an approximation to the set of optimal decoder vectors, based on linear mapping of a block code vector quantization. In this approach a heavily reduced set of parameters is employed to represent the codebook. Furthermore, we illustrate that the proposed scheme has an interpretation as nonlinear predictive quantization. Numerical results indicate high gain over memoryless coding and memory quantization based on linear predictive coding. The results also show that the sub-optimal approach performs close to the optimal.

ic982029.pdf (From Postscript)

TOP

A Voice Activity Detector Employing Soft Decision Based Noise Spectrum Adaptation

Authors:

Jongseo Sohn, Seoul National University (Korea)
Wonyong Sung, Seoul National University (Korea)

Volume 1, Page 365, Paper number 2228

Abstract:

In this paper, a voice activity detector (VAD) for variable rate speech coding is decomposed into two parts, a decision rule and a background noise statistic estimator, which are analysed separately by applying a statistical model. A robust decision rule is derived from the generalized likelihood ratio test by assuming that the noise statistics are known a priori. To estimate the time-varying noise statistics, allowing for the occasional presence of the speech signal, a novel noise spectrum adaptation algorithm using the soft decision information of the proposed decision rule is developed. The algorithm is robust, especially for the time-varying noise such as babble noise.

ic982228.pdf (From Postscript)

TOP

Towards a Synergistic Multistage Speech Coder

Authors:

Manohar N Murthi, University of California, San Diego (U.S.A.)
Bhaskar D. Rao, University of California, San Diego (U.S.A.)

Volume 1, Page 369, Paper number 2243

Abstract:

In this paper, we propose some new modeling techniques that provide a more synergistic approach to multistage time-domain speech compression. In particular, we propose a new error criterion for determining all-pole filters, and a unique method for jointly coding the pulse information in excitation vectors. The new error criterion for determining all-pole filters is based upon minimizing the sum of the residual signal's absolute values raised to a power less than one. It is shown to be a desirable cost function for yielding residual signals that are more sparse, and consequently better suited for multistage compression than Linear Prediction residuals. Statistical reasons supporting the new criterion are also provided. Furthermore, exploiting the properties of, and the relationship between, the Linear Prediction and Minimum Variance spectra, we propose a novel parameter set for jointly coding the excitation vector's pulse position, sign, and gain information.

ic982243.pdf (From Postscript)

TOP

Robust Speech Decoding: Can Error Concealment Be Better Than Error Correction?

Authors:

Tim Fingscheidt, Aachen University of Technology (Germany)
Peter Vary, Aachen University of Technology (Germany)
Jesus A. Andonegui, Aachen University of Technology (Germany)

Volume 1, Page 373, Paper number 1207

Abstract:

Digital speech transmission systems use source coding to reduce the bit rate and channel coding to correct transmission errors. Furthermore, in periods of a very poor channel quality error concealment of residual bit errors becomes necessary as channel decoding fails. However, if the channel is clear, channel coding would not be required at all and the speech quality could be improved by allowing a higher bit rate for source encoding. Usually a compromise is taken between speech quality in case of clear channel and error robustness in case of poor channel quality. This paper addressesthe problem of a joint optimization of error concealment and source/channel coding. Under the premise of a minimum mean square error criterion for signal reconstruction it turns out that error concealment instead of error correction may be the best choice if source coding leaves sufficient residual parameter correlations by less bit rate reduction.