Speech Coding at Low Bit Rate

Home

Efficient Mixed Excitation Models in LPC Based Prototype Interpolation Speech Coders

Authors:

Charalampos Papanastasiou, University of Manchester (U.K.)
Costas Xydeas, University of Manchester (U.K.)

Volume 2, Page 1555

Abstract:

This paper presents a new and efficient method for modelling voiced, mixed excitation spectra in Sinusoidal (SC) and Prototype Interpolation Coding (PIC) systems. Speech harmonics are classified as weak-voiced or strong-voiced by simply examining the short-term residual magnitude spectrum. This information is encoded effectively in terms of fixed width frequency bands and is used to control sets of periodic and random sine wave oscillators which model the short-term mixed excitation nature of speech. In this way the model allows the mixing of periodic and random signal energy on a harmonic basis. The proposed methodology has been used in a 2.4Kbits/sec speech coder, whose recovered speech quality is better than that of the 4.8Kbits/sec DoD standard.

ic971555.pdf

TOP

High Quality Split-Band LPC Vocoder Operating at Low Bit Rates

Authors:

Ian Atkinson, University of Surrey Centre for Satellite Eng. Research (U.K.)
Suat Yeldener, University of Surrey Centre for Satellite Eng. Research (U.K.)
Ahmet Kondoz, University of Surrey Centre for Satellite Eng. Research (U.K.)

Volume 2, Page 1559

Abstract:

LPC based speech coders operating at bit rates below 3.0 kbits/sec are usually associated with buzzy or metallic artefacts in the synthetic speech. These are mainly attributable to the simplifying assumptions made about the excitation source, which are usually required to maintain such low bit rates. In this paper a new LPC vocoder is presented which splits the LPC excitation into two frequency bands using a variable cut-off frequency. The lower band is responsible for representing the voiced parts of speech, whilst the upper band represents unvoiced speech. In doing so the coders performance during both mixed voicing speech and speech containing acoustic noise is greatly improved, producing soft natural sounding speech. The paper also describes new parameter determination and quantisation techniques vital to the operation of this coder at such low bit rates.

ic971559.pdf

1462_a.wav example of coded speech at 2.5kbits/sec.

TOP

Non-linear Techniques for Pitch and Waveform Enhancement in PWI Coders

Authors:

Hui Li, University of Leeds (U.K.)
Gordon B. Lockhart, University of Leeds (U.K.)

Volume 2, Page 1563

Abstract:

Two non-linear interpolation techniques are introduced for enhancing speech reproduction in Prototype Waveform Interpolation (PWI) and similar encoders. A Temporal Differential Rate (TDR) vector is used to characterise the non- uniform evolution of pitch cycle temporal structure during interpolation. Experimental results show a clear improvement in the accuracy of decoded pitch cycle lengths and in the reproduction of periodicity in general. It is also shown that waveform reproduction can be significantly improved by vector quantising sets of Optimal Combination Coefficients (OCC) aimed at maximising the similarity between interpolated and target signal segments. Both time domain waveform similarity and frequency domain spectral envelope similarity derived OCC are tested. Subjective assessment suggests a general preference for non-linear interpolation methods and the scheme using frequency domain derived OCC with perceptual weighting provided the best subjective preference.

ic971563.pdf

TOP

Multi-Prototype Waveform Coding Using Frame-by-Frame Analysis-by-Synthesis

Authors:

Ian S. Burnett, University of Wollongong (Australia)
Duong H. Pham, University of Wollongong (Australia)

Volume 2, Page 1567

Abstract:

A new mechanism for using Analysis-by-Synthesis techniques in low rate Waveform Interpolation based coders is introduced. The algorithm, implemented as part of a Multi-Prototype Waveform coder, exploits the high quality speech produced by interpolating unquantised speech-domain Prototype Waveforms. In the new scheme, a frame of Prototype Waveforms is quantised using two sets of codebook searches, one representing the slowly evolving prototype shape and the other the rapid, noisy components. The scheme offers performance advantages over the previous open-loop Multi-Prototype Waveform coder, particularly when perceptual weighting is incorporated in the search. Reductions in search complexity and the use of the scheme for quantisation at higher rates are also considered. This results in a generalised Analysis-by-Synthesis Waveform Interpolation architecture with closed-loop optimisation of all Prototype Waveform properties.

ic971567.pdf

TOP

Multiband Prototype Waveform Analysis for Very Low Bit Rate Speech Coding

Authors:

Khashayar Yaghmaie, University of Surrey (U.K.)
Ahmet Kondoz, University of Surrey (U.K.)

Volume 2, Page 1571

Abstract:

Prototype waveform interpolation is one of the most efficient compression techniques for coding the speech signal at bit rates below 4 kb/s. Most of the PWI coders employ prototype waveforms of the linear predictive residual signal for coding purpose. In the latest PWI systems, decomposition methods are used to separate the voiced and unvoiced components of the prototype waveforms prior to coding. This has resulted in high quality speech at very low bit rates. This paper presents a novel combination of the Multiband voicing analysis and PWI coding system in which the Multiband analysis is exploited to identify the voiced and unvoiced spectral components of the prototype waveforms of the original speech signal. To produce a high quality synthetic speech, energy variation of the original signal is recovered by transmitting its energy envelope. This method resulted in a high quality and low complexity coder operating at 2.55 kb/s.

ic971571.pdf

TOP

A Formant Vocoder based on Mixtures of Gaussians

Authors:

Parham Zolfaghari, Cambridge University (U.K.)
Tony Robinson, Cambridge University (U.K.)

Volume 2, Page 1575

Abstract:

This paper describes a new low bit-rate formant vocoder. The formant parameters are represented by Gaussian mixture distributions, which are estimated from the discrete Fourier transform (DFT) magnitude spectrum of the speech signal. A voiced/unvoiced classification mechanism has been developed based on the harmonic nature of each formant in the DFT spectrum modulated by the Gaussian Mixture distribution. Using a magnitude-only sinusoidal synthesiser, intelligible synthetic speech has been obtained. Vector quantisation of the vocal tract parameters enables this formant vocoder to operate at bit-rates down to 1248 bps.

ic971575.pdf

TOP

Natural Quality Variable--Rate Spectral Speech Coding Below 3.0 kbps

Authors:

Engin Erzin, Lucent Technologies (U.S.A.)
Arun Kumar, UC, Santa Barbara (U.S.A.)
Allen Gersho, UC, Santa Barbara (U.S.A.)

Volume 2, Page 1579

Abstract:

We propose new techniques for natural quality variable rate spectral speech coding at an average rate of 2.2 kbps for dialog speech and 2.8 kbps for monolog speech. The coder models the Fourier spectrum of each frame and it builds on recent enhancements to the classical multiband excitation (MBE) approach. New techniques for robust pitch estimation and tracking, for efficient quantization of voiced and unvoiced spectra and encoding of partial phase information are the key features that result in improved quality over earlier spectral vocoders. Subjective performance results are reported which show that the coder is very close in quality to the ITU-T G.723.1 algorithm at 5.3 kbps.

ic971579.pdf

2118_a.wav Female-1 original
2118_b.wav Female-1 coded
2118_c.wav Female-2 original
2118_d.wav Female-2 coded
2118_e.wav Male-1 original
2118_f.wav Male-1 coded
2118_g.wav Male-2 original
2118_h.wav Male-2 coded

TOP

A New 2-kbit/s Speech Coder Based on Normalized Pitch Waveform

Authors:

Yuusuke Hiwasaki, NTT Human Interface Labs (Japan)
Kazunori Mano, NTT Human Interface Labs (Japan)

Volume 2, Page 1583

Abstract:

Speech coding at very low bitrate is useful for purposes such as voice communication over computer networks. However, speech coding at around 2.0 kbit/s is difficult for CELP coders while maintaining a high quality. In this paper, a speech coding model called `normalized pitch waveform' and its quantization scheme are presented, aiming for effective compression coding of the `voiced' speech. Listening tests has proven that an efficient and high quality coding has been achieved at bitrate 2.0 kbit/s, less than half of the FS1016. Furthermore, this paper discusses the disadvantage of the normalized pitch waveform and presents an alternative method of using non-normalized pitch waveform. Encoding of a transitional `mixed' state between the `voiced' and the `unvoiced' state is discussed for further improvements.

ic971583.pdf

TOP

A Comparison of the New 2400 bps MELP Federal Standard with Other Standard Coders

Authors:

Mary A. Kohler, U.S. D.o.D. (U.S.A.)

Volume 2, Page 1587

Abstract:

In 1996, the U.S. Department of Defense Digital Voice Processing Consortium (DDVPC) selected Texas Instrument's mixed excitation linear prediction (MELP) algorithm as the recommended new federal standard for 2400 bps voice communications. The algorithm selection process involved quality, intelligibility, communicability, and recognizability testing in many acoustic noise, error, and tandem conditions. Algorithm complexity was also measured. This paper compares the performance scores, diagnostic information, and complexity of MELP to the 4800 bps federal standard (FS1016) code excited linear prediction (CELP) algorithm, the 16 kbps continuously variable slope delta modulation (CVSD) algorithm, and the venerable federal standard (FIPS Pub. 137) 2400 bps linear predictive coding (LPC-10) algorithm.

ic971587.pdf

TOP

MELP: The New Federal Standard at 2400 bps

Authors:

Lynn M. Supplee, Department of Defense (U.S.A.)
Ronald P. Cohn, Department of Defense (U.S.A.)
John S. Collura, Department of Defense (U.S.A.)
Alan V. McCree, Texas Instruments (U.S.A.)

Volume 2, Page 1591

Abstract:

This paper describes the new U.S. Federal Standard at 2400 bps. The Mixed Excitation Linear Prediction (MELP) coder was chosen by the DoD Digital Voice Processing Consortium to replace the existing 2400 bps Federal Standard FS 1015 (LPC-10). This new standard provides equal or improved performance over the 4800 bps Federal Standard FS 1016 (CELP) at a rate equivalent to LPC-10. The MELP coder is based on the traditional LPC model, but includes additional features to improve its performance.

ic971591.pdf

TOP

Using a Perception-Based Frequency Scale in Waveform Interpolation

Authors:

Jes Thyssen, AT&T Labs (U.S.A.)
Bastiaan Kleijn, AT&T Labs (U.S.A.)
Roar Hagen, AT&T Labs (U.S.A.)

Volume 2, Page 1595

Abstract:

In speech coding it is important to focus the coding effort on the perceptually important features of the speech signal. This paper describes new quantization techniques which take advantage of current knowledge of human perception in speech coders. The new procedures exploit the frequency-dependent frequency resolution of the human auditory system. The methods are applied to the waveform interpolation (WI) coder, and their effectiveness is confirmed with experimental results. The principles described in the paper are not restricted to the WI coder, but are also applicable to many other speech coding algorithms.

ic971595.pdf

TOP

Very low complexity interpolative speech coding at 1.2 to 2.4 Kbps

Authors:

Yair Shoham, Bell Laboratories, Lucent Technologies (U.S.A.)

Volume 2, Page 1599

Abstract:

The recently-introduced waveform interpolation (WI) coders provide good-quality speech at low rates but may be too complex for commercial use. This paper proposes new approaches to low-complexity WI speech coding at rates of 1.2 and 2.4 kbps. The proposed coders are 4 to 5 times faster than the previously reported ones . At 2.4 kbps, the complexity is about 7.5 and 2.5 MFLOPS for the encoder and decoder, respectively. At 1.2 kbps, the complexity is about 6 and 2.3 MFLOPS for the encoder and decoder, respectively. Informal subjective evaluation shows that, at 2.4 kbps, the quality is close to that of the high-complexity coders. The quality does not significantly degrade at 1.2 kbps and it is considered sufficient for messaging applications.

ic971599.pdf

TOP

Modified Multiband Excitation Model at 2400 bps

Authors:

Michele Jamrozik, Clemson University (U.S.A.)
John Gowdy, Clemson University (U.S.A.)

Volume 2, Page 1603

Abstract:

This paper presents the Modified Multiband Excitation Model used for speech coding. In many MBE model coders, speech quality is degraded when incorrect voicing decisions are made, particularly for high-pitched female speakers. The MMBE addresses this issue with a modified voiced/unvoiced decision algorithm and a more robust pitch estimate. The listening quality of speech produced using the MMBE model is superior to the FS-1016 CELP coder and is at least comparable with the new 2400 bps MELP coder chosen as the new 2400 bps Federal Standard.

ic971603.pdf

TOP

Variable Bit Rate MBELP Speech Coding Via V/UV Distribution Dependent Spectral Quantization

Authors:

Eric W.M. Yu, City University of Hong Kong (Hong Kong)
Cheung-Fat Chan, City University of Hong Kong (Hong Kong)

Volume 2, Page 1607

Abstract:

A variable bit rate multiband excited linear predictive speech coder is proposed in this paper. Speech signal is compressed in different bit rates ranging from 0.88 kbps to 2.6 kbps according to the mode of operation and the optimum V/UV transition frequency. An average bit rate of 1.24 kbps is achieved. The proposed speech coder improves the speech quality by splitting the non-stationary speech segments for analysis. The V/UV distribution of a short-time speech spectrum is represented efficiently by using a closed-loop minimised V/UV transition frequency. Depending on the V/UV transition frequency, the spectrum envelope is quantized in variable bit rate through embedded differential predictive scalar and vector quantizations of the LSP parameters. The proposed spectral quantization scheme results in a spectral distortion comparable to a fixed 24-bit 2-dimensional differential scalar quantization scheme.