Speech Coding 1

Takashi Masuko, Precision and Intelligence Laboratory, Tokyo Inst. of Tech. (Japan)
Keiichi Tokuda, Department of Computer Science, Nagoya Inst. of Tech. (Japan)
Takao Kobayashi, Interdisciplinary Graduate School of Science and Engineering, Tokyo Inst. of Tech. (Japan)

Page (NA) Paper number 777

Abstract:

This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.

SL980777.PDF (From Author) SL980777.PDF (Rasterized)

0777_01.WAV (was: 0777_01.wav)	synthesized from original spectral parameters File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4
0777_02.WAV (was: 0777_02.wav)	coded speech using speaker dependent models File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4
0777_03.WAV (was: 0777_03.wav)	coded speech using speaker independent models without adaptation File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4
0777_04.WAV (was: 0777_04.wav)	coded speech using adapted models without quantization of transfer vectors File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4
0777_05.WAV (was: 0777_05.wav)	coded speech using adapted models with quantization of transfer vectors File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4

TOP

ITU-T G.729 Extension At 6.4 kbps

Authors:

E. Ekudden, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
R. Hagen, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
B. Johansson, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
S. Hayashi, NTT Human Interface Labs (Japan)
A. Kataoka, NTT Human Interface Labs (Japan)
S. Kurihara, NTT Human Interface Labs (Japan)

Page (NA) Paper number 808

Abstract:

This paper describes the 6.4 kbit/s CS-ACELP coder being standardized as annex D to ITU-T G.729. The coder is based on the same building blocks as the 8 kbit/s G.729 to facilitate low complexity extensions to G.729 in terms of additional memory usage. It is fully switchable with the 8 kbit/s coder and provides additional flexibility to existing and emerging G.729 applications. The fixed codebook is a 2-pulse algebraic codebook. The adaptive codebook quantization has been changed and a new conjugate structure gain codebook is used. In order to compensate for the sparser algebraic codebook, an adaptive post-processing technique is used to enhance the quality for unvoiced speech and background noise sounds. Subjective tests have indicated that the coder has a performance close to that of G.729, and equivalent to that of G.723.1 at 6.3 kbit/s for speech.

SL980808.PDF (From Author) SL980808.PDF (Rasterized)

TOP

Adaptive Transformation for Segmented Parametric Speech Coding

Authors:

Damith J. Mudugamuwa, Royal Melbourne Institute of Technology (Australia)
Alan B. Bradley, Royal Melbourne Institute of Technology (Australia)

Page (NA) Paper number 107

Abstract:

In voice coding applications where there is no constraint on the encoding delay, segment coding techniques can be used to achieve a reduction in data rate. For low data rate linear predictive coding schemes, increasing the encoding delay allows one to exploit any long term temporal stationarities on an interframe basis, thus reducing the transmission bandwidth or storage needs of the speech signal. Transform coding has previously been applied in low data rate speech coding to exploit both the interframe and the intraframe correlation [1][6][8]. This paper investigates the potential of an adaptive transformation scheme for a segmented parametric speech representation. The problem of transform quantization is formulated and a solution methodology was proposed. The potential benefit of the use of the proposed adaptive transformation scheme is discussed in the context of segmented LSPs.

SL980107.PDF (Scanned)

TOP

Speech Enhancement Using STC-Based Bandwidth Extension

Authors:

Julien Epps, University of New South Wales (Australia)
W. Harvey Holmes, University of New South Wales (Australia)

Page (NA) Paper number 711

Abstract:

Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlation between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from narrowband speech. This paper presents a method for re-synthesizing narrowband coded speech using sinusoidal transform coding (STC), modified codebook mapping and a novel method for the synthesis of highband unvoiced components. Informal listening test results indicate that this method produces a significant quality improvement in speech which has been coded using narrowband standards.

SL980711.PDF (From Author) SL980711.PDF (Rasterized)

TOP

Performance And Optimization Of The SEEVOC Algorithm

Authors:

Weihua Zhang, ioWave, Inc. (USA)
W. Harvey Holmes, The University of New South Wales, School of Electrical Engineering (Australia)

Page (NA) Paper number 1128

Abstract:

The Spectral Envelope Estimation Vocoder (SEEVOC) is a successful spectral envelope estimation method that plays an important role in low bit rate speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, and shows that the required accuracy for the initial pitch estimate is greater than commonly supposed. It also generalizes and optimizes the SEEVOC algorithm by choice of the search range parameters a and b. Rules for the optimum choice of a and b are derived, based on both theoretical analysis and experimental results. The effects of noise on the SEEVOC algorithm are also investigated. Experimental results show that the SEEVOC algorithm performs better for voiced speech in the presence of noise than linear prediction (LP) analysis.

Speech Coding 1

Authors:

Page (NA) Paper number 777

Abstract:

(was: 0777_01.wav)

(was: 0777_02.wav)

(was: 0777_03.wav)

(was: 0777_04.wav)

(was: 0777_05.wav)

Authors:

Page (NA) Paper number 808

Abstract:

Authors:

Page (NA) Paper number 107

Abstract:

Authors:

Page (NA) Paper number 711

Abstract:

Authors:

Page (NA) Paper number 1128

Abstract: