Speech Coding 1

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

A Very Low Bit Rate Speech Coder Using HMM With Speaker Adaptation

Authors:

Takashi Masuko, Precision and Intelligence Laboratory, Tokyo Inst. of Tech. (Japan)
Keiichi Tokuda, Department of Computer Science, Nagoya Inst. of Tech. (Japan)
Takao Kobayashi, Interdisciplinary Graduate School of Science and Engineering, Tokyo Inst. of Tech. (Japan)

Page (NA) Paper number 777

Abstract:

This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.

SL980777.PDF (From Author) SL980777.PDF (Rasterized)

0777_01.WAV
(was: 0777_01.wav)
synthesized from original spectral parameters
File type: Sound File
Format: Sound File: WAV
Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear
Creating Application:: sox-10
Creating OS: SunOS 4.1.4
0777_02.WAV
(was: 0777_02.wav)
coded speech using speaker dependent models
File type: Sound File
Format: Sound File: WAV
Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear
Creating Application:: sox-10
Creating OS: SunOS 4.1.4
0777_03.WAV
(was: 0777_03.wav)
coded speech using speaker independent models without adaptation
File type: Sound File
Format: Sound File: WAV
Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear
Creating Application:: sox-10
Creating OS: SunOS 4.1.4
0777_04.WAV
(was: 0777_04.wav)
coded speech using adapted models without quantization of transfer vectors
File type: Sound File
Format: Sound File: WAV
Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear
Creating Application:: sox-10
Creating OS: SunOS 4.1.4
0777_05.WAV
(was: 0777_05.wav)
coded speech using adapted models with quantization of transfer vectors
File type: Sound File
Format: Sound File: WAV
Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear
Creating Application:: sox-10
Creating OS: SunOS 4.1.4

TOP


ITU-T G.729 Extension At 6.4 kbps

Authors:

E. Ekudden, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
R. Hagen, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
B. Johansson, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
S. Hayashi, NTT Human Interface Labs (Japan)
A. Kataoka, NTT Human Interface Labs (Japan)
S. Kurihara, NTT Human Interface Labs (Japan)

Page (NA) Paper number 808

Abstract:

This paper describes the 6.4 kbit/s CS-ACELP coder being standardized as annex D to ITU-T G.729. The coder is based on the same building blocks as the 8 kbit/s G.729 to facilitate low complexity extensions to G.729 in terms of additional memory usage. It is fully switchable with the 8 kbit/s coder and provides additional flexibility to existing and emerging G.729 applications. The fixed codebook is a 2-pulse algebraic codebook. The adaptive codebook quantization has been changed and a new conjugate structure gain codebook is used. In order to compensate for the sparser algebraic codebook, an adaptive post-processing technique is used to enhance the quality for unvoiced speech and background noise sounds. Subjective tests have indicated that the coder has a performance close to that of G.729, and equivalent to that of G.723.1 at 6.3 kbit/s for speech.

SL980808.PDF (From Author) SL980808.PDF (Rasterized)

TOP


Adaptive Transformation for Segmented Parametric Speech Coding

Authors:

Damith J. Mudugamuwa, Royal Melbourne Institute of Technology (Australia)
Alan B. Bradley, Royal Melbourne Institute of Technology (Australia)

Page (NA) Paper number 107

Abstract:

In voice coding applications where there is no constraint on the encoding delay, segment coding techniques can be used to achieve a reduction in data rate. For low data rate linear predictive coding schemes, increasing the encoding delay allows one to exploit any long term temporal stationarities on an interframe basis, thus reducing the transmission bandwidth or storage needs of the speech signal. Transform coding has previously been applied in low data rate speech coding to exploit both the interframe and the intraframe correlation [1][6][8]. This paper investigates the potential of an adaptive transformation scheme for a segmented parametric speech representation. The problem of transform quantization is formulated and a solution methodology was proposed. The potential benefit of the use of the proposed adaptive transformation scheme is discussed in the context of segmented LSPs.

SL980107.PDF (Scanned)

TOP


Speech Enhancement Using STC-Based Bandwidth Extension

Authors:

Julien Epps, University of New South Wales (Australia)
W. Harvey Holmes, University of New South Wales (Australia)

Page (NA) Paper number 711

Abstract:

Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlation between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from narrowband speech. This paper presents a method for re-synthesizing narrowband coded speech using sinusoidal transform coding (STC), modified codebook mapping and a novel method for the synthesis of highband unvoiced components. Informal listening test results indicate that this method produces a significant quality improvement in speech which has been coded using narrowband standards.

SL980711.PDF (From Author) SL980711.PDF (Rasterized)

TOP


Performance And Optimization Of The SEEVOC Algorithm

Authors:

Weihua Zhang, ioWave, Inc. (USA)
W. Harvey Holmes, The University of New South Wales, School of Electrical Engineering (Australia)

Page (NA) Paper number 1128

Abstract:

The Spectral Envelope Estimation Vocoder (SEEVOC) is a successful spectral envelope estimation method that plays an important role in low bit rate speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, and shows that the required accuracy for the initial pitch estimate is greater than commonly supposed. It also generalizes and optimizes the SEEVOC algorithm by choice of the search range parameters a and b. Rules for the optimum choice of a and b are derived, based on both theoretical analysis and experimental results. The effects of noise on the SEEVOC algorithm are also investigated. Experimental results show that the SEEVOC algorithm performs better for voiced speech in the presence of noise than linear prediction (LP) analysis.

SL981128.PDF (From Author) SL981128.PDF (Rasterized)

TOP