Speech Coding 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Towards a Unified Model for Low Bit-Rate Speech Coding Using a Recognition-Synthesis Approach

Authors:

Wendy J. Holmes, DERA (U.K.)

Page (NA) Paper number 553

Abstract:

This paper proposes a recognition-synthesis approach to speech coding which uses a formant trajectory model for both recognition and synthesis. It is argued that this unified approach to coding has the potential to achieve low data rates whilst preserving speech quality and paralinguistic information. A simple coding scheme is described which establishes the principles of this approach. Formant analysis is applied to the input speech, and the formant features are input to a linear-trajectory segmental hidden Markov model recognizer to locate segment boundaries. The formant parameters for each segment are coded using a linear trajectory description, and used to drive a parallel-formant synthesizer to reproduce the utterance at the receiver. The coding method has been tested on utterances from a variety of speakers. In the current system, which has not yet been optimised for coding efficiency, speech is typically coded at 600-1000 bits/s with good intelligibility, whilst preserving speaker characteristics.

SL980553.PDF (From Author) SL980553.PDF (Rasterized)

TOP


On the Significance of Temporal Masking in Speech Coding

Authors:

Jan Skoglund, Chalmers University of Technology, Department of Signals and Systems (Sweden)
W. Bastiaan Kleijn, Royal Institute of Technology, Department of Speech, Music and Hearing (Sweden)

Page (NA) Paper number 747

Abstract:

This paper addresses the issue of masking of noise in voiced speech. First, we examine the audibility of cyclostationary narrow-band noise added to voiced speech generated by synthetic excitation. Varying the temporal location of noise within a pitch cycle corresponds to varying its phase spectrum. Using this fact, we find that a phase change of the noise in the high frequency region is more perceptible for a low-pitched sound than for a high-pitched sound. We propose a pitch-dependent temporal weighting function and we show experimentally that it is beneficial to the quantization of pitch-cycle waveforms.

SL980747.PDF (From Author) SL980747.PDF (Rasterized)

TOP


Waveform Interpolation Coding With Pitch-Spaced Subbands

Authors:

W. Bastiaan Kleijn, KTH (Royal Institute of Technology) (Sweden)
Huimin Yang, Tsinghua University (China)
Ed F. Deprettere, Delft University of Technology (The Netherlands)

Page (NA) Paper number 1069

Abstract:

We present new waveform-interpolation coding procedures which allow perfect reconstruction of the speech signal from the unquantized parameter set. Instead of using adaptive parameter extraction methods, we combine a time warping of the original signal with nonadaptive parameter extraction methods. The new coding structure has good performance at low bit rates and provides convergence to the original waveform with increasing rate.

SL981069.PDF (From Author) SL981069.PDF (Rasterized)

TOP


An Improved Decomposition Method For WI Using IIR Wavelet Filter Banks

Authors:

Nicola R. Chong, University of Wollongong (Australia)
Ian S. Burnett, University of Wollongong (Australia)
Joe F. Chicharo, University of Wollongong (Australia)

Page (NA) Paper number 142

Abstract:

In this paper, we present an alternative characteristic waveform (CW) decomposition mechanism for the Waveform Interpolation (WI) paradigm based on the Pitch Synchronous Wavelet Transform (PSWT). In this technique, IIR filters replace the conventional FIR filters of the PSWT, offering computational and spectral magnitude performance advantages, in addition to significant delay reductions. Previously, the PSWT has only incorporated filter banks with slowly reacting FIR wavelet filters. While these filters possess the desirable properties of linear phase, and design simplicity, a large delay is incurred which increases exponentially with increasing resolution. The progression to IIR filter banks gives rise to a multi-resolution decomposition mechanism, beneficial for real-time applications, such as speech coding, where delay is an important issue.

SL980142.PDF (From Author) SL980142.PDF (Rasterized)

TOP