Speech Coding 3

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

A New Linear Predictive Method for Compression of Speech Signals

Authors:

Paavo Alku, University of Turku (Finland)
Susanna Varho, University of Turku (Finland)

Page (NA) Paper number 3

Abstract:

A new linear predictive method is presented in this study. The method, Linear Prediction with Linear Extrapolation (LPLE), reformulates the computation of linear prediction by combining the preceding values of sample x(n) into consecutive sample pairs (i.e., x(n-2i), x(n-2i+1)). Each of these pairs determines a regression line the value of which at time instant n is used as a data sample in the prediction. The optimal LPLE-predictor is obtained by minimizing the square of the prediction error using the autocorrelation method. The rationale for the new method is the fact that LPLE yields an all-pole filter of order 2p when the number of unknowns in the normal equations equals p. Therefore the new all-pole modeling method can be used in speech coding applications. Preliminary experiments of the present study show that LPLE is able to model speech spectra more accurately in comparison to conventional linear prediction in the case when a very small number of prediction parameters is required to be used in order to greatly compress the spectral information of speech signals.

SL980003.PDF (From Author) SL980003.PDF (Rasterized)

TOP


Hierarchical Temporal Decomposition: A Novel Approach To Efficient Compression Of Spectral Characteristics Of Speech

Authors:

Shahrokh Ghaemmaghami, School of Electrical & Electronic Systems Engineering, Queensland University of Technology, Brisbane (Australia)
Mohamed Deriche, School of Electrical & Electronic Systems Engineering, Queensland University of Technology, Brisbane (Australia)
Sridha Sridharan, School of Electrical & Electronic Systems Engineering, Queensland University of Technology, Brisbane (Australia)

Page (NA) Paper number 673

Abstract:

The authors propose a new approach to Temporal Decomposition (TD) of characteristic parameters of speech for very low rate coding applications. The method models the articulatory dynamics employing a hierarchical error minimization algorithm which does not use Singular Value Decomposition. It is also much faster than conventional TD and could be implemented in real-time. High flexibility is achieved with the proposed method to comply with the desired coding requirements, such as compression ratio, accuracy, delay, and computational complexity. This method can be used for coding spectral parameters at rates 1000-1200 b/s with high fidelity and an algorithmic delay of less than 150 msec.

SL980673.PDF (From Author) SL980673.PDF (Rasterized)

TOP


Speech Intelligibility Testing for New Technologies

Authors:

Susan L. Hura, Lucent Technologies (USA)

Page (NA) Paper number 42

Abstract:

There are several tests of speech intelligibility currently available which employ a variety of methods. The most appropriate method for testing intelligibility of speech transmitted via telephony is a forced choice task in which listeners hear speech samples and identify what they hear from among a set of alternatives displayed onscreen. This methodology allows tests to be run quickly and scored automatically. A major flaw in existing forced-choice intelligibility tests is the use of unfamiliar words, nonwords, and proper names along with common words. A stimulus set that is mixed in this way may introduce response biases into the test and therefore produce results that are less predictive of actual intelligibility performance. The Intelligibility of Familiar Items Test (IFIT) ameliorates several methodological flaws found in earlier tests. The IFIT uses a stimulus set composed of high familiarity real English words and tests consonants in initial and final word position and vowels in word medial position.

SL980042.PDF (From Author) SL980042.PDF (Rasterized)

TOP


Efficient Quantization Of LSF Parameters Based on Temporal Decomposition

Authors:

Sung Joo Kim, KAIST (Korea)
Sangho Lee, KAIST (Korea)
Woo Jin Han, KAIST (Korea)
Yung Hwan Oh, KAIST (Korea)

Page (NA) Paper number 469

Abstract:

In this paper, we present a restricted temporal decomposition method for LSF parameters. The event vectors estimated by this method preserve the ordering property of LSF parameters so that they can be quantized efficiently. Experimental results show that interpolated LSF parameters can be quantized transparently at the rate of 753bps. Also we design a LPC vocoder at 996bps as an application of the proposed method. According to a listening test, the reconstructed speech of our vocoder has reasonable quality comparing with 2400bps LPC10e.

SL980469.PDF (From Author) SL980469.PDF (Rasterized)

TOP


A Sinusoidal Harmonic Vocoder at 1.2 kbps Using Auditory Perceptual Characteristics

Authors:

Minoru Kohata, Chiba Institute of Technology (Japan)

Page (NA) Paper number 37

Abstract:

In this paper, a very low bit speech coder at 1.2 Ops is newly proposed. Like the LPC vocoder, it requires few types of information (power, pitch, and spectral information), but its quality is far superior. In the proposed vocoder, the synthesized speech quality is improved based on auditory perceptual characteristics. The synthesis method is one of harmonic coding, using sinusoids whose frequencies are multiples of the fundamental frequency, where the amplitudes of the sinusoids, are adaptively modulated using Gammatone filters as a perceptual weighting filter. The sinusoids' phases are also adjusted so as to maximize the perceptual quality. In order to reduce the total bit rate to 1.2 Ops, a new segment coder for spectral information (LSP coefficients) using DP matching is also proposed. The quality of the synthesized speech is considerably improved compared with that of the simple I-PC vocoder, according to MOS and preference tests.

SL980037.PDF (Scanned)

TOP


A 16 Kbit/s Wideband CELP Coder Using MEL-Generalized Cepstral Analysis and its Subjective Evaluation

Authors:

Kazuhito Koishida, University of California, Santa Barbara (USA)
Gou Hirabayashi, Toshiba Corporation (Japan)
Keiichi Tokuda, Nagoya Institute of Technology (Japan)
Takao Kobayashi, Tokyo Institute of Technology (Japan)

Page (NA) Paper number 904

Abstract:

We have proposed a wideband CELP coder, called MGC-CELP, which provides high quality speech by utilizing mel-generalized cepstral (MGC) analysis instead of linear prediction (LP). In this paper, we investigate the performance of the wideband MGC-CELP coder at 16 kbit/s in terms of short-term predictor order, i.e., order of MGC analysis. Subjective tests show that the MGC-CELP coder with a predictor of order 20 gives better performance than ITU-T G.722 at 64 kbit/s. It is also found that the MGC-CELP coder with 12th order achieves comparable quality to the 64 kbit/s G.722, and outperforms the 16 kbit/s conventional CELP coder using 20th-order LP analysis under the same conditions.

SL980904.PDF (From Author) SL980904.PDF (Rasterized)

TOP


Comparison Of Spectral Estimation Techniques For Low Bit-Rate Speech Coding

Authors:

D.J. Molyneux, School of Engineering, University of Manchester (U.K.)
C.I. Parris, Ensigma Ltd. (U.K.)
X.Q. Sun, Voxware Inc. (USA)
B.M.G. Cheetham, School of Engineering, University of Manchester (U.K.)

Page (NA) Paper number 946

Abstract:

Many low bit-rate speech coders represent the spectral envelope by an all-pole digital filter whose coefficients are calculated by a form of linear prediction (LP) analysis. The lower the bit-rate, the more critical will be the accuracy of the spectral analysis for achieving good quality speech. This paper compares four known techniques: a technique based on cubic spline interpolation, DAP, MVDR, and iterative all-pole modelling. First, the accuracy obtained for artificial and real speech spectra is assessed for each technique by calculating the degree of spectral distortion with reference to the spectral envelope sampled at the pitch-harmonics. Then, each technique is used to characterise the spectral amplitudes generated by a 2.4 kb/s multi-band excitation (MBE) coder. Results show that significantly better spectral accuracy is obtained using DAP. However listening tests on MBE encoded speech indicate that the advantage of DAP over the other techniques is not strongly perceptible.

SL980946.PDF (From Author) SL980946.PDF (Rasterized)

TOP


Low Bit Rate Coding for Speech and Audio Using Mel Linear Predictive Coding (MLPC) Analysis

Authors:

Yoshihisa Nakatoh, Matsushita Electric Industrial Co., Ltd. (Japan)
Takeshi Norimatsu, Matsushita Electric Industrial Co., Ltd. (Japan)
Ah Heng Low, Faculty of Engineering, Shinshu University (Malaysia)
Hiroshi Matsumoto, Faculty of Engineering, Shinshu University (Japan)

Page (NA) Paper number 1100

Abstract:

This paper proposes a low bit rate coding method for speech and audio using a new analysis method named MLPC (Mel-LPC analysis). In MLPC analysis a spectrum envelope is estimated on a mel- or bark-frequency scale, so as to improve the spectral resolution in the low frequency band. This analysis is accomplished with about two-fold increase in computation over the standard LPC analysis. Our coding algorithm using MLPC analysis consists of five key parts: time frequency transformation, inverse filtering by MLPC spectrum envelope, power normalization, perceptual weighting estimation, and multi-stage VQ. In subjective experiments, we have investigated the performance of MLPC analysis, through paired comparison tests between the MLPC analysis and the standard LPC one in inverse filtering. In all bit rates, almost all the listeners feel decoding sounds by the MLPC analysis is superior to the LPC one. Especially in low bit rate, there is a great difference between them.

SL981100.PDF (From Author) SL981100.PDF (Rasterized)

TOP


Comparison Study on VQ Codevector Index Assignment

Authors:

Jeng-Shyang Pan, National Kaohsiung Institute of Technology (Taiwan)
Chin-Shiuh Shieh, National Kaohsiung Institute of Technology (Taiwan)
Shu-Chuan Chu, University of South Australia (Australia)

Page (NA) Paper number 31

Abstract:

Vector quantization is a popular technique in low bit rate coding of speech signal. The transmission index of the codevector is highly sensitive to channel noise. The channel distortion can be reduced by organizing the codevector indices suitably. Several index assignment algorithms are studied comparatively. Among them, the index allocation algorithm proposed by Wu and Barba is the fastest method but the channel distortion is the worst one. The proposed parallel tabu search algorithm reach the best performance of channel distortion.

SL980031.PDF (From Author) SL980031.PDF (Rasterized)

TOP


Using Linguistic Knowledge To Improve The Design Of Low-Bit Rate LSF Quantisation

Authors:

John J. Parry, University of Wollongong (Australia)
Ian S. Burnett, University of Wollongong (Australia)
Joe F. Chicharo, University of Wollongong (Australia)

Page (NA) Paper number 137

Abstract:

In this paper we investigate an alternative approach to the design of low-bit rate (LBR) quantisation. This approach incorporates phonetic information into the structure of Line Spectral Frequency (LSF) codebooks. In prior work vector quantisation (VQ) has been used to quantise stochastic processes. Speech signals can, however, be described in terms of phonetic segments and linguistic rules. A trained LSF codebook, like the phonetic inventory of a language, is a static description of spectral behaviour of speech. As clear relationships exist between phonetic segments and LSFs the structure of an LSF codebook can be analysed in terms of the phonetic segments. The investigation leads to the conclusion that phonetic information can be usefully employed in codebook training in terms of perceptual performance and bit-rate reductions.

SL980137.PDF (From Author) SL980137.PDF (Rasterized)

TOP


Transform Coding of LSF Parameters Using Wavelets

Authors:

Davor Petrinović, Faculty of EE and C, University of Zagreb (Croatia)

Page (NA) Paper number 1114

Abstract:

A method of inter-frame transform coding of Line Spectrum Frequencies (LSF) using the Discrete Wavelet Transform is presented in this paper. Each component of the LSFs (or of their linear transform) is treated separately and is decomposed into a set of subband signals using the nonuniform filter bank. Subband signals are quantized and coded independently. By the appropriate choice of the mother Wavelet, subband signal with the lowest rate comprises most of the LSF waveform energy. Filter bank effectively decorrelates the input signal, enabling more efficient quantization of the subband signals. A suitable weighted Euclidean distance measure in the Wavelet domain is proposed, defining optimal static or dynamic bit allocation of the subband signals. It is shown that the average bit rate for coding of the DCT transformed LSFs can be reduced by 0.9 bits per vector component by using a very simple Wavelet. The total delay due to the inter-frame coding is only 90ms that is acceptable even for a medium bit rate speech coders.

SL981114.PDF (From Author) SL981114.PDF (Rasterized)

TOP


Source Controlled Variable Bit-Rate Speech Coder Based On Waveform Interpolation

Authors:

F. Plante, Dept. Electrical Engineering & Electronics, Liverpool University (U.K.)
B.M.G. Cheetham, Dept. Electrical Engineering & Electronics, Liverpool University (U.K.)
D. Marston, Ensigma Ltd (U.K.)
P.A. Barrett, BT Laboratories (U.K.)

Page (NA) Paper number 848

Abstract:

This paper describes a source controlled variable bit-rate (SC-VBR) speech coder based on the concept of prototype waveform interpolation. The coder uses a four mode classification : silence, voiced, unvoiced and transition. These modes are detected after the speech has been decomposed into slowly evolving (SEW) and rapidly evolving (REW) waveforms. A voicing activity detection (VAD), the relative level of SEW and REW and the cross-correlation coefficient between characteristic waveform segments are used to make the classification. The encoding of the SEW components is improved using a gender adaptation. In tests using conversational speech, the SC-VBR allows a compression factor of around 3. The VBR coder was evaluated against a fixed rate 4.6kbit/s PWI coder for clean speech and noisy speech and was found to perform better for male speech and for noisy speech.

SL980848.PDF (From Author) SL980848.PDF (Rasterized)

TOP


Improving Speaker Recognisability In Phonetic Vocoders

Authors:

Carlos M. Ribeiro, INESC (Portugal)
Isabel M. Trancoso, INESC (Portugal)

Page (NA) Paper number 448

Abstract:

Phonetic vocoding is one of the methods for coding speech below 1000 bit/s. The transmitter stage includes a phone recogniser whose index is transmitted together with prosodic information such as duration, energy and pitch variation. This type of coder does not transmit spectral speaker characteristics and speaker recognisability thus becomes a major problem. In our previous work, we adapted a speaker modification strategy to minimise this problem, modifying a codebook to match the spectral characteristics of the input speaker. This is done at the cost of transmitting the LSP averages computed for vowel and glide phones. This paper presents new codebook generation strategies, with gender dependence and interpolation frames, that lead to better speaker recognisability and speech quality. Relatively to our previous work, some effort was also devoted to deriving more efficient quantization methods for the speaker-specific information , that considerably reduced the average bit rate, without quality degradation.

SL980448.PDF (From Author) SL980448.PDF (Rasterized)

0448_01.WAV
(was: 0448)
wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
0448_02.WAV
(was: 0448)
wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
0448_03.WAV
(was: 0448)
wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
0448_04.WAV
(was: 0448)
wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
0448_05.WAV
(was: 0448)
wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown

TOP