Signal Processing and Speech Analysis 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Improving Pitch Estimation with Short Duration Speech Samples

Authors:

William A. Ainsworth, Department of Communication & Neuroscience, Keele University (U.K.)
Charles R. Day, Department of Communication & Neuroscience, Keele University (U.K.)
Georg F. Meyer, Department of Communication & Neuroscience, Keele University (U.K.)

Page (NA) Paper number 512

Abstract:

Hermes' Sub Harmonic Summation (SHS) pitch determination algorithm is an effective technique for extracting the percept of pitch from human speech. Effective determination of the pitch in a passage of speech is believed to be fundamental for higher level speech processing applications such as speech or speaker recognition. Of particular interest is the need to extract pitch from speech in less than ideal conditions eg. in the presence of noise or using very short analysis windows. In an attempt to deliver accurate pitch estimates from relatively short analysis windows this paper describes an evaluation of two forms of the SHS procedure: in one case, FFT-SHS, the procedure uses the conventional Fast Fourier Transform (FFT) in its spectral analysis step; in the second case, RAFT-SHS, the ReAssigned Fourier Transform (RAFT) technique is used instead of the FFT.

SL980512.PDF (From Author) SL980512.PDF (Rasterized)

TOP


An Instantaneous-Frequency-Based Pitch Extraction Method for High-Quality Speech Transformation: Revised TEMPO in the STRAIGHT-Suite

Authors:

Hideki Kawahara, Wakayama University/ATR/CREST (Japan)
Alain de Cheveigné, Paris 7 University/CNRS (France)
Roy D. Patterson, CNBH University of Cambridge (U.K.)

Page (NA) Paper number 659

Abstract:

A new source information extraction algorithm is proposed to provide a reliable source signal for an extremely high-quality speech analysis, modification, and transformation system called STRAIGHT-suite (Speech Transformation and Representation based on Adaptive Interpolation of weiGHTed spectrogram). The proposed method makes use of instantaneous frequencies in harmonic components based on their reliability. A performance evaluation is conducted using a simultaneous EGG (Electroglottograph) recording as the reference signal. The error variance for F0 extraction using the proposed algorithm is shown to be about 1/3 that of the previous F0 extraction method used in STRAIGHT-suite, although the previous algorithm is still competitive with conventional F0 extraction methods.

SL980659.PDF (From Author) SL980659.PDF (Rasterized)

TOP


Speaker-Independent Speech Recognition Using Micro Segment Spectrum Integration

Authors:

Kiyoaki Aikawa, NTT Human Interface Laboratories, Speech and Acoustics Laboratories (Japan)

Page (NA) Paper number 262

Abstract:

This paper proposes a new spectral estimation method for automatic speech recognition. The spectrum estimated with the conventional data window of around 30 ms shows harmonic structure in the voiced portions of speech data. The harmonic frequency interval is often comparable to the formant frequency interval for female voices with high F0, which results in spectral estimation error. The new idea is to estimate spectrum by taking the Lp norm of the time series of the spectrum obtained from a very short speech segment. The new method, called the micro-segment spectrum integration, provides (1) precise spectral estimation not affected by harmonic structure, and (2) noise-robustness by suppressing noisy speech segments. Phoneme recognition experiments demonstrate that the micro-segment spectrum integration method outperforms conventional spectral estimation methods.

SL980262.PDF (From Author) SL980262.PDF (Rasterized)

TOP


On Robust Speech Analysis Based On Time-Varying Complex AR Model

Authors:

Keiichi Funaki, Hokkaido University (Japan)
Yoshikazu Miyanaga, Hokkaido University (Japan)
Koji Tochinai, Hokkaido University (Japan)

Page (NA) Paper number 1001

Abstract:

We have already developed time-varying complex AR (TV-CAR) parameter estimation based on minimizing mean square error (MMSE) for analytic speech signal. Although the MMSE approach is commonly and successfully applied in various parameter estimation such as conventional LPC, it is well-known that an MMSE method easily suffers from biased and inaccurate spectrum estimation due to non-Gaussian nature of glottal excitation for voiced speech in the context of speech analysis. This paper offers robust parameter estimation algorithm for the TV-CAR model by applying Huber's robust M-estimation approach and two kinds of robust algorithms are derived: Newton-type algorithm and weighted least squares (WLS) algorithm. The preliminary experiments with synthetic signal generated by glottal source model excitation and natural speech uttered by female speaker demonstrate that the time-varying complex AR method is sufficiently robust against non-Gaussian nature of glottal source excitation owing to the improved resolution in the frequency domain.

SL981001.PDF (Scanned)

TOP


Spectral Basis Functions from Discriminant Analysis

Authors:

Hynek Hermansky, Oregon Graduate Institute Of Science And Technology (USA)
Narendranath Malayath, Oregon Graduate Institute Of Science And Technology (USA)

Page (NA) Paper number 616

Abstract:

The work examines Karhunen-Loeve Transform and Linear Discriminant Analysis as means for designing optimized spectral bases for the projection of the critical-band auditory-like spectrum.

SL980616.PDF (From Author) SL980616.PDF (Rasterized)

TOP