Signal Processing and Speech Analysis 3

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Determination of Articulatory Positions from Speech Acoustics by Applying Dynamic Articulatory Constraints

Authors:

Shin Suzuki, NTT Basic Research Laboratories (Japan)
Takesi Okadome, NTT Basic Research Laboratories (Japan)
Masaaki Honda, NTT Basic Research Laboratories (Japan)

Page (NA) Paper number 130

Abstract:

A method for determining articulatory parameters from speech acoustics is presented. The method is based on a search of an articulatory-acoustic codebook which is designed from simultaneous observation data of articulatory motions and speech acoustics. The codebook search employs dynamic constraints on acoustic behavior as well as articulatory behavior. There are two constrains. One of the constraints is use of spectral segments in the codebook search and the other is use of the smoothness of articulatory trajectories in the articulatory parameter path search. The articulatory parameters are determined by selecting the articulatory code vector in the codebook which minimizes the weighted distance measure of segmental spectral distance and squared distance between succeeding articulatory parameters. The results of an experiment show that an rms error between the estimated and observed articulatory parameter was about 2.0 mm on average, and the articulatory features for vowels and consonants are recovered well.

SL980130.PDF (From Author) SL980130.PDF (Rasterized)

TOP


Recognizing Emotions in Speech Using Short-term and Long-term Features

Authors:

Yang Li, University of Illinois at Urbana-Champaign (USA)
Yunxin Zhao, University of Illinois at Urbana-Champaign (USA)

Page (NA) Paper number 379

Abstract:

The acoustic characteristics of speech are influenced by speakers' emotional status. In this study, we attempted to recognize the emotional status of individual speakers by using speech features that were extracted from short-time analysis frames as well as speech features that represented entire utterances. Principal component analysis was used to analyze the importance of individual features in representing emotional categories. Three classification methods including vector quantization, artificial neural networks and Gaussian mixture density model were used. Classifications using short-term features only, long-term features only and both short-term and long-term features were conducted. The best recognition performance of 62% accuracy was achieved by using the Gaussian mixture density method with both short-term and long-term features.

SL980379.PDF (From Author) SL980379.PDF (Rasterized)

TOP


PeriphEar : A Nonlinear Active Model of the Auditory Periphery

Authors:

Arnaud Robert, CIRC Group, Swiss Federal Institute of Technology, Lausanne (Switzerland)
Jan Eriksson, Physiology Department, University of Lausanne (Switzerland)

Page (NA) Paper number 748

Abstract:

This paper describes a phenomenological model of the auditory periphery which consists of a bank of nonlinear time-varying parallel filters. Realistic filter shapes are obtained with the all-pole gammatone filter (APGF) which provides both a good approximation of the far more complex wave-propagation or cochlear mechanics models and a very simple implementation. The model also includes an active, distributed feedback that controls the damping parameter of the APGF. As a result, the model reproduces several observed phenomena including compression and two-tone suppression. It is now used to study responses to complex stimuli in models of the auditory nerve and cochlear nucleus neurons and to provide physiologically plausible front-end for speech analysis.

SL980748.PDF (From Author) SL980748.PDF (Rasterized)

TOP


The Voicing Feature for Stop Consonants: Acoustic Phonetic Analyses and Automatic Speech Recognition Experiments

Authors:

Padma Ramesh, Bell Labs - Lucent Technologies (USA)
Partha Niyogi, Bell Labs - Lucent Technologies (USA)

Page (NA) Paper number 881

Abstract:

We examine the distinctive feature [voice] that separates the voiced from the unvoiced sounds for the case of stop consonants. We conduct acoustic phonetic analyses on a large database and demonstrate the superior separability using a temporal measure (voice onset time; VOT) rather than spectral measures. We describe several algorithms to automatically estimate the VOT from continuous speech and compare them on a speech recognition problem to reduce error rates by as much as 53 percent over a baseline HMM based system.

SL980881.PDF (From Author) SL980881.PDF (Rasterized)

TOP


Wavelet-Based Energy Binning Cepstral Features for Automatic Speech Recognition

Authors:

Sankar Basu, IBM T.J. Watson Research Center (USA)
Stéphane Maes, IBM T.J. Watson Research Center (USA)

Page (NA) Paper number 982

Abstract:

Speech production models, coding methods as well as text to speech technology often lead to the introduction of modulation models to represent speech signals with primary components which are amplitude-and-phase-modulated sine functions. Parallelisms between properties of the wavelet transform of primary components and algorithmic representations of speech signals derived from auditory nerve models like the EIH lead to the introduction of synchrosqueezing measures. On the other hand, in automatic speech (and speaker) recognition, cepstral feature have imposed themselves quasi-universally as acoustic characteristic of speech utterances. This paper analyses cepstral representation in the context of the synchrosqueezed representation - wastrum. It discusses energy accumulation derived wastra as opposed to classical MEL and LPC derived cepstra. In the former method the primary components and formants play a primary role. Recognition results are presented on the Wall Street Journal database using IBM continuous decoder.

SL980982.PDF (From Author) SL980982.PDF (Rasterized)

TOP


Articulatory Analysis using a Codebook for Articulatory based Low Bit-Rate Speech Coding

Authors:

Carlos Silva, Dept. de Electrónica Industrial - Universidade do Minho (Portugal)
Samir Chennoukh, Center for Computer Aids for Industrial Productivity (CAIP), Rutgers University. (USA)

Page (NA) Paper number 899

Abstract:

Fundamental to the success of the articulatory based speech coding is the mapping from acoustics to articulatory description. As the mapping is not unique and based on articulatory continuity criteria, the non-uniqueness of the articulatory trajectories is solved using a forward dynamic network. In this paper, we present new results on forward dynamic network used to estimate articulatory trajectories when using an improved articulatory codebook for acoustic-to-articulatory mapping. The improvement on the codebook design is based on a new model that provides more details on the vocal tract area function and on more appropriate articulatory parameter samplings according to the articulatory-acoustics relation.

SL980899.PDF (From Author) SL980899.PDF (Rasterized)

0899_01.WAV
(was: 0899_1.wav)
Speech file of the original sentence "Where are you?"
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 KHz, 16 bits, mono, signed linear encoding.
Creating Application:: sox
Creating OS: Linux
0899_02.WAV
(was: 0899_2.wav)
Speech file of the mimic result of the sentence "Where are you?" using our imploved codebook.
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 KHz, 16 bits, mono, signed linear encoding.
Creating Application:: sox
Creating OS: Linux
0899_03.WAV
(was: 0899_3.wav)
Speech file of the mimic result of the sentence "Where are you?" using our old codebook.
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 KHz, 16 bits, mono, signed linear encoding.
Creating Application:: sox
Creating OS: Linux
0899_04.PDF
(was: 0899.gif)
Spectrogram of the sentence "Where are you?" using our old codebook.
File type: Image File
Format: Image : GIF
Tech. description: None
Creating Application:: XV
Creating OS: Linux.

TOP