Signal Processing and Speech Analysis 3

Shin Suzuki, NTT Basic Research Laboratories (Japan)
Takesi Okadome, NTT Basic Research Laboratories (Japan)
Masaaki Honda, NTT Basic Research Laboratories (Japan)

Page (NA) Paper number 130

Abstract:

A method for determining articulatory parameters from speech acoustics is presented. The method is based on a search of an articulatory-acoustic codebook which is designed from simultaneous observation data of articulatory motions and speech acoustics. The codebook search employs dynamic constraints on acoustic behavior as well as articulatory behavior. There are two constrains. One of the constraints is use of spectral segments in the codebook search and the other is use of the smoothness of articulatory trajectories in the articulatory parameter path search. The articulatory parameters are determined by selecting the articulatory code vector in the codebook which minimizes the weighted distance measure of segmental spectral distance and squared distance between succeeding articulatory parameters. The results of an experiment show that an rms error between the estimated and observed articulatory parameter was about 2.0 mm on average, and the articulatory features for vowels and consonants are recovered well.

SL980130.PDF (From Author) SL980130.PDF (Rasterized)

TOP

Recognizing Emotions in Speech Using Short-term and Long-term Features

Authors:

Yang Li, University of Illinois at Urbana-Champaign (USA)
Yunxin Zhao, University of Illinois at Urbana-Champaign (USA)

Page (NA) Paper number 379

Abstract:

The acoustic characteristics of speech are influenced by speakers' emotional status. In this study, we attempted to recognize the emotional status of individual speakers by using speech features that were extracted from short-time analysis frames as well as speech features that represented entire utterances. Principal component analysis was used to analyze the importance of individual features in representing emotional categories. Three classification methods including vector quantization, artificial neural networks and Gaussian mixture density model were used. Classifications using short-term features only, long-term features only and both short-term and long-term features were conducted. The best recognition performance of 62% accuracy was achieved by using the Gaussian mixture density method with both short-term and long-term features.

SL980379.PDF (From Author) SL980379.PDF (Rasterized)

TOP

PeriphEar : A Nonlinear Active Model of the Auditory Periphery

Authors:

Arnaud Robert, CIRC Group, Swiss Federal Institute of Technology, Lausanne (Switzerland)
Jan Eriksson, Physiology Department, University of Lausanne (Switzerland)

Page (NA) Paper number 748

Abstract:

This paper describes a phenomenological model of the auditory periphery which consists of a bank of nonlinear time-varying parallel filters. Realistic filter shapes are obtained with the all-pole gammatone filter (APGF) which provides both a good approximation of the far more complex wave-propagation or cochlear mechanics models and a very simple implementation. The model also includes an active, distributed feedback that controls the damping parameter of the APGF. As a result, the model reproduces several observed phenomena including compression and two-tone suppression. It is now used to study responses to complex stimuli in models of the auditory nerve and cochlear nucleus neurons and to provide physiologically plausible front-end for speech analysis.

SL980748.PDF (From Author) SL980748.PDF (Rasterized)

TOP

The Voicing Feature for Stop Consonants: Acoustic Phonetic Analyses and Automatic Speech Recognition Experiments

Authors:

Padma Ramesh, Bell Labs - Lucent Technologies (USA)
Partha Niyogi, Bell Labs - Lucent Technologies (USA)

Page (NA) Paper number 881

Abstract:

We examine the distinctive feature [voice] that separates the voiced from the unvoiced sounds for the case of stop consonants. We conduct acoustic phonetic analyses on a large database and demonstrate the superior separability using a temporal measure (voice onset time; VOT) rather than spectral measures. We describe several algorithms to automatically estimate the VOT from continuous speech and compare them on a speech recognition problem to reduce error rates by as much as 53 percent over a baseline HMM based system.

SL980881.PDF (From Author) SL980881.PDF (Rasterized)

TOP

Wavelet-Based Energy Binning Cepstral Features for Automatic Speech Recognition

Authors:

Sankar Basu, IBM T.J. Watson Research Center (USA)
Stéphane Maes, IBM T.J. Watson Research Center (USA)

Page (NA) Paper number 982

Abstract:

Speech production models, coding methods as well as text to speech technology often lead to the introduction of modulation models to represent speech signals with primary components which are amplitude-and-phase-modulated sine functions. Parallelisms between properties of the wavelet transform of primary components and algorithmic representations of speech signals derived from auditory nerve models like the EIH lead to the introduction of synchrosqueezing measures. On the other hand, in automatic speech (and speaker) recognition, cepstral feature have imposed themselves quasi-universally as acoustic characteristic of speech utterances. This paper analyses cepstral representation in the context of the synchrosqueezed representation - wastrum. It discusses energy accumulation derived wastra as opposed to classical MEL and LPC derived cepstra. In the former method the primary components and formants play a primary role. Recognition results are presented on the Wall Street Journal database using IBM continuous decoder.

SL980982.PDF (From Author) SL980982.PDF (Rasterized)

TOP

Articulatory Analysis using a Codebook for Articulatory based Low Bit-Rate Speech Coding

Authors:

Carlos Silva, Dept. de Electrónica Industrial - Universidade do Minho (Portugal)
Samir Chennoukh, Center for Computer Aids for Industrial Productivity (CAIP), Rutgers University. (USA)

Page (NA) Paper number 899

Abstract:

Fundamental to the success of the articulatory based speech coding is the mapping from acoustics to articulatory description. As the mapping is not unique and based on articulatory continuity criteria, the non-uniqueness of the articulatory trajectories is solved using a forward dynamic network. In this paper, we present new results on forward dynamic network used to estimate articulatory trajectories when using an improved articulatory codebook for acoustic-to-articulatory mapping. The improvement on the codebook design is based on a new model that provides more details on the vocal tract area function and on more appropriate articulatory parameter samplings according to the articulatory-acoustics relation.

SL980899.PDF (From Author) SL980899.PDF (Rasterized)

0899_01.WAV (was: 0899_1.wav)	Speech file of the original sentence "Where are you?" File type: Sound File Format: Sound File: WAV Tech. description: 16 KHz, 16 bits, mono, signed linear encoding. Creating Application:: sox Creating OS: Linux
0899_02.WAV (was: 0899_2.wav)	Speech file of the mimic result of the sentence "Where are you?" using our imploved codebook. File type: Sound File Format: Sound File: WAV Tech. description: 16 KHz, 16 bits, mono, signed linear encoding. Creating Application:: sox Creating OS: Linux
0899_03.WAV (was: 0899_3.wav)	Speech file of the mimic result of the sentence "Where are you?" using our old codebook. File type: Sound File Format: Sound File: WAV Tech. description: 16 KHz, 16 bits, mono, signed linear encoding. Creating Application:: sox Creating OS: Linux
0899_04.PDF (was: 0899.gif)	Spectrogram of the sentence "Where are you?" using our old codebook. File type: Image File Format: Image : GIF Tech. description: None Creating Application:: XV Creating OS: Linux.

Signal Processing and Speech Analysis 3

Authors:

Page (NA) Paper number 130

Abstract:

Authors:

Page (NA) Paper number 379

Abstract:

Authors:

Page (NA) Paper number 748

Abstract:

Authors:

Page (NA) Paper number 881

Abstract:

Authors:

Page (NA) Paper number 982

Abstract:

Authors:

Page (NA) Paper number 899

Abstract:

(was: 0899_1.wav)

(was: 0899_2.wav)

(was: 0899_3.wav)

(was: 0899.gif)