Session Th4D Vocal Tract Analysis

Chairperson Antrea Paoloni Fondazione Ugo Bordoni, Italy

Home

ON USING FRACTAL FEATURES OF SPEECH SOUNDS IN AUTOMATIC SPEECH RECOGNITION

Authors: Petros Maragos (1) and Alexandros Potamianos (2)

(1) Institute for Language & Speech Processing, Margari 22, Athens 11525, GREECE; and School of E.C.E., Georgia Institute of Technology, Atlanta, GA 30332, USA. (2) AT&T Labs{Research, 180 Park Ave, P.O. Box 971, Florham Park, NJ 07932-0971, U.S.A.

Volume 5 pages 2531 - 2534

ABSTRACT

The dynamics of air ow during speech production may often result into some small or large degree of turbulence. In this paper, we quantify the geometry of speech turbulence as re ected in the fragmentation of the time signal by using fractal models. We describe an efficient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological filtering and discuss its potential for phonetic classification. We also report experimental results on using the short- time fractal dimension of speech signals at multiple scales as additional features in an automatic speech recognition system using hidden Markov models, which provides a modest improvement in speech recognition performance. dimensions of speech segments as additional features in an automatic speech recognition system based on hidden Markov models (HMMs) and found them to offer a modest improvement to the speech recognition performance.

A0186.pdf

TOP

DYNAMIC CONSTRAINT WEIGHTING IN THE CONTEXT OF ARTICULATORY PARAMETER ESTIMATION

Authors: Hywel B. Richards (1) John S. Bridle (1) Melvyn J. Hunt (1) John S. Mason (2)

(1) Dragon Systems UK Ltd, Millbank, Stoke Road, Bishops Cleeve, CHELTENHAM, GL52 4RW, UK. (2) Department of Electrical & Electronic Engineering, University of Wales Swansea, SWANSEA, SA2 8PP, UK. email: hywel@dragon.co.uk

Volume 5 pages 2535 - 2538

ABSTRACT

This paper describes a cross-validation method to determine the appropriate weight with which dynamic constraints should be applied when estimating vocal tract shapes from speech. This data-dependent method can estimate the weighting without the need for separate prior knowledge of the source and noise statistics. The principles are first demonstrated on a simple one-dimensional system analogous to speech production. As the data here is synthetic, the statistics are known, and so the success of the method can be objectively assessed. Next, the same principles are extended to real speech to improve estimation of vocal tract shape trajectories.

A0294.pdf

TOP

ESTIMATION OF VOCAL TRACT FRONT CAVITY RESONANCE IN UNVOICED FRICATIVE SPEECH

Authors: Minkyu Lee and Donald G. Childers

Department of Electrical and Computer Engineering University of Florida Gainesville, FL 32611, U.S.A.

Volume 5 pages 2539 - 2542

ABSTRACT

The purpose of this paper is to study the effect of the front cavity resonance and the vocal tract area function on the quality of synthesized unvoiced speech. From prior experiments, it has been determined that unvoiced speech is highly related to the vocal tract front cavity resonance. The noise source is located near the vocal tract constriction and the front cavity serves as a spectral shaping filter. An algorithm is proposed to estimate front cavity resonances, from which effective length of the vocal tract front cavity can be calculated. The parameters are used to construct a simple vocal tract area function. Unvoiced speech is generated using an articulatory synthesizer. And effects of the front cavity length, back cavity shape on the perception of unvoiced fricatives are investigated.

A0414.pdf

TOP

A SOFTWARE TOOL TO STUDY PORTUGUESE VOWELS

Authors: Antonio Teixeira (1), Francisco Vaz (1) and Jose Carlos Principe (2)

(1) Dep. Electronica e Telecomunicacoes/INESC, Universidade de Aveiro Campus Universitario, 3810 AVEIRO, Portugal Tel. +351 34 370 500, FAX: +351 34 370 541, E-mail:fajst,fvazg@inesca.pt (2) Department of Electrical Engineering, University of Florida CSE 444, Gainesville, FL 32611, USA, E-mail: principe@synapse.ee.ufl.edu

Volume 5 pages 2543 - 2546

ABSTRACT

We are developing a software system to help the study of Portuguese Vowel Production. This tool is an articulatory synthesizer with a graphical user interface. The synthesizer is composed of a saggittal articulatory model derived from Mermelstein model and a frequency domain simulation of the electric analog of the acoustic tube. User can easily define the nasal tract configuration. System includes optimization by simulated annealing to perform acoustic-to-articulatory mapping. In this paper we present the system being developed, its current state and future perspectives. Preliminary experiments with Portuguese Vowels gave good results.

A0607.pdf

TOP

POST-SYNCHRONIZATION VIA FORMANT-TO-AREA MAPPING OF ASYNCHRONOUSLY RECORDED SPEECH SIGNALS AND AREA FUNCTIONS

Authors: J. Schoentgen and S. Ciocea

Laboratory of Experimental Phonetics, Institute of Modern Languages and Phonetics, CP110, Université Libre de Bruxelles Av. F. D. Roosevelt, 50, B-1050 Brussels, Belgium. Tel. +32 2 650 2010, Fax: 32 2 650 2007, E-mail: jschoent@ulb.ac.be

Volume 5 pages 2547 - 2550

ABSTRACT

The article presents a method of post- synchronization which is the match, by means of formant-to-area mapping, of an area function model to a measured area function. The objective of post- synchronization is to compute a model which is as near as possible to a measured area function and whose eigenfrequencies are identical to the corresponding measured formant frequencies. Different types of acoustic models and constraints are examined. Results show that the best map is obtained in the case of a lossless acoustic model, corrected for lip radiation and wall vibration losses, and the minimization of the Euclidean distance between geometrically fitted and formant-to-area mapped area function models. The differences between measured and mapped area functions are gauged by means of the dynamic length warping distance.

A0637.pdf

TOP

GEOMETRICALLY AND ACOUSTICALLY OPTIMIZED CODEBOOK FOR UNIQUE MAPPING FROM FORMANTS TO VOCAL-TRACT SHAPE

Authors: Z.L.Yu and P.C.Ching

Department of Electronic Engineering , The Chinese University of Hong Kong Shatin, N.T., Hong Kong E-mail: zlyu@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk

Volume 5 pages 2551 - 2554

ABSTRACT

A method to generate a codebook with distributed formant targets and unique geometric-acoustic mapping from formants to vocal-tract shape by direct acoustic calculation is proposed. Geometric and acoustic constraints are applied to both vocal-tract model parameters and calculated acoustic features to eliminate unacceptable values from the initial codebook which usually has an extremely large codebook size. The vocaltract length is used as an additional parameter to model the vocal-tract. Restriction on the vocaltract length based on some measured data is employed. A geometric and acoustic optimization scheme is devised to cluster the constrained codebook into an uniquely mapped codebook with reduced size. The codebook generated by this method is precise and robust and provides a satisfactory solution to the inverse speech production problem.

A0771.pdf