ABSTRACT
The dynamics of air ow during speech production may often result into some small or large degree of turbulence. In this paper, we quantify the geometry of speech turbulence as re ected in the fragmentation of the time signal by using fractal models. We describe an efficient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological filtering and discuss its potential for phonetic classification. We also report experimental results on using the short- time fractal dimension of speech signals at multiple scales as additional features in an automatic speech recognition system using hidden Markov models, which provides a modest improvement in speech recognition performance. dimensions of speech segments as additional features in an automatic speech recognition system based on hidden Markov models (HMMs) and found them to offer a modest improvement to the speech recognition performance.
ABSTRACT
This paper describes a cross-validation method to determine the appropriate weight with which dynamic constraints should be applied when estimating vocal tract shapes from speech. This data-dependent method can estimate the weighting without the need for separate prior knowledge of the source and noise statistics. The principles are first demonstrated on a simple one-dimensional system analogous to speech production. As the data here is synthetic, the statistics are known, and so the success of the method can be objectively assessed. Next, the same principles are extended to real speech to improve estimation of vocal tract shape trajectories.
ABSTRACT
The purpose of this paper is to study the effect of the front cavity resonance and the vocal tract area function on the quality of synthesized unvoiced speech. From prior experiments, it has been determined that unvoiced speech is highly related to the vocal tract front cavity resonance. The noise source is located near the vocal tract constriction and the front cavity serves as a spectral shaping filter. An algorithm is proposed to estimate front cavity resonances, from which effective length of the vocal tract front cavity can be calculated. The parameters are used to construct a simple vocal tract area function. Unvoiced speech is generated using an articulatory synthesizer. And effects of the front cavity length, back cavity shape on the perception of unvoiced fricatives are investigated.
ABSTRACT
We are developing a software system to help the study of Portuguese Vowel Production. This tool is an articulatory synthesizer with a graphical user interface. The synthesizer is composed of a saggittal articulatory model derived from Mermelstein model and a frequency domain simulation of the electric analog of the acoustic tube. User can easily define the nasal tract configuration. System includes optimization by simulated annealing to perform acoustic-to-articulatory mapping. In this paper we present the system being developed, its current state and future perspectives. Preliminary experiments with Portuguese Vowels gave good results.
ABSTRACT
The article presents a method of post- synchronization which is the match, by means of formant-to-area mapping, of an area function model to a measured area function. The objective of post- synchronization is to compute a model which is as near as possible to a measured area function and whose eigenfrequencies are identical to the corresponding measured formant frequencies. Different types of acoustic models and constraints are examined. Results show that the best map is obtained in the case of a lossless acoustic model, corrected for lip radiation and wall vibration losses, and the minimization of the Euclidean distance between geometrically fitted and formant-to-area mapped area function models. The differences between measured and mapped area functions are gauged by means of the dynamic length warping distance.
ABSTRACT
A method to generate a codebook with distributed formant targets and unique geometric-acoustic mapping from formants to vocal-tract shape by direct acoustic calculation is proposed. Geometric and acoustic constraints are applied to both vocal-tract model parameters and calculated acoustic features to eliminate unacceptable values from the initial codebook which usually has an extremely large codebook size. The vocaltract length is used as an additional parameter to model the vocal-tract. Restriction on the vocaltract length based on some measured data is employed. A geometric and acoustic optimization scheme is devised to cluster the constrained codebook into an uniquely mapped codebook with reduced size. The codebook generated by this method is precise and robust and provides a satisfactory solution to the inverse speech production problem.