ABSTRACT
We are working on performing acoustic to articulatory inversion by using Maeda's model. The purpose of this work is to adapt the model to a new speaker. The adaptation quality is assessed by verifying that vowels uttered by the speaker lie inside the vocalic space defined by the model. It is with this aim in view that we realized a series of MR images for eleven oral French vowels (/i, e, , y, , oe, a, , , o, u/). The adaptation may include modifications of: scale factors for the pharynx and the mouth cavity, the wall of the vocal tract and coefficients for the calculation of the area function from a sagittal shape. The scale factors have been determined by superimposing Maeda's model on the MR images. The wall has been obtained by calculating the mean value of the exterior contours of the vocal tract in the image series. As some discrepancies between natural and synthetic vowels remained the wall contour has been iteratively optimized by means of formant sensibility functions calculated for each section in the vocal tract. The inversion is carried out by means of a table-lookup procedure constrained by the smoothness of articulatory trajectories.
ABSTRACT
A 2D biomechanical model of the tongue is used to simulate movement sequences and speech signals in Vowel-to-Vowel transitions. The analysis is focused on how central commands and biomechanics can interact and influence the physical speech signals. In particular, it is shown how complex velocity profiles can be explained by the biomechanics, and how the low-pass filtering effect of the biomechanics can give an account of the vocalic reduction phenomenon that is observed during speech production.
ABSTRACT
A model of the sagittal plane motion of the tongue, jaw, hyoid bone and larynx is presented, based on the version of equilibrium point hypothesis. The focus is on the organization of control signals underlying vocal tract motions. A number of muscle synergies or `basic motions' of the system are identified. It is shown that systematic sources of variation in a X-ray data base of midsagittal vocal tract motions can be accounted for with six independent commands, each corresponding to a direction of articulator motion. It is further shown that hyoid position and orientation can be predicted from the application of other vocal tract commands and need not be explicitly controlled. The dynamics of individual commands are also assessed. It is shown that the dynamic effects are not neglectable in speech-like movements because of the different dynamic behavior of soft and bony structures.
ABSTRACT
Magnetic Resonance Imaging (MRI) has been used to measure the shape of the vocal tract during speech in several recent studies. Its safety to the subject, high quality imaging of soft tissue, and the ability to select relatively thin imaging planes at any angle are significant advantages over other imaging methods used for speech research. The most significant disadvantage is the long exposure time. As a result most studies have focused on obtaining high-resolution images of the vocal tract volume for static sounds, such as vowels [1], fricatives [5, 6], nasals, the closed phase of plosives [7] and liquids [3,7]. In this paper we will describe our method of obtaining MR images of a moving vocal tract in which we post-synchronise the MR data using a recorded speech signal and thus reconstruct the images without using the MR machine's built-in processing.
ABSTRACT
In this paper, vocal tract and orofacial motions are measured during speech production in order to demonstrate that vocal tract motion can be used to estimate its orofacial counterpart. The inversion, i.e. vocal tract behavior estimation from orofacial motion, is also possible, but to a smaller extent. The numerical results showed that vocal tract motion accounted for 96% of the total variance observed in the joint system, whereas orofacial motion accounted for 77%. This analysis is part of a wider study where a dynamical model is being developed to express vocal tract and orofacial motions as a function of muscle activity. This model, currently implemented through multilinear second order autoregressive techniques is described brie y. Finally, the strong direct in uence that vocal tract and facial motions have on the energy of the speech acoustics is exemplified.
ABSTRACT
A global inversion procedure from the acoustic signal to motor commands is presented here based on a postural target invariance hypothesis. Using a model of vowel production, dynamic motor commands were inferred for a vowel sequence pronounced under different levels of emphasis stress and rate. The results enable to assign a prosodic role to the dynamic parameters of the model and thus to discriminate between slow vs fast or stressed vs unstressed utterances. Reliability of the results was assessed by computing the sensitivity of the model around the inferred motor commands and running perceptual tests on the synthetic stimuli generated from these values.