Full List of Titles 1: ICSLP'98 Proceedings 2: SST Student Day Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files |
Time as a Factor in the Acoustic Variation of SchwaAuthors:William J. Barry, University of the Saarland (Germany) Page (NA) Paper number 554Abstract:Schwa is commonly regarded phonologically as an unspecified vowel, and phonetically as "targetless", i.e., as the product of its context. Methods used in studies investigating the issue are discussed, and it is argued that the evidence for targetlessness is unconvincing. An experiment is presented in which schwa is produced in symmetrical vowel and consonant contexts under varying speech rate conditions. It is shown that the contextual influence on schwa depends on its duration, a result that is not compatible with the concept of a targetless vowel. This conclusion is discussed in relation to the possible phonological status of the target specification.
|
0969_01.PDF(was: 0969.jpg) | The file 0969.jpg contains a view of the poster presented at the
conference, and referred to in the paper. 24 vowel schemes are
displayed chronologically with relations among them indicated.
A 3-D bar diagram of the 'distances' calculated between them
is also included. Updated versions of this file will be made
available at http://www.suntiger.ee.up.ac.za/hendrik/icslp5
File type: Image File Format: Image : JPEG Tech. description: None Creating Application:: Corel Draw 8 Creating OS: MS Windows 98 |
Véronique Lecuit, Universite Libre de Bruxelles (Belgium)
Didier Demolin, Universite Libre de Bruxelles (Belgium)
This paper present a study of the relationship between subglottal pressure (SGP) and intensity of the speech signal in the case of sustained French oral vowels with controlled pitch. The corpus is based on a male and a female subjects, the SGP is measured directly by tracheal puncture. The results show that the relationship between intensity and subglottal pressure varies when one considers the vowel or the pitch parameter. This experiment is useful in the framework of speech production model design. These preliminary results emphasize the need to investigate thoroughly the relationship between all the parameters involved and to do s o on abundant and accurate experimental data.
Alain Soquet, Laboratoire de Phonétique Expérimentale, Université Libre de Bruxelles (Belgium)
Véronique Lecuit, Laboratoire de Phonétique Expérimentale, Université Libre de Bruxelles (Belgium)
Thierry Metens, Unité O.R.L. Hôpital Erasme (Belgium)
Bruno Nazarian, Laboratoire de Phonétique Expérimentale, Université Libre de Bruxelles (Belgium)
Didier Demolin, Laboratoire de Phonologie Expérimentale, Université Libre de Bruxelles (Belgium)
Magnetic Resonance Imaging techniques are uniquely attractive in their ability to provide an extensive body of information on the vocal tract geometry. Once the images are acquired, they must be further processed in order to segment the airway from the surrounding tissues, so as to locate the air passage. This problem has been addressed in several ways i n the literature. In this paper, we carry out a comparative study of different approaches to the same body of data in order to assess the accuracy of the different methods. It is shown that the different methods present small average error and large error distribution.
Sorin Dusan, University of Waterloo, E&CE, Waterloo, Ontario N2L 3G1 (Canada)
Li Deng, University of Waterloo, E&CE, Waterloo, Ontario N2L 3G1 (Canada)
Recovering vocal tract shapes from the speech signal is a well known inversion problem of transformation from the articulatory system to speech acoustics. Most of the studies on this problem in the past have been focused on vowels. There have not been general methods effective for recovering the vocal tract shapes from the speech signal for all classes of speech sounds. In this paper we describe our attempt towards speech inverse mapping by using the mel-frequency cepstrum coefficients to represent the acoustic parameters of the speech signal. An inversion method is developed based on Kalman filtering and a dynamic-system model describing the articulatory motion. This method uses an articulatory-acoustic codebook derived from Maeda's articulatory model.
John H. Esling, University of Victoria (Canada)
Jocelyn Clayards, University of Victoria (Canada)
Jerold A. Edmondson, University of Texas at Arlington (USA)
Qiu Fuyuan, University of Texas at Arlington (USA)
Jimmy G. Harris, Seattle, Washington (USA)
One difficulty in the physical observation of articulatory structures in the pharynx is the reliability of measurements for comparative purposes. The objective of the present approach is to review and explore techniques of identifying and measuring degrees of adjustment of the pharyngeal articulatory mechanism in the production of auditorily controlled settings. Anatomical landmarks are identified on computer images transferred from laryngoscopic videotapes, and measurements are taken of dimensions defined by the configuration of the pharyngeal articulators. Comparisons are taken across linguistically contrastive consonants and vowels, where the activity of the laryngeal sphincter accompanies tongue retraction and larynx raising, and implications are drawn for the measurement of voice quality settings. Measurements of the "tense," raised-larynx series of Yi (Nosu) are illustrated.
Janice Fon, The Ohio State University (USA)
This study investigates the variance and invariance in speech rate as a reflection of conceptual planning and cognitive rhythm. A four-frame comic strip was used to elicit speech and low-pass smoothing was done afterwards to filter out high frequency noise. Results showed that subjects invariably planned narration in terms of story plots. Variance lies in the way subjects synchronize planning and execution stages. Some tended to start the execution stage before the planning stage ends while others were inclined to speak only after macroplanning was done. Lexical retrieval failure is one of the main causes for disruptive story-plot-based temporal cycle.
0842_01.PDF(was: 0842.jpg) | In order to elicit speech, a comic strip of four frames with no
dialogues was chosen from Shuangxiangpao, a very famous comic series
in Taiwan [IMAGE 0842.JPG]. Subjects were seated in a sound-treated
room and were told to study the comic strip and retell the story
afterwards. Recordings were made individually with SONY TCM-5000EV
recorder and SONY ECM-G3M super-directional microphone.
Transcriptions were done afterwards in terms of intonation units (IU)
following the discourse analysis tradition. File type: Image File Format: JPEG Tech. description: Unknown Creating Application:: Unknown Creating OS: Unknown |
Masako Fujimoto, Department of Cognitive Sciences, Graduate school of Medicine, University of Tokyo (Japan)
Emi Murano, Department of Speech Physiology,Graduate school of Medicine, University of Tokyo (Japan)
Seiji Niimi, Department of Speech Physiology,Graduate school of Medicine, University of Tokyo (Japan)
Shigeru Kiritani, Department of Cognitive Sciences,Graduate school of Medicine, University of Tokyo (Japan)
Correspondence between the glottal opening gesture pattern and vowel devoicing in Japanese was examined using PGG with special reference to the pattern of glottal gesture overlap and blending into the neighboring vowel. The results showed that most of the tokens demonstrated either a single glottal opening pattern with a devoiced vowel, or a double glottal opening with a voiced vowel during /CiC/ sequences as generally expected. Some tokens, however, showed a double glottal opening with a devoiced vowel, or a single glottal opening with a partially voiced vowel. From the viewpoint of gestural overlap analysis of vowel devoicing, an intermediate process of gestural overlap may explain the occurrence of the case in which the vowel was devoiced and showed a double phase opening. Nevertheless, the presence of a partially voiced vowel with a single opening phase clearly shows the complexity of vowel devoicing in Japanese, since there are possibly two different patterns of glottal opening (single phase and double phase), which could be observed in PGG analysis, in utterances with partially voiced vowels.
Yukiko Fujisawa, Toyohashi University of Technology (Japan)
Nobuaki Minematsu, Toyohashi University of Technology (Japan)
Seiichi Nakagawa, Toyohashi University of Technology (Japan)
While English word accent is linguistically almost the same as Japanese one, the word accent acoustically differs between the two languages. This fact lets us easily suppose that Japanese learners tend to generate the word accent in a manner of not English but Japanese. We propose two methods for automating the detection of the generated word accent and the evaluation of how it is generated. By using context-sensitive HMMs, stressed and unstressed syllables were modeled separately for their structure and for their position in a word. In the matching process, weighting factors were multiplied with several likelihood scores derived from different acoustic parameters. The optimal combination of the factors for the detection can be thought to reflect each speaker's own manner of the accent generation. The analysis of the optimal combination showed different tendencies between Japanese and native speakers, which mainly accorded with findings in previous studies on English teaching.
Shunichi Ishihara, Japan Centre (Asian Studies), and Phonetics Laboratory, Department of Linguistics (Arts), The Australian National University (Australia)
Data collected from Japanese and English showed that both phonetically fully voiced and (partially) devoiced allophones of /d/ have very similar perturbatory effect on the F0 of the following vowel. It is considered, therefore, that the phonetic voicing of /d/ (periodicity during the closure) is not clearly correlated with lower levels of F0 on the following vowel. Although the F0 perturbation may be caused by some aspects in the production of the preceding stop which is not necessarily manifested in actual vocal cord vibration, this result indicates that there is still a possibility that people may deliberately control the F0 of the following vowel as an additional cue to the phonological difference between voiceless and voiced stop consonants.
Daniel Jurafsky, University of Colorado, Boulder (USA)
Alan Bell, University of Colorado, Boulder (USA)
Eric Fosler-Lussier, University of California at Berkeley (USA)
Cynthia Girand, University of Colorado, Boulder (USA)
William Raymond, University of Colorado, Boulder (USA)
The causes of pronunciation reduction in 8458 occurrences of ten frequent English function words in a four-hour sample from conversations from the Switchboard corpus were examined. Using ordinary linear and logistic regression models, we examined the length of the words, the form of their vowel (basic, full, or reduced), and final obstruent deletion. For %words with a final obstruent, whether it was present or not. For all of these we found strong, independent effects of speaking rate, predictability, the form of the following word, and %following disfluencies symptomatic of planning problem disfluencies. The results bear on issues in speech recognition, models of speech production, and conversational analysis.
Hee-Sun Kim, Stanford Unversity (USA)
This study reports a case of consonant-induced duration compensation within a higher phonological unit in Korean. The main finding of this study is that longer duration in a consonant is partly compensated by shorter duration even in non-adjacent consonants. Results of experiments revealed that the shortening in non-adjacent consonants is a subsidiary compensation process to maintain constant duration at the larger domain than the syllable. Based on these observations, an attempt at modeling speech timing is presented with the perspective that speech production system is a process having multiple simultaneous tasks and the optimal output is obtained by a compromise between their objectives.
Keisuke Mori, Information Processing Educational and Research Institute, Kyushu Kyoritu University (Japan)
Yorinobu Sonoda, Faculty of Engineering, Kumamoto University (Japan)
A quantitative knowledge of the articulatory characteristics is necessary for understanding the dynamics of speech production. Accordingly, it is expected that observations of the shape of mouth will provide useful data for the study on articulatory behaviors in speech production. This paper describes characteristic changes in the shape of the mouth on the basis of processed image data taken by high-speed video recorder, and studies recognition tests jointly using articulatory behavior of lips and sound pattern during speech. The speech material used in this paper were nonsense words of form /eCVCe/ (V: a, i, u, e, o, C: p, b, m). Subject were four adult males, all of them were native speaker of Japanese. In recognition of the consonant, consonant was more closely related with shape pattern than formant pattern. These results show effect of consonant (/p/, /b/, and /m/) on the middle vowel of utterance.
Kunitoshi Motoki, Hokkai-Gakuen University (Japan)
Hiroki Matsuzaki, Hokkai-Gakuen University (Japan)
For the representation of acoustic characteristics of three-dimensional vocal-tract shapes, it is necessary to consider the effects of higher-order modes. This paper proposes an acoustic model of the vocal-tract which incorporates the coupling of the higher-order modes, including both propagative and evanescent modes. A cascaded structure of acoustic tubes connected asymmetrically is introduced as a physical approximation of the vocal-tract. The acoustic characteristics, which are dependent not only on the vocal-tract area function but also on the vocal-tract configuration, can be investigated by the proposed model. Preliminary results of numerical computations for relatively simple configurations suggest that additional resonances at frequencies above 4.3kHz are formed by the propagative higher-order modes, while those at frequencies below 3 kHz are influenced by the evanescent higher-order modes. These results are also confirmed by the FEM simulations.
Takuya Niikawa, Osaka Electro-Communication University (Japan)
Masafumi Matsumura, Osaka Electro-Communication University (Japan)
Takashi Tachimura, Osaka University (Japan)
Takeshi Wada, Osaka University (Japan)
This paper deals with estimations of aspirated air flow in a three-dimensional vocal tract during fricative consonant phonation using the Finite Element Method (FEM). The shape of the 3-D vocal tract during phonation of fricative consonant /s/ is reconstructed from 32 coronal Magnetic Resonance (MR) images. MR images of the dental crown that contains a small amount of water were obtained using a dental crown plate. A 3-D FEM vocal tract model is formed so that the number of elements is 28686, the number of nodes is 7010, and a rigid wall constitutes the vocal tract wall. Results showed that the flow rate was high at the narrow space made between the upper central incisors and the tongue surface. An electric equivalent circuit for fricative consonant phonation was designed in consideration of the location of the noise source.
Takesi Okadome, NTT Basic Research Laboratories (Japan)
Tokihiko Kaburagi, NTT Basic Research Laboratories (Japan)
Masaaki Honda, NTT Basic Research Laboratories (Japan)
The method proposed here produces trajectories of articulatory movements based on a kinematic triphone model and the minimum-jerk model. The kinematic triphone model, which is constructed from articulatory data obtained in the experiments through the use of a magnetic sensor system, is characterized by three kinematic features for a triphone and intervals between two successive phonemes in the triphone. After extracting a kinematic feature for a phoneme in a given sentence, for each point on the articulator, the minimum-jerk trajectory which coincides with the extremum of the time integral of the square of the magnitude of jerk of the point is formulated, which requires only linear computation. The method predicts both the qualitative features and the quantitative details experimentally observed.
Chilin Shih, Bell Labs - Lucent Technologies (USA)
Bernd Möbius, Bell Labs - Lucent Technologies (USA)
We present a study of the voicing profiles of consonants in Mandarin Chinese and German. The voicing profile is defined as the frame-by-frame voicing status of a speech sound in continuous speech. We are particularly interested in discrepancies between the phonological voicing status of a speech sound and its actual phonetic realization in connected speech. We further examine the contextual factors that cause voicing variations and test the cross-language validity of these factors. The result can be used to improve speech synthesis, and to refine phone models to enhance the performance of automatic speech segmentation and recognition.
Andrew J. Lundberg, Johns Hopkins University, Department of Computer Science (USA)
Maureen Stone, University of Maryland Medical School, Division of Otolaryngology (USA)
This paper presents a method for reconstructing 3D tongue surfaces during speech from 6 cross-sectional contours. The method reduces the dimensionality of the tongue surface and maintains highly accurate reproduction of local deformation features. This modification is an essential step if multi-plane tongue movements are to be reconstructed practically into tongue surface movements. Six cross-sectional contours were used to reconstruct 3D tongue surfaces and these were compared to reconstructions from 60 contours. The best set of 6 cross-sectional contours was determined from an optimized set of 6 midsagittal points. These points had been optimized to predict the midsagittal contour. Errors and reconstruction coverage for the midsagittal optimization were comparable to those resulting from an optimization over the entire surface, indicating this was an adequate method for calculating a sparse data set for use in reconstructing 3D tongue surface behavior.
Yasushi Terao, Tokoha Gakuen College (Japan)
Tadao Murata, Kyusyu Institute of Technology (Japan)
In the present study, we would like to discuss how the articulability of two consecutive morae plays an important role in the stage at which sound exchange errors occur. Our assumption is based on the analysis of Japanese sound exchange error data which have been collected from the spontaneous speech of adults and infants. We hypothesized that /dara/ in /kadara/(error form) was more articulable than /rada/ in /karada/(correct form). Three experiments were carried out to confirm Phonological/phonetic characteristics of the unit were shown through the results of experiments and some related observations.
Anne Vilain, ICP (France)
Christian Abry, ICP (France)
Pierre Badin, ICP (France)
A new articulatory model GENTIANE, elaborated from an X-ray film built on a corpus of VCV sequences performed by a skilled French speaker, enabled us to analyse coarticulation of main consonant types in vowel contexts from a degrees of freedom approach. The data displayed an overall coarticulatory versatility, except for an absolute invariance in the labio-dental constriction point. For consonant types recruiting the tongue, the variance explained by the degrees of freedom of the model evidenced specific compensation strategies: tongue tip compensation betokened the common coronal status of the dental plosive and the post-alveolar fricative; whereas tongue dorsum compensation signed the dorsal nature of the velar plosive.
Masahiko Wakumoto, ATR HIP & 1st Dept. of OMFS Showa Univ. (Japan)
Shinobu Masaki, ATR HIP (Japan)
Kiyoshi Honda, ATR HIP (Japan)
Toshikazu Ohue, R&D Dept. Nitta Co. Ltd. (Japan)
This paper describes a new method for measuring the tongue-palatal contact pressure using a thin pressure sensor and its application for speech research. The new pressure sensor is composed of thin pressure sensitive ink whose electrical resistance is proportional to the physical forces applied to the sensor. Several sensors were arranged on the surface of the palatal plate. This setup was used to measure the tongue pressure toward the hard palate during closure for Japanese stop consonants [t] and [d]. Results obtained from 10 Japanese subjects showed the tongue-palatal contact pressure for [t] to be stronger than that for [d]. In addition, the sensors placed on the non-contact area showed no pressure change, indicating negligible effects of intra-oral air pressure during consonantal closure.
Sandra P. Whiteside, University of Sheffield (U.K.)
Rosemary A. Varley, University of Sheffield (U.K.)
Contemporary psycholinguistic models suggest that there may be two possible routes in phonetic encoding: a 'direct' route which uses stored syllabic units, and an 'indirect' route which relies on the on-line assembly of sub-syllabic units. The computationally more efficient direct route is likely to be used for high frequency words, whereas the indirect route is most likely to be used for novel or low frequency words. This paper presents some acoustic evidence that suggests that there may be dual routes operating in phonetic encoding. The data reported suggest that a group of normal speakers may be employing different routes in the phonetic encoding of high and low frequency words elicited via a repetition task. The evidence is presented and discussed within the framework of a dual-route hypothesis, and in light of other acoustic evidence reported in the literature.
Brigitte Zellner, IMM, Lettres, UNIL, 1015 Lausanne (Switzerland)
Many phonetic studies have shown that changes in speech rate have numerous effects at various levels of the temporal structure. This observation is reinforced by a verification with speech synthesis. Changing the number of syllables per second is not a satisfactory manner of creating natural-sounding fast or slow synthetic speech. A systematic comparison of sentences read at two speech rates by a highly fluent French speaker allows a ranking of various mechanisms used to slow down speech. Pausing and producing additional syllables transform the phonological structure of utterances since they impede interlexical binding. It is claimed that knowing the degree of this interlexical binding allows a better characterisation of speech rate changes, and then a better generation of synthetic rhythms. Finally, the expected relationship between lengthening of speech units and pausing was not confirmed in our results. This suggests that the theory on slowing down speech needs revision.