ICSLP'98 Language Acquisition 3 / Multilingual Perception and Recognition 2

Language Acquisition 3 / Multilingual Perception and Recognition 2
Home Full List of Titles 1: ICSLP'98 Proceedings Keynote Speeches Text-To-Speech Synthesis 1 Spoken Language Models and Dialog 1 Prosody and Emotion 1 Hidden Markov Model Techniques 1 Speaker and Language Recognition 1 Multimodal Spoken Language Processing 1 Isolated Word Recognition Robust Speech Processing in Adverse Environments 1 Spoken Language Models and Dialog 2 Articulatory Modelling 1 Talking to Infants, Pets and Lovers Robust Speech Processing in Adverse Environments 2 Spoken Language Models and Dialog 3 Speech Coding 1 Articulatory Modelling 2 Prosody and Emotion 2 Neural Networks, Fuzzy and Evolutionary Methods 1 Utterance Verification and Word Spotting 1 / Speaker Adaptation 1 Text-To-Speech Synthesis 2 Spoken Language Models and Dialog 4 Human Speech Perception 1 Robust Speech Processing in Adverse Environments 3 Speech and Hearing Disorders 1 Prosody and Emotion 3 Spoken Language Understanding Systems 1 Signal Processing and Speech Analysis 1 Spoken Language Generation and Translation 1 Spoken Language Models and Dialog 5 Segmentation, Labelling and Speech Corpora 1 Multimodal Spoken Language Processing 2 Prosody and Emotion 4 Neural Networks, Fuzzy and Evolutionary Methods 2 Large Vocabulary Continuous Speech Recognition 1 Speaker and Language Recognition 2 Signal Processing and Speech Analysis 2 Prosody and Emotion 5 Robust Speech Processing in Adverse Environments 4 Segmentation, Labelling and Speech Corpora 2 Speech Technology Applications and Human-Machine Interface 1 Large Vocabulary Continuous Speech Recognition 2 Text-To-Speech Synthesis 3 Language Acquisition 1 Acoustic Phonetics 1 Speaker Adaptation 2 Speech Coding 2 Hidden Markov Model Techniques 2 Multilingual Perception and Recognition 1 Large Vocabulary Continuous Speech Recognition 3 Articulatory Modelling 3 Language Acquisition 2 Speaker and Language Recognition 3 Text-To-Speech Synthesis 4 Spoken Language Understanding Systems 4 Human Speech Perception 2 Large Vocabulary Continuous Speech Recognition 4 Spoken Language Understanding Systems 2 Signal Processing and Speech Analysis 3 Human Speech Perception 3 Speaker Adaptation 3 Spoken Language Understanding Systems 3 Multimodal Spoken Language Processing 3 Acoustic Phonetics 2 Large Vocabulary Continuous Speech Recognition 5 Speech Coding 3 Language Acquisition 3 / Multilingual Perception and Recognition 2 Segmentation, Labelling and Speech Corpora 3 Text-To-Speech Synthesis 5 Spoken Language Generation and Translation 2 Human Speech Perception 4 Robust Speech Processing in Adverse Environments 5 Text-To-Speech Synthesis 6 Speech Technology Applications and Human-Machine Interface 2 Prosody and Emotion 6 Hidden Markov Model Techniques 3 Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1 Human Speech Production Segmentation, Labelling and Speech Corpora 4 Speaker and Language Recognition 4 Speech Technology Applications and Human-Machine Interface 3 Utterance Verification and Word Spotting 2 Large Vocabulary Continuous Speech Recognition 6 Neural Networks, Fuzzy and Evolutionary Methods 3 Speech Processing for the Speech-Impaired and Hearing-Impaired 2 Prosody and Emotion 7 2: SST Student Day SST Student Day - Poster Session 1 SST Student Day - Poster Session 2 Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files	Speech Perception and Spoken Language in Children with Impaired Hearing Authors: Peter Blamey, The University of Melbourne (Australia) Julia Sarant, Bionic Ear Institute (Australia) Tanya Serry, Bionic Ear Institute (Australia) Roger Wales, The University of Melbourne (Australia) Christopher James, The University of Melbourne (Australia) Johanna Barry, The University of Melbourne (Australia) Graeme M. Clark, The University of Melbourne (Australia) M. Wright, Children's Cochlear Implant Centre (Australia) R. Tooher, Children's Cochlear Implant Centre (Australia) C. Psarros, Children's Cochlear Implant Centre (Australia) G. Godwin, Children's Cochlear Implant Centre (Australia) M. Rennie, Children's Cochlear Implant Centre (Australia) T. Meskin, Children's Cochlear Implant Centre (Australia) Page (NA) Paper number 248 Abstract: Fifty seven children with impaired hearing aged 4-12 years were evaluated with speech perception and language measures as the first stage of a longitudinal study. The Clinical Evaluation of Language Fundamentals (CELF) and Peabody Picture Vocabulary Test (PPVT) were used to evaluate the children's spoken language. Regression analyses indicated that scores on both tests were significantly correlated with chronological age, but delayed relative to children with normal hearing. Performance increased at 45% of the rate expected for children with normal hearing for the CELF, and 62% for the PPVT. Perception scores were not significantly correlated with chronological age, but were highly correlated with results on the PPVT and CELF. The data suggest a complex relationship whereby hearing impairment reduces speech perception, which slows language development, which has a further adverse effect on speech perception. SL980248.PDF (From Author) SL980248.PDF (Rasterized) TOP Quantitative Assessment of Second Language Learners' Fluency: an Automatic Approach Authors: Catia Cucchiarini, A2RT, University of Nijmegen (The Netherlands) Helmer Strik, A2RT, University of Nijmegen (The Netherlands) Louis Boves, A2RT, University of Nijmegen (The Netherlands) Page (NA) Paper number 752 Abstract: This paper describes an experiment aimed at determining whether native and non-native speakers of Dutch significantly differ on a number of quantitative measures related to fluency and whether these measures can be successfully employed to predict fluency scores. Read speech of 20 native and 60 non-native speakers of Dutch was scored for fluency by nine experts and was then analyzed by means of an automatic speech recognizer in order to calculate nine quantitative measures of speech quality that are known to be related to perceived fluency. The results show that the natives' scores on the fluency ratings and on the quantitative measures significantly differ from those of the non-natives, with the native speakers being considered more fluent. Furthermore, it appears that quantitative variables such as rate of speech, phonation-time ratio, number of pauses, and mean length of runs are able to predict fluency scores with a high degree of accuracy. SL980752.PDF (From Author) SL980752.PDF (Rasterized) TOP Cross-Language Merged Speech Units And Their Descriptive Phonetic Correlates Authors: Paul Dalsgaard, Center for PersonKommunikation (Denmark) Ove Andersen, Center for PersonKommunikation (Denmark) William J. Barry, Universität des Saarlandes (Germany) Page (NA) Paper number 482 Abstract: The focus of this paper is to formulate an approach to merging phonemes across languages and to evaluate the resulting cross-language merged speech units on the basis of the traditional acoustic-phonetic descriptions of the phonemes. The methodology is based on the belief that some phonemes across a set of languages may be similar enough to be equated, contrasting traditional phonology which treats phonemes from one language independent from phonemes from another language. The identification of cross-language speech units is performed by an iterative data-driven procedure, which merges acoustically similar phonemes from within one language as well as across languages. The paper interprets a number of merged speech units on the basis of articulatory descriptions. SL980482.PDF (From Author) SL980482.PDF (Rasterized) TOP Crosslinguistic Disfluency Modelling: A Comparative Analysis of Swedish and American English Human--Human and Human--Machine Dialogues Authors: Robert Eklund, Telia Research AB (Sweden) Elizabeth Shriberg, SRI International (USA) Page (NA) Paper number 805 Abstract: We report results from a cross-language study of disfluencies (DFs) in Swedish and American English human-machine and human-human dialogs. The focus is on comparisons not directly affected by differences in overall rates since these could be associated with task details. Rather, we focus on differences suggestive of how speakers utilize DFs in the different languages, including: relative rates of the use of hesitation forms, the location of hesitations, and surface characteristics of DFs. Results suggest that although the languages differ in some respects (such as the ability to insert filled pauses within `words'), in many analyses the languages show similar behavior. Such results provide suggestions for cross-linguistic DF modeling in both theoretical and applied fields. SL980805.PDF (From Author) SL980805.PDF (Rasterized) TOP Calibration Of Machine Scores For Pronunciation Grading Authors: Horacio Franco, SRI International (USA) Leonardo Neumeyer, SRI International (USA) Page (NA) Paper number 764 Abstract: Our proposed paradigm for automatic assessment of pronunciation quality uses hidden Markov models (HMMs) to generate phonetic segmentations of the student's speech. From these segmentations, we use the HMMs to obtain spectral match and duration scores. In this work we focus on the problem of mapping different machine scores to obtain an accurate prediction of the grades that a human expert would assign to the pronunciation. We discuss the application of different approaches based on minimum mean square error (MMSE) estimation and Bayesian classification. We investigate the characteristics of the different mappings as well as the effects of the prior distribution of grades in the calibration database. We finally suggest a simple method to extrapolate mappings from one language to another. SL980764.PDF (From Author) SL980764.PDF (Rasterized) TOP Phonetic-Distance-Based Hypothesis Driven Lexical Adaptation For Transcribing Multlingual Broadcast News Authors: Petra Geutner, Universitaet Karlsruhe (Germany) Michael Finke, Carnegie Mellon University (USA) Alex Waibel, Carnegie Mellon University (USA) Page (NA) Paper number 771 Abstract: High OOV-rates are one of the most prevailing problems for languages with a rapid vocabulary growth, e.g. when transcribing Serbo-Croatian and German broadcast news. Hypothesis-Driven-Lexical-Adaptation (HDLA) has been shown to decrease high OOV-rates significantly by using morphology-based linguistic knowledge. This paper introduces another approach to dynamically adapt a recognition lexicon to the utterance to be recognized. Instead of morphological knowledge about word stems and inflection endings, distance measures based on Levenstein distance are used. Results based on phoneme and grapheme distances will be presented. Our distance-based approach requires no expert knowledge about a specific language and no definition of complex grammar rules. Instead, grapheme sequences or the phoneme representation of words are sufficient to apply our HDLA-algorithm easily to any new language. With our proposed technique OOV-rates were decreased by more than half from 8.7% to 4%, thereby also improving recognition performance by an absolute 4.1% from 29.5% to 25.4% word error rate. SL980771.PDF (From Author) SL980771.PDF (Rasterized) TOP Automatic Pronunciation Error Detection and Guidance for Foreign Language Learning Authors: Chul-Ho Jo, Kyoto University (Japan) Tatsuya Kawahara, Kyoto University (Japan) Shuji Doshita, Kyoto University (Japan) Masatake Dantsuji, Kyoto University (Japan) Page (NA) Paper number 741 Abstract: We propose an effective application of speech recognition to foreign language pronunciation learning. The objective of our system is to detect pronunciation errors and provide diagnostic feedback through speech processing and recognition methods. Automatic pronunciation error detection is used for two kinds of mispronunciation, that is mistake and linguistical inheritance. The correlation between automatic detection and human judgement shows its reliability. For the feedback guidance to an erroneous phone, we set up classifiers for the well-recognized articulatory features, the place of articulation and the manner of articulation, in order to identify the cause of incorrect articulation. It provides feedback guidance on how to correct mispronunciation. SL980741.PDF (From Author) SL980741.PDF (Rasterized) TOP Lexical Access for Large-Vocabulary Speech Recognition Authors: Roger Ho-Yin Leung, Chinese University of Hong Kong (Hong Kong) Hong C. Leung, Chinese University of Hong Kong (Hong Kong) Page (NA) Paper number 229 Abstract: In this paper, the lexical characteristics of two Chinese dialects and American English are explored. Different lexical representations are investigated, including the tonal syllables, base syllables, phonemes, and the broad phonetic classes. Multiple measurements are made, such as coverage, uniqueness, and cohort sizes. Our results are based on lexicons of 44K and 52K words in Chinese and English obtained from the CallHome Corpus and the COMLEX Corpus, respectively. We have found that the set of the most frequent 4,000 words has coverage of 92% and 77% for Chinese and English, respectively. The phonetic representation unique specifies 85%, 87% and 93% of the lexicon for Mandarin, Cantonese, and English, respectively. While the three languages appear quite different when they are described by their full phoneme sets, their characteristics are more similar when they are represented in terms of broad phonetic classes. SL980229.PDF (From Author) SL980229.PDF (Rasterized) TOP The Effect of Fundamental Frequency on Mandarin Speech Recognition Authors: Sharlene Liu, Nuance (USA) Sean Doyle, General Magic (USA) Allen Morris, Soft Gam (USA) Farzad Ehsani, Sehda (USA) Page (NA) Paper number 847 Abstract: ABSTRACT We study the effects of modeling tone in Mandarin speech recognition. Including the neutral tone, there are 5 tones in Mandarin and these tones are syllable-level phenomena. A direct acoustic manifestation of tone is the fundamental frequency (f0). We will report on the effect of f0 on the acoustic recognition accuracy of a Mandarin recognizer. In particular, we put f0, its first derivative (f0'), and its second derivative (f0'') in separate streams of the feature vector. Stream weights are adjusted to investigate the individual effects of f0, f0', and f0'' to recognition accuracy. Our results show that incorporating the f0 feature negatively impacted accuracy, whereas f0' increased accuracy and f0'' seemed to have no effect. SL980847.PDF (From Author) SL980847.PDF (Rasterized) TOP The Perception Of Nativeness: Variable Speakers And Flexible Listeners Authors: Duncan Markham, Deakin University (Australia) Page (NA) Paper number 424 Abstract: Tests of foreign accent usually treat native listeners as reliable providers of accentedness ratings, and pay too little heed to task-specific effects on non-native speakers' performance. This paper details a number of factors which in fact influence native listeners' perceptions, and the native-like behaviour of non-native speakers' productions, based on the results of a large study of phonetic performance in second language learners. Listeners were observed to vary, at times considerably, in their perception of accent, depending on context, and type of stimulus, and at times showed distinctly idiosyncratic scoring patterns. Listeners' reactions to speaker voice pathology, mixed dialect pronunciation, and artefacts of read speech are discussed, and the effects of using different types of scoring system are examined. SL980424.PDF (From Author) SL980424.PDF (Rasterized) TOP Voice Dictation in the Secondary School Classroom Authors: Michael F. McTear, University of Ulster (Ireland) Eamonn A. O'Hare, St Mark's High School (Ireland) Page (NA) Paper number 546 Abstract: This paper reports on an exploratory study in which a group of second year secondary school pupils with reading ages ranging from 8.3 to 12.9 performed a set of tasks using the IBM VoiceType dictation package in order to determine the benefits of voice dictation for classroom use. The results showed that pupils with varying reading ages could dictate at comparable speeds and often with similar degrees of accuracy. Homophones were almost never a source of error in the texts produced with voice dictation, as compared with the children's handwritten texts. The implications of these findings for the use of dictation software in the classroom and for further studies of the potential of voice dictation for improving children's spelling and composition skills are discussed. SL980546.PDF (From Author) SL980546.PDF (Rasterized) TOP The Importance of the First Syllable in English Spoken Word Recognition by Adult Japanese Speakers Authors: Kazuo Nakayama, Yamagata University (Japan) Kaoru Tomita-Nakayama, Yamagata University (Japan) Page (NA) Paper number 446 Abstract: We investigated English spoken word recognition of adult Japanese speakers. We found that the accurate recognition of the first syllable played an important role in recognizing a word correctly. It was implied that their recognition performance would be enhanced by time-scale expansion and/or dynamic range compression. The duration of a beginning word is so short that the listener can't recognize it correctly. In the first experiment, we found that they had difficulty in recognizing both isolated words and the extracted words, especially when the word did not begin with a strong syllable. In the second experiment, the extracted words and the corresponding time-scale expanded words were given. The result indicated that the expanded words were better recognized. It is found that the time-scale modification of the extracted words didn't lose intelligibility even around the ratio of 2.00, as was clear from the fact that the recognition improved. SL980446.PDF (From Author) SL980446.PDF (Rasterized) TOP Spoken L2 Teaching with Contrastive Visual and Auditory Feedback Authors: Anne-Marie Öster, Department of Speech, Music and Hearing, KTH (Sweden) Page (NA) Paper number 256 Abstract: Teaching strategies and positive results from training of both perception and production of spoken Swedish with 13 immigrants are reported. The learners participated in six training sessions lasting thirty minutes, twice a week. The training had a positive effect on the L2-speakers' perception and production of individual Swedish sounds, stress, intonation and rhythm. The positive results were obtained through auditory and visually contrastive feedback provided through a PC running the IBM SpeechViewer software. Skill building modules together with the Speech Pattering Module "Pitch and Loudness" was used, that displays the speech signal as graphical curves and diagrams. A split screen offered a comparison of the student's production with a correct model by the teacher. Pitch and loudness were displayed either separately or combined. SL980256.PDF (From Author) SL980256.PDF (Rasterized) TOP The Role Of Phonological, Morphological, And Orthographic Knowledge In The Intuitive Syllabification Of Dutch Words: A Longitudinal Approach Authors: Dominiek Sandra, University of Antwerp (Belgium) Steven Gillis, University of Antwerp (Belgium) Page (NA) Paper number 1106 Abstract: Children of three different ages (five, eight, and ten years old) were asked to syllabify a list of auditorily presented words. The list composition was such that the effect of different knowledge sources on the children's intuitive syllabification could be assessed: the relative importance of language-universal versus language-specific phonological constraints, the effect of morphological complexity, and the effect of orthographic knowledge. The results indicate that five-year old children are already aware of language-specific constraints and are sensitive to the phonological distinction between continuant and non-continuant consonants. Literate children (eight and ten years old) are influenced in their syllabification behavior by their orthographic knowledge, i.e. once children have reached the literate stage it is difficult for them to separate phonological and orthographic knowledge in this phonological task. Finally, children in all three age groups did not syllabify singulars differently than phonologically closely matched plurals. SL981106.PDF (From Author) SL981106.PDF (Scanned) TOP The Acquisition of Japanese Compound Accent Rule Authors: Ayako Shirose, Department of Cognitive Sciences, Graduate school of Medicine, University of Tokyo (Japan) Haruo Kubozono, Kobe University (Japan) Shigeru Kiritani, University of Tokyo (Japan) Page (NA) Paper number 1107 Abstract: This paper reports the results of research on the process of acquisition of Japanese compound accent rules by children aged 4-5. The results reveal: 1) Children acquire general rules before they acquire lexical idiosyncratic rules. 2) Children failed to retain the accent of the second element, and instead place an incorrect accent on the penultimate foot. This result suggests that children acquire placing accent on the penultimate foot prior to retaining the lexical accent of the second element. We discussed a similarity between above result and a constraint-reranking phenomenon in adults' phonology. 3) The syllable, which plays a important role in adults' CA rules, does not contribute to the CA rules in children's phonology. We assumed that children have not acquired sufficient understanding of the syllable to contribute to the CA rules. SL981107.PDF (Scanned) TOP The Acquisition of Putonghua Phonology Authors: Lydia K.H. So, The University of Hong Kong (Hong Kong) Zhou Jing, The University of Hong Kong (Hong Kong) Page (NA) Paper number 956 Abstract: Ibis paper reports the phoneme repertoires and phonological error patterns of 600 Chinese-speaking children aged 2;0 to 7;0. The findings support the hypotheses that phonological acquisition is influenced by the ambient language and the mother tongue. SL980956.PDF (Scanned) TOP Enhancing Speech Processing of Japanese Learners of English Utilizing Time-Scale Expansion With Constant Pitch Authors: Kaoru Tomita-Nakayama, Yamagata University (Japan) Kazuo Nakayama, Yamagata University (Japan) Masayuki Misaki, Matsushita Electrical Industries Co. Ltd. (Japan) Page (NA) Paper number 180 Abstract: This study demonstrated that the time-scale expansion of speech with the constant pitch (henceforth, expanded speech) enhanced the speech recognition of Japanese learners of English in contrast to the previous studies in which the time-scale expanded speeches did not contribute to the speech recognition chiefly because of the severe distortion of original speech and pitch change. Experiments were administered with the stimuli of original normal speech and the corresponding expanded speech. These results showed that the expanded speech stimuli were intelligible to many of the subjects. The hypotheses are that the expanded speech enhances listeners' speech processing and also enables the listeners to call into play virtual memory capacity for an on-line speech processing, which are more apparent in a longer stimulus. The expanded speech worked well with most subjects. Another prescriptions should be prepared for the rest of the subjects for whom the expanded speech was not solely much effective. SL980180.PDF (From Author) SL980180.PDF (Rasterized) TOP A Bootstrap Training Approach for Language Model Classifiers Authors: Volker Warnke, University of Erlangen (Germany) Elmar Nöth, University of Erlangen (Germany) Jan Buckow, University of Erlangen (Germany) Stefan Harbeck, University of Erlangen (Germany) Heinrich Niemann, University of Erlangen (Germany) Page (NA) Paper number 316 Abstract: In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL-corpus for the tasks dialog act classification and LID. SL980316.PDF (From Author) SL980316.PDF (Rasterized) TOP Voice Onset Time Patterns in 7-, 9- and 11-Year Old Children Authors: Sandra P. Whiteside, University of Sheffield (U.K.) Jeni Marshall, University of Sheffield (U.K.) Page (NA) Paper number 154 Abstract: Voice onset time (VOT) is a key temporal feature in spoken language. There is some evidence to suggest that there are sex differences in VOT patterns. The cause of these sex differences could be attributed to sexual dimorphism of the vocal apparatus. There is also some evidence to suggest that phonetic sex differences could also be attributed to learned stylistic and linguistic factors. This study reports on an investigation into the VOT patterns for /p b t d/ in a group of thirty children aged 7 (n=10), 9 (n=10) and 11 (n=10) years, with equal numbers of girls (n=5) and boys (n=5) in each age group. Age and sex differences were examined for in the VOT data. Age, sex and age-by-sex interactions were found. The results are presented and discussed. SL980154.PDF (From Author) SL980154.PDF (Rasterized) TOP Some Developmental Patterns in the Speech of 6-, 8- and 10-Year Old Children: an Acoustic Phonetic Study Authors: Sandra P. Whiteside, University of Sheffield (U.K.) Carolyn Hodgson, University of Sheffield (U.K.) Page (NA) Paper number 155 Abstract: The process of the development of fine motor speech skills co-occurs with the maturation of the vocal apparatus. This brief study presents some acoustic phonetic characteristics of the speech of twenty pre-adolescent (6-, 8- and 10-year-olds) boys and girls. The speech data were elicited via a picture-naming task. Both age and sex differences in the acoustic phonetic characteristics of selected vowels and consonants are examined. The acoustic phonetic characteristics that were investigated included formant frequency values and coarticulation (or gestural overlap) patterns. Age, sex and age-by-sex differences for the acoustic phonetic characteristics are presented and discussed for the data with reference to speech development and the sexual dimorphism of the vocal apparatus. SL980155.PDF (From Author) SL980155.PDF (Rasterized) TOP Language Development After Extreme Childhood Deprivation: A Case Study Authors: Lisa-Jane Brown, Department of Human Communication Science, Sheffield University, Claremont Crescent, Sheffield (U.K.) John Locke, Department of Human Communication Science, Sheffield University, Claremont Crescent, Sheffield (U.K.) Peter Jones, Sheffield (Hallam) University (U.K.) Sandra P. Whiteside, Department of Human Communication Science, Claremont Crescent, Sheffield (U.K.) Page (NA) Paper number 791 Abstract: The atypical linguistic processing and cognitive development of previously institutionalised, adopted Romanian children are being researched using a neurolinguistic theory of development. Of particular concern is the Critical Period Hypothesis which holds that language capacity can only develop in response to relevant stimulation during a pre-determined period in childhood. The research impetus derives from the need to understand the course of first language acquisition in children who have suffered extreme deprivation at an early age. The purpose of this paper is to attempt to analyse what these children can tell us about the potential for language development in the face of such deprived circumstances. In order to examine this, a theory of neurolinguistic development will be applied to the case study of a formerly institutionalised Romanian child, Maria. A key question will be addressed: Has Maria's early deprivation set for her an irreversible path in terms of attaining normal language development? SL980791.PDF (From Author) SL980791.PDF (Rasterized) TOP Phonological Elements As A Basis For Language-Independent ASR Authors: Geoff Williams, SOAS, University of London/RMS Inc (USA) Mark Terry, RMS Inc (USA) Jonathan Kaye, SOAS, University of London (U.K.) Page (NA) Paper number 622 Abstract: This paper proposes a novel architecture for language-independent ASR based on government phonology (GP). We use experimental data to show that phoneme-based recognisers perform poorly on languages other than the original target, rendering such systems inadequate for multi-lingual speech recognition, a result we attribute to the inadequacy of the phoneme as a linguistic unit. In the proposed GP model, recognition targets are a small set of sub-segmental primes, or "elements", found in all languages, which have been previously shown to be robustly detected in a language-independent manner. Well-formedness constraints are captured by simple parameter settings which can be easily encoded as rules and applied as top-down constraints in a speech recogniser. Hence, given a set of trained element detectors, a recogniser for any given language can in principle be rapidly built by selection of the appropriate lexicon and constraints. We describe the design of experimental architectures for our GP-based system. SL980622.PDF (From Author) SL980622.PDF (Rasterized) TOP A Phonetic and Acoustic Study of Babbling in an Italian Child Authors: Claudio Zmarich, CNR-Istituto di Fonetica e Dialettologia, Padova (Italy) Roberta Lanni, CNR-Istituto di Fonetica e Dialettologia, Padova (Italy) Page (NA) Paper number 1004 Abstract: This single case study aims to combine the auditory assessment method with the precision offered by the instrumental measurement of acoustic characteristics, in order to investigate the phonetic aspect of early speech development, namely babbling and early words. While generic progress may be determined in the increasing prevalence of the number of CV syllables within the global repertory of utterances, the aspects that better reveal the influence of a target language include the frequency of occurrence of vowel types, especially if their classification refers to front-back dimension, in combination with an expansion and refining of phonotactic possibilities. Further, acoustic and articulatory evidence reveals an initial tendency for more control of height dimension than front/back. The patterns of C-V associations suggest that the child develops from a babbling phase characterized by the overwhelming prevalence of front articulations, to a phase characterized by the presence of the first words, where the patterns predicted by the MacNeilage and Davis theory occur, perhaps owing to the presence of the same patterns in the target lexicon. SL981004.PDF (From Author) SL981004.PDF (Rasterized) TOP Rescoring Multiple Pronunciations Generated from Spelled Words Authors: Roland Kuhn, Panasonic Technologies Inc., Speech Technology Laboratory (USA) Jean-Claude Junqua, Panasonic Technologies Inc., Speech Technology Laboratory (USA) Philip D. Martzen, Panasonic Technologies Inc., Speech Technology Laboratory (USA) Page (NA) Paper number 304 Abstract: Building on earlier work, we show how a set of binary decision trees can be trained to generate an ordered list of possible pronunciations from a spelled word. Training is carried out on a database consisting of spelled words paired with their pronunciations (in a particular language). We show how phonotactic information can be learned by a second set of decision trees, which reorder the multiple pronunciations generated by the first set. The paper defines the ``inclusion'' metric for scoring phoneticizers that generate multiple pronunciations. Experimental results employing this metric indicate that phonotactic reordering yields a slight improvement when only the top pronunciation is retained, and a large improvement when more than one hypothesis is retained. Isolated-word recognition results which show good performance for automatically-generated pronunciations are given. SL980304.PDF (From Author) SL980304.PDF (Rasterized) TOP