ICSLP'98 Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1

Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Home Full List of Titles 1: ICSLP'98 Proceedings Keynote Speeches Text-To-Speech Synthesis 1 Spoken Language Models and Dialog 1 Prosody and Emotion 1 Hidden Markov Model Techniques 1 Speaker and Language Recognition 1 Multimodal Spoken Language Processing 1 Isolated Word Recognition Robust Speech Processing in Adverse Environments 1 Spoken Language Models and Dialog 2 Articulatory Modelling 1 Talking to Infants, Pets and Lovers Robust Speech Processing in Adverse Environments 2 Spoken Language Models and Dialog 3 Speech Coding 1 Articulatory Modelling 2 Prosody and Emotion 2 Neural Networks, Fuzzy and Evolutionary Methods 1 Utterance Verification and Word Spotting 1 / Speaker Adaptation 1 Text-To-Speech Synthesis 2 Spoken Language Models and Dialog 4 Human Speech Perception 1 Robust Speech Processing in Adverse Environments 3 Speech and Hearing Disorders 1 Prosody and Emotion 3 Spoken Language Understanding Systems 1 Signal Processing and Speech Analysis 1 Spoken Language Generation and Translation 1 Spoken Language Models and Dialog 5 Segmentation, Labelling and Speech Corpora 1 Multimodal Spoken Language Processing 2 Prosody and Emotion 4 Neural Networks, Fuzzy and Evolutionary Methods 2 Large Vocabulary Continuous Speech Recognition 1 Speaker and Language Recognition 2 Signal Processing and Speech Analysis 2 Prosody and Emotion 5 Robust Speech Processing in Adverse Environments 4 Segmentation, Labelling and Speech Corpora 2 Speech Technology Applications and Human-Machine Interface 1 Large Vocabulary Continuous Speech Recognition 2 Text-To-Speech Synthesis 3 Language Acquisition 1 Acoustic Phonetics 1 Speaker Adaptation 2 Speech Coding 2 Hidden Markov Model Techniques 2 Multilingual Perception and Recognition 1 Large Vocabulary Continuous Speech Recognition 3 Articulatory Modelling 3 Language Acquisition 2 Speaker and Language Recognition 3 Text-To-Speech Synthesis 4 Spoken Language Understanding Systems 4 Human Speech Perception 2 Large Vocabulary Continuous Speech Recognition 4 Spoken Language Understanding Systems 2 Signal Processing and Speech Analysis 3 Human Speech Perception 3 Speaker Adaptation 3 Spoken Language Understanding Systems 3 Multimodal Spoken Language Processing 3 Acoustic Phonetics 2 Large Vocabulary Continuous Speech Recognition 5 Speech Coding 3 Language Acquisition 3 / Multilingual Perception and Recognition 2 Segmentation, Labelling and Speech Corpora 3 Text-To-Speech Synthesis 5 Spoken Language Generation and Translation 2 Human Speech Perception 4 Robust Speech Processing in Adverse Environments 5 Text-To-Speech Synthesis 6 Speech Technology Applications and Human-Machine Interface 2 Prosody and Emotion 6 Hidden Markov Model Techniques 3 Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1 Human Speech Production Segmentation, Labelling and Speech Corpora 4 Speaker and Language Recognition 4 Speech Technology Applications and Human-Machine Interface 3 Utterance Verification and Word Spotting 2 Large Vocabulary Continuous Speech Recognition 6 Neural Networks, Fuzzy and Evolutionary Methods 3 Speech Processing for the Speech-Impaired and Hearing-Impaired 2 Prosody and Emotion 7 2: SST Student Day SST Student Day - Poster Session 1 SST Student Day - Poster Session 2 Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files	SIVHA, Visual Speech Synthesis System Authors: Yolanda Blanco, Universidad Publica de Navarra (Spain) Maria Cuellar, Universidad Publica de Navarra (Spain) Arantxa Villanueva, Universidad Publica de Navarra (Spain) Fernando Lacunza, Universidad Publica de Navarra (Spain) Rafael Cabeza, Universidad Publica de Navarra (Spain) Beatriz Marcotegui, Universidad Publica de Navarra (Spain) Page (NA) Paper number 1127 Abstract: This paper presents SIVHA, a high quality Spanish speech synthesis system for severe disabled persons controlled by their eye movements. The system follows the eye-gaze of the patients along the screen and constructs the text with the selected words. When the user considers that the construction of the message has been finished, the synthesis of the message can be ordered. The system is divided in three modules. The first one determines the point of the screen the user is looking at, the second one is an interface to construct the sentences and the third one is the synthesis itself. SL981127.PDF (From Author) SL981127.PDF (Rasterized) TOP Using Automatic Speech Recognition and its Possible Effects on the Voice Authors: C.G. de Bruijn, University of Sheffield (U.K.) Sandra P. Whiteside, University of Sheffield (U.K.) P.A. Cudd, University of Sheffield (U.K.) D. Syder, University of Sheffield (U.K.) K.M. Rosen, KTH (Sweden) L. Nord, KTH (Sweden) Page (NA) Paper number 501 Abstract: Literature and individual reports contain indications that the use of speech recognition based human computer interfaces could potentially lead to vocal fatigue, or even to symptoms associated with dysphonia. While more and more people opt for a speech driven computer interface as an alternative input method to the keyboard, and these speech recognition systems become more and widely used, both in the home and office environment, it has become necessary to qualify any potential risks of voice damage. This study reports about ongoing research that investigates acoustic changes in the voice, after use of a discrete speech recognition system. Acoustic analyses were carried out on two Swedish users of such a system. So far, for one of the users, two of the acoustic parameters under investigation that could be an indicator of vocal fatigue, show a significant difference directly before and after use of a speech recognition system. SL980501.PDF (From Author) SL980501.PDF (Rasterized) TOP The Importance Of F0 or Voice Pitch for Perception of Tonal Language: Simulations With Cochlear Implant Speech Processing Strategies Authors: Robert Alexander Fearn, School of Physics, University of New South Wales (Australia) Page (NA) Paper number 666 Abstract: Cochlear implants were initially distributed in Western countries. The speech processing strategies that have been developed to drive the implants provide detailed information about the spectral envelope and transients. This information is necessary to identify the phonemes of English and other Western languages. Cochlear implants have more recently been distributed in Eastern countries where tonal languages such as Mandarin and Cantonese are spoken. In addition to spectral envelope and transient information, tonal languages require finer resolution of pitch to distinguish different words. Speech processing strategies only provide the cochlear implant user with relatively low resolution of pitch. This study investigates the importance of voice pitch (F0) in detection of phonetically identical Cantonese words, which vary in pitch. Simulations of speech processing strategies were performed with a normal hearing subject and the results suggest that F0 is very important for correct identification of tonal words. SL980666.PDF (From Author) SL980666.PDF (Rasterized) TOP Assessing High-Level Language In Individuals With Multiple Sclerosis: A Pilot Study Authors: Karin Brunnegaard, Dept. of Logopedics and Phoniatrics, Göteborg University, Göteborg (Sweden) Katja Laakso, Dept. of Logopedics and Phoniatrics, Göteborg University, Göteborg (Sweden) Lena Hartelius, Dept. of Logopedics and Phoniatrics, Göteborg University, Göteborg (Sweden) Elisabeth Ahlsén, Dept. of Linguistics, Göteborg University, Göteborg (Sweden) Page (NA) Paper number 496 Abstract: This study describes the development of a test battery to assess high-level language function in Swedish and a description of the test performances of a group of 9 individuals with multiple sclerosis (MS). The test battery included tasks such as repetition of long sentences, understanding of complicated logico-grammatical sentences, naming famous people, resolving ambiguities, recreating sentences, understanding metaphors, making inferences, defining words. The MS group included individuals with self-reported language problems as well as individuals without any such problems. Their performances were compared to a group of 7 control subjects with a Kruskal-Wallis one-way ANOVA which indicated significantly different total mean scores. Post hoc analysis with Mann-Whitney U-tests revealed that the group with self-reported language problems had significantly lower mean scores when compared to control subjects and to MS subjects without self-reported language problems. None of the language difficulties were detected by a standard aphasia test. SL980496.PDF (From Author) SL980496.PDF (Rasterized) TOP Design Of Cochlear Implant Device For Transmitting Voice Pitch Information In Speech Sound Of Asian Languages Authors: Shizuo Hiki, Graduate School of Human Sciences, Waseda University (Japan) Kazuya Imaizumi, Graduate School of Human Sciences, Waseda University (Japan) Yumiko Fukuda, Research Institute, National Rehabilitation Center for the Disabled (Japan) Page (NA) Paper number 1091 Abstract: Resolution of the fundamental frequency of speech sound required for the design of a speech processor of a cochlear implant device is investigated, with special regard to transmitting voice pitch information in Asian languages. Clinical application of the cochlear implant has spread rapidly in recent years to Asian countries where a variety of languages having different voice pitch information from English and other European languages are spoken. The perceptually acceptable area and required resolution of duration and fundamental frequency is estimated on a two-dimensional chart consisting of logarithmic time and frequency scales, based on the typical voice pitch contours of Japanese word accent and Chinese syllabic tone. As a result, it is shown that much finer quantizing and time sampling for the change in fundamental frequency is required compared with sentence intonation and emphasis common to other languages. It is also shown that the amount of information conveyed by combined use of lipreading with a cochlear implant is not sufficient for supplementing the voice pitch information. A possible way of transmitting such voice pitch information by transmission of the waveform of speech sound directly to the auditory area of cortex, where the waveform is reconstructed and voice pitch is extracted, is discussed. SL981091.PDF (From Author) SL981091.PDF (Rasterized) TOP Abnormal Volume-Duration Relationship in Parkinsonian Speech Authors: Aileen K. Ho, Psychology Dept, Monash University (Australia) John L. Bradshaw, Psychology Dept, Monash University (Australia) Robert Iansek, Geriatric Research Unit, Kingston Centre (Australia) Robin J. Alfredson, Dept of Mechanical Engineering, Monash University (Australia) Page (NA) Paper number 11 Abstract: Past studies on Parkinsonian speech have generally examined the parameters of speech separately. Thus volume and suprasegmental duration have largely been described independently of each other on the assumption that two measures are not related. This assumption was tested by manipulating intensity and examining the corresponding effect on duration. Twelve Parkinson's disease (PD) patients and twelve normal healthy controls read according to three conditions; as softly as possible, as loudly as possible, and with no volume instruction (at normal volume). Total Duration of reading (with pauses), and Net Duration (without pauses) were examined. For Net Duration, both groups were similar, and did not vary across volume conditions. PD patients, however, demonstrated decreased Total Duration as speech volume was increased. The abnormal Parkinsonian relationship is suggestive of a trade-off between the two parameters in order to achieve adequately loud reading, and may be explained by increased attention associated with increased effort when speaking louder. SL980011.PDF (From Author) SL980011.PDF (Rasterized) TOP Analysis Of Disordered Speech Signal Using Wavelet Transform Authors: Cheol-Woo Jo, Changwon National University (Korea) Dae-Hyun Kim, Changwon National University (Korea) Page (NA) Paper number 691 Abstract: In this paper a method to analyze pathological speech signal using wavelet transform is suggested. Pathological speech signal is commercially available pathological voice database and analyzed by the suggested method. Normal speech signal is also from the same database and analyzed as well. Then the results are compared to find the differences between normal and pathological speech. Three level wavelet transform is used. Normalized energy ratios between the levels and normalized peak-to-peak values are used as parameters. As a result, it was possible to distinguish between normal and pathological speech signal. SL980691.PDF (From Author) SL980691.PDF (Rasterized) TOP Multi-Channel Pulsation Strategy For Electric Stimulation Of Cochlea Authors: Shigeyoshi Kitazawa, Department of Computer Science, Faculty of Information, Shizuoka University (Japan) Hiroyuki Kirihata, Department of Computer Science, Faculty of Information, Shizuoka University (Japan) Tatsuya Kitamura, Department of Computer Science, Faculty of Information, Shizuoka University (Japan) Page (NA) Paper number 1036 Abstract: This paper describes a speech coding strategy for a cochlear implant system assuming a Nucleus Cochlear Implant receiver stimulator. Speech processor converts input speech into a series of stimulation electrode position and stimulation current intensities. This process can be optimized with a decomposing process of an acoustic signal into a given set of impulse responses corresponding to a set of electrode channels. An error minimization algorithm can find a optimal stimulation sequence that minimizes distortion of transferred speech and maximize transferred phonological information as well as sound qualities. Re-synthesized sound quality was qualitatively evaluated. Environmental sound can also be recognizable with this method. SL981036.PDF (From Author) SL981036.PDF (Scanned) TOP Synthetic Faces as a Lipreading Support Authors: Eva Agelfors, CTT/TMH, KTH (Sweden) Jonas Beskow, CTT/TMH, KTH (Sweden) Martin Dahlquist, CTT/TMH, KTH (Sweden) Björn Granström, CTT/TMH, KTH (Sweden) Magnus Lundeberg, CTT/TMH, KTH (Sweden) Karl-Erik Spens, CTT/TMH, KTH (Sweden) Tobias Öhman, CTT/TMH, KTH (Sweden) Page (NA) Paper number 362 Abstract: In the Teleface project the possibility to use a synthetic face as a visual telephone communication aid for hearing impaired persons is evaluated. In an earlier study, NH, a group of normal hearing persons participated. This paper describes the results of two multimodal intelligibility tests with hearing impaired persons, where the additional information provided by a synthetic as well as a natural face is evaluated. In a first round with hearing impaired persons, HI:1, twelve subjects were presented with VCV-syllables and "everyday sentences" together with a questionnaire. The intelligibility score for the VCV-syllables presented as audio alone, was 30%. When adding a synthetic face the score improved to 55% and when instead adding the natural face it was 58%. In a second round, HI:2, fifteen hearing impaired persons were presented with the sentence material and a questionnaire. The audio track was filtered to simulate telephone bandwidth. The intelligibility score for the audio only condition was 57% correctly identified keywords. Together with a synthetic face it was 66% and with a natural face 83%. Answers in the questionnaires were collected and analysed. The general subjective rating of the synthetic face was positive and the subjects would like to use such a type of aid if available. SL980362.PDF (From Author) SL980362.PDF (Rasterized) TOP Predicting Language Scores From The Speech Perception Scores Of Hearing-Impaired Children Authors: Lois Martin, La Trobe University (Australia) John Bench, La Trobe University (Australia) Page (NA) Paper number 1006 Abstract: The ability to understand speech in 27 hearing impaired children was assessed using the BKB/A Picture Related Sentence test for children. The mean sentence score for the group was 72% (range 100-18%). Language scores (CELF-R) and Verbal Scale IQ (WISC-R) scores were significantly below the norm (72.8 and 89.2 respectively). Performance Scale IQ scores were slightly above the norm (106.3). Sentences scores were correlated significantly with language score (r = 0.49). Further investigation showed that the predictability of language scores could be improved when sensation level was taken into account. Sensation level was negatively correlated with language scores (r = - 0.51), demonstrating that children with better language abilities perceived speech at relatively lower intensity levels. The observed sensation levels from the group were compared with the expected levels for normally hearing children. This difference measure yielded a correlation coefficient of - 0.73 with language scores. SL981006.PDF (From Author) SL981006.PDF (Rasterized) TOP Content-Independent Duration Model on Categories of Voice and Unvoice Segments Authors: Oleg P. Skljarov, Research Institute of Ear, Throat, Nose and Speech (Russia) Page (NA) Paper number 1149 Abstract: Trying to understand the experimental data on segmentation of a speech signal by a principle "Voice/Unvoice" has led us to the hypothesis about a pair of logistical dependence between durations of these segments. The segmentation was carried out with the help of the computer program working in quasi real time. The hypothesis about logistic recurrent dependence for sequence of segments durations has allowed to make a conclusion about quasi rhythmical organization of this sequence. With the help of offered recurrent dependencies it is possible to explain statistical peculiarities of speech behaviour of stutterers in comparison with normal speech behaviour. These logistic dependencies were confirmed by direct experimental data. The assumption of origins of specified rhythm is made. These origins are hidden at the level of control of speech production and perception. Is shown, that the chaotic nature of offered dynamics of formation of large-scale temporary structure allows to enter concept of the information into consideration by a natural way. SL981149.PDF (From Author) SL981149.PDF (Rasterized) TOP Dynamical Spectrogram, an Aid for the Deaf Authors: Ali-Asghar Soltani-Farani, University of Surrey (U.K.) Edward H.S. Chilton, University of Surrey (U.K.) Robin Shirley, University of Surrey (U.K.) Page (NA) Paper number 438 Abstract: Visual perception of speech through spectrogram reading has long been a subject of research, as an aid for the deaf or hearing impaired. Attributing the lack of success in this type of visual aid mainly to the static form of information presented by the spectrograms, this paper proposes a system of dynamic visualisation for speech sounds. This system samples a high resolved, auditory-based spectrogram, with a window of 20 milliseconds duration, so that exploiting the periodicity of the input sound, it produces a phase-locked sequence of images. This sequence is then animated at a rate of 50 images per second to produce a movie-like image displaying both the time-varying and time-independent information of the underlying sound. Results of several preliminary experiments for evaluation of the potential usefulness of the system for the deaf, undertaken by normal-hearing subjects, support the quick learning and persistence of the gestures for small sets of single words and motivate further investigations. SL980438.PDF (From Author) SL980438.PDF (Rasterized) TOP Evidence of Dual-Route Phonetic Encoding From Apraxia of Speech: Implications for Phonetic Encoding Models Authors: Rosemary A. Varley, University of Sheffield (U.K.) Sandra P. Whiteside, University of Sheffield (U.K.) Page (NA) Paper number 151 Abstract: Contemporary psycholinguistic models suggest that there may be dual routes operating in phonetic encoding: a direct route which uses stored syllabic units, and an indirect route which relies on the on-line assembly of sub-syllabic units. The more computationally efficient direct route is more likely to be used for high frequency words, while the indirect route is most likely to be used for novel or low frequency words. We suggest that the acquired neurological disorder of apraxia of speech (AOS), provides a window to speech encoding mechanisms and that the disorder represents an impairment of direct route encoding mechanisms and, therefore, a reliance on indirect mechanisms. We report an investigation of the production of high and low frequency words across three subject groups: non-brain damaged control (NBDC, N=3); brain damaged control (BDC, N=3) and speakers with AOS (N=4). The results are presented and discussed within the dual-route phonetic encoding hypothesis. SL980151.PDF (From Author) SL980151.PDF (Rasterized) TOP Speech Communication Profiles Across The Adult Lifespan: Persons Without Self-Identified Hearing Impairment Authors: M.F. Cheesman, University of Western Ontario (Canada) K.L. Smilsky, University of Western Ontario (Canada) T.M. Major, University of Western Ontario (Canada) F. Lewis, University of Western Ontario (Canada) L.M. Boorman, University of Western Ontario (Canada) Page (NA) Paper number 841 Abstract: A sample of 209 adults ranging from 20 to 79 years of age were studied to measure speech communication profiles as a function of age in persons who did not identify themselves as hearing impaired. The study was conducted in order to evaluate age-related speech perception abilities and communication profiles in a population who do not esent for hearing assessment and who are not included in census statistics as having hearing problems. Audiometric assessment, demographic and hearing history self-reports, speech reception thresholds, consonant discrimination perception in quiet and noi , and the Communication Profile for the Hearing Impaired (CPHI) were the instruments used to develop speech communication profiles. Hearing performance decreased with increased age. However, despite self-reports of no hearing impairment, many subjects ov age 50 had audiometric thresholds that indicated hearing impairment. The responses to the CPHI were correlated to audiometric thresholds, but also to the age of the respondent, when hearing thresholds had been controlled statistically. A comparison of C I responses from this study and that of two other samples in clinical populations revealed only slightly different patterns of behaviour in the present sample when confronted with communication difficulties. SL980841.PDF (Scanned) TOP