Full List of Titles 1: ICSLP'98 Proceedings 2: SST Student Day Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files |
A Speechreading Aid Based on Phonetic ASRAuthors:
Paul Duchnowski, Massachusetts Institute of Technology (USA)
Page (NA) Paper number 589Abstract:Manual Cued Speech (MCS) is an effective method of communication by the deaf and hearing-impaired. We describe our work on assessing the feasibility of automatic determination and presentation of cues without intervention by the speaker. The conclusions of this study are applied to the design and implementation of a prototype automatic cueing system using HMM-based automatic speech recognition software to identify the cues in real time. We also describe the features of our cue display that enhance its effectiveness such as style of cue images and the timing of their transitions. Our experiments show keyword reception by experienced MCS users improves significantly with the use of our system (66%) relative to speechreading alone (35%) on low-context sentences.
|
0589_01.MPG(was: 0589_01.MPG) | The manually cued sentence "The old castle passed from the duke to the
king." File type: Video File Format: Video File: MPEG Tech. description: 30 frames/second, 320 x 240 frame size Creating Application:: mpeg\_encode Creating OS: linux |
0589_02.MPG(was: 0589_02.MPG) | Automatically cued (discrete cues) sentence "The loss and two wins
were fair games." File type: Video File Format: Video File: MPEG Tech. description: 30 frames/second, 320 x 240 frame size Creating Application:: mpeg\_encode Creating OS: linux |
0589_03.MPG(was: 0589_03.MPG) | Automatically cued (dynamic cues) sentence "The kite may fly on this
windy day." File type: Video File Format: Video File: MPEG Tech. description: 30 frames/second, 320 x 240 frame size Creating Application:: mpeg\_encode Creating OS: linux |
Jan Nouza, Technical University of Liberec (Czech Republic)
The paper describes a new version of a visual feedback aid for speech training. The aid is a PC based speech processing system that visualizes incoming signal and its most relevant parameters (such as volume, pitch, timing, spectrum) and compares them to utterances recorded by reference speakers. The goal is to help a trained person in identifying the most severe deviations in his or her pronunciation. The learning through visual comparison is supported by displaying multiple reference utterances, including phonetic labels both to the reference speakers' and trainee's speech, indicating the areas with larger deviations in any of the displayed features and offering a simple tutoring assessment of the trainee's attempts. Primarily, the system was aimed at hearing-impaired users, but its features make it well applicable also for foreign language pronunciation learning and practicing. The latter possibility was verified in an experiment in which a group of subjects tried to learn pronunciation of a couple of words in an exotic for them foreign language.
Ichiro Maruyama, Telecommunications Advancement Organization (TA0) of Japan (Japan)
Yoshiharu Abe, Mitsubishi Electric Corporation / TAO (Japan)
Takahiro Wakao, TAO (Japan)
Eiji Sawamura, TAO (Japan)
Terumasa Ehara, NHK Science and Technical Research Laboratories / TAO (Japan)
Katsuhiko Shirai, Waseda University / TAO (Japan)
This paper describes a method of automatically synchronizing TV news speech and its captions. A news item consists of sentences and often has a corresponding computerized text, which can be used as a caption. We have developed a new phonetically HMM-based word spotter. In this word spotter, word sequences before and after a synchronization point are concatenated and scoring is based on the state of the synchronization point. The detection accuracy of the proposed method is shown to be superior to a conventional method using no word sequence pair. Model configurations are shown for detection failure, an announcer's misstatements and restatements, and erroneous transcriptions. A 100% detection rate with no false alarms is achieved by combining multiple word sequence pairs in series. A 100% detection rate with few false alarms is obtained by using model configurations for misstatements or erroneous transcriptions.
Aileen K. Ho, Department of Psychology,Monash University (Australia)
John L. Bradshaw, Department of Psychology, Monash University (Australia)
Robert Iansek, Geriatric Research Unit, Kingston Centre (Australia)
Robin J. Alfredson, Department of Mechanical Engineering, Monash University (Australia)
This study investigated the ability to regulate speech volume in a group of six-volume impaired idiopathic Parkinson's disease (PD) patients and their age and sex-matched controls. Participants were asked to read under three conditions; as softly as possible, as loudly as possible, and at normal volume (no volume instruction). The stimuli consisted of a target sentence, easily read in one breath, embedded in a short paragraph of text. Mean volume and volume over time (intensity slope) for the target sentence were obtained. It was found that for all three conditions, patients' speech volume was less than controls' by a constant. Patients also showed a significantly greater reduction of volume (negative intensity slope) towards the end of the sentence, especially for the loud condition. The findings indicate that patients with Parkinsonian hypophonic dysarthria have significant difficulty maintaining speech volume in addition to the inadequate generation of overall speech volume.