Chair: T. Chen, Carnegie Mellon University, USA
Yoshinao Aoki, Hokkaido University (Japan)
Ricardo Mitsumori, Hokkaido University (Japan)
Jincan Li, Hokkaido University (Japan)
Alexander Burger, Langweid (Germany)
In this paper we propose a sign language communication between different languages such as Japanese-Korean and Japanese-Portuguese using CG animation of sign language based on the intelligent image communication method. For this purpose sign language animation is produced using data of gesture or text data expressing sign language. In the roduction process of CG animation of sign language, MATLAB and LIFO language are used, where MATLAB is useful for three-dimensional signal processing of gestures and for displaying animation of sign language. On the other hand LIFO language, which is a descendant of the LISP and FORTH language families, is developed and used to produce live CG animations, resulting in a high-speed interactive system of designing and displaying sign language animations. A simple experiment was conducted to translate Japanese sign language into Korean and Portuguese sign languages using the developed CG animation system .
Shrikanth Narayanan, AT&T Labs (U.S.A.)
Mani Subramaniam, AT&T Labs (U.S.A.)
Benjamin Stern, AT&T Labs (U.S.A.)
Barbara Hollister, AT&T Labs (U.S.A.)
Chih-mei Lin, AT&T Labs (U.S.A.)
The relationship between objective speech recognition performance measures and perceived performance is analyzed and modeled using data obtained from a voice-dialing trial with 798 AT&T customers. The ability of these models for predicting user perception and overall demand for such voice-enabled services is discussed.
Raul Fernandez, MIT Media Lab (U.S.A.)
Rosalind W. Picard, MIT Media Lab (U.S.A.)
In this work, inspired by the application of human-machine interaction and the potential use that human-computer interfaces can make of knowledge regarding the affective state of a user, we investigate the problem of sensing and recognizing typical affective experiences that arise when people communicate with computers. In particular, we address the problem of detecting ""frustration"" in human computer interfaces. By first sensing human biophysiological correlates of internal affective states, we proceed to stochastically model the biological time series with Hidden Markov Models to obtain user-dependent recognition systems that learn affective patterns from a set of training data. Labeling criteria to classify the data are discussed, and generalization of the results to a set of unobserved data is evaluated. Significant recognition results (greater than random) are reported for 21 of 24 subjects.
Gavin A Smith, NTT Basic Research Laboratories (Japan)
Hiroshi Murase, NTT Basic Research Laboratories (Japan)
Kunio Kashino, NTT Basic Research Laboratories (Japan)
This paper discusses a method to search quickly through broadcast audio data to detect and locate known sounds using reference templates, based on the active search algorithm and histogram modeling of zero-crossing features. Active search reduces the number of candidate matches between reference and test template by up to 36 times compared to exhaustive search, while still remaining optimal. Computation is further reduced by using computationally inexpensive zero-crossing features. The method is robust against white noise addition down to 20dB signal-to-noise ratios and digitization noise..
David C Abberley, Sheffield University (U.K.)
Steve J Renals, Sheffield University (U.K.)
Gary D Cook, Cambridge University (U.K.)
This paper describes a spoken document retrieval system, combining the Abbot large vocabulary continuous speech recognition (LVCSR) system developed by Cambridge University, Sheffield University and SoftSound, and the PRISE information retrieval engine developed by NIST. The system was constructed to enable us to participate in the TREC 6 Spoken Document Retrieval experimental evaluation. Our key aims in this work were to produce a complete system for the SDR task, to investigate the effect of a word error rate of 30-50% on retrieval performance and to investigate the integration of LVCSR and word spotting in a retrieval task.