Full List of Titles 1: ICSLP'98 Proceedings 2: SST Student Day Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files |
Development of CAI System Employing Synthesized Speech ResponsesAuthors:
Tsubasa Shinozaki, NTT Human Interface Labs. (Japan)
Page (NA) Paper number 409Abstract:This paper proposes a Computer Assisted Instruction (CAI) system that teaches students how to write Japanese characters. The most important feature of the system is the usage of synthesized speech to interact with users. The CAI system has a video display tablet interface. A user traces a pattern of a character using the tablet pen, and simultaneously his tracing is shown on the display. When the trace line is outside the pattern, the system simultaneously outputs synthesized speech to correct the errors. To design strategies for generating instructions, behavior and instruction messages of a human teacher were recorded and analyzed. One of the most interesting challenges of the system is a function that changes the "personality" of the teacher, such as a strict teacher, a friendly teacher, and a short-tempered teacher. According to the experimental results, it was confirmed that the proposed system makes it possible to convey a particular impression using synthesized speech.
|
0409_01.WAV(was: 0409_01.wav) | The instruction messages (fourteen in total) of each set are the
synthesized speech whose prosodic parameters are manually determined
to express good mood, bad mood, and neutral mood, respectively (good
mood [SOUND 0409\_1.WAV], bad mood [SOUND 0409\_2.WAV], neutral mood
[SOUND 0409\_3.WAV]). File type: Sound File Format: Sound File: WAV Tech. description: 11025, 16, mono, PCM Creating Application:: Sound Forge4.0 Creating OS: Windows95 |
0409_02.WAV(was: 0409_02.wav) | The instruction messages (fourteen in total) of each set are the
synthesized speech whose prosodic parameters are manually determined
to express good mood, bad mood, and neutral mood, respectively (good
mood [SOUND 0409\_1.WAV], bad mood [SOUND 0409\_2.WAV], neutral mood
[SOUND 0409\_3.WAV]). File type: Sound File Format: Sound File: WAV Tech. description: 11025, 16, mono, PCM Creating Application:: Sound Forge4.0 Creating OS: Windows95 |
0409_03.WAV(was: 0409_03.wav) | The instruction messages (fourteen in total) of each set are the
synthesized speech whose prosodic parameters are manually determined
to express good mood, bad mood, and neutral mood, respectively (good
mood [SOUND 0409\_1.WAV], bad mood [SOUND 0409\_2.WAV], neutral mood
[SOUND 0409\_3.WAV]). File type: Sound File Format: Sound File: WAV Tech. description: 11025, 16, mono, PCM Creating Application:: Sound Forge4.0 Creating OS: Windows95 |
Andreas Kellner, Philips Research Laboratories Aachen (Germany)
Bernhard Rueber, Philips Research Laboratories Aachen (Germany)
Hauke Schramm, Philips Research Laboratories Aachen (Germany)
Directory assistance systems are amongst the most challenging applications of speech recognition. Today, complete automation of the service fails because of the lacking accuracy of current speech recognizers, which are simply not able to differentiate between hundreds of thousands or even millions of different names occurring in large cities. In this paper, we show that this situation can be remedied by systematically combining all available knowledge sources (last names, first names, street names, partly including their spelled versions) in a statistically optimal way. Especially designed confidence measures for N-best lists are proposed to detect misrecognized turns. Applying these techniques in a hierarchical setup is judged as the enabling step for automating large scale directory assistance. In first experiments, we e.g. are able to service 72% of the inquiries for a database of 1.3 million entries with a remaining error rate of only 6% (or 62% with an error rate of 2%).
Bruce Buntschuh, AT&T Labs - Research (USA)
Candace A. Kamm, AT&T Labs - Research (USA)
Giuseppe Di Fabbrizio, AT&T Labs - Research (USA)
Alicia Abella, AT&T Labs - Research (USA)
Mehryar Mohri, AT&T Labs - Research (USA)
Shrikanth Narayanan, AT&T Labs - Research (USA)
I. Zeljkovic, AT&T Labs - Research (USA)
R.D. Sharp, AT&T Labs - Research (USA)
Jeremy H. Wright, AT&T Labs - Research (USA)
S. Marcus, International Asset Systems Ltd (USA)
J. Shaffer, Mississippi State University (USA)
R. Duncan, Mississippi State University (USA)
J.G. Wilpon, AT&T Labs - Research (USA)
VPQ (Voice Post Query) is a dialog system for spoken access to information in the AT&T personnel database (>120,000 entries). An explicit design goal is for the initial interaction with the system to be rather unconstrained and to rely on tighter, prompt-constrained, dialog only when absolutely necessary. The purpose of VPQ is to explore and exploit the capabilities of ``state-of-the-art'' speech recognition systems for this high-perplexity task, and to develop the natural language understanding and dialog control components necessary for effective and efficient user interactions. The VPQ task includes simple interactions, where the initial request is unambiguous and the system's response provides the desired information or completes a call to the requested person, as well as more complex interactions where ambiguities or errors require dialog-driven resolution. This paper highlights the inherent challenges in this task, the major components of the system, the rationale for their design, and how they perform.
John Choi, AT&T Labs - Research (USA)
Don Hindle, AT&T Labs - Research (USA)
Julia Hirschberg, AT&T Labs - Research (USA)
Ivan Magrin-Chagnolleau, AT&T Labs - Research (USA)
Christine H. Nakatani, AT&T Labs - Research (USA)
Fernando Pereira, AT&T Labs - Research (USA)
Amit Singhal, AT&T Labs - Research (USA)
Steve Whittaker, AT&T Labs - Research (USA)
SCAN (Speech Content based Audio Navigator) is a spoken document retrieval system integrating speaker-independent, large-vocabulary speech recognition with information-retrieval to support query-based retrieval of information from speech archives. Initial development focused on the application of SCAN to the broadcast news domain. This paper provides an overview of this system, including a description of its graphical user interface which incorporates machine-generated speech transcripts to provide local contextual navigation and random access for browsing large speech databases.
Javier Ferreiros, GTH-IEL-UPM (Spain)
José Colás, GTH-IEL-UPM (Spain)
Javier Macías-Guarasa, GTH-IEL-UPM (Spain)
Alejandro Ruiz, GTH-IEL-UPM (Spain)
José Manuel Pardo, GTH-IEL-UPM (Spain)
In this paper we present a speech understanding system that accepts continuous speech sentences as input to command a HIFI set. The string of words obtained from the recogniser is sent to the understanding system that tries to fill in a set of frames specifying the triplet (SUBSYSTEM, PARAMETER, VALUE). All circumstances (understanding incompleteness, HIFI set status, result of the command execution) are confirmed back to the user via a text to speech system with substitutable-concept pattern-based generated messages. The understanding engine is based on semantic-like tagging, including "garbage" tag, and context-dependent rules for meaning extraction. The system allows the application developer to follow the reasoning process (as every understanding rule has an associated concept pattern), spoken by the speech generation module. The concepts for speech generation are randomly substituted with alternative expressions having the same meaning to achieve a certain degree of naturalness in the response speech.
Lori F. Lamel, LIMSI/CNRS (France)
Samir Bennacef, LIMSI/CNRS (France)
Jean-Luc Gauvain, LIMSI/CNRS (France)
Herve Dartigues, SNCF (France)
Jean-Noel Temem, SNCF (France)
In this paper we report on a series of user trials carried out to assess the performance and usability of the MASK prototype kiosk. The aim of the ESPRIT Multimodal Multimedia Service Kiosk (MASK) project was to pave the way for more advanced public service applications with user interfaces employing multimodal, multi-media input and output. The prototype kiosk, was developed after analysis of the technological requirements in the context of users and the tasks they perform in carrying out travel enquiries, in close collaboration with the French Railways (SNCF) and the Ergonomics group at UCL. The time to complete the transaction with the MASK kiosk is reduced by about 30% compared to that required for the standard kiosk, and the success rate is 85% for novices and 94% once familiar with the system. In addition to meeting or exceeding the performance goals set at the project onset in terms of success rate, transaction time, and user satisfaction, the MASK kiosk was judged to be user-friendly and simple to use.