Speech Technology Applications and Human-Machine Interface 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Development of CAI System Employing Synthesized Speech Responses

Authors:

Tsubasa Shinozaki, NTT Human Interface Labs. (Japan)
Masanobu Abe, NTT Human Interface Labs. (Japan)

Page (NA) Paper number 409

Abstract:

This paper proposes a Computer Assisted Instruction (CAI) system that teaches students how to write Japanese characters. The most important feature of the system is the usage of synthesized speech to interact with users. The CAI system has a video display tablet interface. A user traces a pattern of a character using the tablet pen, and simultaneously his tracing is shown on the display. When the trace line is outside the pattern, the system simultaneously outputs synthesized speech to correct the errors. To design strategies for generating instructions, behavior and instruction messages of a human teacher were recorded and analyzed. One of the most interesting challenges of the system is a function that changes the "personality" of the teacher, such as a strict teacher, a friendly teacher, and a short-tempered teacher. According to the experimental results, it was confirmed that the proposed system makes it possible to convey a particular impression using synthesized speech.

SL980409.PDF (From Author) SL980409.PDF (Rasterized)

0409_01.WAV
(was: 0409_01.wav)
The instruction messages (fourteen in total) of each set are the synthesized speech whose prosodic parameters are manually determined to express good mood, bad mood, and neutral mood, respectively (good mood [SOUND 0409\_1.WAV], bad mood [SOUND 0409\_2.WAV], neutral mood [SOUND 0409\_3.WAV]).
File type: Sound File
Format: Sound File: WAV
Tech. description: 11025, 16, mono, PCM
Creating Application:: Sound Forge4.0
Creating OS: Windows95
0409_02.WAV
(was: 0409_02.wav)
The instruction messages (fourteen in total) of each set are the synthesized speech whose prosodic parameters are manually determined to express good mood, bad mood, and neutral mood, respectively (good mood [SOUND 0409\_1.WAV], bad mood [SOUND 0409\_2.WAV], neutral mood [SOUND 0409\_3.WAV]).
File type: Sound File
Format: Sound File: WAV
Tech. description: 11025, 16, mono, PCM
Creating Application:: Sound Forge4.0
Creating OS: Windows95
0409_03.WAV
(was: 0409_03.wav)
The instruction messages (fourteen in total) of each set are the synthesized speech whose prosodic parameters are manually determined to express good mood, bad mood, and neutral mood, respectively (good mood [SOUND 0409\_1.WAV], bad mood [SOUND 0409\_2.WAV], neutral mood [SOUND 0409\_3.WAV]).
File type: Sound File
Format: Sound File: WAV
Tech. description: 11025, 16, mono, PCM
Creating Application:: Sound Forge4.0
Creating OS: Windows95

TOP


Using Combined Decisions and Confidence Measures for Name Recognition in Automatic Directory Assistance Systems

Authors:

Andreas Kellner, Philips Research Laboratories Aachen (Germany)
Bernhard Rueber, Philips Research Laboratories Aachen (Germany)
Hauke Schramm, Philips Research Laboratories Aachen (Germany)

Page (NA) Paper number 454

Abstract:

Directory assistance systems are amongst the most challenging applications of speech recognition. Today, complete automation of the service fails because of the lacking accuracy of current speech recognizers, which are simply not able to differentiate between hundreds of thousands or even millions of different names occurring in large cities. In this paper, we show that this situation can be remedied by systematically combining all available knowledge sources (last names, first names, street names, partly including their spelled versions) in a statistically optimal way. Especially designed confidence measures for N-best lists are proposed to detect misrecognized turns. Applying these techniques in a hierarchical setup is judged as the enabling step for automating large scale directory assistance. In first experiments, we e.g. are able to service 72% of the inquiries for a database of 1.3 million entries with a remaining error rate of only 6% (or 62% with an error rate of 2%).

SL980454.PDF (From Author) SL980454.PDF (Rasterized)

TOP


VPQ: A Spoken Language Interface to Large Scale Directory Information

Authors:

Bruce Buntschuh, AT&T Labs - Research (USA)
Candace A. Kamm, AT&T Labs - Research (USA)
Giuseppe Di Fabbrizio, AT&T Labs - Research (USA)
Alicia Abella, AT&T Labs - Research (USA)
Mehryar Mohri, AT&T Labs - Research (USA)
Shrikanth Narayanan, AT&T Labs - Research (USA)
I. Zeljkovic, AT&T Labs - Research (USA)
R.D. Sharp, AT&T Labs - Research (USA)
Jeremy H. Wright, AT&T Labs - Research (USA)
S. Marcus, International Asset Systems Ltd (USA)
J. Shaffer, Mississippi State University (USA)
R. Duncan, Mississippi State University (USA)
J.G. Wilpon, AT&T Labs - Research (USA)

Page (NA) Paper number 877

Abstract:

VPQ (Voice Post Query) is a dialog system for spoken access to information in the AT&T personnel database (>120,000 entries). An explicit design goal is for the initial interaction with the system to be rather unconstrained and to rely on tighter, prompt-constrained, dialog only when absolutely necessary. The purpose of VPQ is to explore and exploit the capabilities of ``state-of-the-art'' speech recognition systems for this high-perplexity task, and to develop the natural language understanding and dialog control components necessary for effective and efficient user interactions. The VPQ task includes simple interactions, where the initial request is unambiguous and the system's response provides the desired information or completes a call to the requested person, as well as more complex interactions where ambiguities or errors require dialog-driven resolution. This paper highlights the inherent challenges in this task, the major components of the system, the rationale for their design, and how they perform.

SL980877.PDF (From Author) SL980877.PDF (Rasterized)

TOP


SCAN - Speech Content Based Audio Navigator: A System Overview

Authors:

John Choi, AT&T Labs - Research (USA)
Don Hindle, AT&T Labs - Research (USA)
Julia Hirschberg, AT&T Labs - Research (USA)
Ivan Magrin-Chagnolleau, AT&T Labs - Research (USA)
Christine H. Nakatani, AT&T Labs - Research (USA)
Fernando Pereira, AT&T Labs - Research (USA)
Amit Singhal, AT&T Labs - Research (USA)
Steve Whittaker, AT&T Labs - Research (USA)

Page (NA) Paper number 604

Abstract:

SCAN (Speech Content based Audio Navigator) is a spoken document retrieval system integrating speaker-independent, large-vocabulary speech recognition with information-retrieval to support query-based retrieval of information from speech archives. Initial development focused on the application of SCAN to the broadcast news domain. This paper provides an overview of this system, including a description of its graphical user interface which incorporates machine-generated speech transcripts to provide local contextual navigation and random access for browsing large speech databases.

SL980604.PDF (From Author) SL980604.PDF (Rasterized)

TOP


Controlling a HIFI With a Continuous Speech Understanding System

Authors:

Javier Ferreiros, GTH-IEL-UPM (Spain)
José Colás, GTH-IEL-UPM (Spain)
Javier Macías-Guarasa, GTH-IEL-UPM (Spain)
Alejandro Ruiz, GTH-IEL-UPM (Spain)
José Manuel Pardo, GTH-IEL-UPM (Spain)

Page (NA) Paper number 988

Abstract:

In this paper we present a speech understanding system that accepts continuous speech sentences as input to command a HIFI set. The string of words obtained from the recogniser is sent to the understanding system that tries to fill in a set of frames specifying the triplet (SUBSYSTEM, PARAMETER, VALUE). All circumstances (understanding incompleteness, HIFI set status, result of the command execution) are confirmed back to the user via a text to speech system with substitutable-concept pattern-based generated messages. The understanding engine is based on semantic-like tagging, including "garbage" tag, and context-dependent rules for meaning extraction. The system allows the application developer to follow the reasoning process (as every understanding rule has an associated concept pattern), spoken by the speech generation module. The concepts for speech generation are randomly substituted with alternative expressions having the same meaning to achieve a certain degree of naturalness in the response speech.

SL980988.PDF (From Author) SL980988.PDF (Rasterized)

TOP


User Evaluation Of The Mask Kiosk

Authors:

Lori F. Lamel, LIMSI/CNRS (France)
Samir Bennacef, LIMSI/CNRS (France)
Jean-Luc Gauvain, LIMSI/CNRS (France)
Herve Dartigues, SNCF (France)
Jean-Noel Temem, SNCF (France)

Page (NA) Paper number 85

Abstract:

In this paper we report on a series of user trials carried out to assess the performance and usability of the MASK prototype kiosk. The aim of the ESPRIT Multimodal Multimedia Service Kiosk (MASK) project was to pave the way for more advanced public service applications with user interfaces employing multimodal, multi-media input and output. The prototype kiosk, was developed after analysis of the technological requirements in the context of users and the tasks they perform in carrying out travel enquiries, in close collaboration with the French Railways (SNCF) and the Ergonomics group at UCL. The time to complete the transaction with the MASK kiosk is reduced by about 30% compared to that required for the standard kiosk, and the success rate is 85% for novices and 94% once familiar with the system. In addition to meeting or exceeding the performance goals set at the project onset in terms of success rate, transaction time, and user satisfaction, the MASK kiosk was judged to be user-friendly and simple to use.

SL980085.PDF (From Author) SL980085.PDF (Rasterized)

TOP