Multimodal Spoken Language Processing 3

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Referential Features and Linguistic Indirection in Multimodal Language

Authors:

Sharon L. Oviatt, Oregon Graduate Institute (USA)
Karen Kuhn, AlTech (USA)

Page (NA) Paper number 48

Abstract:

The present report outlines differences between multimodal and unimodal communication patterns in linguistic features associated with ease of dialogue tracking and ambiguity resolution. A simulation method was used to collect data while participants used spoken, pen-based, or multimodal input during spatial tasks with a dynamic system. Users' linguistic constructions were analyzed for differences in the rates of reference, co-reference, definite and indefinite referring expressions, and deictic terms. Differences also were summarized in the prevalence of linguistic indirection. Results indicate that spoken language contains substantially higher levels of referring and co-referring expressions and also linguistic indirection, compared with multimodal language communicated by the same users completing the same task. In contrast, multimodal language not only has fewer referential expressions and relatively little anaphora, it also specifically lacks the regular use of determiners observed in spoken definite and indefinite noun phrases. In addition, multimodal language is distinct in its high levels of deictic reference. Implications of these findings are discussed for the relative ease of natural language processing for speech-only versus multimodal systems.

SL980048.PDF (From Author) SL980048.PDF (Rasterized)

TOP


Multimodal Language Processing

Authors:

Michael Johnston, Oregon Graduate Institute (USA)

Page (NA) Paper number 893

Abstract:

Multimodal interfaces enable more natural and effective human-computer interaction by providing multiple channels through which input or output may pass. In order to realize their full potential, they need to support not just input from multiple modes, but synchronized integration of modes. This paper describes a multimodal language processing architecture which allows for declarative statement of multimodal integration strategies in a unification-based grammar formalism. The architecture is currently deployed in a working system supporting interaction with dynamic maps using speech and pen, but the approach is more general and extends to a wide variety of other potential multimodal interfaces.

SL980893.PDF (From Author) SL980893.PDF (Rasterized)

TOP


Implementation of Coordinative Nodding Behavior on Spoken Dialogue Systems

Authors:

Jun-ichi Hirasawa, NTT Basic Research Laboratories (Japan)
Noboru Miyazaki, NTT Basic Research Laboratories (Japan)
Mikio Nakano, NTT Basic Research Laboratories (Japan)
Takeshi Kawabata, NTT Basic Research Laboratories (Japan)

Page (NA) Paper number 158

Abstract:

This paper proposes a mechanism that contributes to the implementation of a spoken dialogue system with which a user can communicate effortlessly. In a dialogue, exchanges between participants promote the establishment of shared information and this leads to effortless communication. This is called "dialogue coordination". In particular, revealing the respondent's internal state, such as through nodding and back-channel feedback, promotes the establishment of shared information. This is called "manifestation", which is one aspect of coordinative behavior, and a mechanism for handling manifestation is introduced. In a human-human dialogue, the listener's manifestative behavior often occurs during a speaker's utterance. However, systems using conventional speech recognition technologies cannot respond during the speaker's utterance. In order to solve this problem, the proposed mechanism, ISTAR protocol transmission, utilizes the intermediate speech recognition results without waiting for the end of the speaker's utterance. This realizes a system with flexible manifestative behavior.

SL980158.PDF (From Author) SL980158.PDF (Rasterized)

0158_01.MOV
(was: 0158.MOV)
This design of the manifestative behavior was implemented, and the dialogue for this implementation is in [MOVIE 0158.MOV] on CD-ROM.
File type: Video File
Format: Quicktime
Tech. description: Unknown
Creating Application:: Unknown
Creating OS: Unknown

TOP


Use of Non-Verbal Information in Communication Between Human and Robot

Authors:

Masao Yokoyama, Waseda Univ. But current working at THOSHIBA (Japan)
Kazumi Aoyama, Waseda Univ. (Japan)
Hideaki Kikuchi, Waseda Univ. (Japan)
Katsuhiko Shirai, Waseda Univ. (Japan)

Page (NA) Paper number 491

Abstract:

In this research, we consider the use of non-verbal information in human-robot dialogue to draw the communication ability of robots close to that of human beings. This paper describes analysis of output timing of non-verbal information for the interactive dialogue between human beings. Moreover, we analyzed influences of output timing by controlling it in dialogue with a CG robot. As a result, we clarify the strength of constraint and naturalness of various types of non-verbal information. We also confirm that appropriate output timing of non-verbal information is the start of utterances. This is the same as in human-human dialogue. As a result, non-verbal information made speaker-change smooth for the CG robot.

SL980491.PDF (From Author) SL980491.PDF (Rasterized)

TOP


What You See is (Almost) What You Hear: Design Principles For User Interfaces For Accessing Speech Archives

Authors:

Steve Whittaker, ATT Labs-Research (USA)
John Choi, ATT Labs-Research (USA)
Julia Hirschberg, ATT Labs-Research (USA)
Christine H. Nakatani, ATT Labs-Research (USA)

Page (NA) Paper number 1002

Abstract:

Despite the recent growth and potential utility of speech archives, we currently lack tools for effective archival access. Previous research on search of textual archives has assumed that the system goal should be to retrieve sets of relevant documents, leaving users to visually scan through those documents to identify relevant information. However, in previous work we show that in accessing real speech archives, it is insufficient to only retrieve "document" sets [9,10]. Users experience huge problems of local navigation in attempting to extract relevant information from within speech "documents". These studies also show that users address these problems by taking handwritten notes. These notes detail both the content of the speech and serve as indices to help access relevant regions of the archive. From these studies we derive a new principle for the design of speech access systems: What You See Is (Almost) What You Hear. We present a new user interface to a broadcast news archive, designed on that principle.

SL981002.PDF (From Author) SL981002.PDF (Rasterized)

1002_01.PDF
(was: 1002_01.gif)
A screen dump of the user interface.
File type: Image File
Format: GIF
Tech. description: Unknown
Creating Application:: Unknown
Creating OS: Unknown

TOP