Multimodal Spoken Language Processing 3

The present report outlines differences between multimodal and unimodal communication patterns in linguistic features associated with ease of dialogue tracking and ambiguity resolution. A simulation method was used to collect data while participants used spoken, pen-based, or multimodal input during spatial tasks with a dynamic system. Users' linguistic constructions were analyzed for differences in the rates of reference, co-reference, definite and indefinite referring expressions, and deictic terms. Differences also were summarized in the prevalence of linguistic indirection. Results indicate that spoken language contains substantially higher levels of referring and co-referring expressions and also linguistic indirection, compared with multimodal language communicated by the same users completing the same task. In contrast, multimodal language not only has fewer referential expressions and relatively little anaphora, it also specifically lacks the regular use of determiners observed in spoken definite and indefinite noun phrases. In addition, multimodal language is distinct in its high levels of deictic reference. Implications of these findings are discussed for the relative ease of natural language processing for speech-only versus multimodal systems.

SL980048.PDF (From Author) SL980048.PDF (Rasterized)

TOP

Multimodal Language Processing

Authors:

Michael Johnston, Oregon Graduate Institute (USA)

Page (NA) Paper number 893

Abstract:

Multimodal interfaces enable more natural and effective human-computer interaction by providing multiple channels through which input or output may pass. In order to realize their full potential, they need to support not just input from multiple modes, but synchronized integration of modes. This paper describes a multimodal language processing architecture which allows for declarative statement of multimodal integration strategies in a unification-based grammar formalism. The architecture is currently deployed in a working system supporting interaction with dynamic maps using speech and pen, but the approach is more general and extends to a wide variety of other potential multimodal interfaces.

SL980893.PDF (From Author) SL980893.PDF (Rasterized)

TOP

Implementation of Coordinative Nodding Behavior on Spoken Dialogue Systems

Authors:

Jun-ichi Hirasawa, NTT Basic Research Laboratories (Japan)
Noboru Miyazaki, NTT Basic Research Laboratories (Japan)
Mikio Nakano, NTT Basic Research Laboratories (Japan)
Takeshi Kawabata, NTT Basic Research Laboratories (Japan)

Page (NA) Paper number 158

Abstract:

This paper proposes a mechanism that contributes to the implementation of a spoken dialogue system with which a user can communicate effortlessly. In a dialogue, exchanges between participants promote the establishment of shared information and this leads to effortless communication. This is called "dialogue coordination". In particular, revealing the respondent's internal state, such as through nodding and back-channel feedback, promotes the establishment of shared information. This is called "manifestation", which is one aspect of coordinative behavior, and a mechanism for handling manifestation is introduced. In a human-human dialogue, the listener's manifestative behavior often occurs during a speaker's utterance. However, systems using conventional speech recognition technologies cannot respond during the speaker's utterance. In order to solve this problem, the proposed mechanism, ISTAR protocol transmission, utilizes the intermediate speech recognition results without waiting for the end of the speaker's utterance. This realizes a system with flexible manifestative behavior.

SL980158.PDF (From Author) SL980158.PDF (Rasterized)

0158_01.MOV

(was: 0158.MOV)

This design of the manifestative behavior was implemented, and the dialogue for this implementation is in [MOVIE 0158.MOV] on CD-ROM.
File type: Video File
Format: Quicktime
Tech. description: Unknown
Creating Application:: Unknown
Creating OS: Unknown

TOP

Use of Non-Verbal Information in Communication Between Human and Robot

Authors:

Masao Yokoyama, Waseda Univ. But current working at THOSHIBA (Japan)
Kazumi Aoyama, Waseda Univ. (Japan)
Hideaki Kikuchi, Waseda Univ. (Japan)
Katsuhiko Shirai, Waseda Univ. (Japan)

Page (NA) Paper number 491

Abstract:

In this research, we consider the use of non-verbal information in human-robot dialogue to draw the communication ability of robots close to that of human beings. This paper describes analysis of output timing of non-verbal information for the interactive dialogue between human beings. Moreover, we analyzed influences of output timing by controlling it in dialogue with a CG robot. As a result, we clarify the strength of constraint and naturalness of various types of non-verbal information. We also confirm that appropriate output timing of non-verbal information is the start of utterances. This is the same as in human-human dialogue. As a result, non-verbal information made speaker-change smooth for the CG robot.

SL980491.PDF (From Author) SL980491.PDF (Rasterized)

TOP

What You See is (Almost) What You Hear: Design Principles For User Interfaces For Accessing Speech Archives

Authors:

Steve Whittaker, ATT Labs-Research (USA)
John Choi, ATT Labs-Research (USA)
Julia Hirschberg, ATT Labs-Research (USA)
Christine H. Nakatani, ATT Labs-Research (USA)

Page (NA) Paper number 1002

Abstract:

Despite the recent growth and potential utility of speech archives, we currently lack tools for effective archival access. Previous research on search of textual archives has assumed that the system goal should be to retrieve sets of relevant documents, leaving users to visually scan through those documents to identify relevant information. However, in previous work we show that in accessing real speech archives, it is insufficient to only retrieve "document" sets [9,10]. Users experience huge problems of local navigation in attempting to extract relevant information from within speech "documents". These studies also show that users address these problems by taking handwritten notes. These notes detail both the content of the speech and serve as indices to help access relevant regions of the archive. From these studies we derive a new principle for the design of speech access systems: What You See Is (Almost) What You Hear. We present a new user interface to a broadcast news archive, designed on that principle.

SL981002.PDF (From Author) SL981002.PDF (Rasterized)

1002_01.PDF

(was: 1002_01.gif)

A screen dump of the user interface.
File type: Image File
Format: GIF
Tech. description: Unknown
Creating Application:: Unknown
Creating OS: Unknown