Authors:
Kuansan Wang, Microsoft Research (USA)
Page (NA) Paper number 888
Abstract:
This paper reports our progress in building a mixed initiative, goal-oriented
dialogue system for human machine interactions. The dialogue model
embraces the spirits of the so-called plan-based approach in that the
dialogue flow is not statically authored but dynamically generated
by the system as a natural outcome of the semantic evaluation process.
With multimodal applications in mind, the dialogue system is designed
in an event driven architecture that is commonly seen in the core of
a graphical user interface (GUI) environment. In the same manner that
GUI events are handled by graphical objects, the proposed dialogue
model assigns dialogue events to semantic objects that encapsulate
the knowledge for handling events under various discourse contexts.
Currently, we have found that four types of events, namely, dialogue
object instantiation, semantic evaluation, dialogue repair, and discourse
binding, are sufficient for a wide range of applications.
Authors:
Cosmin Popovici, I.C.I. - Institutul de Cercetari in Informatica (Romania)
Paolo Baggia, CSELT - Centro Studi e Laboratori Telecomunicazioni (Italy)
Pietro Laface, Politecnico di Torino (Italy)
Loreta Moisa, Politecnico di Torino (Italy)
Page (NA) Paper number 552
Abstract:
This paper exploits the broad concept of dialogue predictions by linking
a point in a human-machine dialogue with a specific language model
which is used during the recognition of the next user utterance. The
idea is to cluster several dialogue contexts into a class and to create
for each class a specific language model. We present an automatic
algorithm based on the minimal decrease of mutual information which
clusters the dialogue contexts. Moreover the algorithm is able to guess
an appropriate number of classes, that gives a good trade off between
the mutual information and the amount of training data. Therefore
the automatic classification allows the full automatic creation of
context-dependent language models for a spoken dialogue system.
Authors:
Ganesh N. Ramaswamy, I.B.M. Research Center (USA)
Jan Kleindienst, I.B.M. Research Center (USA)
Page (NA) Paper number 612
Abstract:
In this paper, we propose a trainable system that can automatically
identify the command boundaries in a conversational natural language
user interface. The proposed solution makes the conversational interface
much more user friendly, and allows the user to speak naturally and
continuously in a hands-free manner. The main ingredient of the system
is the maximum entropy identification model, which is trained using
data that has all the correct command boundaries marked. During training,
a set of features and their weights are selected iteratively using
the training data. The features consists of words and phrases, as well
as their relative position to the potential command boundaries. Decoding
is done by examining the product of the weights for the features that
are present. We also propose several enhancements to the approach,
such as combining it with a more effective language model at the speech
recognition stage to generate additional tokens for the identification
model. We conducted several experiments to evaluate the proposed approach,
and the results are described.
Authors:
Massimo Poesio, University of Edinburgh, HCRC (U.K.)
Andrei Mikheev, Harlequin (U.K.)
Page (NA) Paper number 606
Abstract:
Recognizing the dialogue act(s) performed by means of an utterance
involves combining top-down expectations about the next likely `move'
in a dialogue with bottom-up information extracted from the speech
signal. We compared two ways of generating expectations: one which
makes the expectations depend only on the previous act (as in a bigram
model), and one which also takes into account the fact that individual
dialogue acts play a role as part of larger conversational structures
(`games'). Our models were built by training over the HCRC MapTask
corpus using the LTG implementation of maximum entropy estimation.
We achieved an accuracy of 38.6% using bigrams, of 50.6% taking game
structure into account; adding information about speaker change resulted
in an accuracy of 41.8% with bigrams, 54% with game structure. These
results indicate that exploiting game structure does lead to improved
expectations.
Authors:
Paul C. Constantinides, Carnegie Mellon University (USA)
Scott Hansma, Carnegie Mellon University (USA)
Chris Tchou, Carnegie Mellon University (USA)
Alexander I. Rudnicky, Carnegie Mellon University (USA)
Page (NA) Paper number 637
Abstract:
Frame-based approaches to spoken language interaction work well for
limited tasks such as information access, given that the goal of the
interaction is to construct a correct query then execute it. More complex
tasks, however, can benefit from more active system participation.
We describe two mechanisms that provide this, a modified stack that
allows the system to track multiple topics, and form-specific schema
that allow the system to deal with tasks that involve completion of
multiple forms. Domain-dependent schema specify system behavior and
are executed by a domain-independent engine. We describe implementations
for a personal calendar system and for an air travel planning system.
Authors:
Gregory Aist, Language Technologies Institute, Carnegie Mellon University (USA)
Page (NA) Paper number 928
Abstract:
Turn taking in spoken language systems has generally been push-to-talk
or strict alternation (user speaks, system speaks, user speaks, ...)
with some systems such as telephone-based systems handling barge-in
(interruption by the user.) In this paper we describe our time sensitive
conversational architecture for turn taking that not only allows alternating
turns and barge in, but other conversational behaviors as well. This
architecture allows backchanneling, prompting the user by taking more
than one turn if necessary, and overlapping speech. The architecture
is implemented in a Reading Tutor that listens to children read aloud,
and helps them. We extended this architecture to allow the Reading
Tutor to interrupt the student based on a non-self-corrected mistake
- "content-driven interruption". To the best of our knowledge, the
Reading Tutor is thus the first spoken language system to intentionally
interrupt the user based on the content of the utterance.
|