Authors:
Masahiro Araki, Kyoto University (Japan)
Shuji Doshita, Kyoto University (Japan)
Page (NA) Paper number 729
Abstract:
In this paper, we propose a robust processing model of spoken dialogue.
Our dialogue model is a cognitive process model (1) which integrates
stepwise processing from utterance understanding to response generation,
(2) which specifies the interactions between the processing of each
steps and two level dialogue management mechanism, and (3) which identifies
the possible errors caused by speech recognition error and specifies
the method of recovering from the error. Also, We examined the validity
of this model using new evaluation paradigm: system-to-system dialogue
with linguistic noise. By this evaluation, the robustness of proposed
cognitive process model is shown in relatively low recognition error
situation.
Authors:
Tom Brøndsted, Center for PersonKommunikation (Denmark)
Bo Nygaard Bai, Center for PersonKommunikation (Denmark)
Jesper Østergaard Olsen, Center for PersonKommunikation (Denmark)
Page (NA) Paper number 811
Abstract:
The paper describes the platform for building spoken language systems
being designed and implemented within the EU-language engineering project
REWARD. The platform collects and streamlines a set of software tools
such that they together constitute the basic modules needed to enable
dialogue developers to establish new dialogue applications with only
minimal knowledge outside their own field of experience and within
a minimum amount of time. The system differs from other platforms,
as non-expert users have been strongly involved in the design phase.
Authors:
Matthew Bull, Human Communication Research Centre, University of Edinburgh (U.K.)
Matthew Aylett, Human Communication Research Centre, University of Edinburgh (U.K.)
Page (NA) Paper number 790
Abstract:
This paper presents a context-based analysis of the intervals between
different speakers' utterances in a corpus of task-oriented dialogue
(the Human Communication Research Centre's Map Task Corpus). In the
analysis, we assessed the relationship between inter-speaker intervals
and various contextual factors, such as the effects of eye contact,
the presence of conversational game boundaries, the category of move
in an utterance, and the degree of experience with the task in hand.
The results of the analysis indicated that the main factors which
gave rise to significant differences in inter-speaker intervals were
those which related to decision-making and planning - the greater the
amount of planning, the greater the inter-speaker interval. Differences
between speakers were also found to be significant, although this effect
did not necessarily interact with all other effects. These results
provide unique and useful data for the improved effectiveness of dialogue
systems.
Authors:
Sarah Davies, Human Communication Research Centre, University of Edinburgh (U.K.)
Massimo Poesio, Human Communication Research Centre, University of Edinburgh (U.K.)
Page (NA) Paper number 813
Abstract:
In this paper we report on the development of a spoken dialogue system
for computer aided language learning (CALL), and explore some of the
issues involved in the incorporation of a corrective feedback module.
We initially developed a small prototype system, and tested it for
usability with visiting students of English as a foreign language.
In the light of the positive results we obtained for this, we began
to develop a more advanced system, with the aim of investigating how
spoken dialogue systems might best be tailored to help language learning.
The issue we focussed on was the kind of feedback on errors which might
be most useful to the learner. We show the types of feedback we have
considered, and highlight some of the problems associated with providing
different types of feedback.
Authors:
Laurence Devillers, LIMSI/CNRS (France)
Helene Bonneau-Maynard, LIMSI/CNRS (France)
Page (NA) Paper number 378
Abstract:
In this paper, we describe the evaluation of the dialog management
and response generation strategies being developed for retrieval of
touristic information, selected as a common domain for the ARC-AUPELF-B2-action.
Comparing and evaluating different strategies is a difficult task,
which often remains unexplored, because in most cases evaluation approaches
require a unified database structure and efficient integration of data
from several disparate sources and forms. To avoid this problem, we
implemented two dialog strategy versions within the same general platform.
We investigate qualitative and quantitative criteria for evaluation
of these dialog control strategies: in particular, by testing the efficiency
of our system with and without automatic mechanisms for guiding the
user via suggestive prompts. An evaluation phase has been carried
out to assess the utility of guiding the user with 32 naive and experienced
subjects. The experiments show that user guidance is appropriate for
novices and appreciated by all users.
Authors:
Sadaoki Furui, Tokyo Institute of Technology (Japan)
Koh'ichiro Yamaguchi, Tokyo Institute of Technology (Japan)
Page (NA) Paper number 36
Abstract:
This paper introduces a paradigm for designing multimodal dialogue
systems. An example system task of the system is to retrieve particular
information about different shops in the Tokyo Metropolitan area, such
as their names, addresses and phone numbers. The system accepts speech
and screen touching as input, and presents retrieved information on
a screen display. The speech recognition part is modeled by the FSN
(finite state network) consisting of keywords and fillers, both of
which are implemented by the DAWG (directed acyclic word-graph) structure.
The number of keywords is 306, consisting of district names and business
names. The fillers accept roughly 100,000 non-keywords/phrases occuring
in spontaneous speech. A variety of dialogue strategies are designed
and evaluated based on an objective cost function having a set of actions
and states as parameters. Expected dialogue cost is calculated for
each strategy, and the best strategy is selected according to the keyword
recognition accuracy.
Authors:
Dinghua Guan, Institute of Acoustics, Chinese Academy of Sciences (China)
Min Chu, Institute of Acoustics, Chinese Academy of Sciences (China)
Quan Zhang, Institute of Acoustics, Chinese Academy of Sciences (China)
Jian Liu, Institute of Acoustics, Chinese Academy of Sciences (China)
Xiangdong Zhang, Institute of Acoustics, Chinese Academy of Sciences (China)
Page (NA) Paper number 245
Abstract:
This paper gives a brief introduction about the five-year research
project of "Man-Computer Dialogue System in Chinese", which was supported
by the Chinese Academy of Sciences. The project is carried out in two
steps. In the first step, research works undertook by several research
groups separately on the core area such as speech recognition, speech
synthesis, language understanding and dialogue organizing module. And
in the second step, all techniques are assembled together to form a
demo dialogue system of traveling information inquiry system. The current
state of all above core areas and some evaluation results are discussed
in the first part of this paper and the framework of the traveling
information inquiry system is presented in the second part.
Authors:
Kate S. Hone, ICL Institute of Information Technology, University of Nottingham (U.K.)
David Golightly, ICL Institute of Information Technology, University of Nottingham (U.K.)
Page (NA) Paper number 519
Abstract:
An experiment was conducted to investigate the effects of vocabulary
constraints and syntax on human interactions with a speech interactive
system. Three dialogue styles for a telephone banking application,
all using constrained vocabularies, were compared: yes/no, menu and
query prompts. These styles differ both in the degree of vocabulary
constraint, and in how that constraint is communicated to the user.
It was found that although i t involved more dialogue steps the yes/no
interaction style was the most effective in terms of both task completion
rates and performance time. The query strategy was least preferred
by users.
Authors:
Tatsuya Iwase, University of Tokyo (Japan)
Nigel Ward, University of Tokyo (Japan)
Page (NA) Paper number 224
Abstract:
To make human-computer dialog as `natural' as human-human dialog requires
paying attention to the timing of utterances. This is done with reference
to responses from the listener, in particular back-channel feedback,
questions and mumbles. On the basis of corpus analysis, We have made
direction-giving dialog system which adjust the pace of dialog using
only prosodic information without using speech recognition; no word
recognition was used. We contrived a method to evaluate a dialog system
talking to human naturally with prosodic information. To evaluate the
naturalness of dialog made by our system, we made three experiment
with 10 subjects each. The system accomplished natural dialog, and
most of subjects weren't aware that it was a computer. This fact, that
reasonably good performance was obtained by paying attention to prosodic
information alone, indicates the utility of using prosody in producing
appropriate timing in dialog. This confirms a commonly held belief.
Authors:
Annika Flycht-Eriksson, Department of Computer and Information Science, Linköping University (Sweden)
Arne Jönsson, Department of Computer and Information Science, Linköping University (Sweden)
Page (NA) Paper number 479
Abstract:
Spatial reasoning plays an important role in many spoken dialogue systems.
One application area where it is especially important is timetable
information for local bus traffic. Users of such systems often request
information based on vague spatial descriptions and a usable system
must be able to handle this. We have extended a dialogue system with
abilities to transform vague spatial expressions into a form that can
be used to access the information base. In our approach we use the
power of a Geographical Information System (GIS) for the spatial reasoning.
Authors:
Candace A. Kamm, AT&T Labs - Research (USA)
Diane J. Litman, AT&T Labs - Research (USA)
Marilyn A. Walker, AT&T Labs - Research (USA)
Page (NA) Paper number 883
Abstract:
One challenge for current spoken dialogue systems is how to make the
limitations of the system (vocabulary, grammar, and application domain)
apparent to users. This study explored the use of a 4-minute tutorial
session to acquaint novice users with features of a spoken dialogue
system for accessing email. On three scenario-based tasks, novice users
who had the tutorial had task completion times and user satisfaction
ratings that were comparable to those of expert users. Novices who
did not experience the tutorial had significantly longer task completion
times on the initial task, but similar completion times to the tutorial
group on the final task. User satisfaction ratings of the no-tutorial
group were consistently lower than the ratings of the other two groups.
Evaluation using the PARADISE framework indicated that perceived task
completion, mean recognition score, and number of help requests were
significant predictors of user satisfaction with the system.
Authors:
Takeshi Kawabata, NTT Basic Research Laboratories (Japan)
Page (NA) Paper number 143
Abstract:
This paper proposes a new dialogue management architecture for human-machine
speech communication systems. In our daily speech communication, incremental,
non-deterministic and quick-response behaviors are required for effortless
information interchange. Emergent computational architectures, proposed
in the robot control domain, are promising to enable such features.
The dialogue manager (ECL-DIALOG) consists of multiple "phrase pattern"
detectors as input sensors. The CFG driven phrase detectors search
for phrase patterns in user utterances and generates numerous emergent
slot-filling signals. The system integrates them according to their
"phrase pattern" priorities and updates the current task-completion
context. When a slot value is updated, the system generates an appropriate
response. For example, when the system finds a new slot value from
user utterances, the system generates a chiming utterance "yeah". When
the context slot is replaced by a different value "Tuesday" that has
a lower priority, the system asks for confirmation "On Tuesday?".
Authors:
Tadahiko Kumamoto, Communications Research Laboratory, MPT of Japan (Japan)
Akira Ito, Yamagata University (Japan)
Page (NA) Paper number 493
Abstract:
Many researchers have been developing natural language dialogue systems
as a human-friendly man-machine interface. The human factors in a man-machine
dialogue, however, are not obvious enough to understand with regard
to how people talk with a dialogue system. 141 dialogues which our
dialogue system had in DiaLeague '97 were analyzed at the utterance
and dialogue levels, where DiaLeague '97 was the second dialogue contest
in which a dialogue system engaged in a dialogue with a human in order
to solve a specific problem. For the analyses at the utterance level,
we investigated the users' speaking styles, the richness of the users'
utterances in a variety of surface patterns, and the influence of the
system's utterance pattern on the users' utterance. For the analyses
at the dialogue level, we investigated the instances of confusion observed
in the 141 dialogues and we also show how the users behaved when the
confusion occurred.
Authors:
Michael F. McTear, University of Ulster (U.K.)
Page (NA) Paper number 545
Abstract:
The development of a spoken dialogue system is a complex process involving
the integration of several component technologies. Various toolkits
and authoring environments have been produced that provide assistance
with this process. This paper reports on several projects involving
CSLU's RAD (Rapid Application Developer) and critically evaluates the
applicability of state transition diagrams for modelling different
types of spoken dialogue. State transition methods have been recommended
for dialogues that involve well-structured tasks that can be mapped
directly on to a dialogue structure. However, other significant factors
to be considered include the structure of the information to be transacted
and the need for verification of the user's input as determined by
the system's level of recognition accuracy. Examples of different types
of dialogue are presented together with recommendations concerning
the advantages and disadvantages of state transition based dialogue
control.
Authors:
Michio Okada, ATR Media Integration & Communications Research Laboratories (Japan)
Noriko Suzuki, ATR Media Integration & Communications Research Laboratories (Japan)
Jacques Terken, IPO, Center for Research on User-System Interaction, Eindhoven University of Technology (The Netherlands)
Page (NA) Paper number 801
Abstract:
In this paper, we present a general framework and architecture for
maintaining dialogue coordination in spoken dialogue systems, in which
intended behaviors and goals are incrementally performed during the
course of maintaining dialogue coordination. The dialogue structure
emerges as a result from interaction between user and the dialogue
system. The key feature of this design for the systems is to use multiple
situated-agents for coordinating communicative acts that are realized
as a hierarchy of autonomous behaviors by using a subsumption architecture.
In this architecture it should be noted that the lower-level behaviors
act autonomously for maintaining the dialogue coordination and are
linked to the specifications from higher-level behaviors for dialogue
management. In order to make the behavior of the system social, in
general, the maintaining of dialogue coordination takes priority over
the realization of intended goals of the system as a dialogue participant.
We introduce an under-specification strategy for controlling the preference
of the concurrent behaviors. This is in contrast to the classical,
top-down approach to dialogue coordination.
Authors:
Xavier Pouteau, IPO, Center for Research on User-System Interaction (The Netherlands)
Luis Arévalo, Robert Bosch GmbH, Corporate R&D (Germany)
Page (NA) Paper number 368
Abstract:
In this paper, we report the significant results of a fully-implemented
voice operated dialogue system, and particularly its main component:
the Dialogue Manager (DM). Just like for other interfaces, spoken interfaces
require a well-conducted design, implying a good analysis of the users'
needs throughout the dialogue. The VODIS project 1 has led to the design
and development of a spoken interface for the control of car equipment.
Due to the workload caused by the task of driving the vehicle, spoken
communication provides a potentially safe and efficient mode of operating
the car equipment. To achieve this, we present the main characteristics
of the task model specified during the design stage, and show how its
specific features related to the spoken communication allowed to implement
a robust dialogue.
Authors:
Daniel Willett, Gerhard-Mercator-University, Duisburg (Germany)
Arno Römer, Gerhard-Mercator-University, Duisburg (Germany)
Jörg Rottland, Gerhard-Mercator-University, Duisburg (Germany)
Gerhard Rigoll, Gerhard-Mercator-University, Duisburg (Germany)
Page (NA) Paper number 524
Abstract:
In this paper, we present the basic design principles and architecture
of a dialogue system for scheduling appointments. This mixed-initiative
dialogue system integrates an automatic speaker-independent speech
recognition engine for continuously spoken German, a speech synthesizer
and a scheduler database application to build up a scheduler that is
purely driven by natural continuous speech and thus, does not need
any visual display device. With these properties it is a prototype
for a speech driven palm-size computer application and could be integrated
in miniature computers that come along with no display device at all.
Authors:
Chung-Hsien Wu, National Cheng Kung University (China)
Gwo-Lang Yan, National Cheng Kung University (China)
Chien-Liang Lin, National Cheng Kung University (China)
Page (NA) Paper number 219
Abstract:
In a spoken dialogue system, the intention is the most important component
for speech understanding. In this paper, we propose a corpus-based
hidden Markov model (HMM) to model the intention of a sentence. Each
intention is represented by a sequence of word segment categories determined
by a task-specific lexicon and a corpus. In the training procedure,
five intention HMM's are defined, each representing one intention in
our approach. In the intention identification process, the phrase sequence
is fed to each intention HMM. Given a speech utterance, the Viterbi
algorithm is used to find the most likely intention sequences. The
intention HMM considers not only the phrase frequency but also the
syntactic and semantic structure in a phrase sequence. In order to
evaluate the proposed method, a spoken dialogue model for air travel
information service is investigated. The experiments were carried out
using a test database from 25 speakers (15 male and 10 female). There
are 120 dialogues, which contain 725 sentences in the test database.
The experimental results show that the correct response rate can achieve
about 80.3% using intention HMM.
Authors:
Peter Wyard, BT Labs (U.K.)
Gavin Churcher, BT Labs (U.K.)
Page (NA) Paper number 556
Abstract:
This paper describes a Wizard of Oz (WOZ) system that allows the realistic
simulation of a multimodal spoken language system. A Wizard protocol
has been drawn up which means that the WOZ system will simulate the
limitations of an automatic system rather than allow the user to engage
in the full range of human-human dialogue. In support of this protocol
is a sophisticated Wizard response panel and underlying response generation
functionality. This enables the Wizard to respond to complex multimodal
inputs in near real-time. The chosen application is a 3D retail service,
in which users can select furnishings from a database according to
colour, pattern, fabric type, etc., transfer furnishings to objects
in a virtual showroom, ask about prices and matching of fabrics, etc.
The system includes a "virtual assistant", i.e. a synthetic persona
which speaks the verbal system output. Users make their input by a
combination of fluent speech and touchscreen input. The paper describes
a formal trial carried out with the WOZ system, and discusses the results.
Authors:
Yen-Ju Yang, Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan (Taiwan)
Lin-Shan Lee, Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan (Taiwan)
Page (NA) Paper number 528
Abstract:
This paper presents a syllable-based Chinese spoken dialogue system
for telephone directory services primarily trained with a corpus. It
integrates automatic phrase extraction, robust phrase spotting, statistics-based
semantic parsing by phrase-concept joint language model as well as
concept-based dialogue model, and intention identification by probabilistic
finite state network to form a speech intention estimator. By applying
the proposed techniques, the concept sequence with the maximum a-posteriori
(MAP) probability based on intra and inter sentence consideration conveyed
in the user's speech sentence, i.e. the speaker's intention, can be
identified. This approach is convenient to be trained by a given corpus
and flexible to be ported to different dialogue tasks. Incorporate
a mixed-initiative goal-oriented dialogue manager, we have successfully
developed a dialogue system for telephone directory service. Very
promising results have been obtained in on-line tests.
Authors:
Hiroyuki Yano, Communications Research Laboratory (Japan)
Akira Ito, Faculty of Engineering, Yamagata University (Japan)
Page (NA) Paper number 719
Abstract:
Analysis was made of disagreement expressions in dialogues recorded
in a cooperative task experiment. A disagreement expression is defined
as the latter utterance of consecutive utterances, which shows disagreement
with the former. Subjects used two types of disagreement expressions:
to the partner's utterance, and to their own. These were classified
into three subtypes according to part of speech: conjunction, interjection,
and content word. The role of disagreement expressions in cooperative
tasks was examined. It was found that subjects used disagreement expressions
suitable to the occasion to maintain good relation with their partners.
It was concluded that using expressions that disagree with one's own
previous utterance is an effective strategy for expressing an opinion
for which one lacks adequate evidence and for eliciting utterances
from one's partner.
|