Session ThMC Dialogue Systems:Linguistic Structures, Modelling and Evaluation

Chairperson Niels Ole Bernsen Roskilde Univ. , Denmark

Home

DATABASE MANAGEMENT AND ANALYSIS FOR SPOKEN DIALOG SYSTEMS: METHODOLOGY AND TOOLS

Authors: Chih-mei Lin, Shrikanth Narayanan, Russell Ritenour

AT&T Labs - Research 180 Park Avenue, F]orham Park, NJ 07932, USA email: {cmlin,shri,rit}@research.att.com

Volume 4 pages 2199 - 2202

ABSTRACT

A methodology for creating and managing an integrated database for spoken dialog systems is proposed. Using an example of a telecommunication service application, details of organizing, maintaining, and visualizing the dialog system data are presented. Examples illustrating the use ofthe unified database structure for dialog reproduction and performance evaluation are provided.

A0161.pdf

TOP

EVALUATING SPOKEN DIALOG SYSTEMS FOR TELECOMMUNICATION SERVICES

Authors: Candace Kamm, Shrikanth Narayanan, Dawn Dutton, Russell Ritenour

AT&T Labs--Research 180 Park Avenue, Florham Park, NJ 07932 cak@research.att.com shri@research.att.com dldutton@att.com rit@research.att.com

Volume 4 pages 2203 - 2206

ABSTRACT

This paper presents a case study analyzing the results of an on-going trial of a prototype mixed-initiative spoken dialog system for telephony control and messaging. System usage and performance data were captured at three points in time. Information from multiple data sources, including spoken utterances, system call logs, speech recognizer output, and subjective surveys was evaluated to determine the relationship between aspects of system performance and user perceptions of the system. This report provides several examples using these data sources in combination to identify key areas to focus on in modifying the system, application, and/or user interface in order to significantly improve system usability and user satisfaction.

A0189.pdf

TOP

Robust Spoken Dialogue Management for Driver Information Systems

Authors: Xavier Pouteau, Emiel Krahmer, Jan Landsbergen

E-mail: {pouteau/krahmer/landsbrn}@ipo.tue.nl IPO, Center for Research on User-System Interaction P.O. Box 513, 5612 MB Eindhoven, The Netherlands

Volume 4 pages 2207 - 2210

ABSTRACT

Considering the limitations of Speech Recognition for the development of user-system dialogues in real applications, robustness is a primary objective. In this paper, we describe the most essential characteristics of the Dialogue Manager of a driver infonnation system that is controlled by voice, mainly showing how its design has been driven by the characteristics of voice in such a dialogue. We present the main methods used by the Dialogue Manager to come to an effec- tive balance between robustness and efficiency. We illustrate them with examples from the first implementation of the system.

A0368.pdf

TOP

USING ACOUSTIC AND PROSODIC CUES TO CORRECT CHINESE SPEECH REPAIRS

Authors: Yue-Shi Lee and Hsin-Hsi Chen

Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan, Republic of China E-mail: {leeys, hh_chen}@csie.ntu.edu.tw

Volume 4 pages 2211 - 2214

ABSTRACT

Speech repairs introduce much noise in spoken language processing. Properly correcting speech repairs can help the speech recognizer to avoid the textual errors, and prevent the interpretation errors during the subsequent processing. Because the task of repair processing cannot defer to the latter (word segmentation, part-of-speech tagging and sentence parsing) stages, this paper employs acoustic and prosodic cues to correct Chinese repetition and addition repairs. The experimental results show that the precision rate of 93.87% (76.09%) and the recall rate of 90.65% (70%) can be achieved for correcting Chinese repetition (addition) repairs.

A0454.pdf

TOP

INTEGRATING DOMAIN SPECIFIC FOCUSING IN DIALOGUE MODELS

Authors: Nils Dahlbäck and Arne Jönsson

Department of Computer and Information Science Linköping University, S-581 83, LINKÖPING, SWEDEN nilda@ida.liu.se, arnjo@ida.liu.se

Volume 4 pages 2215 - 2218

ABSTRACT

Natural language interaction requires dialogue models that allow for efficient and robust human computer interaction. Most systems today use some kind of speech-act based dialogue model. While successful in a number of applications, these models have known limitations, both from linguistic and computational points of view, which has led a number of workers to suggest using the dialogue participants goals/intentions to model the dialogue. In this paper we suggest that amending speech act based models with sophisticated domain knowledge makes it possible to extend their applicability. Two kinds of domain knowledge are identified, one is the Domain Model; a structure of the discourse 'world', and the other is the Conceptual Model which contains domain specific general information about the concepts and their relationships in the domain. These extensions have been utilized in the LINLIN dialogue manager and the paper presents results from customizing the dialogue manager to two different applications.

A0508.pdf

TOP

EVALUATING COMPETING AGENT STRATEGIES FOR A VOICE EMAIL AGENT

Authors: Marilyn Walker, Donald Hindle, Jeanne Fromer, Giuseppe Di Fabbrizio, Craig Mestel

AT&T Labs -- Research 180 Park Ave, Florham Park, NJ 07932, USA email: fwalker,hindle,pinog@research.att.com

Volume 4 pages 2219 - 2222

ABSTRACT

This paper reports experimental results comparing a mixed-initiative to a system-initiative dialog strategy in the context of a personal voice email agent. To independently test the effects of dialog strategy and user expertise, users interact with either the system-initiative or the mixed-initiative agent to perform three successive tasks which are identical for both agents. We report performancecomparisonsacross agent strategies as well as over tasks. This evaluation utilizes and tests the PARADISE evaluation framework, and discusses the performance function derivable from the experimental data.

A0516.pdf

TOP

DISCOURSE MARKER USE IN TASK-ORIENTED SPOKEN DIALOG \Lambda

Authors: Donna K. Byron Peter A. Heeman

Department of Computer Science University of Rochester Rochester NY 14604, U.S.A. dbyron@cs.rochester.edu France Télécom CNET Technopole Anticipa - 2 Avenue Pierre Marzin 22301 Lannion Cedex, France. heeman@lannion.cnet.fr

Volume 4 pages 2223 - 2226

ABSTRACT

Discourse markers, also known as cue words, are used exten- sively in human-human task-oriented dialogs to signal the struc- ture of the discourse. Previous work showed their importance in monologues for marking discourse structure, but little attention has been paid to their importance in spoken dialog systems. This paper investigates what discourse markers signal about the up- coming speech, and when they tend to be used in task-oriented dialog. We demonstrate that there is a high correlation between specific discourse markers and specific conversational moves, between discourse marker use and adjacency pairs, and between discourse markers and the speaker's orientation to information presented in the prior turn.

A0565.pdf

TOP

FROM INTERFACE TO CONTENT: TRANSLINGUAL ACCESS AND DELIVERY OF ON-LINE INFORMATION

Authors: Victor Zue, Stephanie Seneff, James Glass, Lee Hetherington, Edward Hurley, Helen Meng, Christine Pao, Joseph Polifroni, Rafael Schloming, Philipp Schmid

Spoken Language Systems Group, Laboratory for Computer Science Massachusetts Institute of Technology, Cambridge, MA 02139 USA http://www.sls.lcs.mit.edu

Volume 4 pages 2227 - 2230

ABSTRACT

This paper describes our initial implementation of a sys- tem to provide world-wide weather information over the telephone. The information is gathered from several dif- ferent sites on the Web, preprocessed, and cached locally into a relational database to make access both fast and selective. Our natural language tools, originally devel- oped for processing user queries, are used here for under- standing content, and for subsequently translating it into languages other than English. The system is operational, and we have been collecting data from real users via a toll-free number. We report here on an initial evaluation both of the full system in English and of the quality of the responses in German.

A0608.pdf

TOP

LEARNING DIALOGUE STRUCTURES FROM A CORPUS

Authors: Jan Alexandersson and Norbert Reithinger

DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbr¨ucken, Germany E-Mail: falexandersson,reithingerg@dfki.de

Volume 4 pages 2231 - 2234

ABSTRACT

This paper demonstrates some aspects of a plan processor which is a subcomponent of the dialogue module of verb- mobil. We describe how we transfer results from the re- search area of grammar extraction for the semi-automatic acquisition of plan operators for turn classes. We exploit statistical knowledge acquired during learning the gram- mar and incorporate top down predictions to enhance the correct analysis of turn classes described. A first evalua- tion shows a relative recognition rate of around 70% on unseen data.

A0667.pdf

TOP

DIALOGUE ACT CLASSIFICATION USING LANGUAGE MODELS

Authors: Norbert Reithinger and Martin Klesen

DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbr¨ucken, Germany E-Mail: reithinger@dfki.de

Volume 4 pages 2235 - 2238

ABSTRACT

Pragmatically important information as e.g. dialogue acts that describe the illocution of an utterance depend in traditional processing approaches on error prone syntactic/semantic processing. We present a statistically based method for dialogue act classification that has word strings as input. An experimental evaluation shows that this method can be successfully used to determine dialogue acts. The overall recognition rate in the experiments is in the range of 65%--67% for German test data, and 74% for an experiment with English dialogues.

A0677.pdf

TOP

User's Multiple Goals in Spoken Dialogue

Authors: Didier Pernel

Thomson-CSF Laboratoire Central de Recherches 91404 Orsay cedex - FRANCE email: pernel@thomson-lcr.fr

Volume 4 pages 2239 - 2242

ABSTRACT

This paper deals with a problematic not deeply stud- ied as yet: user's goals interaction. A situation of multiple goals occurs as soon as the user utters a new goal whereas the previous one has not been solved yet. We propose an algorithm to identify the kind of multiple goals according to the task state and to the goals themselves. We define ten strategies to process those situations. Three meta-strategies order the strategies relevant for given situations. The system checks the preconditions of strategies to be sure they can be triggered. When a strategy is applied, the system updates the dialogue history and the task state. Some strategies push a goal in a stack and pop it when the first processed goal is fully reached.

A0684.pdf

TOP

Chatting with Interactive Agent

Authors: N.Suzuki, S.Inokuchi, K.Ishii and M.Okada

ATR Media Integration & Communications Research Laboratories Hikaridai, Seika-cho, Kyoto-fu, 619-02 Japan Tel. +81 774 95 1401, FAX: +81 774 95 1408, E-mail: noriko@mic.atr.co.jp

Volume 4 pages 2243 - 2246

ABSTRACT

Conventional spoken dialogue systems are based on goal-oriented techniques(8). The recent expansion of application fields such as cyber space, internet, etc, necessitates the creation of new interaction styles between humans and autonomous agents. Interaction with autonomous agents creates new possibilities for spontaneous conversation in spoken dialogue systems. Within this context, we regard spontaneous, informal chatting behavior as one aspect of spoken dialogue(4)(5). According to this view, an essential property of chatting is the emergence of topics and goals situated within the context of interactions among participants rather than as the result of explicit goals. In this paper, we propose a spoken dialogue system with chatting properties and illustrate sample chatting between a human and a virtual interface agent called Talking Eye using a prototype system.

A0704.pdf

TOP

Generic Template for the evaluation of Dialogue Management Systems

Authors: Gavin E Churcher, Eric S Atwell, Clive Souter

Centre for Computer Analysis of Language And Speech (CCALAS) Artificial Intelligence Division, School of Computer Studies The University of Leeds, LEEDS LS2 9JT, Yorkshire, England gavin@scs.leeds.ac.uk eric@scs.leeds.ac.uk cs@scs.leeds.ac.uk WWW: http://agora.leeds.ac.uk/amalgam/

Volume 4 pages 2247 - 2250

ABSTRACT

We present a generic template for spoken dialogue systems integrating speech recognition and synthesis with 'higher-level' natural language dialogue modelling components. The generic model is abstracted from a number of real application systems targetted at very different domains. Our research aim in developing this generic template is to investigate a new approach to the evaluation of Dialogue Management Systems. Rather than attempting to measure accuracy/speed of output, we propose principles for the evaluation of the underlying theoretical linguistic model of Dialogue Management in a given system, in terms of how well it fits our generic template for Dialogue Management Systems. This is a measure of 'genericness' or 'application-independence' of a given system, which can be used to moderate accuracy/speed scores in comparisons of very unlike DMSs serving different domains. This relates to (but is orthogonal to) Dialogue Management Systems evaluation in terms of naturalness and like measurable metrics; it follows more closely emerging qualitative evaluation techniques for NL grammatical parsing schemes.

A0738.pdf

TOP

ANALYSIS OF INTERACTIVE STRATEGY TO RECOVER FROM MISRECOGNITION OF UTTERANCES INCLUDING MULTIPLE INFORMATION ITEMS

Authors: Yasuhisa NIIMI, Takuya NISHIMOTO and Yutaka KOBAYASHI

Department of Electronics and Information Science, Kyoto Institute of Technology Matsugasaki, Sakyo-ku, Kyoto, 606 JAPAN e-mail: fniimi,nishi,kobag@dj.kit.ac.jp

Volume 4 pages 2251 - 2254

ABSTRACT

This paper proposes and analyzes mathematically an interactive strategy to recover from misrecognition of utterances including multiple information items through a short conversation with a speaker. First the speech recognizer in a dialogue system recognizes an utterance and evaluates the reliability of each item contained in it. The dialogue system accepts only those items of which the reliability is high, while it rejects the items which are unreliably recognized, or confirms the content of them. The paper, given the performance of the recognizer, derives two quantities P ac and N, which can describe the performance of the dialogue system using this interactive strategy: P ac is the probability that all information items included in user's utterance are conveyed to the system correctly, and N is the average number of turns taken between the user and the system until all the items are accepted.

A0804.pdf

TOP

A REFERENTIAL APPROACH TO REDUCE PERPLEXITY IN THE VOCAL COMMAND SYSTEM COMPPA

Authors: F.-A. Mathieu, B. Gaiffe and J.-M. Pierrel

CRIN-CNRS & INRIA-Lorraine B.P. 239, 54506 Vandoeuvre les Nancy TEL. 33 3 83 59 20 00, FAX: 33 3 83 41 30 79, E-mail: {arnmat, gaiffe, jmp}@loria.fr

Volume 4 pages 2255 - 2258

ABSTRACT

The reliability of automatic speech recognition systems depends mainly on the local perplexity of the language to recognise. In the framework of vocal command dialogue systems, we propose an approach based on pragmatic, mainly through a precise treatment of referential expressions, which we use in order to reduce dynamically the local perplexity that the recognition process is confronted with. Therefore, we take into account not only the left context of the current hypothesis but also the state of the application. The article justifies the architecture we propose, describes the treatments and shows the resulting reduction of perplexity when using contextual information as compared to that obtained when using only semantic ones. Keywords: vocal command system - natural language - pragmatics - language perplexity - reference calculus

A0847.pdf

TOP

LINGUISTIC PROCESSOR FOR A SPOKEN DIALOGUE SYSTEM BASED ON ISLAND PARSING TECHNIQUES

Authors: Aristomenis Thanopoulos, Nikos Fakotakis, George Kokkinakis

Wire Communications Laboratory, Electrical & Computer Engineering Dept. University of Patras, 261 10 Patras, Greece. Tel. +30 61 991 722, FAX: +30 61 991 855, E-mail: fakotaki@wcl.ee.upatras.gr

Volume 4 pages 2259 - 2262

ABSTRACT

In this paper we present the Linguistic Analysis Component of a Spoken Dialogue System designed for robustness and flexibility. The dialogue takes place in the Greek Language through the public telephone network and is performed in two different applications. The analysis is based on Island Parsing, Pattern Matching and Frame-based Representation techniques. The main knowledge sources are a Semantic Network and Frame-Slot structures thoroughly connected with each other. Simple bigram grammar rules have been also used to assist the parsing process as well as to evaluate the recognition output.

A0861.pdf

TOP

MODELLING OF SPEECH-BASED USER INTERFACES

Authors: Brian Mellor Chris Baber

Speech Research Unit, DERA Malvern, St Andrews Road, Malvern, WR14 3PS, England. School of Mechanical Engineering, University of Birmingham, Edgbaston, Birmingham, B15 2TT, England.

Volume 4 pages 2263 - 2266

ABSTRACT

The capability profiles of commercial automatic speech recognition (ASR) systems are rapidly improving in terms of vocabulary size, noise robustness and user population. Most contemporary applications of ASR use interfaces relying solely on the speech mode of interaction (over telephone channels for example). Many applications will, however, benefit from using speech input in conjunction with other interaction devices such as trackballs, keyboards and touch-screens. In this paper, we present an interface modelling approach based on a critical path analysis of the interface design. The approach has been developed to model multi-modal interactions using combinations of input devices. Degradation of unit performances allow the effects of environmental factors on the overall interface performance to be predicted. The model is verified by comparison with experimental trials carried out on a number of multi-modal applications. It is demonstrated that the model is able to predict the main performance metric (task completion time) to within 10% of the experimental values.

A0865.pdf

TOP

Can You Predict Responses to Yes/no Questions? Yes, No, and Stuff

Authors: Beth Ann Hockey (1) Deborah Rossen-Knill (2) Beverly Spejewski (1) Matthew Stone (1) Stephen Isard (3)

(1) University of Pennsylvania Philadelphia PA 19104-6228 (2) Phila. College of Textiles & Science Philadelphia PA 19144-5497 (3) University of Edinburgh Edinburgh EH1 1HN fbeth, drossen, spejewsk, matthewg@linc.cis.upenn.edu stepheni@cstr.ed.ac.uk

Volume 4 pages 2267 - 2270

ABSTRACT

We analyze what functions as a YES response and a NO response for different yes/no questions. This problem is surprisingly complex: respondents do not always produce overt yes or no lexical items in sponse to a yes/no question. In addition, when spondents don't include a clear yes or no word, they may mean to communicate a clear YES or NO ing, or something else. We find that the classification of yes/no questions described in (Carletta et al., 1995) for the Edinburgh map task corpus correlates well with whether a response will be a bare yes or no, a yes or no plus additional speech, or just speech out an overt yes or no. Correlation with responses described simply as as "direct" or "indirect" is less good. We also find that, under the three-way rization, the strength of a question's expectation for a YES response predicts the form of the response.

A0918.pdf

TOP

Dia-MoLE: An unsupervised learning approach to adaptive dialogue models for spoken dialogue systems

Authors: Jens-Uwe Möller

Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg Vogt-Koelln-Str. 30, D-22527 Hamburg, Germany Phone: ++49 40 5494 - 2516 / Fax: ++49 40 5494 - 2515 http://www.informatik.uni-hamburg.de/NATS/staff/moeller.html mailto:jum@informatik.uni-hamburg.de

Volume 4 pages 2271 - 2274

ABSTRACT

The Dialogue Model Learning Environment supports an engineering-oriented approach towards dialogue modelling for a spoken-language interface. Major steps towards dialogue models is to know about the basic units that are used to construct a dialogue model and possible sequences. In difference to many other approaches a set of dialogue acts is not predefined by any theory or manually during the engineering process, but is learned from data that are available in an avised spoken dialogue system. The architecture is outlined and the approach is applied to the domain of appointment scheduling. Even though based on a word correctness of about 70% predictability of dialogue acts in Dia-MoLE turns out to be comparable to human-assigned dialogue acts.

A0969.pdf

TOP

HOW DO SYSTEM QUESTIONS INFLUENCE LEXICAL CHOICES IN USER ANSWERS?

Authors: J. Gustafson (1), A. Larsson (1), R. Carlson (1) and K. Hellman (2)

(1) Department of Speech, Music and Hearing, KTH Box 70014, S-10044 Stockholm, Sweden Tel.:+46 8 790 7879 Fax: +46 8 790 7854 E-mail: {joakim_g | anette | rolf}@speech.kth.se (2) Department of Linguistics, Stockholm University S-106 91 Stockholm, Sweden Tel.:+46 8 16 2335 Fax: +46 8 15 53 89 E-mail: kicki@ling.su.se

Volume 4 pages 2275 - 2278

ABSTRACT

This paper describes some studies on the effect of the system vocabulary on the lexical choices of the users. There are many theories about human-human dialogues that could be useful in the design of spoken dialogue systems. This paper will give an overview of some of these theories and report the results from two experiments that examines one of these theories, namely lexical entrainment. The first experiment was a small Wizard of Oz-test that simulated a tourist information system with a speech interface, and the second experiment simulated a system with speech recognition that controlled a questionnaire about peoples plans for their vacation. Both experiments show that the subjects mostly adapt their lexical choices to the system questions. Only in less than 5% of the cases did they use an alternative main verb in the answer. These results encourage us to investigate the possibility to add an adaptive language model in the speech recognizer in our dialogue system, where the probabilities for the words used in the system questions are increased.

A0978.pdf