Session W1A Dialogue Systems:Applications

Chairperson Norman Fraser Univ. of Surrey, UK

Home

Experiments in Spoken Queries for Document Retrieval

Authors: J. Barnett (1), S. Anderson (1), J. Broglio (2), M. Singh (1), R. Hudson (3), and S. W. Kuo (3)

(1) Dragon Systems, 320 Nevada St., Newton, MA 02160 USA Tel. (617) 965-5200, FAX: (617) 332-9575, E-mail: {stevea, monas}@dragonsys.com (2) Sovereign Hill Software, 100 Venture Way, Hadley, MA 01035 USA Tel. (413) 587-2222, FAX: (413) 587-2246, E-mail broglio@sovereign-hill.com (3) Intermetrics Inc., 23 Fourth Ave., Burlington, MA 01803 USA Tel. (617) 221-6990, FAX: (617) 221-6991, E-mail: {rgh, swk}@inmet.com

Volume 3 pages 1323 - 1326

ABSTRACT

We report the results of three experiments using the errorful output of a large vocabulary continuous speech recognition (LVCSR) system as the input to a statistical information retrieval (IR) system. Our goal is to allow a user to speak, rather than type, query terms into an IR engine and still obtain relevant documents. The purpose of these experiments is to test whether IR systems are robust to errors in the query terms introduced by the speech recognizer. If the correctly recognized words in the search query outweigh the misinformation from the incorrectly recognized words, the relevant documents will still be retrieved. This paper presents evidence that speech-driven IR can be effective, although with a reduced precision. We also find that longer spoken queries produce higher precision retrieval than shorter queries. For queries containing many (50-60) search terms and a recognizer word error rate (WER) of 27.9%, the precision at 30 documents retrieved is degraded by only 11.1%. For roughly the same WER, however, we find that queries shorter than 10-15 words suffer more than a 30% loss of precision.

A0062.pdf

TOP

TOWARDS AN AUTOMATED DIRECTORY INFORMATION SYSTEM

Authors: Frank Seide (1) and Andreas Kellner (2)

(2)Philips GmbH Research Laboratories Aachen, P.O. Box 500145 D-52085 Aachen, Germany (1) Philips Research Laboratories Taipei, P.O. Box 22978, Taipei, Taiwan, R.O.C. E-mail: fseide@prlt,kellner@pfag.research.philips.com

Volume 3 pages 1327 - 1330

ABSTRACT

This paper describes a design and feasibility study for a large-scale automatic directory information system with a scalable architecture. The current demonstrator, called PADIS-XL 1 , operates in realtime and handles a database of a medium-size German city with 130,000 listings. The system uses a new technique of taking a combined decision on the joint probability over multiple dialogue turns, and a dialogue strategy that strives to restrict the search space more and more with every dialogue turn. During the course of the dialogue, the last name of the desired subscriber must be spelled out. The spelling rec- ognizer permits continuous spelling and uses a context-free grammar to parse common spelling expressions. This paper describes the system architecture, our maximum a-posteriori (MAP) decision rule, the spelling grammar, and the dialogue strategy. We give results on the SPEECHDAT and SIETILL databases on recognition of first names by spelling and on jointly deciding on the spelled and the spoken name. In a 35,000-names setup, the joint decision reduced name-recognition errors by 31%.

A0097.pdf

TOP

A STRATEGY FOR MIXED-INITIATIVE DIALOGUE CONTROL

Authors: Lars Bo Larsen

Center for PersonKommunikation Institute of Electronic Systems Aalborg University DK-9220 Aalborg Denmark Email: lbl@cpk.auc.dk

Volume 3 pages 1331 - 1334

ABSTRACT

This paper presents and discusses a strategy for mixed-initiative dialogue management within a home banking application. The strategy tries to utilise the guidance of system-directed dialogues, while accommodating user initiated focus shifts by the inclusion of short-cuts in the dialogue. The paper reports on two experiments, one with a simulated speech recogniser (WOZ), and the second with a fully automated system. Both experiments shows that users use the possibility for short-cuts, even when not instructed of their existence. A tendency towards user habituation is also demonstrated.

A0382.pdf

TOP

ON THE DESIGN OF EFFECTIVE SPEECH-BASED INTERFACES FOR DESKTOP APPLICATIONS 1

Authors: Jim Hugunin and Victor Zue

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA hugunin@mit.edu — http://www.sls.lcs.mit.edu/~jjh

Volume 3 pages 1335 - 1338

ABSTRACT

Is speech a useful input modality for applications where the user has easy access to a full-size keyboard and mouse? This study shows that a well-designed speech interface can be more effective than a standard desktop application's traditional interface. Subjects are able to build a set of three spreadsheet tables 50% faster using a spoken dialog interface, and they report significantly greater enjoyment in using that interface. However, these advantages cannot be achieved by simply bolting a speech recognition system onto an application's existing interface. We found that this latter approach led to an insignificant 4% increase in efficiency and a devastating 64% increase in errors compared to the standard keyboard and mouse interface. In short, speech-based interfaces have the potential to substantially improve our interactions with computers, but they require significant interface redesign to take advantage of the unique properties of speech.

A0869.pdf

TOP

DIALOGUE STRATEGIES GUIDING USERS TO THEIR COMMUNICATIVE GOALS

Authors: Matthias Denecke Alex Waibel

denecke@mcspeech.com ahw@cs.cmu.edu Multicom Research Interactive Systems Laboratories 1900 Murray Avenue Carnegie Mellon University Pittsburgh, PA 15217,USA Pittsburgh, PA 15213, USA

Volume 3 pages 1339 - 1342

ABSTRACT

Much work has been done in dialogue modeling for Human - Computer Interaction. Problems arise in situations where disambiguation of highly ambiguous data base output is necessary. We propose to model the task rather than the dialogue itself. Furthermore, we propose underspecified representations to represent relevant data and to serve as a base for generating clarification questions that guide the user eficiently to arrive at his communicative goal. In this paper, we establish a connection between underspecified representations as representations of disjunctions and clarification questions. Our approach to clarifying dialogues differs from other approaches in that the form of the clarification dialogues is entirely determined by the domain modeling and by the underspecified representations.

A0971.pdf

TOP

A Speech Interface for Forms on WWW

Authors: Sunil Issar

Department of Computer Science Carnegie Mellon University Pittsburgh PA 15213 USA E-mail: si@cs.cmu.edu

Volume 3 pages 1343 - 1346

ABSTRACT

There is a wide variety of forms that a user encounters on the world wide web (WWW). In this paper, we describe the design of a speech interface that can be used over the web to fill forms. This presents several problems, for example, communicating with speech recognizer, parsing one or more forms embedded in text, generating appropriate language models and dictionary entries, and presenting appropriate information (responses and queries) to the user. Many database and non-database retrieval tasks can be viewed as form-filling tasks. Goddeau [2] also describes a form-based dialogue manager for spoken language understanding tasks. This tends to support our belief that a speech interface for forms is an important first step in the design of distributed spoken language systems, which can assist the user in problem solving activities.

A1171.pdf