ABSTRACT
We report the results of three experiments using the errorful output of a large vocabulary continuous speech recognition (LVCSR) system as the input to a statistical information retrieval (IR) system. Our goal is to allow a user to speak, rather than type, query terms into an IR engine and still obtain relevant documents. The purpose of these experiments is to test whether IR systems are robust to errors in the query terms introduced by the speech recognizer. If the correctly recognized words in the search query outweigh the misinformation from the incorrectly recognized words, the relevant documents will still be retrieved. This paper presents evidence that speech-driven IR can be effective, although with a reduced precision. We also find that longer spoken queries produce higher precision retrieval than shorter queries. For queries containing many (50-60) search terms and a recognizer word error rate (WER) of 27.9%, the precision at 30 documents retrieved is degraded by only 11.1%. For roughly the same WER, however, we find that queries shorter than 10-15 words suffer more than a 30% loss of precision.
ABSTRACT
This paper describes a design and feasibility study for a large-scale automatic directory information system with a scalable architecture. The current demonstrator, called PADIS-XL 1 , operates in realtime and handles a database of a medium-size German city with 130,000 listings. The system uses a new technique of taking a combined decision on the joint probability over multiple dialogue turns, and a dialogue strategy that strives to restrict the search space more and more with every dialogue turn. During the course of the dialogue, the last name of the desired subscriber must be spelled out. The spelling rec- ognizer permits continuous spelling and uses a context-free grammar to parse common spelling expressions. This paper describes the system architecture, our maximum a-posteriori (MAP) decision rule, the spelling grammar, and the dialogue strategy. We give results on the SPEECHDAT and SIETILL databases on recognition of first names by spelling and on jointly deciding on the spelled and the spoken name. In a 35,000-names setup, the joint decision reduced name-recognition errors by 31%.
ABSTRACT
This paper presents and discusses a strategy for mixed-initiative dialogue management within a home banking application. The strategy tries to utilise the guidance of system-directed dialogues, while accommodating user initiated focus shifts by the inclusion of short-cuts in the dialogue. The paper reports on two experiments, one with a simulated speech recogniser (WOZ), and the second with a fully automated system. Both experiments shows that users use the possibility for short-cuts, even when not instructed of their existence. A tendency towards user habituation is also demonstrated.
ABSTRACT
Is speech a useful input modality for applications where the user has easy access to a full-size keyboard and mouse? This study shows that a well-designed speech interface can be more effective than a standard desktop application's traditional interface. Subjects are able to build a set of three spreadsheet tables 50% faster using a spoken dialog interface, and they report significantly greater enjoyment in using that interface. However, these advantages cannot be achieved by simply bolting a speech recognition system onto an application's existing interface. We found that this latter approach led to an insignificant 4% increase in efficiency and a devastating 64% increase in errors compared to the standard keyboard and mouse interface. In short, speech-based interfaces have the potential to substantially improve our interactions with computers, but they require significant interface redesign to take advantage of the unique properties of speech.
ABSTRACT
Much work has been done in dialogue modeling for Human - Computer Interaction. Problems arise in situations where disambiguation of highly ambiguous data base output is necessary. We propose to model the task rather than the dialogue itself. Furthermore, we propose underspecified representations to represent relevant data and to serve as a base for generating clarification questions that guide the user eficiently to arrive at his communicative goal. In this paper, we establish a connection between underspecified representations as representations of disjunctions and clarification questions. Our approach to clarifying dialogues differs from other approaches in that the form of the clarification dialogues is entirely determined by the domain modeling and by the underspecified representations.
ABSTRACT
There is a wide variety of forms that a user encounters on the world wide web (WWW). In this paper, we describe the design of a speech interface that can be used over the web to fill forms. This presents several problems, for example, communicating with speech recognizer, parsing one or more forms embedded in text, generating appropriate language models and dictionary entries, and presenting appropriate information (responses and queries) to the user. Many database and non-database retrieval tasks can be viewed as form-filling tasks. Goddeau [2] also describes a form-based dialogue manager for spoken language understanding tasks. This tends to support our belief that a speech interface for forms is an important first step in the design of distributed spoken language systems, which can assist the user in problem solving activities.