ABSTRACT
A methodology for creating and managing an integrated database for spoken dialog systems is proposed. Using an example of a telecommunication service application, details of organizing, maintaining, and visualizing the dialog system data are presented. Examples illustrating the use ofthe unified database structure for dialog reproduction and performance evaluation are provided.
ABSTRACT
This paper presents a case study analyzing the results of an on-going trial of a prototype mixed-initiative spoken dialog system for telephony control and messaging. System usage and performance data were captured at three points in time. Information from multiple data sources, including spoken utterances, system call logs, speech recognizer output, and subjective surveys was evaluated to determine the relationship between aspects of system performance and user perceptions of the system. This report provides several examples using these data sources in combination to identify key areas to focus on in modifying the system, application, and/or user interface in order to significantly improve system usability and user satisfaction.
ABSTRACT
Considering the limitations of Speech Recognition for the development of user-system dialogues in real applications, robustness is a primary objective. In this paper, we describe the most essential characteristics of the Dialogue Manager of a driver infonnation system that is controlled by voice, mainly showing how its design has been driven by the characteristics of voice in such a dialogue. We present the main methods used by the Dialogue Manager to come to an effec- tive balance between robustness and efficiency. We illustrate them with examples from the first implementation of the system.
ABSTRACT
Speech repairs introduce much noise in spoken language processing. Properly correcting speech repairs can help the speech recognizer to avoid the textual errors, and prevent the interpretation errors during the subsequent processing. Because the task of repair processing cannot defer to the latter (word segmentation, part-of-speech tagging and sentence parsing) stages, this paper employs acoustic and prosodic cues to correct Chinese repetition and addition repairs. The experimental results show that the precision rate of 93.87% (76.09%) and the recall rate of 90.65% (70%) can be achieved for correcting Chinese repetition (addition) repairs.
ABSTRACT
Natural language interaction requires dialogue models that allow for efficient and robust human computer interaction. Most systems today use some kind of speech-act based dialogue model. While successful in a number of applications, these models have known limitations, both from linguistic and computational points of view, which has led a number of workers to suggest using the dialogue participants goals/intentions to model the dialogue. In this paper we suggest that amending speech act based models with sophisticated domain knowledge makes it possible to extend their applicability. Two kinds of domain knowledge are identified, one is the Domain Model; a structure of the discourse 'world', and the other is the Conceptual Model which contains domain specific general information about the concepts and their relationships in the domain. These extensions have been utilized in the LINLIN dialogue manager and the paper presents results from customizing the dialogue manager to two different applications.
ABSTRACT
This paper reports experimental results comparing a mixed-initiative to a system-initiative dialog strategy in the context of a personal voice email agent. To independently test the effects of dialog strategy and user expertise, users interact with either the system-initiative or the mixed-initiative agent to perform three successive tasks which are identical for both agents. We report performancecomparisonsacross agent strategies as well as over tasks. This evaluation utilizes and tests the PARADISE evaluation framework, and discusses the performance function derivable from the experimental data.
ABSTRACT
Discourse markers, also known as cue words, are used exten- sively in human-human task-oriented dialogs to signal the struc- ture of the discourse. Previous work showed their importance in monologues for marking discourse structure, but little attention has been paid to their importance in spoken dialog systems. This paper investigates what discourse markers signal about the up- coming speech, and when they tend to be used in task-oriented dialog. We demonstrate that there is a high correlation between specific discourse markers and specific conversational moves, between discourse marker use and adjacency pairs, and between discourse markers and the speaker's orientation to information presented in the prior turn.
ABSTRACT
This paper describes our initial implementation of a sys- tem to provide world-wide weather information over the telephone. The information is gathered from several dif- ferent sites on the Web, preprocessed, and cached locally into a relational database to make access both fast and selective. Our natural language tools, originally devel- oped for processing user queries, are used here for under- standing content, and for subsequently translating it into languages other than English. The system is operational, and we have been collecting data from real users via a toll-free number. We report here on an initial evaluation both of the full system in English and of the quality of the responses in German.
ABSTRACT
This paper demonstrates some aspects of a plan processor which is a subcomponent of the dialogue module of verb- mobil. We describe how we transfer results from the re- search area of grammar extraction for the semi-automatic acquisition of plan operators for turn classes. We exploit statistical knowledge acquired during learning the gram- mar and incorporate top down predictions to enhance the correct analysis of turn classes described. A first evalua- tion shows a relative recognition rate of around 70% on unseen data.
ABSTRACT
Pragmatically important information as e.g. dialogue acts that describe the illocution of an utterance depend in traditional processing approaches on error prone syntactic/semantic processing. We present a statistically based method for dialogue act classification that has word strings as input. An experimental evaluation shows that this method can be successfully used to determine dialogue acts. The overall recognition rate in the experiments is in the range of 65%--67% for German test data, and 74% for an experiment with English dialogues.
ABSTRACT
This paper deals with a problematic not deeply stud- ied as yet: user's goals interaction. A situation of multiple goals occurs as soon as the user utters a new goal whereas the previous one has not been solved yet. We propose an algorithm to identify the kind of multiple goals according to the task state and to the goals themselves. We define ten strategies to process those situations. Three meta-strategies order the strategies relevant for given situations. The system checks the preconditions of strategies to be sure they can be triggered. When a strategy is applied, the system updates the dialogue history and the task state. Some strategies push a goal in a stack and pop it when the first processed goal is fully reached.
ABSTRACT
Conventional spoken dialogue systems are based on goal-oriented techniques(8). The recent expansion of application fields such as cyber space, internet, etc, necessitates the creation of new interaction styles between humans and autonomous agents. Interaction with autonomous agents creates new possibilities for spontaneous conversation in spoken dialogue systems. Within this context, we regard spontaneous, informal chatting behavior as one aspect of spoken dialogue(4)(5). According to this view, an essential property of chatting is the emergence of topics and goals situated within the context of interactions among participants rather than as the result of explicit goals. In this paper, we propose a spoken dialogue system with chatting properties and illustrate sample chatting between a human and a virtual interface agent called Talking Eye using a prototype system.
ABSTRACT
We present a generic template for spoken dialogue systems integrating speech recognition and synthesis with 'higher-level' natural language dialogue modelling components. The generic model is abstracted from a number of real application systems targetted at very different domains. Our research aim in developing this generic template is to investigate a new approach to the evaluation of Dialogue Management Systems. Rather than attempting to measure accuracy/speed of output, we propose principles for the evaluation of the underlying theoretical linguistic model of Dialogue Management in a given system, in terms of how well it fits our generic template for Dialogue Management Systems. This is a measure of 'genericness' or 'application-independence' of a given system, which can be used to moderate accuracy/speed scores in comparisons of very unlike DMSs serving different domains. This relates to (but is orthogonal to) Dialogue Management Systems evaluation in terms of naturalness and like measurable metrics; it follows more closely emerging qualitative evaluation techniques for NL grammatical parsing schemes.
ABSTRACT
This paper proposes and analyzes mathematically an interactive strategy to recover from misrecognition of utterances including multiple information items through a short conversation with a speaker. First the speech recognizer in a dialogue system recognizes an utterance and evaluates the reliability of each item contained in it. The dialogue system accepts only those items of which the reliability is high, while it rejects the items which are unreliably recognized, or confirms the content of them. The paper, given the performance of the recognizer, derives two quantities P ac and N, which can describe the performance of the dialogue system using this interactive strategy: P ac is the probability that all information items included in user's utterance are conveyed to the system correctly, and N is the average number of turns taken between the user and the system until all the items are accepted.
ABSTRACT
The reliability of automatic speech recognition systems depends mainly on the local perplexity of the language to recognise. In the framework of vocal command dialogue systems, we propose an approach based on pragmatic, mainly through a precise treatment of referential expressions, which we use in order to reduce dynamically the local perplexity that the recognition process is confronted with. Therefore, we take into account not only the left context of the current hypothesis but also the state of the application. The article justifies the architecture we propose, describes the treatments and shows the resulting reduction of perplexity when using contextual information as compared to that obtained when using only semantic ones. Keywords: vocal command system - natural language - pragmatics - language perplexity - reference calculus
ABSTRACT
In this paper we present the Linguistic Analysis Component of a Spoken Dialogue System designed for robustness and flexibility. The dialogue takes place in the Greek Language through the public telephone network and is performed in two different applications. The analysis is based on Island Parsing, Pattern Matching and Frame-based Representation techniques. The main knowledge sources are a Semantic Network and Frame-Slot structures thoroughly connected with each other. Simple bigram grammar rules have been also used to assist the parsing process as well as to evaluate the recognition output.
ABSTRACT
The capability profiles of commercial automatic speech recognition (ASR) systems are rapidly improving in terms of vocabulary size, noise robustness and user population. Most contemporary applications of ASR use interfaces relying solely on the speech mode of interaction (over telephone channels for example). Many applications will, however, benefit from using speech input in conjunction with other interaction devices such as trackballs, keyboards and touch-screens. In this paper, we present an interface modelling approach based on a critical path analysis of the interface design. The approach has been developed to model multi-modal interactions using combinations of input devices. Degradation of unit performances allow the effects of environmental factors on the overall interface performance to be predicted. The model is verified by comparison with experimental trials carried out on a number of multi-modal applications. It is demonstrated that the model is able to predict the main performance metric (task completion time) to within 10% of the experimental values.
ABSTRACT
We analyze what functions as a YES response and a NO response for different yes/no questions. This problem is surprisingly complex: respondents do not always produce overt yes or no lexical items in sponse to a yes/no question. In addition, when spondents don't include a clear yes or no word, they may mean to communicate a clear YES or NO ing, or something else. We find that the classification of yes/no questions described in (Carletta et al., 1995) for the Edinburgh map task corpus correlates well with whether a response will be a bare yes or no, a yes or no plus additional speech, or just speech out an overt yes or no. Correlation with responses described simply as as "direct" or "indirect" is less good. We also find that, under the three-way rization, the strength of a question's expectation for a YES response predicts the form of the response.
ABSTRACT
The Dialogue Model Learning Environment supports an engineering-oriented approach towards dialogue modelling for a spoken-language interface. Major steps towards dialogue models is to know about the basic units that are used to construct a dialogue model and possible sequences. In difference to many other approaches a set of dialogue acts is not predefined by any theory or manually during the engineering process, but is learned from data that are available in an avised spoken dialogue system. The architecture is outlined and the approach is applied to the domain of appointment scheduling. Even though based on a word correctness of about 70% predictability of dialogue acts in Dia-MoLE turns out to be comparable to human-assigned dialogue acts.
ABSTRACT
This paper describes some studies on the effect of the system vocabulary on the lexical choices of the users. There are many theories about human-human dialogues that could be useful in the design of spoken dialogue systems. This paper will give an overview of some of these theories and report the results from two experiments that examines one of these theories, namely lexical entrainment. The first experiment was a small Wizard of Oz-test that simulated a tourist information system with a speech interface, and the second experiment simulated a system with speech recognition that controlled a questionnaire about peoples plans for their vacation. Both experiments show that the subjects mostly adapt their lexical choices to the system questions. Only in less than 5% of the cases did they use an alternative main verb in the answer. These results encourage us to investigate the possibility to add an adaptive language model in the speech recognizer in our dialogue system, where the probabilities for the words used in the system questions are increased.