Spoken Language Models and Dialog 2

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

An Event Driven Model for Dialogue Systems

Authors:

Kuansan Wang, Microsoft Research (USA)

Page (NA) Paper number 888

Abstract:

This paper reports our progress in building a mixed initiative, goal-oriented dialogue system for human machine interactions. The dialogue model embraces the spirits of the so-called plan-based approach in that the dialogue flow is not statically authored but dynamically generated by the system as a natural outcome of the semantic evaluation process. With multimodal applications in mind, the dialogue system is designed in an event driven architecture that is commonly seen in the core of a graphical user interface (GUI) environment. In the same manner that GUI events are handled by graphical objects, the proposed dialogue model assigns dialogue events to semantic objects that encapsulate the knowledge for handling events under various discourse contexts. Currently, we have found that four types of events, namely, dialogue object instantiation, semantic evaluation, dialogue repair, and discourse binding, are sufficient for a wide range of applications.

SL980888.PDF (From Author) SL980888.PDF (Rasterized)

TOP


Automatic Classification of Dialogue Contexts for Dialogue Predictions

Authors:

Cosmin Popovici, I.C.I. - Institutul de Cercetari in Informatica (Romania)
Paolo Baggia, CSELT - Centro Studi e Laboratori Telecomunicazioni (Italy)
Pietro Laface, Politecnico di Torino (Italy)
Loreta Moisa, Politecnico di Torino (Italy)

Page (NA) Paper number 552

Abstract:

This paper exploits the broad concept of dialogue predictions by linking a point in a human-machine dialogue with a specific language model which is used during the recognition of the next user utterance. The idea is to cluster several dialogue contexts into a class and to create for each class a specific language model. We present an automatic algorithm based on the minimal decrease of mutual information which clusters the dialogue contexts. Moreover the algorithm is able to guess an appropriate number of classes, that gives a good trade off between the mutual information and the amount of training data. Therefore the automatic classification allows the full automatic creation of context-dependent language models for a spoken dialogue system.

SL980552.PDF (From Author) SL980552.PDF (Rasterized)

TOP


Automatic Identification of Command Boundaries in a Conversational Natural Language User Interface

Authors:

Ganesh N. Ramaswamy, I.B.M. Research Center (USA)
Jan Kleindienst, I.B.M. Research Center (USA)

Page (NA) Paper number 612

Abstract:

In this paper, we propose a trainable system that can automatically identify the command boundaries in a conversational natural language user interface. The proposed solution makes the conversational interface much more user friendly, and allows the user to speak naturally and continuously in a hands-free manner. The main ingredient of the system is the maximum entropy identification model, which is trained using data that has all the correct command boundaries marked. During training, a set of features and their weights are selected iteratively using the training data. The features consists of words and phrases, as well as their relative position to the potential command boundaries. Decoding is done by examining the product of the weights for the features that are present. We also propose several enhancements to the approach, such as combining it with a more effective language model at the speech recognition stage to generate additional tokens for the identification model. We conducted several experiments to evaluate the proposed approach, and the results are described.

SL980612.PDF (From Author) SL980612.PDF (Rasterized)

TOP


The Predictive Power of Game Structure in Dialogue Act Recognition: Experimental Results Using Maximum Entropy Estimation

Authors:

Massimo Poesio, University of Edinburgh, HCRC (U.K.)
Andrei Mikheev, Harlequin (U.K.)

Page (NA) Paper number 606

Abstract:

Recognizing the dialogue act(s) performed by means of an utterance involves combining top-down expectations about the next likely `move' in a dialogue with bottom-up information extracted from the speech signal. We compared two ways of generating expectations: one which makes the expectations depend only on the previous act (as in a bigram model), and one which also takes into account the fact that individual dialogue acts play a role as part of larger conversational structures (`games'). Our models were built by training over the HCRC MapTask corpus using the LTG implementation of maximum entropy estimation. We achieved an accuracy of 38.6% using bigrams, of 50.6% taking game structure into account; adding information about speaker change resulted in an accuracy of 41.8% with bigrams, 54% with game structure. These results indicate that exploiting game structure does lead to improved expectations.

SL980606.PDF (From Author) SL980606.PDF (Rasterized)

TOP


A Schema Based Approach To Dialog Control

Authors:

Paul C. Constantinides, Carnegie Mellon University (USA)
Scott Hansma, Carnegie Mellon University (USA)
Chris Tchou, Carnegie Mellon University (USA)
Alexander I. Rudnicky, Carnegie Mellon University (USA)

Page (NA) Paper number 637

Abstract:

Frame-based approaches to spoken language interaction work well for limited tasks such as information access, given that the goal of the interaction is to construct a correct query then execute it. More complex tasks, however, can benefit from more active system participation. We describe two mechanisms that provide this, a modified stack that allows the system to track multiple topics, and form-specific schema that allow the system to deal with tasks that involve completion of multiple forms. Domain-dependent schema specify system behavior and are executed by a domain-independent engine. We describe implementations for a personal calendar system and for an air travel planning system.

SL980637.PDF (From Author) SL980637.PDF (Rasterized)

TOP


Expanding A Time-Sensitive Conversational Architecture For Turn-Taking To Handle Content-Driven Interruption

Authors:

Gregory Aist, Language Technologies Institute, Carnegie Mellon University (USA)

Page (NA) Paper number 928

Abstract:

Turn taking in spoken language systems has generally been push-to-talk or strict alternation (user speaks, system speaks, user speaks, ...) with some systems such as telephone-based systems handling barge-in (interruption by the user.) In this paper we describe our time sensitive conversational architecture for turn taking that not only allows alternating turns and barge in, but other conversational behaviors as well. This architecture allows backchanneling, prompting the user by taking more than one turn if necessary, and overlapping speech. The architecture is implemented in a Reading Tutor that listens to children read aloud, and helps them. We extended this architecture to allow the Reading Tutor to interrupt the student based on a non-self-corrected mistake - "content-driven interruption". To the best of our knowledge, the Reading Tutor is thus the first spoken language system to intentionally interrupt the user based on the content of the utterance.

SL980928.PDF (From Author) SL980928.PDF (Rasterized)

TOP