Authors:
Stephanie Seneff, MIT Laboratory for Computer Science (USA)
Ed Hurley, MIT Laboratory for Computer Science (USA)
Raymond Lau, MIT Laboratory for Computer Science (USA)
Christine Pao, MIT Laboratory for Computer Science (USA)
Philipp Schmid, MIT Laboratory for Computer Science (USA)
Victor Zue, MIT Laboratory for Computer Science (USA)
Page (NA) Paper number 1153
Abstract:
GALAXY is a client-server architecture for accessing on-line information
using spoken dialogue which was first introduced at ICSLP-94. It has
served as the testbed for developing human language technologies for
our group for several years. Recently, we have initiated a significant
redesign of its architecture to enable many researchers to develop
their own applications, using either exclusively their own servers
or intermixing them with servers developed by others. This redesign
was done in part due to the fact that GALAXY has been designated as
the prototype reference architecture for the new DARPA Communicator
Program. The new architecture, GALAXY-II, makes use of a scripting
language for flow control to provide flexible interaction among the
servers, and a set of libraries to support rapid prototyping of new
servers. In this paper, we describe the new architecture in some detail,
and report on the current status of its development.
Authors:
Grace Chung, MIT Laboratory of Computer Science (USA)
Stephanie Seneff, MIT Laboratory of Computer Science (USA)
Page (NA) Paper number 603
Abstract:
This paper explores some issues in designing conversational systems
with integrated higher level constraints. We experiment with a configuration
that combines a context-dependent acoustic front-end, using MIT's SUMMIT
recognizer, with ANGIE, a hierarchical framework that models word substructure
and phonological processes, and with TINA, a trainable probabilistic
natural language (NL) model. Working in the Jupiter weather domain,
we develop a computationally tractable system which incorporates higher
level linguistic, prosodic and phonological constraints together in
the second of a two-pass strategy. Experiments are evaluated using
a new understanding performance metric, and the new integrated system
achieves up to 17.1% relative reduction in understanding error and
15.4% reduction in word error. In addition, we investigate the possibilities
of a two-pass system which relies on the first stage for pruning based
on syllable-level constraint, and applies linguistic and prosodic knowledge
largely at the second stage.
Authors:
Kenney Ng, MIT Laboratory for Computer Science (USA)
Page (NA) Paper number 1088
Abstract:
In this paper, we investigate a number of robust indexing and retrieval
methods in an effort to improve spoken document retrieval performance
in the presence of speech recognition errors. In particular, we examine
expanding the original query representation to include confusible terms;
developing a new document-query retrieval measure based on approximate
matching that is less sensitive to recognition errors; expanding the
document representation to include multiple recognition hypotheses;
modifying the original query using automatic relevance feedback to
include new terms found in the top ranked documents; and combining
information from multiple subword unit representations. We study the
different methods individually and then explore the effects of combining
them. Experiments on radio broadcast news data show that using a combination
of these methods can improve retrieval performance by over 20%.
|