Authors:
Richard Sproat, Bell Laboratories, Lucent Technologies (USA)
Jan P.H. van Santen, Bell Laboratories, Lucent Technologies (USA)
Page (NA) Paper number 41
Abstract:
Most work on sense disambiguation presumes that one knows beforehand
--- e.g. from a thesaurus --- a set of polysemous terms. But published
lists invariably give only partial coverage. For example, the English
word tan has several obvious senses, but one may overlook the abbreviation
for tangent. In this paper, we present an algorithm for identifying
interesting polysemous terms and measuring their degree of polysemy,
given an unlabeled corpus. The algorithm involves: (i) collecting
all terms within a k-term window of the target term; (ii) computing
the inter-term distances of the contextual terms, and reducing the
multi-dimensional distance space to two dimensions using standard methods;
(iii) converting the two-dimensional representation into radial coordinates
and using isotonic/antitonic regression to compute the degree to which
the distribution deviates from a single-peak model. The amount of deviation
is the proposed polysemy index.
Authors:
Julia Fischer, Chair for Pattern Recognition, University of Erlangen-Nuremberg (Germany)
Juergen Haas, Chair for Pattern Recognition, University of Erlangen-Nuremberg (Germany)
Elmar Nöth, Chair for Pattern Recognition, University of Erlangen-Nuremberg (Germany)
Heinrich Niemann, Chair for Pattern Recognition, University of Erlangen-Nuremberg (Germany)
Frank Deinzer, Chair for Pattern Recognition, University of Erlangen-Nuremberg (Germany)
Page (NA) Paper number 369
Abstract:
In this paper we present an innovative approach to speech understanding
which is based on a fine-grained knowledge representation automatically
compiled from a semantic network and on iterative optimization. Besides
allowing an efficient exploitation of parallelism, any-time capability
is provided since after each iteration step a (sub-)optimal solution
is always available. We apply this approach to a real-world task,
which is a dialog system able to answer queries about the German train
timetable. In order to speed up the search for the best interpretation
of an utterance we make use of statistical methods, e.g. neural networks,
n-grams, and classification trees, which are trained on application
relevant utterances collected over the public telephone network. At
the moment the real-time factor for interpreting the initial user's
utterance is 0.7.
Authors:
Akito Nagai, Information Technology R&D Center, MITSUBISHI Electric Corporation (Japan)
Yasushi Ishikawa, Information Technology R&D Center, MITSUBISHI Electric Corporation (Japan)
Page (NA) Paper number 1023
Abstract:
We have proposed a method of concept-driven semantic interpretation
based on general semantic knowledge of conceptual dependency. In our
approach, a concept is a unit of semantic interpretation and an utterance
is regarded as a sequence of concepts that convey an intention. However,
a considerable number of accepted results were not syntactically meaningful.
This is because the order in which linguistic features occurred in
the sequence of concepts was not taken into account in constructing
the whole meaning from the concepts: only semantic constraint was used
to attain linguistic robustness. Therefore, we introduce a statistical
language model which calculates the plausibility of a sequence of concepts
from the points of view of the order in which shallow linguistic features
occur. Experimental results of speech understanding for 1000-word-vocabulary
spontaneous speech show that the proposed method significantly improves
the system performance.
Authors:
José Colás, Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica - E. T. S. I. Telecomunicación - Universidad Politécnica de Madrid (Spain)
Javier Ferreiros, Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica - E. T. S. I. Telecomunicación - Universidad Politécnica de Madrid (Spain)
Juan Manuel Montero, Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica - E. T. S. I. Telecomunicación - Universidad Politécnica de Madrid (Spain)
Julio Pastor, Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica - E. T. S. I. Telecomunicación - Universidad Politécnica de Madrid (Spain)
Ascensión Gallardo, Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica - E. T. S. I. Telecomunicación - Universidad Politécnica de Madrid (Spain)
José Manuel Pardo, Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica - E. T. S. I. Telecomunicación - Universidad Politécnica de Madrid (Spain)
Page (NA) Paper number 1095
Abstract:
In a limited domain task (e.g. airline reservation, database retrieval,
etc) many robust understanding systems, designed for both speech and
text input, have been implemented [1][3][5][6][10] based on the Stochastic
Conceptual Finite-State paradigm (Semantic Network) or CHRONUS paradigm
[11] (Conceptual Hidden Representation of Natural Unconstrained Speech),
which establishes relations between conceptual entities through a probabilistic
graph-like structure. The use of this kind of grammar to model semantic
information presents limitations, which have been analysed during the
implementation of a flexible architecture for a robust information
retrieval system, based on the same paradigm [1]. We have tried to
solve some of them by integrating a set of conceptual probabilistic
and non-probabilistic grammars, which allow certain complexity in the
functionality of the application, such as applying non-SQL functions
to the results of SQL queries in order to retrieve information not
explicitly included in the database, translating certain natural spoken
sentences (that would produce difficult embedded queries and therefore
more natural queries) without so many restrictions in the relative
position of inter-concept relationship.
Authors:
Todd Ward, IBM (USA)
Salim Roukos, IBM (USA)
Chalapathy Neti, IBM (USA)
Jerome Gros, IBM (formerly of) (USA)
Mark Epstein, IBM (USA)
Satya Dharanipragada, IBM (USA)
Page (NA) Paper number 400
Abstract:
In this paper we describe our initial efforts in building a natural
language understanding (NLU) system across multiple languages. The
system allows users to switch languages seamlessly in a single session
without requiring any switch in the speech recognition system. Context-dependence
is maintained across utterances, even when the user changes languages.
Towards this end we have begun building a universal speech recognizer
for English and French languages. We experiment with a common phonology
for both French and English with a novel mechanism to handle language
dependent variations. Our best results so far show about 5% relative
performance degradation for English relative to a unilingual English
system and a 9% relative degradation in French relative to a unilingual
French system. The NLU system uses the same statistical understanding
algorithms for each language, making system development, maintenance
and portability vastly superior to systems built customly for each
language.
Authors:
Andreas Stolcke, SRI International (USA)
Elizabeth Shriberg, SRI International (USA)
Rebecca Bates, Boston University (USA)
Mari Ostendorf, Boston University (USA)
Dilek Hakkani, SRI International (USA)
Madelaine Plauche, SRI International (USA)
Gökhan Tür, SRI International (USA)
Yu Lu, SRI International (USA)
Page (NA) Paper number 59
Abstract:
We study the problem of detecting linguistic events at interword boundaries,
such as sentence boundaries and disfluency locations, in speech transcribed
by an automatic recognizer. Recovering such events is crucial to facilitate
speech understanding and other natural language processing tasks.
Our approach is based on a combination of prosodic cues modeled by
decision trees, and word-based event N-gram language models. Several
model combination approaches are investigated. The techniques are
evaluated on conversational speech from the Switchboard corpus. Model
combination is shown to give a significant win over individual knowledge
sources.
|