Authors:
Kazuhiro Arai, NTT Human Interface Laboratories (Japan)
Jeremy H. Wright, AT&T Laboratories-Research (USA)
Giuseppe Riccardi, AT&T Laboratories-Research (USA)
Allen L. Gorin, AT&T Laboratories-Research (USA)
Page (NA) Paper number 63
Abstract:
A new method is proposed for automatically acquiring Fragments to understand
fluent speech. The goal of this method is to generate a collection
of Fragments, each representing a set of syntactically and semantically
similar phrases. First, phrases frequently observed in the training
set are selected as candidates. Each candidate phrase has three associated
probability distributions of : following contexts, preceding contexts,
and associated semantic actions. The similarity between candidate phrases
is measured by applying the Kullback-Leibler distance to these three
probability distributions. Candidate phrases that are close in all
three distances are clustered into a Fragment. Salient sequences of
these Fragments are then automatically acquired, and exploited by a
spoken language understanding to classify calls in AT&T's ``How
May I Help You?'' task. The experimental results show that the average
and maximum improvements in call-type classification performance of
2.2% and 2.8% are respectively achieved by introducing the Fragments.
Authors:
Tom Brøndsted, Center for PersonKommunikation (Denmark)
Page (NA) Paper number 810
Abstract:
The paper describes the concepts behind a sub grammar design tool being
developed within the EU-funded language-engineering project REWARD.
The tool is a sub-component of a general platform for designing spoken
language systems and addresses dialogue designers who are non-experts
in natural language processing and speech technology. Yet, the tool
interfaces to a powerful and "professional" unification grammar formalism
that is interpreted by a corresponding natural language parser and
in a derived finite state approximation form is used for constraining
speech recognition. The tool performs some basic intersection and unification
operations on feature sets to generate template-like rules complying
with the unification grammar formalism.
Authors:
Bob Carpenter, Lucent Technologies Bell Laboratories (USA)
Jennifer Chu-Carroll, Lucent Technologies Bell Laboratories (USA)
Page (NA) Paper number 76
Abstract:
We have developed a domain independent, automatically trained, call
router which directs customer calls based on their response to an open-ended
``How may I direct your call?'' query. Routing behavior is trained
from a corpus of transcribed and hand-routed calls and then carried
out using vector-based information retrieval techniques. Terms consist
of sequences of morphologically reduced content words. Documents representing
routing destinations consist of weighted term frequencies derived from
calls to that destination in the training corpus. In this paper, we
evaluate our approach in the context of a large financial services
call center with thousands of possible customer activities and dozens
of routing destinations. We evaluate the system's performance on ambiguous
and unambiguous calls when given either accurate transcriptions or
fairly noisy real-time speech recognizer output. We conclude that
in a highly complex call center, our system performs at roughly the
same level of accuracy as human operators.
Authors:
Debajit Ghosh, Nuance Communications (USA)
David Goddeau, Compaq Cambridge Research Laboratory (USA)
Page (NA) Paper number 167
Abstract:
One approach to spoken language understanding converts a transcribed
utterance into a semantic representation, which is then interpreted
to produce a response. This can be accomplished with conventional
parsing technology given a syntactic grammar and semantic composition
rules. However, constructing such a grammar can be difficult and time-consuming.
An alternative approach is to learn the rules from translated examples.
This eliminates the need for knowledge engineering but requires the
collection and annotation of the examples, which can be as difficult.
This research investigates using semantic information to learn syntax
automatically. After describing a semantic parsing mechanism for parsing
utterances based on meaning, we illustrate a grammar induction technique
which uses semantic parsing's results to create syntactic rules. We
also present experiments which use these rules in syntactic parsing
experiments in two domains. The learned grammar covers 98% of semantically-valid
utterances in its original domain and 85% in a different domain.
Authors:
Yasuyuki Kono, Kansai Research Laboratories, Toshiba Corporation (Japan)
Takehide Yano, Kansai Research Laboratories, Toshiba Corporation (Japan)
Munehiko Sasajima, Kansai Research Laboratories, Toshiba Corporation (Japan)
Page (NA) Paper number 169
Abstract:
This paper presents a new parsing algorithm, BTH, which is capable
of efficiently parsing a keyword lattice that contains a large number
of false alarms. The BTH parser runs without unfolding the given keyword
lattice, so that it can efficiently obtain a set of word sequences
acceptable to the given grammar as the parser result. The algorithm
has been implemented on Windows-based PCs and is tested by applying
it to the car navigation task that has a scale of practical applications.
The result shows promise in implementing the function of spontaneous
speech understanding from sentence utterance in next-generation car
navigation systems.
Authors:
Susanne Kronenberg, University of Bielefeld (Germany)
Franz Kummert, University of Bielefeld (Germany)
Page (NA) Paper number 515
Abstract:
The model presented here for parsing spoken German offers a way to
proceed incrementally discontinuous constructions. Typical constructions
in task oriented dialogs are Extrapositions to the right. These are
defined as syntactical constructions where a constituent is extraposed
in the Nachfeld. Basing on the assumption that the extraposed constituent
is not part of the source sentence the model works by coordinating
the syntactical information given by the source sentence and the extraposed
constituent to complete the extaposed constituent to a whole sentence.
Therefore, the standard LR(1)-parser is extended by two additional
actions. The parsing strategy works by deriving the first part of the
construction until the extraposed constituent is reached. The both
new actions enable the parser to proceed further by taking the syntactical
information of the source sentence to complete the extraposed constituent
to a sentence of its own. The repeated use of these actions guarantees
that every intended reading will be performed.
Authors:
Bor-Shen Lin, National Taiwan University (China)
Berlin Chen, Academia Sinica (China)
Hsin-Min Wang, Academia Sinica (China)
Lin-Shan Lee, National Taiwan University and Academia Sinica (China)
Page (NA) Paper number 449
Abstract:
It has been relatively difficult to develop natural language parsers
for spoken dialog systems, not only because of the possible recognition
errors, pauses, hesitations, out-of-vocabulary words, and the grammatically
incorrect sentence structures, but because of the great efforts required
to develop a general enough grammar with satisfactory coverage and
flexibility to handle different applications. In this paper, a new
hierarchical graph-based search scheme with layered structure is presented,
which is shown to provide more robust and flexible spontaneous speech
understanding for spoken dialog systems.
Authors:
Yasuhisa Niimi, Kyoto Institute of Technology (Japan)
Noboru Takinaga, Kyoto Institute of Technology (Japan)
Takuya Nishimoto, Kyoto Institute of Technology (Japan)
Page (NA) Paper number 1138
Abstract:
ABSTRACT This paper presents an approach to extraction of dialog acts
and topics from utterances in a spoken dialog system. Two knowledge
sources are used to describe the dialog history. One is a transition
network of dialog acts and the other is a tree of topics which might
appear in domain communications. Dialog acts and topics are extracted
through bottom-up and top-down analyses. Bottom-up candidates are
decided by applying a set of specially designed rules to the semantic
representation of an utterance, and top-down candidates by using the
current state of the dialog history. The logical ANDs between bottom-up
and top-down candidates are taken to decide the dialog act and topic
of an utterance. This method was examined with a corpus of fourteen
dialogs including 335 utterances. Correct extraction rates were 85%
for the topic and 82% for the dialog act.
Authors:
Harry Printz, IBM (USA)
Page (NA) Paper number 1129
Abstract:
Maximum entropy / minimum divergence modeling is a powerful technique
for constructing probability models, which has been applied to a variety
of problems in natural language processing. A maximum entropy / minimum
divergence (MEMD) model is built from a base model, and a set of feature
functions, also called simply features, whose empirical expectations
on some training corpus are known. A fundamental difficulty with this
technique is that while there are typically millions of features that
could be incorporated into a given model, in general it is not computationally
feasible, or even desirable, to use them all. Thus some means must
be devised for determining each feature's predictive power, also known
as its gain. Once the gains are known, the features can be ranked according
to their utility, and only the most gainful ones retained. This paper
presents a new algorithm for computing feature gain that is fast, accurate
and memory-efficient.
Authors:
Giuseppe Riccardi, AT&T-Labs Research (USA)
Allen L. Gorin, AT&T-Labs Research (USA)
Page (NA) Paper number 111
Abstract:
Stochastic language models for speech recognition have traditionally
been designed and evaluated in order to optimize word accuracy. In
this work, we present a novel framework for training stochastic language
models by optimizing two different criteria appropriate for speech
recognition and language understanding. First, the language entropy
and "salience" measure are used for learning the "relevant" spoken
language features (phrases). Secondly, a novel algorithm for training
stochastic finite state machines is presented which incorporates the
acquired phrase structure into a single stochastic language model.
Thirdly, we show the benefit of our novel framework with an end-to-end
evaluation of a large vocabulary spoken language system for call routing.
Authors:
Carol Van Ess-Dykema, U.S. Department of Defense (USA)
Klaus Ries, Carnegie Mellon University (USA)
Page (NA) Paper number 787
Abstract:
In order to improve Large Vocabulary Continuous Speech Recognition
(LVCSR) systems, it is essential to discover exactly how our current
systems are underperforming. The major intellectual tool for solving
this problem is error analysis: careful investigation of just which
factors are contributing to errors in the recognizers. This paper
presents our observations of the effects that discourse (i.e., dialog)
modeling has on LVCSR system performance. As our title indicates,
we emphasize the recognition error analysis methodology we developed
and what it showed us as opposed to emphasizing development of the
discourse model itself. In the first analysis of our output data,
we focussed on errors that could be eliminated by Dialog Act discourse
tagging using Dialog Act-specific language models. In a second analysis,
we manipulated the parameterization of the Dialog Act-specific language
models, enabling us to acquire evidence of the constraints these models
introduced. The word error rate did not significantly decrease since
the error rate in the largest category of Dialog Acts, namely Statements,
did not significantly decrease. We did, however, observe significant
error reduction in the less frequently occurring Dialog Acts and we
report on the characteristic of the error corrections. We discovered
that discourse models can introduce simple syntactic constraints and
that they are most sensitive to parts of speech.
Authors:
Kazuya Takeda, Nagoya University (Japan)
Atsunori Ogawa, Nagoya University (Japan)
Fumitada Itakura, Nagoya University (Japan)
Page (NA) Paper number 456
Abstract:
The relationship between the optimal value of word insertion penalty
and entropy of the language is discussed, based on the hypothesis that
the optimal word insertion penalty compensates the probability given
by a language model to the true probability. It is shown that the optimal
word insertion penalty can be calculated as the difference between
test set entropy of the given language model and true entropy of the
given test set sentences. The correctness of the idea is confirmed
through recognition experiment, where the entropy of the given set
of sentences are estimated from two different language models and word
insertion penalty optimized for each language model.
Authors:
Shu-Chuan Tseng, LiLi, University of Bielefeld (Germany)
Page (NA) Paper number 993
Abstract:
This paper presents results of a corpus-based analysis of speech repairs,
investigating repair signals which mark the existence of possible repairs.
Dividing speech repairs into three parts: erroneous part, editing term
and correction, this paper provides empirical evidence which supports
the notion that speech repairs are produced in a rather regular syntactic
pattern. Phrases seem to play a particular role in the production of
speech repairs, as phrasal boundaries frequently correspond to boundaries
within or around repairs. Related acoustic-prosodic features highlighting
the internal structure of repairs including F0 , duration and tonal
patterns are also examined and discussed with respect to specific syntactic
patterns.
Authors:
Francisco J. Valverde-Albacete, Dpto.Tec.Comms. Univ.Carlos III de Madrid (Spain)
José Manuel Pardo, Gr.Tec.Habla. Dep.Ing.Electrónica. UPM (Spain)
Page (NA) Paper number 587
Abstract:
In this paper, we present a new language model that includes some of
the most promising techniques for overcoming linguistic inadequacy,
- including POS tagging and refining, hierarchical, locally conditioned
grammars, parallel modelling of acoustic and linguistic domains - and
some of our own: language modelling as language parsing, and a better
integration of the whole process with the acoustic model resulting
in a richer educt from the language modelling process. We are building
this model for a translation into Spanish of the DARPA RM task, maintaining
the same 1k words vocabulary and some 1000 sentences.
Authors:
Jeremy H. Wright, AT&T Labs - Research (USA)
Allen L. Gorin, AT&T Labs - Research (USA)
Alicia Abella, AT&T Labs - Research (USA)
Page (NA) Paper number 385
Abstract:
We describe a procedure for contextual interpretation of spoken sentences
within dialogs. Task structure is represented in a graphical form,
enabling the interpreter algorithm to be efficient and task-independent.
Recognized spoken input may consist either of a single sentence with
utterance-verification scores, or of a word lattice with arc weights.
A confidence model is used throughout and all inferences are probability-weighted.
The interpretation consists of a probability for each class and for
each auxiliary information label needed for task completion. Anaphoric
references are permitted.
Authors:
Yoshimi Suzuki, Yamanashi University (Japan)
Fumiyo Fukumoto, Yamanashi University (Japan)
Yoshihiro Sekiguchi, Yamanashi University (Japan)
Page (NA) Paper number 739
Abstract:
In this paper, we propose a keyword extraction method for dictation
of radio news which consists of several domains. In our method, newspaper
articles which are automatically classified into suitable domains are
used in order to calculate feature vectors. The feature vectors show
term-domain interdependence and are used for selecting a suitable domain
of each part of radio news. Keywords are extracted by using the selected
domain. The results of keyword extraction experiments showed that
our methods are effective for keyword extraction of radio news.
|