Spoken Language Understanding Systems 4

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Grammar Fragment Acquisition using Syntactic and Semantic Clustering

Authors:

Kazuhiro Arai, NTT Human Interface Laboratories (Japan)
Jeremy H. Wright, AT&T Laboratories-Research (USA)
Giuseppe Riccardi, AT&T Laboratories-Research (USA)
Allen L. Gorin, AT&T Laboratories-Research (USA)

Page (NA) Paper number 63

Abstract:

A new method is proposed for automatically acquiring Fragments to understand fluent speech. The goal of this method is to generate a collection of Fragments, each representing a set of syntactically and semantically similar phrases. First, phrases frequently observed in the training set are selected as candidates. Each candidate phrase has three associated probability distributions of : following contexts, preceding contexts, and associated semantic actions. The similarity between candidate phrases is measured by applying the Kullback-Leibler distance to these three probability distributions. Candidate phrases that are close in all three distances are clustered into a Fragment. Salient sequences of these Fragments are then automatically acquired, and exploited by a spoken language understanding to classify calls in AT&T's ``How May I Help You?'' task. The experimental results show that the average and maximum improvements in call-type classification performance of 2.2% and 2.8% are respectively achieved by introducing the Fragments.

SL980063.PDF (From Author) SL980063.PDF (Rasterized)

TOP


Non-Expert Access to Unification Based Speech Understanding

Authors:

Tom Brøndsted, Center for PersonKommunikation (Denmark)

Page (NA) Paper number 810

Abstract:

The paper describes the concepts behind a sub grammar design tool being developed within the EU-funded language-engineering project REWARD. The tool is a sub-component of a general platform for designing spoken language systems and addresses dialogue designers who are non-experts in natural language processing and speech technology. Yet, the tool interfaces to a powerful and "professional" unification grammar formalism that is interpreted by a corresponding natural language parser and in a derived finite state approximation form is used for constraining speech recognition. The tool performs some basic intersection and unification operations on feature sets to generate template-like rules complying with the unification grammar formalism.

SL980810.PDF (From Author) SL980810.PDF (Rasterized)

TOP


Natural Language Call Routing: A Robust, Self-Organizing Approach

Authors:

Bob Carpenter, Lucent Technologies Bell Laboratories (USA)
Jennifer Chu-Carroll, Lucent Technologies Bell Laboratories (USA)

Page (NA) Paper number 76

Abstract:

We have developed a domain independent, automatically trained, call router which directs customer calls based on their response to an open-ended ``How may I direct your call?'' query. Routing behavior is trained from a corpus of transcribed and hand-routed calls and then carried out using vector-based information retrieval techniques. Terms consist of sequences of morphologically reduced content words. Documents representing routing destinations consist of weighted term frequencies derived from calls to that destination in the training corpus. In this paper, we evaluate our approach in the context of a large financial services call center with thousands of possible customer activities and dozens of routing destinations. We evaluate the system's performance on ambiguous and unambiguous calls when given either accurate transcriptions or fairly noisy real-time speech recognizer output. We conclude that in a highly complex call center, our system performs at roughly the same level of accuracy as human operators.

SL980076.PDF (From Author) SL980076.PDF (Rasterized)

TOP


Automatic Grammar Induction from Semantic Parsing

Authors:

Debajit Ghosh, Nuance Communications (USA)
David Goddeau, Compaq Cambridge Research Laboratory (USA)

Page (NA) Paper number 167

Abstract:

One approach to spoken language understanding converts a transcribed utterance into a semantic representation, which is then interpreted to produce a response. This can be accomplished with conventional parsing technology given a syntactic grammar and semantic composition rules. However, constructing such a grammar can be difficult and time-consuming. An alternative approach is to learn the rules from translated examples. This eliminates the need for knowledge engineering but requires the collection and annotation of the examples, which can be as difficult. This research investigates using semantic information to learn syntax automatically. After describing a semantic parsing mechanism for parsing utterances based on meaning, we illustrate a grammar induction technique which uses semantic parsing's results to create syntactic rules. We also present experiments which use these rules in syntactic parsing experiments in two domains. The learned grammar covers 98% of semantically-valid utterances in its original domain and 85% in a different domain.

SL980167.PDF (From Author) SL980167.PDF (Rasterized)

TOP


BTH: An Efficient Parsing Algorithm for Word-Spotting

Authors:

Yasuyuki Kono, Kansai Research Laboratories, Toshiba Corporation (Japan)
Takehide Yano, Kansai Research Laboratories, Toshiba Corporation (Japan)
Munehiko Sasajima, Kansai Research Laboratories, Toshiba Corporation (Japan)

Page (NA) Paper number 169

Abstract:

This paper presents a new parsing algorithm, BTH, which is capable of efficiently parsing a keyword lattice that contains a large number of false alarms. The BTH parser runs without unfolding the given keyword lattice, so that it can efficiently obtain a set of word sequences acceptable to the given grammar as the parser result. The algorithm has been implemented on Windows-based PCs and is tested by applying it to the car navigation task that has a scale of practical applications. The result shows promise in implementing the function of spontaneous speech understanding from sentence utterance in next-generation car navigation systems.

SL980169.PDF (From Author) SL980169.PDF (Rasterized)

TOP


Syntax Coordination: Interaction of Discourse and Extrapositions

Authors:

Susanne Kronenberg, University of Bielefeld (Germany)
Franz Kummert, University of Bielefeld (Germany)

Page (NA) Paper number 515

Abstract:

The model presented here for parsing spoken German offers a way to proceed incrementally discontinuous constructions. Typical constructions in task oriented dialogs are Extrapositions to the right. These are defined as syntactical constructions where a constituent is extraposed in the Nachfeld. Basing on the assumption that the extraposed constituent is not part of the source sentence the model works by coordinating the syntactical information given by the source sentence and the extraposed constituent to complete the extaposed constituent to a whole sentence. Therefore, the standard LR(1)-parser is extended by two additional actions. The parsing strategy works by deriving the first part of the construction until the extraposed constituent is reached. The both new actions enable the parser to proceed further by taking the syntactical information of the source sentence to complete the extraposed constituent to a sentence of its own. The repeated use of these actions guarantees that every intended reading will be performed.

SL980515.PDF (From Author) SL980515.PDF (Rasterized)

TOP


Hierarchical Tag-Graph Search for Spontaneous Speech Understanding in Spoken Dialog Systems

Authors:

Bor-Shen Lin, National Taiwan University (China)
Berlin Chen, Academia Sinica (China)
Hsin-Min Wang, Academia Sinica (China)
Lin-Shan Lee, National Taiwan University and Academia Sinica (China)

Page (NA) Paper number 449

Abstract:

It has been relatively difficult to develop natural language parsers for spoken dialog systems, not only because of the possible recognition errors, pauses, hesitations, out-of-vocabulary words, and the grammatically incorrect sentence structures, but because of the great efforts required to develop a general enough grammar with satisfactory coverage and flexibility to handle different applications. In this paper, a new hierarchical graph-based search scheme with layered structure is presented, which is shown to provide more robust and flexible spontaneous speech understanding for spoken dialog systems.

SL980449.PDF (From Author) SL980449.PDF (Rasterized)

TOP


Extraction of the Dialog Act and the Topic From Utterances in a Spoken Dialog System

Authors:

Yasuhisa Niimi, Kyoto Institute of Technology (Japan)
Noboru Takinaga, Kyoto Institute of Technology (Japan)
Takuya Nishimoto, Kyoto Institute of Technology (Japan)

Page (NA) Paper number 1138

Abstract:

ABSTRACT This paper presents an approach to extraction of dialog acts and topics from utterances in a spoken dialog system. Two knowledge sources are used to describe the dialog history. One is a transition network of dialog acts and the other is a tree of topics which might appear in domain communications. Dialog acts and topics are extracted through bottom-up and top-down analyses. Bottom-up candidates are decided by applying a set of specially designed rules to the semantic representation of an utterance, and top-down candidates by using the current state of the dialog history. The logical ANDs between bottom-up and top-down candidates are taken to decide the dialog act and topic of an utterance. This method was examined with a corpus of fourteen dialogs including 335 utterances. Correct extraction rates were 85% for the topic and 82% for the dialog act.

SL981138.PDF (From Author) SL981138.PDF (Rasterized)

TOP


Fast Computation of Maximum Entropy / Minimum Divergence Feature Gain

Authors:

Harry Printz, IBM (USA)

Page (NA) Paper number 1129

Abstract:

Maximum entropy / minimum divergence modeling is a powerful technique for constructing probability models, which has been applied to a variety of problems in natural language processing. A maximum entropy / minimum divergence (MEMD) model is built from a base model, and a set of feature functions, also called simply features, whose empirical expectations on some training corpus are known. A fundamental difficulty with this technique is that while there are typically millions of features that could be incorporated into a given model, in general it is not computationally feasible, or even desirable, to use them all. Thus some means must be devised for determining each feature's predictive power, also known as its gain. Once the gains are known, the features can be ranked according to their utility, and only the most gainful ones retained. This paper presents a new algorithm for computing feature gain that is fast, accurate and memory-efficient.

SL981129.PDF (From Author)

TOP


Stochastic Language Models for Speech Recognition and Understanding

Authors:

Giuseppe Riccardi, AT&T-Labs Research (USA)
Allen L. Gorin, AT&T-Labs Research (USA)

Page (NA) Paper number 111

Abstract:

Stochastic language models for speech recognition have traditionally been designed and evaluated in order to optimize word accuracy. In this work, we present a novel framework for training stochastic language models by optimizing two different criteria appropriate for speech recognition and language understanding. First, the language entropy and "salience" measure are used for learning the "relevant" spoken language features (phrases). Secondly, a novel algorithm for training stochastic finite state machines is presented which incorporates the acquired phrase structure into a single stochastic language model. Thirdly, we show the benefit of our novel framework with an end-to-end evaluation of a large vocabulary spoken language system for call routing.

SL980111.PDF (From Author) SL980111.PDF (Rasterized)

TOP


Linguistically Engineered Tools for Speech Recognition Error Analysis

Authors:

Carol Van Ess-Dykema, U.S. Department of Defense (USA)
Klaus Ries, Carnegie Mellon University (USA)

Page (NA) Paper number 787

Abstract:

In order to improve Large Vocabulary Continuous Speech Recognition (LVCSR) systems, it is essential to discover exactly how our current systems are underperforming. The major intellectual tool for solving this problem is error analysis: careful investigation of just which factors are contributing to errors in the recognizers. This paper presents our observations of the effects that discourse (i.e., dialog) modeling has on LVCSR system performance. As our title indicates, we emphasize the recognition error analysis methodology we developed and what it showed us as opposed to emphasizing development of the discourse model itself. In the first analysis of our output data, we focussed on errors that could be eliminated by Dialog Act discourse tagging using Dialog Act-specific language models. In a second analysis, we manipulated the parameterization of the Dialog Act-specific language models, enabling us to acquire evidence of the constraints these models introduced. The word error rate did not significantly decrease since the error rate in the largest category of Dialog Acts, namely Statements, did not significantly decrease. We did, however, observe significant error reduction in the less frequently occurring Dialog Acts and we report on the characteristic of the error corrections. We discovered that discourse models can introduce simple syntactic constraints and that they are most sensitive to parts of speech.

SL980787.PDF (From Author) SL980787.PDF (Rasterized)

TOP


Estimating Entropy of a Language from Optimal Word Insertion Penalty

Authors:

Kazuya Takeda, Nagoya University (Japan)
Atsunori Ogawa, Nagoya University (Japan)
Fumitada Itakura, Nagoya University (Japan)

Page (NA) Paper number 456

Abstract:

The relationship between the optimal value of word insertion penalty and entropy of the language is discussed, based on the hypothesis that the optimal word insertion penalty compensates the probability given by a language model to the true probability. It is shown that the optimal word insertion penalty can be calculated as the difference between test set entropy of the given language model and true entropy of the given test set sentences. The correctness of the idea is confirmed through recognition experiment, where the entropy of the given set of sentences are estimated from two different language models and word insertion penalty optimized for each language model.

SL980456.PDF (From Author) SL980456.PDF (Rasterized)

TOP


A Linguistic Analysis of Repair Signals in Co-operative Spoken Dialogues

Authors:

Shu-Chuan Tseng, LiLi, University of Bielefeld (Germany)

Page (NA) Paper number 993

Abstract:

This paper presents results of a corpus-based analysis of speech repairs, investigating repair signals which mark the existence of possible repairs. Dividing speech repairs into three parts: erroneous part, editing term and correction, this paper provides empirical evidence which supports the notion that speech repairs are produced in a rather regular syntactic pattern. Phrases seem to play a particular role in the production of speech repairs, as phrasal boundaries frequently correspond to boundaries within or around repairs. Related acoustic-prosodic features highlighting the internal structure of repairs including F0 , duration and tonal patterns are also examined and discussed with respect to specific syntactic patterns.

SL980993.PDF (From Author) SL980993.PDF (Rasterized)

TOP


A Hierarchical Language Model for CSR

Authors:

Francisco J. Valverde-Albacete, Dpto.Tec.Comms. Univ.Carlos III de Madrid (Spain)
José Manuel Pardo, Gr.Tec.Habla. Dep.Ing.Electrónica. UPM (Spain)

Page (NA) Paper number 587

Abstract:

In this paper, we present a new language model that includes some of the most promising techniques for overcoming linguistic inadequacy, - including POS tagging and refining, hierarchical, locally conditioned grammars, parallel modelling of acoustic and linguistic domains - and some of our own: language modelling as language parsing, and a better integration of the whole process with the acoustic model resulting in a richer educt from the language modelling process. We are building this model for a translation into Spanish of the DARPA RM task, maintaining the same 1k words vocabulary and some 1000 sentences.

SL980587.PDF (From Author) SL980587.PDF (Rasterized)

TOP


Spoken Language Understanding Within Dialogs Using a Graphical Model of Task Structure

Authors:

Jeremy H. Wright, AT&T Labs - Research (USA)
Allen L. Gorin, AT&T Labs - Research (USA)
Alicia Abella, AT&T Labs - Research (USA)

Page (NA) Paper number 385

Abstract:

We describe a procedure for contextual interpretation of spoken sentences within dialogs. Task structure is represented in a graphical form, enabling the interpreter algorithm to be efficient and task-independent. Recognized spoken input may consist either of a single sentence with utterance-verification scores, or of a word lattice with arc weights. A confidence model is used throughout and all inferences are probability-weighted. The interpretation consists of a probability for each class and for each auxiliary information label needed for task completion. Anaphoric references are permitted.

SL980385.PDF (From Author) SL980385.PDF (Rasterized)

TOP


Keyword Extraction of Radio News using Domain Identification based on Categories of an Encyclopedia

Authors:

Yoshimi Suzuki, Yamanashi University (Japan)
Fumiyo Fukumoto, Yamanashi University (Japan)
Yoshihiro Sekiguchi, Yamanashi University (Japan)

Page (NA) Paper number 739

Abstract:

In this paper, we propose a keyword extraction method for dictation of radio news which consists of several domains. In our method, newspaper articles which are automatically classified into suitable domains are used in order to calculate feature vectors. The feature vectors show term-domain interdependence and are used for selecting a suitable domain of each part of radio news. Keywords are extracted by using the selected domain. The results of keyword extraction experiments showed that our methods are effective for keyword extraction of radio news.

SL980739.PDF (From Author) SL980739.PDF (Rasterized)

TOP