Session T1A Keyword and Topic Spotting

Chairperson Joseph Mariani LIMSI-CNRS, France

Home

KEY-PHRASE SPOTTING USING AN INTEGRATED LANGUAGE MODEL OF N-GRAMS AND FINITE-STATE GRAMMAR

Authors: Qiguang Lin Dave Lubensky Michael Picheny P. Srinivasa Rao

IBM Watson Research Center - Human Language Technologies Group P.O. Box 218, Yorktown Heights, NY 10598, USA email: qlin@watson.ibm.com

Volume 1 pages 255 - 258

ABSTRACT

This paper describes a new algorithm for key-phrase spotting applications. The algorithm consists of three processes. The first process is to synergistically integrate N-grams with Finite-State Grammars (FSG) -- the two conventional language models (LM) for speech recognition. All the key phrases to be spotted are covered by the FSG component of the recognizer's LM, while the N-grams are used for decoding surrounding non-key phrases. Secondly, selective weighting is proposed and implemented. The weighting parameters independently control the triggering and completion of FSG on top of N-grams. Finally, the third process involves a word confirmation and rejection logic which determines whether to accept or reject a hypothesized key phrase. The proposed algorithm has been favorably evaluated on two separate experiments. In these experiments, only the FSG part of the LM need be updated for different application tasks while the N-gram part can remain unchanged.

A0007.pdf

TOP

EFFICIENT METHODS FOR DETECTING KEYWORDS IN CONTINUOUS SPEECH

Authors: Jochen Junkawitsch 1 , Günther Ruske 2 , Harald Höge 1

1 Siemens AG, Otto-Hahn-Ring 6, D-81730 Munich, Germany 2 Institute for Human-Machine-Communication, Munich University of Technology, Germany email: Jochen.Junkawitsch@mchp.siemens.de

Volume 1 pages 259 - 262

ABSTRACT

This paper refers to our prosperous development of algorithms for detecting keywords in continuous speech. Two different approaches to define confidence measures are introduced. As an advantage, these definitions are theoretically calculable without artful tuning. Moreover, two distinct decoding algorithms are presented, that incorporate these confidence measures into the search procedure. One is a new possibility of detecting keywords in continuous speech, using the standard Viterbi algorithm without modeling the non-keyword parts of the utterance. The other one is an improved further development of an algorithm described in [1], also without the need of modeling the non-keyword parts.

A0036.pdf

TOP

PROVIDING SUBLEXICAL CONSTRAINTS FOR WORD SPOTTING WITHIN THE ANGIE FRAMEWORK 1

Authors: Raymond Lau and Stephanie Seneff

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu, mailto:fraylau, seneffg@mit.edu

Volume 1 pages 263 - 266

ABSTRACT

We describe our recent work in implementing a word-spotting system based on the ANGIE framework and the effects of varying the nature of the sublexical constraints placed upon the word-spotter's filler model. ANGIE is a framework for modelling speech where the morphological and phonological substructures of words are jointly characterized by a context-free grammar and are represented in a multi-layered hierarchical structure. In this representation, the upper layers capture syllabification, morphology, and stress, the preterminal layer represents phonemics, and the bottom terminal categories are the phones. ANGIE provides a flexible framework where we can explore the effects of sublexical constraints within a word-spotting environment. Our experiments with spotting city names in ATIS validate the intuition that increasing the constraints present in the model improves performance, from 85.3 FOM for phone bigram to 89.3 FOM for a word lexicon. They also empirically strengthens our belief that ANGIE provides a feasible framework for various speech recognition tasks, of which word-spotting is one.

A0037.pdf

TOP

Usefulness of phonetic parameters in a rejection procedure of an HMM based speech recognition system

Authors: K. Bartkova & D. Jouvet

France Télécom, CNET /DIH/RCP, Technopole Anticipa, 2, Avenue P. Marzin, 22307 Lannion Cedex, France Tel: (33)-2-96-05-10-58, e-mail bartkova@lannion.cnet.fr Fax: (33)-2-96-05-35-30

Volume 1 pages 267 - 270

ABSTRACT

The aim of this paper was to study the efficiency of sound duration, degree of sound voicing and sound energy in a rejection procedure of an automatic speech recognition system. A modelling of the three parameters was achieved using statistical models estimated on vocabulary words, out-of-vocabulary words and noise tokens. The rejection of out-of-vocabulary words and noises depended on the score obtained by comparing the probability given by the different models. However, such an approach also cause false rejection (rejection of vocabulary words). A trade-off was therefore necessary between the false rejection rate and the false alarm rate on out-of-vocabulary words and noise tokens. The degree of voicing turned out to be the most efficient parameter for rejecting noise tokens; it reduced the HMM false acceptance rate from 6.3% down to 2.3% for the same amount of false rejection rate (9%). The duration parameter provided better performance for laboratory data, reducing the error rate on French numbers from 3.1% to 1.5% for a 5% false rejection rate.

A0131.pdf

TOP

Keyword Spotting Using F0 Contour Matching

Authors: Yoichi Yamashita (1) , Riichiro Mizoguchi (2)

(1) Dep. of Computer Science, Ritsumeikan University 1-1-1, Noji-Higashi, Kusatsu-shi, Shiga, 525-77 Japan yama@cs.ritsumei.ac.jp (2) I.S.I.R., Osaka University 8-1, Mihogaoka, Ibaraki-shi, Osaka, 567 Japan miz@ei.sanken.osaka-u.ac.jp

Volume 1 pages 271 - 274

ABSTRACT

This paper describes keyword spotting using prosodic information as well as phonemic information. A Japanese word has its own F0 contour based on the lexical accent type and the F0 contour is preserved in sentences. Prosodic dissimilarity between a keyword and input speech is measured by DP matching of F0 contours. Phonemic score is calculated by a conventional HMM technique. A total score based on these two measures is used for detecting keywords. The F0 contour of the keyword is smoothed by using an F0 model. Evaluation test was carried out on recorded speech of a TV news program. The introduction of prosodic information reduces false alarms by 30% or 50% for wide ranges of the detection rate.

A0653.pdf

TOP

A Frame and Segment Based Approach for Topic Spotting

Authors: E. Noth, S. Harbeck, H. Niemann, V. Warnke

Universitat Erlangen-Nurnberg, Lehrstuhl fur Mustererkennung, Martensstr. 3, 91058 Erlangen, Germany Tel.: +49 9131 / 857888 Fax.: +49 9131 / 303811 noeth@informatik.uni-erlangen.de http://www5.informatik.uni-erlangen.de

Volume 1 pages 275 - 278

ABSTRACT

In this paper we present a new approach for topic spotting based on subword units (phonemes and feature vectors) instead of words. Classification of topics is done by running topic dependent polygram language models over these symbol sequences and deciding for the one with the best score. We trained and tested the two methods on three dierent corpora. The first is a part of a media corpus which contains data from TV shows for three different topics (IDS), the second is part of the Switchboard corpus, the third is a collection of human machine dialogs about train timetable information (EVAR corpus). The results on Switchboard are compared with phoneme based approaches which were made at CRIM (Montreal) and DRA (Malvern) and are presented as ROC curves; the results on IDS and EVAR are compared with a word based approach and presented as confusion tables. We show that a surprisingly little amount of recognition accuracy is lost when going from word to subword based topic spotting.

A0699.pdf