Lexical Modeling and Topic Spotting

Chair: Joseph Picone, Mississippi State University, USA

Home


An Automatic Method for Learning a Japanese Lexicon for Recognition of Spontaneous Speech

Authors:

Laura Mayfield Tomokiyo, Carnegie Mellon University (U.S.A.)
Klaus Ries, Universitaet Karlsruhe (Germany)

Volume 1, Page 305, Paper number 2305

Abstract:

When developing a speech recognition system, one must start by deciding what the units to be recognized should be. This is for the most part a straightforward choice in the case of word-based languages such as English, but becomes an issue even in handling languages with a complex compounding system like German; with an agglutinative language like Japanese, which provides no spaces in written text, the choice is not at all obvious. Once an appropriate unit has beendetermined, the problem of consistently segmenting transcriptions of training data must be addressed. This paper describes a method for learning a lexicon from a training corpus which contains no word-level segmentation, applied to the problem of building a Japanese speech recognition system. We show not only that one can satisfactorily segment transcribed training data automatically, avoiding human error, but also that our system, when trained with the automatically segmented corpus, showed a significant improvement in recognition performance.

ic982305.pdf (Scanned)

TOP



Acoustics-Only Based Automatic Phonetic Baseform Generation

Authors:

Bhuvana Ramabhadran, IBM T.J. Watson Research Center (U.S.A.)
Lalit R. Bahl, IBM T.J. Watson Research Center (U.S.A.)
Peter V DeSouza, IBM T.J. Watson Research Center (U.S.A.)
Mukund Padmanabhan, IBM T.J. Watson Research Center (U.S.A.)

Volume 1, Page 309, Paper number 2275

Abstract:

Phonetic baseforms are the basic recognition units in most speech recognition systems. These baseforms are usually determined by linguists once a vocabulary is chosen and not modified thereafter. However, several applications, such as name dialing, require the user be able to add new words to the vocabulary. These new words are often names, or task-specific jargon, that have user-specific pronunciations. This paper describes a novel method for generating phonetic transcriptions (baseforms) of words based on acoustic evidence alone. It does not require either the spelling or any prior acoustic representation of the new word, is vocabulary independent, and does not have any linguistic constraints (pronunciation rules). Our experiments demonstrate the high decoding accuracies obtained when baseforms deduced using this approach are incorporated into our speech recognizer. Also, the error rates on the added words were found to be comparable to or better than when the baseforms were derived by hand.

ic982275.pdf (From Postscript)

TOP



Pronunciation Modelling Using a Hand-Labelled Corpus for Conversational Speech Recognition

Authors:

William J. Byrne, Johns Hopkins University (U.S.A.)
Michael Finke, Carnegie Mellon University (U.S.A.)
Sanjeev P. Khudanpur, Johns Hopkins University (U.S.A.)
John McDonough, Johns Hopkins University (U.S.A.)
Harriet Nock, Cambridge University (U.K.)
Michael D. Riley, AT&T Labs - Research (U.S.A.)
Murat Saraclar, Johns Hopkins University (U.S.A.)
Charles Wooters, US Department of Defense (U.S.A.)
George Zavaliagkos, BBN (U.S.A.)

Volume 1, Page 313, Paper number 2380

Abstract:

Accurately modelling pronunciation variability in conversational speech is an important component of an automatic speech recognition system. We describe some of the projects undertaken in this direction during and after WS97, the Fifth LVCSR Summer Workshop, held at Johns Hopkins University, Baltimore, in July-August, 1997. We first illustrate a use of hand-labelled phonetic transcriptions of a portion of the Switchboard corpus, in conjunction with statistical techniques, to learn alternatives to canonical pronunciations of words. We then describe the use of these alternate pronunciations in an automatic speech recognition system. We demonstrate that the improvement in recognition performance from pronunciation modelling persists as the system is enhanced with better acoustic and language models.

ic982380.pdf (From Postscript)

TOP



The Use of Accent-Specific Pronunciation Dictionaries in Acoustic Model Training

Authors:

Jason J Humphries, Cambridge University (U.K.)
Philip C. Woodland, Cambridge University (U.K.)

Volume 1, Page 317, Paper number 1969

Abstract:

Speech recognition systems are increasingly being built to cover an ever wider range of speaker accents. However, electronically available pronunciation dictionaries (PDs) specific to these accents often do not exist and would be time consuming and expensive to build by hand. This paper explores the use of pronunciation modelling for the synthesis of accent-specific PDs directly from acoustic data, and their use in acoustic model training. It is shown that this is particularly effective when the amount of acoustic data from the new accent region is insufficient to build a new recogniser, and it is necessary to retrain an existing system: a further 15% reduction in word error rate can be achieved over and above the 20% reduction resulting from acoustic model retraining alone. This paper also presents an empirical evaluation of an American English PD which has been synthesised from a British English PD.

ic981969.pdf (From Postscript)

TOP



Specific Language Modelling for New-Word Detection in Continuous-Speech Recognition

Authors:

Rachida El-Meliani, INRS-Telecommunications (Canada)
Douglas O'Shaughnessy, INRS-Telecommunications (Canada)

Volume 1, Page 321, Paper number 1264

Abstract:

The objective of this work is to allow the INRS continuous-speech recognizer to process accurately new words and incorporate them into the vocabulary. Until now only a few new-word detectors have been reported, all of them defining an acoustic filler model different from the models used to represent vocabulary words. In this paper, we define several designs using, unlike other researchers, strictly-lexical fillers and a unique process to perform speech recognition, new-word detection and new-word phonetic transcription. Moreover, we propose here four different types of language models differing in the way they use the limited information we gathered on new words. The best combinations are found to be different from the ones we obtained for keyword spotting.

ic981264.pdf (From Postscript)

TOP



Phonetic Recognition for Spoken Document Retrieval

Authors:

Kenney Ng, MIT (U.S.A.)
Victor W Zue, MIT (U.S.A.)

Volume 1, Page 325, Paper number 1841

Abstract:

This paper describes the development and application of a phonetic recognition system to the task of spoken document retrieval. The recognizer is used to generate phonetic transcriptions of the speech messages which are then processed to produce subword unit representations for indexing and retrieval. Subword units are used as an alternative to words units generated by either keyword spotting or word recognition. We first investigate the use of different acoustic and language models in the speech recognizer in an effort to improve phonetic recognition performance. Then we examine a variety ofsubword unit indexing terms and measure their ability to perform effective spoken document retrieval. Finally, we look at some simple robust indexing and retrieval methods that take into account the characteristics of the recognition errors in an attempt to improve retrieval performance.

ic981841.pdf (From Postscript)

TOP



Topic Extraction with Multiple Topic-Words in Broadcast-News Speech

Authors:

Katsutoshi Ohtsuki, NTT (Japan)
Tatsuo Matsuoka, NTT (Japan)
Shoichi Matsunaga, NTT (Japan)
Sadaoki Furui, Tokyo Institute of Technology (Japan)

Volume 1, Page 329, Paper number 1668

Abstract:

This paper reports on topic extraction in Japanese broadcast-news speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is a more detailed and more flexible approach than using a single word or a single category. A topic-extraction model shows the degree of relevance between each topic-word and each word in the article. For all words in an article, topic-words which have high total relevance score are extracted. We trained the topic-extraction model with five years of newspapers, using the frequency of topic-words taken from headlines and words in articles. The degree of relevance between topic-words and words in articles is calculated on the basis of statistical measures, i.e., mutual information or the chi-square-value. In topic-extraction experiments for recognized broadcast-news speech, we extracted five topic-words from the 10-best hypotheses using a chi-square-based model and found that 76.6% of them agreed with the topic-words chosen by subjects.

ic981668.pdf (From Postscript)

TOP



A Hidden Markov Model Approach to Text Segmentation and Event Tracking

Authors:

Jonathan P Yamron, Dragon Systems Inc. (U.S.A.)
Ira Carp, Dragon Systems Inc. (U.S.A.)
Larry Gillick, Dragon Systems Inc. (U.S.A.)
Steve Lowe, Dragon Systems Inc. (U.S.A.)
Paul Van Mulbregt, Dragon Systems Inc. (U.S.A.)

Volume 1, Page 333, Paper number 2335

Abstract:

Continuing progress in the automatic transcription of broadcast speech via speech recognition has raised the possibility of applying information retrieval techniques to the resulting (errorful) text. For these techniques to be easily applicable, it is highly desirable that the transcripts be segmented into stories. This paper introduces a general methodology based on HMMs and on classical language modeling techniques for automatically inferring story boundaries and for retrieving stories relating to a specific event. In this preliminary work, we report some highly promising results on accurate text. Future work will apply these techniques to errorful transcripts.

ic982335.pdf (From Postscript)

TOP