ICASSP '98 Main Page

General Information

Conference Schedule

Technical Program

    Overview
    50th Annivary Events
    Plenary Sessions
    Special Sessions
    Tutorials
    Technical Sessions

	By Date
		May 12, Tue
		May 13, Wed
		May 14, Thur
		May 15, Fri

By Category
	AE	ANNIV
	COMM	DSP
	IMDSP	MMSP
	NNSP	PLEN
	SP	SPEC
	SSAP	UA
	VLSI

By Author
	A	B	C	D	E
	F	G	H	I	J
	K	L	M	N	O
	P	Q	R	S	T
	U	V	W	X	Y
	Z

Invited Speakers

Registration

Exhibits

Social Events

Coming to Seattle

Satellite Events

Call for Papers/
Author's Kit

Future Conferences

Help

Abstract - SP25

SP25.1	Improved Lexical Tree Search for Large Vocabulary Speech Recognition S. Ortmanns, A. Eiden, H. Ney (RWTH-Aachen, Germany) This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a word-conditioned dynamic search space organization for handling trigram language models in a one-pass strategy. In particular, we study several LM look-ahead pruning techniques. Further, we improve the efficiency of this look-ahead technique by exploiting subtree dominance. This method avoids the computation of redundant subtrees within the copies of the lexical prefix tree and thus reduces the memory requirements of the search algorithm. In addition, we present a pruning criterion depending on the state index. The experimental results on the 20000-word NAB'94 task (ARPA North American Business Corpus) indicate that the computational effort can be reduced to 3.3 times real time on a ALPHA 5000 PC without a significant loss in the recognition accuracy.
SP25.2	Efficient Search with Posterior Probability Estimates in HMM-Based Speech Recognition D. Willett, C. Neukirchen, G. Rigoll (Duisburg University, Germany) In this paper we present the methods we developed to estimate posterior probabilities for HMM states in continuous and discrete HMM-based speech recognition systems and several ways to speed up decoding by using theseposterior probability estimates. The proposed pruning techniques are State Deactivation Pruning (SDP), similar to an approach proposed for hybrid recognition systems, and a novel posteriori-based lookahead technique, Posteriori Lookahead Pruning (PLP), that evaluates future posteriors in order to exclude unlikely HMM states as early as possible during search. By applying the proposed methods we managed to vastly reduce the decoding time consumed by our time-synchronous Viterbi-decoder for recognition systems based on the Verbmobil and the Wall Street Journal database with hardly any additional search error.
SP25.3	Improved Search Strategy for Large Vocabulary Continuous Mandarin Speech Recognition T. Ho, K. Yang, K. Huang (National Taiwan University, Taiwan, ROC); L. Lee (Inst. of Info Sci, Academia Sinica, Taiwan, R.O.C) This paper presents a new search strategy for large vocabulary continuous Mandarin speech recognition considering the special structure of Chinese language. This strategy is composed of a forward and a backward passes, between which a high-quality syllable lattice is generated to bridge the syllable-level and word-level decoding processes. In the forward pass, considering the small number of syllables in Chinese language, a frame-synchronous stack decoder is used to integrate the high-order syllable N-Gram language model, so as to generate a very accurate and compact syllable lattice. In the backward pass, considering the special monosyllabic wording structure in Chinese language, the search space for the word-level decoding is expanded dynamically from the syllable lattice, and the best word sequence is extracted based on the knowledge provided by the word pronunciation lexicon and the word N-Gram language model. In the preliminary experiments, it was found that, with this strategy, the character error rate can be reduced by more than 20% as compared with a previous system using syllable-aligned lattice approach on a speaker-adaptive continuous speech recognition task.
SP25.4	Time-First Search for Large Vocabulary Speech Recognition T. Robinson (SoftSound, UK); J. Christie (Cambridge University, UK) This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds time-first, that is partial path hypotheses are extended into the future in the inner loop and a tree walk over the lexicon is performed as an outer loop. Partial word hypotheses are grouped based on language model state. The stack maintains information about groups of hypotheses and whole groups are extended by one word to form new stack entries. An implementation is described of a one-pass decoder employing a 65,000 word lexicon and a disk-based trigram language model. Real time operation is achieved with a small search error, a search space of about 5 Mbyte and a total memory usage of about 35 Mbyte.
SP25.5	Improving Vocabulary Independent HMM Decoding Results by Using the Dynamically Expanding Context M. Kurimo (Helsinki University of Technology, Finland) A method is presented to correct phoneme strings produced by a vocabulary independent speech recognizer. The method first extracts the N best matching result strings using mixture density hidden Markov models (HMMs) trained by neural networks. Then the strings are corrected by the rules generated automatically by the Dynamically Expanding Context (DEC). Finally, the corrected string candidates and the extra alternatives proposed by the DEC are ranked according to the likelihood score of the best HMM path to generate the obtained string. The experiments show that N need not be very large and the method is able to decrease recognition errors from a test data that even has no common words with the training data of the speech recognizer.
SP25.6	Development of Robust Speech Recognition Middleware on Microprocessor N. Hataoka, H. Kokubo, Y. Obuchi, A. Amano (Central Research laboratory, Hitachi Ltd., Japan) We have developed speech recognition middleware on a RISC microprocessor which has robust processing functions against environmental noise and speaker differences. The speech recognition middleware enables developers and users to use a speech recognition process for many possible speech applications, such as car navigation systems and handheld PCs. In this paper, we report implementation issues of speech recognition process in middleware of microprocessors and propose robust noise handling functions using ANC(Adaptive Noise Cancellation) and noise adaptive models. We also propose a new speaker adaptation algorithm, in which the relationships among HMMs(Hidden Markov Models) transfer vectors are provided as a set of pre-trained interpolation coefficients. Experimental evaluations on 1000-word vocabulary speech recognition showed promising results for both robust processing functions of the proposed noise handling methods and the proposed speaker adaptation method.
SP25.7	Mandarin Telephone Speech Recognition for Automatic Telephone Number Directory Service Y. Wang, S. Chen (National Chaio-Tung University, Taiwan, ROC) This paper discusses an HMM-based Mandarin telephone speech recognition method for implementing a prototype system of automatic telephone number directory service. It adopted the GPD/MCE training algorithm to train the HMM models for 100 final-dependent syllable initials and 40 syllable finals. The SBR method was used to compensate the speaker and channel effects. Besides, an RNN-based pre-classification scheme was employed to speed up the recognition search. A syllable recognition rate of 53.7% was achieved. This method was then used to implement an isolated-word recognizer for the prototype system to discriminate 1922 names of bank and insurance companies. Word recognition rates of 94.8% for top-1 and 97.9% for top-3 were achieved.
SP25.8	Design and Implementation of an Auto-attendant System for the T.U.C. Campus Using Speech Recognition K. Koumpis, V. Digalakis (Technical University of Crete, Greece); H. Murveit (Nuance Communications, USA) We present an auto-attendant system, which is based on a statistical speech recognizer and has been developed for the Technical University of Crete (TUC) campus. Auto-attendants allow remote callers to reach a person of department by simply speaking an appropriate name. This is the first speech-recognition system in Greece operating in continuous speech and speaker-independent modes, and we describe our approaches for solving several special phenomena specific to the Greek language. The high recognition accuracy of the engine supports several hundred of names. Evaluation on our database yielded more than 97.5% name retrieval for a dictionary of 350 names of persons and services.
SP25.9	Name Dialing Using Final User Defined Vocabularies in Mobile (GSM & TACS) and Fixed Telephone Networks J. Elvira, J. Torrecilla (Telefonica I + D, Spain) This work presents the results obtained on the evaluation of a new approach for generation of phonetic transcriptions for name dialing applications in different telephone networks and with temporal variations.In this kind of applications on-line construction of user vocabularies is mandatory. The proposed method allows adaptive selection of new transcriptions requiring much less speech utterances for system training than other approaches. The new approach is evaluated using data from different telephone networks (PSTN, GSM and TACS networks) and from different temporal utterances (recordings done in a period of two months).
SP25.10	The RWTH Large Vocabulary Continuous Speech Recognition System H. Ney, L. Welling, S. Ortmanns, K. Beulen, F. Wessel (RWTH Aachen, Germany) In this paper, we present an overview of the RWTH Aachen large vocabulary continuous speech recognizer. The recognizer is based on continuous density hidden Markov models and a time-synchronous left--to--right beam search strategy. Experimental results on the ARPA Wall Street Journal (WSJ) corpus verify the effects of several system components, namely linear discriminant analysis, vocal tract normalization, pronunciation lexicon and cross--word triphones, on the recognition performance.

< Previous Abstract - SP24

SP26 - Next Abstract >