Session TAC Acoustic Modelling II

Chairperson Vassilios Digalakis Technical Univ. of Crete, Greece

Home

INCORPORATING LINGUISTIC KNOWLEDGE AND AUTOMATIC BASEFORM GENERATION IN ACOUSTIC SUBWORD UNIT BASED SPEECH RECOGNITION

Authors: Trym Holter and Torbjorn Svendsen

Department ofTelecommunications, Norwegian University of Science and Technology O.S. Bragstads plass 2B, N-7034 Trondheim, Norway E-mail: holter@tele.ntnu.no or svendsen@tele.ntnu.no Tel.:+47 73 594318. Fax: +47 73 592640.

Volume 3 pages 1159 - 1162

ABSTRACT

A major challenge in speech recognition based on acoustic subword units is creating a lexicon which is robust to inter- and intra-speaker variations. In this paper we present two different approaches for incorporating simple word-level linguistic knowledge into the labelling step of the training procedure. The proposed systems also utilise a scheme for combined optimisation of baseforms and subword models. For the TI46 database, these methods are shown to greatly improve the performance compared to an acoustic subword based speech recogniser employing unsupervised labelling, and they are found to perform as well as systems utilising whole-word models and context independent phoneme models.

A0067.pdf

TOP

Modelling and Decoding of Crossword Context Dependent Phones in the Philips Large Vocabulary Continuous Speech Recognition System

Authors: Peter Beyerlein, Meinhard Ullrich, Patricia Wilcox

Philips GmbH Forschungslaboratorien Aachen, P.O. Box 50 01 45, D-52085 Aachen, Germany

Volume 3 pages 1163 - 1166

ABSTRACT

The performance of the Philips system for large vocabulary continuous speech recognition has been improved significantly by crossword N-phone modelling, enhanced clustering of HMM-states during training, consistent handling of untrained HMM-states during decoding and a new effcient crossword N-phone M-gram decoding strategy. We report word error rate reductions of up to 18% on various ARPA test sets as compared to our best within-word triphone system, based on Laplacian densities, Viterbi decoding and lterbank-LDA features. The following two issues are addressed: a) Transformation of a tree-organized bigram beam- search decoder into an effcient tree-organized decoder capable of handling long-span acoustic contexts as well as long-span language model contexts. b) State-clustering and generalizing of unseen contexts for the case of Laplacian emission probability density functions.

A0094.pdf

TOP

Modelling inter-frame dependence with preceeding and succeeding frames

Authors: P.Hanna, J.Ming, P. O'Boyle & F.J. Smith

School of Electrical Engineering and Computer Science The Queen's University of Belfast, Belfast, BT7 INN, Northern Ireland E-mail: P.Hanna@qub.ac.uk, J.Ming@qub.ac.uk, P.OBoyle@qub.ac.uk, FJ.Smith@qub.ac.uk Tel: (+44 1232) 245133 x 4538 Fax: (+44 1232) 666520

Volume 3 pages 1167 - 1170

ABSTRACT

This paper explores the modelling of inter-frame dependence as a means of improving the performance of HMMs. More specifically, a model based on the IFD- HMM (Ming & Smith, 1996) that assumes a dependency upon both succeeding and preceeding frames is proposed. The means by which a dependency upon succeeding frames might be integrated into a HMM framework are explored, and a mathematical outline of the proposed extension given. The results of various tests aimed towards exploring the consequences of introducing succeeding frame dependencies are included. It was found that a dependency upon succeeding frames enabled dynamic spectral information, not found in the preceeding frames, to be usefully employed; resulting in a significant increase in the recognition accuracy. Additionally, it was shown that modelling of the dynamic spectral information (using time-lag sequences) was at least as important as improved modelling of the instantaneous spectra (using multiple mixtures).

A0117.pdf

TOP

CONTINUOUS SPEECH RECOGNITION USING SYLLABLES

Authors: Rhys James Jones (1), Simon Downey (2), John S. Mason (3)

(1),(3) Speech Research Group, Department of Electrical & Electronic Engineering, University of Wales Swansea SA2 8PP, UK (2) Speech Technology Unit, BT Laboratories, Martlesham Heath, Ipswich, Suffolk IP5 7RE, UK R.J.Jones@swansea.ac.uk, downey@saltfarm.bt.co.uk, J.S.D.Mason@swansea.ac.uk

Volume 3 pages 1171 - 1174

ABSTRACT

The vast majority of work in continuous speech recognition uses phoneme-like units as the basic recognition component. The work presented here investigates the practicability of syllable-like units as the building blocks for recognition. A phonetically annotated telephony database is analysed at the syllable level, and a set of syllable-based HMMs are built. Refinements including the introduction of syllable-level bigram probabilities, word- and syllable-level insertion penalties, and the investigation of different model topologies are found to improve recogniser performance. It is found that the syllable-based recogniser gives recognition accuracies of over 60%, which compares with 35% as the baseline accuracy for monophone recognition. It is envisaged that practical applications of syllable recognition could be in a hybrid system, where the most common syllable HMMs would be used in conjunction with whole-word and phoneme models.

A0295.pdf

TOP

A NEW APPROACH TO GENERALIZED MIXTURE TYING FOR CONTINUOUS HMM-BASED SPEECH RECOGNITION

Authors: Daniel Willett, Gerhard Rigoll

Department of Computer Science Faculty of Electrical Engineering Gerhard-Mercator-University Duisburg, Germany e-mail: fwillett,rigollg@fb9-ti.uni-duisburg.de

Volume 3 pages 1175 - 1178

ABSTRACT

In this paper we present a new approach for a generalized tying of mixture components for continuous mixture-density HMM-based speech recognition systems. With an iterative pruning and splitting procedure for the mixture components, this approach offers a very accurate and detailed representation of the acoustic space and at the same time keeps the number of parameters reasonably small in favor of a robust parameter estimation and a fast decoding. Contrary to other approaches, it does not require a strict clustering of the pdfs into subsets that share their mixture components, so that it is capable of providing more general and flexible types of mixture tying. We applied the new approach on a semi-continuous HMM (SCHMM)-system for the Resource Management task and improved its recognition performance by 12% and vastly accelerated the decoding because of a much faster likelihood computation.

A0351.pdf

TOP

STATE TYING FOR CONTEXT DEPENDENT PHONEME MODELS

Authors: K. Beulen E. Bransch H. Ney

Lehrstuhl fur Informatik VI, RWTH Aachen - University ofTechnology, D-52056 Aachen

Volume 3 pages 1179 - 1182

ABSTRACT

In this paper several modifications of two methods for parameter reduction of Hidden Markov Models by state tying are described. The two methods represent a data driven clustering triphone states with a bottom up algorithm [3, 9], and a top down method growing decision trees for triphone states [2, 10]. We investigate several aspects of state tying as the possible reduction of the word error rate by state tying, the consequences of different distance measures for the data driven approach and modications of the original decision tree approach such as node merging. The tests were performed on the test corpora for the 5 000 word vocabulary of the WSJ November 92 task and on the evaluation corpora for the 3 000 word VERBMOBIL '95 task. The word error rate by state tying was reduced by 14% for the WSJ task and by 5% for the VERBMOBIL task.

A0352.pdf

TOP

A NOVEL NODE SPLITTING CRITERION IN DECISION TREE CONSTRUCTION FOR SEMI-CONTINUOUS HMMS

Authors: Jacques Duchateau, Kris Demuynck and Dirk Van Compernolle

Katholieke Universiteit Leuven - E.S.A.T. Kardinaal Mercierlaan 94 B-3001 Heverlee, Belgium E-mail: Jacques.Duchateau@esat.kuleuven.ac.be

Volume 3 pages 1183 - 1186

ABSTRACT

In [1], we described how to improve Semi-Continuous Density Hidden Markov Models (SC-HMMs) to be as fast as Continuous Density HMMs (CD-HMMs), whilst outperforming them on large vocabulary recognition tasks with context independent models. In this paper, we extend our work with SC-HMMs to context dependent modelling. We propose a novel node splitting criterion in an approach with phonetic decision trees. It is based on a distance measure between mixture gaussian probability density functions (pdfs) as used in the final tied state SC-HMMs, this in contrast with other criteria which are based on simplified pdfs to manage the algorithm complexity. Results on the ARPA Resource Management task show that the proposed criterion outperforms two of these criteria with simplified pdfs.

A0366.pdf

TOP

CREATING UNSEEN TRIPHONES BY PHONE CONCATENATION IN THE SPECTRAL, CEPSTRAL AND FORMANT DOMAINS

Authors: Mats Blomberg

Dept. of Speech, Music and Hearing, KTH, Stockholm E-mail: mats@speech.kth.se

Volume 3 pages 1187 - 1190

ABSTRACT

A technique for predicting triphones by concatenation of diphone or monophone models is studied. The models are connected using linear interpolation between endpoints of piece-wise linear parameter trajectories. Three types of spectral representation are compared: formants, filter amplitudes and cepstrum coefficients. The proposed technique lowers the spectral distortion of the phones for all three representations when different speakers are used for training and evaluation. The average error of the created triphones is lower in the filter and cepstrum domains than for formants. This is explained to be caused by limitations in the Analysis-by-Synthesis formant tracking algorithm. A small improvement with the proposed technique is achieved for all representations in the task of reordering N-best sentence recognition candidate lists.

A0424.pdf

TOP

CREATING LARGE SUBWORD UNITS FOR SPEECH RECOGNTION

Authors: T.Pfau, M.Beham (1), W.Reichl (2), G.Ruske

Institute for Human-Machine-Communication, Technical University of Munich, Arcisstr. 21, D-80290 München, Germany (1) pc-plus-COMPUTING, Grillparzer Str.10, D-81675 München, Germany (2) Dialogue Systems Research Department, Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974, USA Tel.: +49 89 289-28554, Fax: +49 89 289-28535, E-mail: {pfa, rus}@mmk.e-technik.tu-muenchen.de

Volume 3 pages 1191 - 1194

ABSTRACT

This paper deals with the choice of suitable subword units (SWU) for a HMM based speech recognition system. Using demisyllables (including phonemes) as base units, an inventory of domain-specific larger sized subword units, so-called macro-demisyllables (MDS), is created. A quality measure for the automatic decomposition of all single words into subword units is presented which takes into account the trainability of the chosen units. To create the whole inventory an iterative procedure is applied with respect to the predefined quality measure. Each MDS is represented by a dedicated HMM. By tying the densities of specific phonemes, only the number of mixture coefficients and transitions increases in comparison to the original phoneme models. Recogniton experiments within the German Verbmobil evaluation 1996 show that the new simple MDS models are as powerful as standard triphone models, although our MDS models are up to now context-independent.

A0488.pdf

TOP

Segmental Modeling Using a Continuous Mixture of Non-parametric Models

Authors: Jacob Goldberger , David Burshtein and Horacio Franco **

* Tel-Aviv University, Israel jacob,burstyn@eng.tau.ac.il ** SRI International, CA, USA hef@speech.sri.com

Volume 3 pages 1195 - 1198

ABSTRACT

The aim of the research described in this paper is to overcome the modeling limitation of conventional hidden Markov models. We present a segmental model that consists of two elements. The first is a nonparametric representation of both the mean and variance trajectories, which describes the local dynamics. The second element is some parameterized transformation (e.g., random shift) of the trajectory that is global to the segment and models long-term variations such as speaker identity.

A0490.pdf

TOP

SEGMENTATION AND MODELING IN SEGMENT-BASED RECOGNITION

Authors: Jane W. Chang and James R. Glass

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu fjwc, jrgg@sls.lcs.mit.edu

Volume 3 pages 1199 - 1202

ABSTRACT

Recently, we have developed a probabilistic framework for segment-based speech recognition that represents the speech signal as a network of segments and associated feature vectors [2]. Although in general, each path through the network does not traverse all segments, we argued that each path must account for all feature vectors in the network. We then demonstrated an efficient search algorithm that uses a single additional model to account for segments that are not traversed. In this paper, we present two new extensions to our framework. First, we replace our acoustic segmentation algorithm with "segmentation by recognition," a probabilistic algorithm that can combine multiple contextual constraints towards hypothesizing only the most likely segments. Second, we generalize our framework to "near-miss modeling" and describe a search algorithm that can efficiently use multiple models to enforce contextual constraints across all segments in a network. We report experiments in phonetic recognition on the TIMIT corpus in which we achieve a diphone context-dependent error rate of 26.6% on the NIST core test set over 39 classes. This is a 12.8% reduction in error rate from our best previously reported result.

A0633.pdf

TOP

Using Syllables in a Hybrid HMM-ANN Recognition System

Authors: Alfred Hauenstein *

Siemens AG, Corporate Technology, Otto-Hahn-Ring 6, 81730 München, Germany email: Alfred.Hauenstein@mchp.siemens.de

Volume 3 pages 1203 - 1206

ABSTRACT

An approach to speech recognition using syllables as basic modelling units is compared to a state-of-the-art system employing phonemes. The technological framework is a hybrid HMM-ANN 1 recognition system applied on small to medium vocabulary recognition tasks. Although the number of units to be classified nearly doubles, it is shown that the syllable can outperform the phoneme slightly but significantly in terms of unit classification capability, measured as frame error rate. Compar- ing the overall system performance (measured in word error rate) the phoneme-based system still performs obviously better for continuous speech tasks, while the syllable-based system is superior for isolated word recognition tasks on cross-database tests. This suggests the need for further work on the understanding of the interaction of knowledge sources on the frame-, word-, and sentence-level in current recognition systems.

A0675.pdf

TOP

NOISE ROBUST SEGMENT-BASED WORD RECOGNITION USING VECTOR QUANTISATION

Authors: Ramalingam Hariharan, Juha Häkkinen, Kari Laurila and Janne Suontausta

Speech and Audio Systems Laboratory, Nokia Research Center P.O. Box 100, FIN-33721 Tampere, Finland. E-mail: {ramalingam.hariharan, juha.hakkinen, kari.laurila, janne.suontausta}@research.nokia.com FAX: +358 3 272 5897

Volume 3 pages 1207 - 1210

ABSTRACT

Segment-based speech recognition systems have been proposed in recent years to overcome some of the deficiencies of the current state-of-the-art HMM based systems. In this paper, we present a segmental speech recogniser, where the speech trajectory segments are modelled using their mean, variance and shape. The shape is chosen from a codebook of global vector quantised trajectories, obtained from uniformly segmented training utterances. Experiments were done for a speaker dependent isolated word recognition application under different noise environments. The results have shown that this segment based approach outperforms HMM based speech recognition systems under similar test conditions. In adverse noise conditions, up to 34% error rate reduction was achieved.

A0737.pdf

TOP

VITERBI BASED SPLITTING OF PHONEME HMM'S

Authors: L. J. Rodriguez and M. I. Torres

Dpto. Electricidad y Electr—nica. Fac. Ciencias. Universidad del Pa's Vasco. Apdo. 644. 48080 Bilbao (Spain) E-mail: luisja@we.lc.ehu.es; manes@we.lc.ehu.es

Volume 3 pages 1211 - 1214

ABSTRACT

Continuous Speech Recognition Systems (CSR) usually include large sets of context dependent units to model contextual variations in the pronunciation of phones. The goal of this work was to obtain adequate sets of sub-lexical models by using acoustic information but excluding any previous phonological knowledge. At each iteration of a classical Viterbi training scheme each acoustic model was split into a set of more accurate models. This approach was evaluated over a Spanish acoustic phonetic decoding task. The experimental results showed that this approach produces similar recognition rates than classical triphones.

A0830.pdf

TOP

THE DEMIPHONE: AN EFFICIENT SUBWORD UNIT FOR CONTINUOUS SPEECH RECOGNITION

Authors: Jos B. Marino, Albin Nogueiras, Antonio Bonafonte

Universitat Politecnica de Catalunya c) Jordi Girona 1-3 08034 Barcelona SPAIN {canton / albino / antonio}@gps.tsc.upc.es

Volume 3 pages 1215 - 1218

ABSTRACT

In this paper we introduce the demiphone as a contextual phonetic unit for continuous speech recognition. A phone is divided into two parts: a left demiphone that accounts for the left side coarticulation and a right demiphone that copes with the right side context. This new unit discards the dependence between the effects of both side contexts, but provides a better training of the transition between phones. The demiphone can be seen as a heuristic clustering of states that allows a more smoothed training of hidden Markov models and additionally supplies a simple way to create unseen triphones. We report experimental evidence that demiphones outperform the usual combination of triphones, right-side and left-side biphones and monophones.

A0831.pdf

TOP

ORGANIZING PHONE MODELS BASED ON PIECEWISE LINEAR SEGMENT LATTICES OF SPEECH SAMPLES

Authors: H. Kojima and K. Tanaka

Machine Understanding Division Electrotechnical Laboratory, AIST, MITI 1-1-4 Umezono, Tsukuba, Ibaraki 305, Japan Tel: +81-298-58-5937, FAX: +81-298-58-5939, E-mail: hkojima@etl.go.jp

Volume 3 pages 1219 - 1222

ABSTRACT

Aiming at robust speech recognition, we have proposed a framework for "phonological concept formation," which is the task of acquiring an efficient representation of phonemes from spoken word samples without using any transcriptions except for the lexical classification of the words. In order to implement this task, we propose the "piecewise linear segment lattice (PLSL)" model for phoneme representation. The structure of this model is a lattice of segments, each of which is represented as regression coefficients of feature vectors within the segment. In order to organize phone models, operations including division, concatenation, blocking and clustering are applied to the models. Feasibility of the method is discussed with experimental results for isolated word recognition. The recognition rate is improved by applying these operations.

A1009.pdf

TOP

AUTOMATIC ARCHITECTURE DESIGN BY LIKELIHOOD-BASED CONTEXT CLUSTERING WITH CROSSVALIDATION

Authors: Ivica Rogina

Interactive Systems Labs University of Karlsruhe, Am Fasanengarten 5, 76131 Karlsruhe, Germany E-mail: rogina@ira.uka.de

Volume 3 pages 1223 - 1226

ABSTRACT

Most state-of-the-art speech recognizers benefit from some kind of context information in their acoustic modeling [1][2][3]. The most common approach to context clustering is a divisive method that is iteratively building decision trees [4][5]. The problem, when to stop the growing of the tree is usually solved by choosing the maximum number of resulting models that can be supported by the available training data and/or computer memory and CPU power. In this paper we propose a new algorithm, that not only offers an optimized stopping criterion, but also uses a likelihood-based distance measure that optimizes the likelihood of unseen training-data at every splitting of a decision tree node. We evaluate our algorithm on the Wall Street Journal task, and show that it outperforms an algorithm using an entropy-based distance measure.

A1012.pdf

TOP

TOWARDS ARTICULATORY SPEECH RECOGNITION: LEARNING SMOOTH MAPS TO RECOVER ARTICULATOR INFORMATION

Authors: Sam Roweis (1), Abeer Alwan (2)

(1) Computation and Neural Systems California Institute of Technology (2) Department of Electrical Engineering University of California, Los Angeles

Volume 3 pages 1227 - 1230

ABSTRACT

We present a novel method for recovering articulator movements from speech acoustics based on a constrained form [9] of a hidden Markov model. The model attempts to explain sequences of high dimensional data using smooth and slow trajectories in a latent variable space. The key insight is that this continuity constraint when applied to speech helps to solve the \ill-posed" problem of acoustic to articulatory mapping. By working with sequences of spectra rather than looking only at individual spectra, it is possible to choose between competing articulatory congurations for any given spectrum by selecting the conguration \closest" to those at nearby times. We present results of applying this algorithm to recover articulator movements from acoustics using data from the Wisconsin X-ray microbeam project [3]. We find that the recovered traces are highly correlated with the measured articulator movements under a single linear transform. Such recovered traces have the potential to be used for speech recognition, an application we are currently investigating.

A1060.pdf

TOP

SELECTION OF THE MOST EFFECTIVE SET OF SUBWORD UNITS FOR AN HMM-BASED SPEECH RECOGNITION SYSTEM

Authors: A. Tsopanoglou (1), N Fakotakis (2)

(1) KNOWLEDGE S.A., Human Machine Communication Dept., N.E.O Patron Athinon 37, 264 41 Patras, Greece Tel: +30 61 452820, Fax: +30 61 453819, e-mail:KNOWLEDGE@Patra.hol.gr (2) Wire Communications Laboratory (WCL), Electrical & Computer Engineering Dept., University of Patras, 261 10 Patras, Greece Tel: +30 61 991722, Fax: +30 61 991855, e-mail:fakotaki@WCL.ee.upatras.gr

Volume 3 pages 1231 - 1234

ABSTRACT

In this work we describe several approaches to determine an effective set of subword units for modeling the spoken Greek language. We tried to form a concrete set of basic units which must have the capability of giving a unique phonetic transcription for every input utterance. The results of an extensive set of experiments showed that the use of longer units than phonemes can lead to a significant improvement in a system's performance. Three sets of subword units were finally formed regarding the way we combined the 42 phonemes of the Greek Language. The three approaches showed better results than the baseline phoneme-based system and the most effective one proved to be the second approach in which we used two-phoneme combinations of the types non-vowel/vowel and non-vowel/non-vowel. The phoneme recognition rate of the system increased almost by 9% (reaching a level of 78.65%) for the best situation compared to the baseline system.

A1197.pdf

TOP

Multi-Band Continuous Speech Recognition

Authors: Christophe CERISARA, Jean-Paul HATON, Jean-François MARI, Dominique FOHR

CRIN/CNRS, Bâtiment LORIA, Campus Scientifique, BP 239 54506 Vandoeuvre-les-Nancy CEDEX

Volume 3 pages 1235 - 1238

ABSTRACT

The problem addressed by this paper is to enhance the continuous speech recognizers robustness to noise. For this purpose, the acoustic signal is filtered into several spectral bands, and independent recognition is achieved in each band. Then, the system recombines the results given by each recognizer and delivers a unique solution. The main advantage of this method is to consider the signal only in the bands which are relevant, and to ignore spectral bands which are corrupted by noise. We are developping a speaker-independent continuous speech recognizer based on this principle.

A1230.pdf

TOP

THE DESIGN OF ACOUSTIC PARAMETERS FOR SPEAKER-INDEPENDENT SPEECH RECOGNITION

Authors: Nabil N. Bitar and Carol Y. Espy-Wilson

Boston University, Electrical & Computer Engineering Dept. 44 Cummington St. Boston, MA. 02215 E-mail: nabil@bu.edu, espy@bu.edu

Volume 3 pages 1239 - 1242

ABSTRACT

This paper presents a two-stage procedure, based on the Fisher criterion and automatic classification trees, for designing acoustic parameters (APs) that target phonetic features in the speech signal. This procedure and a subset of the TIMIT 1 training set were used to develop acoustic parameters for the phonetic features: sonorant, syllabic, strident, palatal, alveolar, labial and velar. Results on a subset of the TIMIT test set show that the developed parameters achieve correct phonetic-feature classification rates in the 90 % range with the exception of stop- consonant place of articulation (labial, alveolar and velar) where correct classification is about 73 %. Furthermore, it is shown that by basing the acoustic parameters on relative measures (e.g. an acoustic parameter that measures energy in a frequency band relative to energy in the same band at another time instant) the effect of interspeaker variability (e.g. gender) on the parameters is reduced.

A1291.pdf

Session TAC Acoustic Modelling II

Chairperson Vassilios Digalakis Technical Univ. of Crete, Greece

Authors: Trym Holter and Torbjorn Svendsen

Department ofTelecommunications, Norwegian University of Science and Technology O.S. Bragstads plass 2B, N-7034 Trondheim, Norway E-mail: holter@tele.ntnu.no or svendsen@tele.ntnu.no Tel.:+47 73 594318. Fax: +47 73 592640.

Volume 3 pages 1159 - 1162

Authors: Peter Beyerlein, Meinhard Ullrich, Patricia Wilcox

Philips GmbH Forschungslaboratorien Aachen, P.O. Box 50 01 45, D-52085 Aachen, Germany

Volume 3 pages 1163 - 1166

Authors: P.Hanna, J.Ming, P. O'Boyle & F.J. Smith

School of Electrical Engineering and Computer Science The Queen's University of Belfast, Belfast, BT7 INN, Northern Ireland E-mail: P.Hanna@qub.ac.uk, J.Ming@qub.ac.uk, P.OBoyle@qub.ac.uk, FJ.Smith@qub.ac.uk Tel: (+44 1232) 245133 x 4538 Fax: (+44 1232) 666520

Volume 3 pages 1167 - 1170

Authors: Rhys James Jones (1), Simon Downey (2), John S. Mason (3)

(1),(3) Speech Research Group, Department of Electrical & Electronic Engineering, University of Wales Swansea SA2 8PP, UK (2) Speech Technology Unit, BT Laboratories, Martlesham Heath, Ipswich, Suffolk IP5 7RE, UK R.J.Jones@swansea.ac.uk, downey@saltfarm.bt.co.uk, J.S.D.Mason@swansea.ac.uk

Volume 3 pages 1171 - 1174

Authors: Daniel Willett, Gerhard Rigoll

Department of Computer Science Faculty of Electrical Engineering Gerhard-Mercator-University Duisburg, Germany e-mail: fwillett,rigollg@fb9-ti.uni-duisburg.de

Volume 3 pages 1175 - 1178

Authors: K. Beulen E. Bransch H. Ney

Lehrstuhl fur Informatik VI, RWTH Aachen - University ofTechnology, D-52056 Aachen

Volume 3 pages 1179 - 1182

Authors: Jacques Duchateau, Kris Demuynck and Dirk Van Compernolle

Katholieke Universiteit Leuven - E.S.A.T. Kardinaal Mercierlaan 94 B-3001 Heverlee, Belgium E-mail: Jacques.Duchateau@esat.kuleuven.ac.be

Volume 3 pages 1183 - 1186

Authors: Mats Blomberg

Dept. of Speech, Music and Hearing, KTH, Stockholm E-mail: mats@speech.kth.se

Volume 3 pages 1187 - 1190

Authors: T.Pfau, M.Beham (1), W.Reichl (2), G.Ruske

Volume 3 pages 1191 - 1194

Authors: Jacob Goldberger *, David Burshtein * and Horacio Franco **

* Tel-Aviv University, Israel jacob,burstyn@eng.tau.ac.il ** SRI International, CA, USA hef@speech.sri.com

Volume 3 pages 1195 - 1198

Authors: Jane W. Chang and James R. Glass

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu fjwc, jrgg@sls.lcs.mit.edu

Volume 3 pages 1199 - 1202

Authors: Alfred Hauenstein *

Siemens AG, Corporate Technology, Otto-Hahn-Ring 6, 81730 München, Germany email: Alfred.Hauenstein@mchp.siemens.de

Volume 3 pages 1203 - 1206

Authors: Ramalingam Hariharan, Juha Häkkinen, Kari Laurila and Janne Suontausta

Speech and Audio Systems Laboratory, Nokia Research Center P.O. Box 100, FIN-33721 Tampere, Finland. E-mail: {ramalingam.hariharan, juha.hakkinen, kari.laurila, janne.suontausta}@research.nokia.com FAX: +358 3 272 5897

Volume 3 pages 1207 - 1210

Authors: L. J. Rodriguez and M. I. Torres

Dpto. Electricidad y Electr—nica. Fac. Ciencias. Universidad del Pa's Vasco. Apdo. 644. 48080 Bilbao (Spain) E-mail: luisja@we.lc.ehu.es; manes@we.lc.ehu.es

Volume 3 pages 1211 - 1214

Authors: Jos B. Marino, Albin Nogueiras, Antonio Bonafonte

Universitat Politecnica de Catalunya c) Jordi Girona 1-3 08034 Barcelona SPAIN {canton / albino / antonio}@gps.tsc.upc.es

Volume 3 pages 1215 - 1218

Authors: H. Kojima and K. Tanaka

Machine Understanding Division Electrotechnical Laboratory, AIST, MITI 1-1-4 Umezono, Tsukuba, Ibaraki 305, Japan Tel: +81-298-58-5937, FAX: +81-298-58-5939, E-mail: hkojima@etl.go.jp

Volume 3 pages 1219 - 1222

Authors: Ivica Rogina

Interactive Systems Labs University of Karlsruhe, Am Fasanengarten 5, 76131 Karlsruhe, Germany E-mail: rogina@ira.uka.de

Volume 3 pages 1223 - 1226

Authors: Sam Roweis (1), Abeer Alwan (2)

(1) Computation and Neural Systems California Institute of Technology (2) Department of Electrical Engineering University of California, Los Angeles

Volume 3 pages 1227 - 1230

Authors: A. Tsopanoglou (1), N Fakotakis (2)

Volume 3 pages 1231 - 1234

Authors: Christophe CERISARA, Jean-Paul HATON, Jean-François MARI, Dominique FOHR

CRIN/CNRS, Bâtiment LORIA, Campus Scientifique, BP 239 54506 Vandoeuvre-les-Nancy CEDEX

Volume 3 pages 1235 - 1238

Authors: Nabil N. Bitar and Carol Y. Espy-Wilson

Boston University, Electrical & Computer Engineering Dept. 44 Cummington St. Boston, MA. 02215 E-mail: nabil@bu.edu, espy@bu.edu

Volume 3 pages 1239 - 1242

Authors: Jacob Goldberger , David Burshtein and Horacio Franco **