ICSLP'98 Neural Networks, Fuzzy and Evolutionary Methods 1

Neural Networks, Fuzzy and Evolutionary Methods 1
Home Full List of Titles 1: ICSLP'98 Proceedings Keynote Speeches Text-To-Speech Synthesis 1 Spoken Language Models and Dialog 1 Prosody and Emotion 1 Hidden Markov Model Techniques 1 Speaker and Language Recognition 1 Multimodal Spoken Language Processing 1 Isolated Word Recognition Robust Speech Processing in Adverse Environments 1 Spoken Language Models and Dialog 2 Articulatory Modelling 1 Talking to Infants, Pets and Lovers Robust Speech Processing in Adverse Environments 2 Spoken Language Models and Dialog 3 Speech Coding 1 Articulatory Modelling 2 Prosody and Emotion 2 Neural Networks, Fuzzy and Evolutionary Methods 1 Utterance Verification and Word Spotting 1 / Speaker Adaptation 1 Text-To-Speech Synthesis 2 Spoken Language Models and Dialog 4 Human Speech Perception 1 Robust Speech Processing in Adverse Environments 3 Speech and Hearing Disorders 1 Prosody and Emotion 3 Spoken Language Understanding Systems 1 Signal Processing and Speech Analysis 1 Spoken Language Generation and Translation 1 Spoken Language Models and Dialog 5 Segmentation, Labelling and Speech Corpora 1 Multimodal Spoken Language Processing 2 Prosody and Emotion 4 Neural Networks, Fuzzy and Evolutionary Methods 2 Large Vocabulary Continuous Speech Recognition 1 Speaker and Language Recognition 2 Signal Processing and Speech Analysis 2 Prosody and Emotion 5 Robust Speech Processing in Adverse Environments 4 Segmentation, Labelling and Speech Corpora 2 Speech Technology Applications and Human-Machine Interface 1 Large Vocabulary Continuous Speech Recognition 2 Text-To-Speech Synthesis 3 Language Acquisition 1 Acoustic Phonetics 1 Speaker Adaptation 2 Speech Coding 2 Hidden Markov Model Techniques 2 Multilingual Perception and Recognition 1 Large Vocabulary Continuous Speech Recognition 3 Articulatory Modelling 3 Language Acquisition 2 Speaker and Language Recognition 3 Text-To-Speech Synthesis 4 Spoken Language Understanding Systems 4 Human Speech Perception 2 Large Vocabulary Continuous Speech Recognition 4 Spoken Language Understanding Systems 2 Signal Processing and Speech Analysis 3 Human Speech Perception 3 Speaker Adaptation 3 Spoken Language Understanding Systems 3 Multimodal Spoken Language Processing 3 Acoustic Phonetics 2 Large Vocabulary Continuous Speech Recognition 5 Speech Coding 3 Language Acquisition 3 / Multilingual Perception and Recognition 2 Segmentation, Labelling and Speech Corpora 3 Text-To-Speech Synthesis 5 Spoken Language Generation and Translation 2 Human Speech Perception 4 Robust Speech Processing in Adverse Environments 5 Text-To-Speech Synthesis 6 Speech Technology Applications and Human-Machine Interface 2 Prosody and Emotion 6 Hidden Markov Model Techniques 3 Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1 Human Speech Production Segmentation, Labelling and Speech Corpora 4 Speaker and Language Recognition 4 Speech Technology Applications and Human-Machine Interface 3 Utterance Verification and Word Spotting 2 Large Vocabulary Continuous Speech Recognition 6 Neural Networks, Fuzzy and Evolutionary Methods 3 Speech Processing for the Speech-Impaired and Hearing-Impaired 2 Prosody and Emotion 7 2: SST Student Day SST Student Day - Poster Session 1 SST Student Day - Poster Session 2 Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files	A Comparison of Thai Speech Recognition Systems Using Hidden Markov Model, Neural Network, and Fuzzy-Neural Network Authors: Visarut Ahkuputra, Department of Electrical Engineering, Chulalongkorn University (Thailand) Somchai Jitapunkul, Department of Electrical Engineering, Chulalongkorn University (Thailand) Nutthacha Jittiwarangkul, Department of Electrical Engineering, Chulalongkorn University (Thailand) Ekkarit Maneenoi, Department of Electrical Engineering, Chulalongkorn University (Thailand) Sawit Kasuriya, Department of Electrical Engineering, Chulalongkorn University (Thailand) Page (NA) Paper number 283 Abstract: The recognition of ten Thai isolated numerals from zero to nine and 60 Thai polysyllabic words are compared between different recognition techniques, namely, Neural Network, Modified Backpropagation Neural Network, Fuzzy-Neural Network, and Hidden Markov Model. The 15-state left-to-right discrete hidden markov model in cooperation with the vector quantization technique has been studied and compared with the multilayer perceptron neural network using the error backpropagation, the modified backpropagation, and also with the fuzzy-neural network with the same configuration. The recognition error on Thai isolated numerals using the conventional neural network, the modified neural network, the fuzzy-neural network, and the hidden markov model techniques are 26.97 percent, 22.00 percent, 8.50 percent, and 15.75 percent respectively. SL980283.PDF (From Author) SL980283.PDF (Scanned) TOP Phoneme Recognition with Statistical Modeling of the Prediction Error of Neural Networks Authors: Felix Freitag, UPC (Spain) Enric Monte, UPC (Spain) Page (NA) Paper number 455 Abstract: This paper presents a speech recognition system which incorporates predictive neural networks. The neural networks are used to predict observation vectors of speech. The prediction error vectors are modeled on the state level by Gaussian densities, which provide the local similarity measure for the Viterbi algorithm during recognition. The system is evaluated on a continuous speech phoneme recognition task. Compared with a HMM reference system, the proposed system obtained better results in the speech recognition experiments. SL980455.PDF (From Author) SL980455.PDF (Rasterized) TOP Neural Network Based Pronunciation Modeling With Applications To Speech Recognition Authors: Toshiaki Fukada, ATR-ITL (Japan) Takayoshi Yoshimura, Nagoya Institute of Technology (Japan) Yoshinori Sagisaka, ATR-ITL (Japan) Page (NA) Paper number 658 Abstract: We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (realized pronunciations) from canonical pronunciations. This method can generate multiple forms of realized pronunciations using the pronunciation network. Experimental results on spontaneous speech show that the automatically-derived pronunciation dictionary gives consistently higher recognition rates than a conventional dictionary. SL980658.PDF (From Author) SL980658.PDF (Rasterized) TOP A Comparative Study of OCON and MLP Architectures for Phoneme Recognition Authors: Stephen J. Haskey, Loughborough University (U.K.) Sekharajit Datta, Loughborough University (U.K.) Page (NA) Paper number 568 Abstract: In this paper a comparative study between One-Class-One-Network (OCON) and Multi-Layered Perceptron (MLP) neural networks for vowel phoneme recognition is presented. The OCON architecture, first proposed by I.C.Jou et al, is similar in design to a conventional feed-forward MLP, only each class had its own dedicated sub-network containing a single output node. Conventional MLPs usually consist of fully-connected nodes which not only result in a large number of weighted connections but also create the problem of cross-class interference. Using vowel phoneme data from the DARPA TIMIT corpus of read speech, MLP and OCON architectures were trained and the relative effects of recognition and convergence rates during both intra and inter-class adaptation tested. The OCON showed an increase in the convergence rate of 273% and an improvement of adapted recognition rates against the MLP of over 12%. SL980568.PDF (From Author) SL980568.PDF (Rasterized) TOP Evaluation and Integration of Neural-Network Training Techniques for Continuous Digit Recognition Authors: John-Paul Hosom, Oregon Graduate Institute of Science and Technology (OGI) (USA) Ronald A. Cole, Oregon Graduate Institute of Science and Technology (OGI) (USA) Piero Cosi, Institute of Phonetics -- C. N. R. (Italy) Page (NA) Paper number 613 Abstract: This paper describes a set of experiments on neural-network training and search techniques that, when combined, have resulted in a 54% reduction in error on the continuous digits recognition task. The best system had word-level accuracy of 97.52% on a test set of the OGI 30K Numbers corpus, which contains naturally-produced continuous digit strings recorded over telephone channels. Experiments investigated effects of the feature set, the amount of data used for training, the type of context-dependent categories to be recognized, the values for duration limits, and the type of grammar. The experiments indicate that the grammar and duration limits had a greater effect on recognition accuracy than the output categories, cepstral features, or a 50% increase in the amount of training data. SL980613.PDF (From Author) SL980613.PDF (Rasterized) TOP Hierarchical Neural Networks (HNN) for Chinese Continuous Speech Recognition Authors: Ying Jia, Lab of Interactive Information Systems, Institute of Acoustics, Chinese Academy of Science (China) Limin Du, Lab of Interactive Information Systems, Institute of Acoustics, Chinese Academy of Science (China) Ziqiang Hou, Institute of Acoustics, Chinese Academy of Science (China) Page (NA) Paper number 415 Abstract: To integrate the hierarchy structure of discrimination between all HMM states for Chinese Initials and Finals, we constructed in this paper Hierarchical Neural Networks (HNN), which differ from Jordan's HME in such extensions as more complex parameterization for gate and/or expert and dimension-reduced expert network. With these extensions, we can reuse those pre-trained simple node networks in a hierarchy structure (HNN), and fine-tune them jointly by Generalized Expectation Maximization (GEM) algorithm. The proposed HNNs were used within hybrid HMM-ANN models to perform the estimation of posterior probabilities for HMM states. Instead of using a large monolithic neural network, the HNN system can be trained in a short time compared with MLP estimator and result in a speed-up in decoding time over the conventional systems. We have applied the proposed hybrid HMM-HNN method to the recognition task of Chinese Continuous Speech., achieve a promising word error rate of 26.4%. SL980415.PDF (From Author) SL980415.PDF (Rasterized) TOP Neural Network Motivation for Segmental Distribution Authors: Eric Keller, University of Lausanne (Switzerland) Page (NA) Paper number 937 Abstract: Feature representations mediating between acoustic input and symbolic representation promise to reduce learning time needed for automatic speech signal segmentation. Experiments are reported that circumscribe simple acoustic inputs and appropriate feature sets for neural network training. Stable and compatible solutions for English and French were identified. SL980937.PDF (From Author) SL980937.PDF (Rasterized) TOP Combining Connectionist Multi-Band and Full-Band Probability Streams for Speech Recognition Of Natural Numbers Authors: Nikki Mirghafori, ICSI & UC Berkeley (USA) Nelson Morgan, ICSI & UC Berkeley (USA) Page (NA) Paper number 1150 Abstract: Multi-band automatic speech recognition is a new and exploratory area of speech recognition which has been getting much attention in the research community. It has been shown that multi-band ASR reduces word error in noisy conditions, particularly in the case of narrow band noise. In this work we show that multi-band ASR could be used to improve the speech recognition accuracy of natural numbers for clean speech when the multi-band (MB) information stream is used in addition to the full-band (FB) one. We also observe that a similar combination method significantly reduces the error rate on reverberant speech. Finally, we analyze the error patterns of the full-band and multi-band paradigms to understand why the combination of the two streams is effective. SL981150.PDF (From Author) SL981150.PDF (Rasterized) TOP Initial Speech Recognition Results Using The Multinet Architecture Authors: Ednaldo B. Pizzolato, University of Essex (U.K.) T. Jeff Reynolds, University of Essex (U.K.) Page (NA) Paper number 821 Abstract: Multinet is a connectionist architecture designed for certain difficult multi-class pattern classification tasks. These are characterised by very large input feature spaces, rendering a monolithic classifier impractical. The architecture consists of a layer with at least one primary 'detector' for each class, followed by a combining net which estimates the posterior probabilities for all classes. Typically primary detectors only input a subset of the input features. Thus the architecture decomposes classification in two ways: by class and by factoring of the input space dimensions. Multinet incorporates the ideas of Modular Neural Networks and Ensembles. In this paper, we investigate the use of Multinet on standard HMM and hybrid HMM-NN systems that we run on the same tasks. The value and potential of the Multinet approach is shown by detailing successive improvements to the Multinet system which are easily obtained because of the modularity of the architecture. SL980821.PDF (From Author) SL980821.PDF (Rasterized) TOP Selection of the Optimal Structure of the Continuous HMM Using the Genetic Algorithm Authors: Tomio Takara, University of the Ryukyus (Japan) Yasushi Iha, University of the Ryukyus (Japan) Itaru Nagayama, University of the Ryukyus (Japan) Page (NA) Paper number 1066 Abstract: The hidden Markov models (HMMs) are widely used for automatic speech recognition because they have a powerful algorithm used in estimating the model's parameters, and also achieve a high performance. Once a structure of the model is given, the model's parameters are obtained auto- matically by feeding training data. However, there is still an unresolved problem with the HMM, i.e. how to design an optimal HMM structure. In answer to this problem, we proposed the application of a genetic algorithm (GA) to search out such an optimal structure, and we showed this method to be effective for isolated word recognition. However, the test of this method was restricted to discrete HMMs. In this paper, we propose a new application of the GA to the continuous HMM (CHMM) which is thought to be more effective than the discrete HMM. We report the results of our experiment showing the effectiveness of the genetic algorithm in automatic speech recognition. SL981066.PDF (From Author) SL981066.PDF (Rasterized) TOP A Proposed Decision Rule For Speaker Recognition Based On Fuzzy C-Means Clustering Authors: Dat Tran, University of Canberra (Australia) Michael Wagner, University of Canberra (Australia) Tu Van Le, University of Canberra (Australia) Page (NA) Paper number 797 Abstract: In vector quantisation (VQ) based speaker recognition, the minimum overall average distortion rule is used as a criterion to assign a given sequence of acoustic vectors to a speaker model known as a codebook. An alternative decision rule based on fuzzy c-means clustering is proposed in this paper. A set of membership functions associated with vectors for codebooks are defined as discriminant functions and the maximum overall average membership function rule is stated. The theoretical analysis and the experimental results show that this rule can be used in both speaker identification and speaker verification. It is more effective than the minimum overall average distortion rule. SL980797.PDF (From Author) SL980797.PDF (Rasterized) TOP Fuzzy Gaussian Mixture Models For Speaker Recognition Authors: Dat Tran, University of Canberra (Australia) Tu Van Le, University of Canberra (Australia) Michael Wagner, University of Canberra (Australia) Page (NA) Paper number 798 Abstract: A fuzzy clustering based modification of Gaussian mixture models (GMMs) for speaker recognition is proposed. In this modification, fuzzy mixture weights are introduced by redefining the distances used in the fuzzy c-means (FCM) functionals. Their reestimation formulas are proved by minimising the FCM functionals. The experimental results show that the fuzzy GMMs can be used in speaker recognition and it is more effective than the GMMs in tests on the TI46 database. SL980798.PDF (From Author) SL980798.PDF (Rasterized) TOP A New Strategy of Fuzzy-Neural Network for Thai Numeral Speech Recognition Authors: Chai Wutiwiwatchai, Department of Electrical Engineering, Chulalongkorn University (Thailand) Somchai Jitapunkul, Department of Electrical Engineering, Chulalongkorn University (Thailand) Visarut Ahkuputra, Department of Electrical Engineering, Chulalongkorn University (Thailand) Ekkarit Maneenoi, Department of Electrical Engineering, Chulalongkorn University (Thailand) Sudaporn Luksaneeyanawin, Department of Linguistics, Chulalongkorn University (Thailand) Page (NA) Paper number 349 Abstract: In this research, a new strategy of Fuzzy-Neural Network system was proposed for Thai numeral speech recognition. Instead of using the fuzzy membership input with class membership desired-output during training procedure as proposed by several researches, we used the fuzzy membership input with fundamental binary desired-output. This can reduce the misunderstood training, decrease the training time and also improve the recognition ability. The system was tested on the Thai ten-numeral speech (0-9) recognition. The error rate for speaker-independent test achieved 9.2% compared to 14% error rate of conventional neural network system while the error rate of the system using class membership desired-output is quite high because of misunderstood training. SL980349.PDF (From Author) SL980349.PDF (Scanned) TOP Thai Polysyllabic Word Recognition Using Fuzzy-Neural Network Authors: Chai Wutiwiwatchai, Department of Electrical Engineering, Chulalongkorn University (Thailand) Somchai Jitapunkul, Department of Electrical Engineering, Chulalongkorn University (Thailand) Visarut Ahkuputra, Department of Electrical Engineering, Chulalongkorn University (Thailand) Ekkarit Maneenoi, Department of Electrical Engineering, Chulalongkorn University (Thailand) Sudaporn Luksaneeyanawin, Department of Linguistics, Chulalongkorn University (Thailand) Page (NA) Paper number 350 Abstract: In this research, the Fuzzy-Neural Network (fuzzy-NN) model was proposed for Speaker-Independent Thai polysyllabic word recognition. Various fuzzy membership functions on linguistic properties were used to convert exact features extracted from input speech to the fuzzy membership values. The fuzzy membership values were arranged to be new input vector of Multilayer Perceptron (MLP) neural network. The binary desired outputs were used during training. 70 Thai words consist of ten numerals, the others were single-syllable, double-syllable and triple-syllable, 20 words in each group, were used for system evaluation. In order to improve recognition accuracy, number of syllable and tonal level detected were conducted for speech preclassification. The Pi fuzzy membership function provided the best recognition accuracy among other functions; Trapezoidal, and Triangular function. Under an optimal condition, the achieved recognition error rates were 5.6% on dependent test and 6.7% on independent test, which were respectively 3.3% and 3.4% decreasing from the conventional Neural Network system. SL980350.PDF (From Author) SL980350.PDF (Rasterized) TOP