Authors:
Tuan Pham, Faculty of Information Sciences & Engineering, University of Canberra (Australia)
Michael Wagner, Faculty of Information Sciences & Engineering, University of Canberra (Australia)
Page (NA) Paper number 953
Abstract:
Similarity or likelihood normalization techniques are important for
speaker verification systems as they help to alleviate the variations
in the speech signals. In the conventional normalization, the a priori
probabilities of the cohort speakers are assumed to be equal. From
this standpoint, we apply the theory of fuzzy measure and fuzzy integral
to combine the likelihood values of the cohort speakers in which the
assumption of equal a priori probabilities is relaxed. This approach
replaces the conventional normalization term by the fuzzy integral
which acts as a non-linear fusion of the similarity measures of an
utterance assigned to the cohort speakers. We illustrate the performance
of the proposed approach by testing the speaker verification system
with both the conventional and the fuzzy algorithms using the commercial
speech corpus TI46. The results in terms of the equal error rates
show that the speaker verification system using the fuzzy integral
is more flexible and more favorable than the conventional normalization
method.
Authors:
Hiroshi Shimodaira, Japan Advanced Institute of Science and Technology (Japan)
Jun Rokui, Japan Advanced Institute of Science and Technology (Japan)
Mitsuru Nakai, Japan Advanced Institute of Science and Technology (Japan)
Page (NA) Paper number 795
Abstract:
A novel method to prevent the over-fitting effect and improve the generalization
performance of the Minimum Classification Error (MCE) / Generalized
Probabilistic Descent (GPD) learning is proposed. The MCE/GPD method,
which is one of the newest discriminative-learning approaches proposed
by Katagiri and Juang in 1992, results in better recognition performance
in various areas of pattern recognition than the maximum-likelihood
(ML) based approach where a posteriori probabilities are estimated.
Despite its superiority in recognition performance, it still suffers
from the problem of over-fitting to the training samples as it is with
other learning algorithms. In the present study, a regularization
technique is employed to the MCE method to overcome this problem.
Feed-forward neural networks are employed as a recognition platform
to evaluate the recognition performance of the proposed method. Recognition
experiments are conducted on several sorts of datasets. The proposed
method shows better generalization performance than the original one
Authors:
Tetsuro Kitazoe, Miyazaki University (Japan)
Tomoyuki Ichiki, Miyazaki University (Japan)
Sung-Ill Kim, Miyazaki University (Japan)
Page (NA) Paper number 965
Abstract:
The equation of neural nets for stereo vision is applied to speech
recognition. We use Coupled Pattern Recognition (CPR) equation which
has been shown to organize depth perception very well through competition
and cooperation. We construct Gaussian probability density function
for each phoneme from a number of training data. The input data to
be recognized are compared to the pdf's and the similarity measures
are obtained for each phoneme. The CPR equation develops neuron activities
by receiving the similarity measures as input. A recognition is achieved
when the activities arrive at a stable states. The recognition rates
for 25 Japanese phoneme are 74.75% in average which is compared to
71.53% Hidden Markov Model. A certain technical improvement is applied
to our neuron model, by dividing data of a phoneme into two part, one
for the former frames, the other for the latter frames.A remarkable
improvement is obtained with average recognition rate of 79.79%.
Authors:
Julie Ngan, Institute for Signal and Information Processing, Mississippi State University (USA)
Aravind Ganapathiraju, Institute for Signal and Information Processing, Mississippi State University (USA)
Joseph Picone, Institute for Signal and Information Processing, Mississippi State University (USA)
Page (NA) Paper number 384
Abstract:
Proper noun pronunciation generation is a particularly challenging
problem in speech recognition since a large percentage of proper nouns
often defy typical letter-to-sound conversion rules. In this paper,
we present decision tree methods which outperform neural network techniques.
Using the decision tree method, we have achieved an overall error rate
of 45.5%, which is a 35% reduction over the previous techniques. Our
best system is a binary decision tree that uses a context length of
3 and employs information gain ratio as the splitting rule.
|