Authors:
William J.J. Roberts, Defence Science Technology Organisation (Australia)
Yariv Ephraim, George Mason University (USA)
Page (NA) Paper number 141
Abstract:
Hidden Markov modeling of speech waveforms is studied and applied to
speech recognition of clean and noisy signals. Signal vectors in each
state are assumed Gaussian with zero mean and a Toeplitz covariance
matrix. This model allows short signal vectors and thus is useful for
speech signals with rapidly changing second order statistics. It can
also be straightforwardly adapted to noisy signals especially when
the noise is additive and independent of the signal. Since no closed
form solution exists for the maximum likelihood estimate of the Toeplitz
covariance matrices, an expectation-maximization procedure was used
and efficiently implemented. HMM's with Toeplitz as well as asymptotically
Toeplitz (e.g., circulant, autoregressive) covariance matrices are
theoretically and experimentally studied. While asymptotically all
of these matrices provide similar performance, they differ significantly
when the frame length is finite. Recognition results are provided for
clean and noisy signals at 0-30dB SNR.
Authors:
David Thambiratnam, Queensland University of Technology (Australia)
Sridha Sridharan, Queensland University of Technology (Australia)
Page (NA) Paper number 308
Abstract:
This paper presents a solution to the adverse environment, open microphone
problem, by using the information stored in HMM output probability
distributions to obtain a confidence measure of the results. This information
can also be used to perform a secondary classification and improve
recognition results. The system was tested on data from the TI46 database
that had been corrupted by noise from the NOISEX-92 database, as well
as on real-world data, and shows promising results.
Authors:
Philippe Morin, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Ted H. Applebaum, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Robert Boman, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Yi Zhao, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Jean-Claude Junqua, Panasonic Technologies, Inc. / Speech Technology Laboratory (USA)
Page (NA) Paper number 402
Abstract:
In this paper we characterize the sensitivity of two speaker-dependent
isolated word recognizers toward several kinds of variability and distortions;
namely noise, channels, distance to microphone and target language.
Both recognizers use a phoneme similarity acoustic front-end as a rich
representation for speech from which reliable features are extracted.
A cross-correlation test showed that a phoneme similarity front-end
is more robust to variability and distortions (especially intra-speaker
variability) than a LPC cepstral front-end. The first recognizer (Condor)
uses a frame-based approach while the second (Pasha) uses the phoneme
similarity information contained in a small number of speech segments.
The two recognition methods are presented with a special emphasis on
the robustness improvements and computational trade-offs that have
been made. Experimental results are reported for car noise at different
speeds, speakerphone versus handset input in an office environment
and several target languages. Recognition accuracy greater than 94%
was achieved in a car environment at 60 mph (Condor) and recognition
accuracy greater than 95% was achieved for speakerphone input at a
distance of 50 cm. in an office environment.
Authors:
Takeshi Yamada, Graduate School of Information Science, Nara Institute of Science and Technology (Japan)
Satoshi Nakamura, Graduate School of Information Science, Nara Institute of Science and Technology (Japan)
Kiyohiro Shikano, Graduate School of Information Science, Nara Institute of Science and Technology (Japan)
Page (NA) Paper number 484
Abstract:
To integrate the microphone array processing into speech recognition,
we have proposed a speech recognition algorithm based on 3-D Viterbi
search, which localizes a target talker considering the likelihood
of HMMs (Hidden Markov Models) while performing speech recognition.
The performance of the 3-D Viterbi search method depends on the improvement
of the SNR (Signal to Noise Ratio) by the beamforming technique. This
paper proposes a novel method based on an adaptive beamforming technique
instead of the delay-and-sum beamformer used in our previous study.
The speaker-dependent isolated-word recognition experiments were carried
out on real environment data to evaluate the effect of the adaptive
beamformer. These results showed that the adaptive beamformer drastically
improves the recognition performance both for a fixed-position talker
and for a moving-talker.
Authors:
Joaquin González-Rodríguez, DIAC- Universidad Politecnica de Madrid (Spain)
Santiago Cruz-Llanas, DIAC- Universidad Politecnica de Madrid (Spain)
Javier Ortega-García, DIAC- Universidad Politecnica de Madrid (Spain)
Page (NA) Paper number 64
Abstract:
In this paper, the acoustic characteristics of sound fields in enclosed
rooms are studied in the joint presence of speech and noise, in order
to design a broadband microphone array system capable of coping with
both coherent and diffuse noises. Several state-of-the-art speech enhancement
array structures are presented and compared to our new system in terms
of correct word recognition rates in a simple command and control task.
The proposed structure, based on a broadband subband-nested array,
performs real-time estimations of the spatial coherence in order to
determine the coherent/diffuse nature of the different subbands, using
different filters in each case, improving also the classical Wiener
post-filter, typically used for diffuse noise supression, for proper
cancellation of coherent noises. The results obtained with a 15-channel
simultaneous recording database in different reverberation and noise
conditions show better performance than other structures previously
proposed.
Authors:
Hui Jiang, The University of Tokyo (Japan)
Keikichi Hirose, The University of Tokyo (Japan)
Qiang Huo, The University of Hong Kong (China)
Page (NA) Paper number 693
Abstract:
In this paper, we propose a novel implementation of a minimax decision
rule for continuous density hidden Markov model based robust speech
recognition. By combining the idea of the minimax decision rule with
a normal Viterbi search, we derive a recursive minimax search algorithm,
where the minimax decision rule is repetitively applied to determine
the partial paths during the search procedure. Because of its intrinsic
nature of a recursive search, the proposed method can be easily extended
to perform continuous speech recognition. Experimental results on Japanese
isolated digits and TIDIGITS, where the mismatch between training and
testing conditions is caused by additive white Gaussian noise, show
the viability and efficiency of the proposed minimax search algorithm.
|