Authors:
William A. Ainsworth, Department of Communication & Neuroscience, Keele University (U.K.)
Charles R. Day, Department of Communication & Neuroscience, Keele University (U.K.)
Georg F. Meyer, Department of Communication & Neuroscience, Keele University (U.K.)
Page (NA) Paper number 512
Abstract:
Hermes' Sub Harmonic Summation (SHS) pitch determination algorithm
is an effective technique for extracting the percept of pitch from
human speech. Effective determination of the pitch in a passage of
speech is believed to be fundamental for higher level speech processing
applications such as speech or speaker recognition. Of particular
interest is the need to extract pitch from speech in less than ideal
conditions eg. in the presence of noise or using very short analysis
windows. In an attempt to deliver accurate pitch estimates from relatively
short analysis windows this paper describes an evaluation of two forms
of the SHS procedure: in one case, FFT-SHS, the procedure uses the
conventional Fast Fourier Transform (FFT) in its spectral analysis
step; in the second case, RAFT-SHS, the ReAssigned Fourier Transform
(RAFT) technique is used instead of the FFT.
Authors:
Hideki Kawahara, Wakayama University/ATR/CREST (Japan)
Alain de Cheveigné, Paris 7 University/CNRS (France)
Roy D. Patterson, CNBH University of Cambridge (U.K.)
Page (NA) Paper number 659
Abstract:
A new source information extraction algorithm is proposed to provide
a reliable source signal for an extremely high-quality speech analysis,
modification, and transformation system called STRAIGHT-suite (Speech
Transformation and Representation based on Adaptive Interpolation of
weiGHTed spectrogram). The proposed method makes use of instantaneous
frequencies in harmonic components based on their reliability. A performance
evaluation is conducted using a simultaneous EGG (Electroglottograph)
recording as the reference signal. The error variance for F0 extraction
using the proposed algorithm is shown to be about 1/3 that of the previous
F0 extraction method used in STRAIGHT-suite, although the previous
algorithm is still competitive with conventional F0 extraction methods.
Authors:
Kiyoaki Aikawa, NTT Human Interface Laboratories, Speech and Acoustics Laboratories (Japan)
Page (NA) Paper number 262
Abstract:
This paper proposes a new spectral estimation method for automatic
speech recognition. The spectrum estimated with the conventional data
window of around 30 ms shows harmonic structure in the voiced portions
of speech data. The harmonic frequency interval is often comparable
to the formant frequency interval for female voices with high F0, which
results in spectral estimation error. The new idea is to estimate
spectrum by taking the Lp norm of the time series of the spectrum obtained
from a very short speech segment. The new method, called the micro-segment
spectrum integration, provides (1) precise spectral estimation not
affected by harmonic structure, and (2) noise-robustness by suppressing
noisy speech segments. Phoneme recognition experiments demonstrate
that the micro-segment spectrum integration method outperforms conventional
spectral estimation methods.
Authors:
Keiichi Funaki, Hokkaido University (Japan)
Yoshikazu Miyanaga, Hokkaido University (Japan)
Koji Tochinai, Hokkaido University (Japan)
Page (NA) Paper number 1001
Abstract:
We have already developed time-varying complex AR (TV-CAR) parameter
estimation based on minimizing mean square error (MMSE) for analytic
speech signal. Although the MMSE approach is commonly and successfully
applied in various parameter estimation such as conventional LPC, it
is well-known that an MMSE method easily suffers from biased and inaccurate
spectrum estimation due to non-Gaussian nature of glottal excitation
for voiced speech in the context of speech analysis. This paper offers
robust parameter estimation algorithm for the TV-CAR model by applying
Huber's robust M-estimation approach and two kinds of robust algorithms
are derived: Newton-type algorithm and weighted least squares (WLS)
algorithm. The preliminary experiments with synthetic signal generated
by glottal source model excitation and natural speech uttered by female
speaker demonstrate that the time-varying complex AR method is sufficiently
robust against non-Gaussian nature of glottal source excitation owing
to the improved resolution in the frequency domain.
Authors:
Hynek Hermansky, Oregon Graduate Institute Of Science And Technology (USA)
Narendranath Malayath, Oregon Graduate Institute Of Science And Technology (USA)
Page (NA) Paper number 616
Abstract:
The work examines Karhunen-Loeve Transform and Linear Discriminant
Analysis as means for designing optimized spectral bases for the projection
of the critical-band auditory-like spectrum.
|