Authors:
Wendy J. Holmes, DERA (U.K.)
Page (NA) Paper number 553
Abstract:
This paper proposes a recognition-synthesis approach to speech coding
which uses a formant trajectory model for both recognition and synthesis.
It is argued that this unified approach to coding has the potential
to achieve low data rates whilst preserving speech quality and paralinguistic
information. A simple coding scheme is described which establishes
the principles of this approach. Formant analysis is applied to the
input speech, and the formant features are input to a linear-trajectory
segmental hidden Markov model recognizer to locate segment boundaries.
The formant parameters for each segment are coded using a linear trajectory
description, and used to drive a parallel-formant synthesizer to reproduce
the utterance at the receiver. The coding method has been tested on
utterances from a variety of speakers. In the current system, which
has not yet been optimised for coding efficiency, speech is typically
coded at 600-1000 bits/s with good intelligibility, whilst preserving
speaker characteristics.
Authors:
Jan Skoglund, Chalmers University of Technology, Department of Signals and Systems (Sweden)
W. Bastiaan Kleijn, Royal Institute of Technology, Department of Speech, Music and Hearing (Sweden)
Page (NA) Paper number 747
Abstract:
This paper addresses the issue of masking of noise in voiced speech.
First, we examine the audibility of cyclostationary narrow-band noise
added to voiced speech generated by synthetic excitation. Varying
the temporal location of noise within a pitch cycle corresponds to
varying its phase spectrum. Using this fact, we find that a phase
change of the noise in the high frequency region is more perceptible
for a low-pitched sound than for a high-pitched sound. We propose
a pitch-dependent temporal weighting function and we show experimentally
that it is beneficial to the quantization of pitch-cycle waveforms.
Authors:
W. Bastiaan Kleijn, KTH (Royal Institute of Technology) (Sweden)
Huimin Yang, Tsinghua University (China)
Ed F. Deprettere, Delft University of Technology (The Netherlands)
Page (NA) Paper number 1069
Abstract:
We present new waveform-interpolation coding procedures which allow
perfect reconstruction of the speech signal from the unquantized parameter
set. Instead of using adaptive parameter extraction methods, we combine
a time warping of the original signal with nonadaptive parameter extraction
methods. The new coding structure has good performance at low bit rates
and provides convergence to the original waveform with increasing rate.
Authors:
Nicola R. Chong, University of Wollongong (Australia)
Ian S. Burnett, University of Wollongong (Australia)
Joe F. Chicharo, University of Wollongong (Australia)
Page (NA) Paper number 142
Abstract:
In this paper, we present an alternative characteristic waveform (CW)
decomposition mechanism for the Waveform Interpolation (WI) paradigm
based on the Pitch Synchronous Wavelet Transform (PSWT). In this technique,
IIR filters replace the conventional FIR filters of the PSWT, offering
computational and spectral magnitude performance advantages, in addition
to significant delay reductions. Previously, the PSWT has only incorporated
filter banks with slowly reacting FIR wavelet filters. While these
filters possess the desirable properties of linear phase, and design
simplicity, a large delay is incurred which increases exponentially
with increasing resolution. The progression to IIR filter banks gives
rise to a multi-resolution decomposition mechanism, beneficial for
real-time applications, such as speech coding, where delay is an important
issue.
|