Authors:
Paavo Alku, University of Turku (Finland)
Susanna Varho, University of Turku (Finland)
Page (NA) Paper number 3
Abstract:
A new linear predictive method is presented in this study. The method,
Linear Prediction with Linear Extrapolation (LPLE), reformulates the
computation of linear prediction by combining the preceding values
of sample x(n) into consecutive sample pairs (i.e., x(n-2i), x(n-2i+1)).
Each of these pairs determines a regression line the value of which
at time instant n is used as a data sample in the prediction. The optimal
LPLE-predictor is obtained by minimizing the square of the prediction
error using the autocorrelation method. The rationale for the new method
is the fact that LPLE yields an all-pole filter of order 2p when the
number of unknowns in the normal equations equals p. Therefore the
new all-pole modeling method can be used in speech coding applications.
Preliminary experiments of the present study show that LPLE is able
to model speech spectra more accurately in comparison to conventional
linear prediction in the case when a very small number of prediction
parameters is required to be used in order to greatly compress the
spectral information of speech signals.
Authors:
Shahrokh Ghaemmaghami, School of Electrical & Electronic Systems Engineering, Queensland University of Technology, Brisbane (Australia)
Mohamed Deriche, School of Electrical & Electronic Systems Engineering, Queensland University of Technology, Brisbane (Australia)
Sridha Sridharan, School of Electrical & Electronic Systems Engineering, Queensland University of Technology, Brisbane (Australia)
Page (NA) Paper number 673
Abstract:
The authors propose a new approach to Temporal Decomposition (TD) of
characteristic parameters of speech for very low rate coding applications.
The method models the articulatory dynamics employing a hierarchical
error minimization algorithm which does not use Singular Value Decomposition.
It is also much faster than conventional TD and could be implemented
in real-time. High flexibility is achieved with the proposed method
to comply with the desired coding requirements, such as compression
ratio, accuracy, delay, and computational complexity. This method can
be used for coding spectral parameters at rates 1000-1200 b/s with
high fidelity and an algorithmic delay of less than 150 msec.
Authors:
Susan L. Hura, Lucent Technologies (USA)
Page (NA) Paper number 42
Abstract:
There are several tests of speech intelligibility currently available
which employ a variety of methods. The most appropriate method for
testing intelligibility of speech transmitted via telephony is a forced
choice task in which listeners hear speech samples and identify what
they hear from among a set of alternatives displayed onscreen. This
methodology allows tests to be run quickly and scored automatically.
A major flaw in existing forced-choice intelligibility tests is the
use of unfamiliar words, nonwords, and proper names along with common
words. A stimulus set that is mixed in this way may introduce response
biases into the test and therefore produce results that are less predictive
of actual intelligibility performance. The Intelligibility of Familiar
Items Test (IFIT) ameliorates several methodological flaws found in
earlier tests. The IFIT uses a stimulus set composed of high familiarity
real English words and tests consonants in initial and final word position
and vowels in word medial position.
Authors:
Sung Joo Kim, KAIST (Korea)
Sangho Lee, KAIST (Korea)
Woo Jin Han, KAIST (Korea)
Yung Hwan Oh, KAIST (Korea)
Page (NA) Paper number 469
Abstract:
In this paper, we present a restricted temporal decomposition method
for LSF parameters. The event vectors estimated by this method preserve
the ordering property of LSF parameters so that they can be quantized
efficiently. Experimental results show that interpolated LSF parameters
can be quantized transparently at the rate of 753bps. Also we design
a LPC vocoder at 996bps as an application of the proposed method. According
to a listening test, the reconstructed speech of our vocoder has reasonable
quality comparing with 2400bps LPC10e.
Authors:
Minoru Kohata, Chiba Institute of Technology (Japan)
Page (NA) Paper number 37
Abstract:
In this paper, a very low bit speech coder at 1.2 Ops is newly proposed.
Like the LPC vocoder, it requires few types of information (power,
pitch, and spectral information), but its quality is far superior.
In the proposed vocoder, the synthesized speech quality is improved
based on auditory perceptual characteristics. The synthesis method
is one of harmonic coding, using sinusoids whose frequencies are multiples
of the fundamental frequency, where the amplitudes of the sinusoids,
are adaptively modulated using Gammatone filters as a perceptual weighting
filter. The sinusoids' phases are also adjusted so as to maximize
the perceptual quality. In order to reduce the total bit rate to 1.2
Ops, a new segment coder for spectral information (LSP coefficients)
using DP matching is also proposed. The quality of the synthesized
speech is considerably improved compared with that of the simple I-PC
vocoder, according to MOS and preference tests.
Authors:
Kazuhito Koishida, University of California, Santa Barbara (USA)
Gou Hirabayashi, Toshiba Corporation (Japan)
Keiichi Tokuda, Nagoya Institute of Technology (Japan)
Takao Kobayashi, Tokyo Institute of Technology (Japan)
Page (NA) Paper number 904
Abstract:
We have proposed a wideband CELP coder, called MGC-CELP, which provides
high quality speech by utilizing mel-generalized cepstral (MGC) analysis
instead of linear prediction (LP). In this paper, we investigate the
performance of the wideband MGC-CELP coder at 16 kbit/s in terms of
short-term predictor order, i.e., order of MGC analysis. Subjective
tests show that the MGC-CELP coder with a predictor of order 20 gives
better performance than ITU-T G.722 at 64 kbit/s. It is also found
that the MGC-CELP coder with 12th order achieves comparable quality
to the 64 kbit/s G.722, and outperforms the 16 kbit/s conventional
CELP coder using 20th-order LP analysis under the same conditions.
Authors:
D.J. Molyneux, School of Engineering, University of Manchester (U.K.)
C.I. Parris, Ensigma Ltd. (U.K.)
X.Q. Sun, Voxware Inc. (USA)
B.M.G. Cheetham, School of Engineering, University of Manchester (U.K.)
Page (NA) Paper number 946
Abstract:
Many low bit-rate speech coders represent the spectral envelope by
an all-pole digital filter whose coefficients are calculated by a form
of linear prediction (LP) analysis. The lower the bit-rate, the more
critical will be the accuracy of the spectral analysis for achieving
good quality speech. This paper compares four known techniques: a technique
based on cubic spline interpolation, DAP, MVDR, and iterative all-pole
modelling. First, the accuracy obtained for artificial and real speech
spectra is assessed for each technique by calculating the degree of
spectral distortion with reference to the spectral envelope sampled
at the pitch-harmonics. Then, each technique is used to characterise
the spectral amplitudes generated by a 2.4 kb/s multi-band excitation
(MBE) coder. Results show that significantly better spectral accuracy
is obtained using DAP. However listening tests on MBE encoded speech
indicate that the advantage of DAP over the other techniques is not
strongly perceptible.
Authors:
Yoshihisa Nakatoh, Matsushita Electric Industrial Co., Ltd. (Japan)
Takeshi Norimatsu, Matsushita Electric Industrial Co., Ltd. (Japan)
Ah Heng Low, Faculty of Engineering, Shinshu University (Malaysia)
Hiroshi Matsumoto, Faculty of Engineering, Shinshu University (Japan)
Page (NA) Paper number 1100
Abstract:
This paper proposes a low bit rate coding method for speech and audio
using a new analysis method named MLPC (Mel-LPC analysis). In MLPC
analysis a spectrum envelope is estimated on a mel- or bark-frequency
scale, so as to improve the spectral resolution in the low frequency
band. This analysis is accomplished with about two-fold increase in
computation over the standard LPC analysis. Our coding algorithm using
MLPC analysis consists of five key parts: time frequency transformation,
inverse filtering by MLPC spectrum envelope, power normalization, perceptual
weighting estimation, and multi-stage VQ. In subjective experiments,
we have investigated the performance of MLPC analysis, through paired
comparison tests between the MLPC analysis and the standard LPC one
in inverse filtering. In all bit rates, almost all the listeners feel
decoding sounds by the MLPC analysis is superior to the LPC one. Especially
in low bit rate, there is a great difference between them.
Authors:
Jeng-Shyang Pan, National Kaohsiung Institute of Technology (Taiwan)
Chin-Shiuh Shieh, National Kaohsiung Institute of Technology (Taiwan)
Shu-Chuan Chu, University of South Australia (Australia)
Page (NA) Paper number 31
Abstract:
Vector quantization is a popular technique in low bit rate coding of
speech signal. The transmission index of the codevector is highly sensitive
to channel noise. The channel distortion can be reduced by organizing
the codevector indices suitably. Several index assignment algorithms
are studied comparatively. Among them, the index allocation algorithm
proposed by Wu and Barba is the fastest method but the channel distortion
is the worst one. The proposed parallel tabu search algorithm reach
the best performance of channel distortion.
Authors:
John J. Parry, University of Wollongong (Australia)
Ian S. Burnett, University of Wollongong (Australia)
Joe F. Chicharo, University of Wollongong (Australia)
Page (NA) Paper number 137
Abstract:
In this paper we investigate an alternative approach to the design
of low-bit rate (LBR) quantisation. This approach incorporates phonetic
information into the structure of Line Spectral Frequency (LSF) codebooks.
In prior work vector quantisation (VQ) has been used to quantise stochastic
processes. Speech signals can, however, be described in terms of phonetic
segments and linguistic rules. A trained LSF codebook, like the phonetic
inventory of a language, is a static description of spectral behaviour
of speech. As clear relationships exist between phonetic segments and
LSFs the structure of an LSF codebook can be analysed in terms of the
phonetic segments. The investigation leads to the conclusion that phonetic
information can be usefully employed in codebook training in terms
of perceptual performance and bit-rate reductions.
Authors:
Davor Petrinović, Faculty of EE and C, University of Zagreb (Croatia)
Page (NA) Paper number 1114
Abstract:
A method of inter-frame transform coding of Line Spectrum Frequencies
(LSF) using the Discrete Wavelet Transform is presented in this paper.
Each component of the LSFs (or of their linear transform) is treated
separately and is decomposed into a set of subband signals using the
nonuniform filter bank. Subband signals are quantized and coded independently.
By the appropriate choice of the mother Wavelet, subband signal with
the lowest rate comprises most of the LSF waveform energy. Filter
bank effectively decorrelates the input signal, enabling more efficient
quantization of the subband signals. A suitable weighted Euclidean
distance measure in the Wavelet domain is proposed, defining optimal
static or dynamic bit allocation of the subband signals. It is shown
that the average bit rate for coding of the DCT transformed LSFs can
be reduced by 0.9 bits per vector component by using a very simple
Wavelet. The total delay due to the inter-frame coding is only 90ms
that is acceptable even for a medium bit rate speech coders.
Authors:
F. Plante, Dept. Electrical Engineering & Electronics, Liverpool University (U.K.)
B.M.G. Cheetham, Dept. Electrical Engineering & Electronics, Liverpool University (U.K.)
D. Marston, Ensigma Ltd (U.K.)
P.A. Barrett, BT Laboratories (U.K.)
Page (NA) Paper number 848
Abstract:
This paper describes a source controlled variable bit-rate (SC-VBR)
speech coder based on the concept of prototype waveform interpolation.
The coder uses a four mode classification : silence, voiced, unvoiced
and transition. These modes are detected after the speech has been
decomposed into slowly evolving (SEW) and rapidly evolving (REW) waveforms.
A voicing activity detection (VAD), the relative level of SEW and REW
and the cross-correlation coefficient between characteristic waveform
segments are used to make the classification. The encoding of the SEW
components is improved using a gender adaptation. In tests using conversational
speech, the SC-VBR allows a compression factor of around 3. The VBR
coder was evaluated against a fixed rate 4.6kbit/s PWI coder for clean
speech and noisy speech and was found to perform better for male speech
and for noisy speech.
Authors:
Carlos M. Ribeiro, INESC (Portugal)
Isabel M. Trancoso, INESC (Portugal)
Page (NA) Paper number 448
Abstract:
Phonetic vocoding is one of the methods for coding speech below 1000
bit/s. The transmitter stage includes a phone recogniser whose index
is transmitted together with prosodic information such as duration,
energy and pitch variation. This type of coder does not transmit spectral
speaker characteristics and speaker recognisability thus becomes a
major problem. In our previous work, we adapted a speaker modification
strategy to minimise this problem, modifying a codebook to match the
spectral characteristics of the input speaker. This is done at the
cost of transmitting the LSP averages computed for vowel and glide
phones. This paper presents new codebook generation strategies, with
gender dependence and interpolation frames, that lead to better speaker
recognisability and speech quality. Relatively to our previous work,
some effort was also devoted to deriving more efficient quantization
methods for the speaker-specific information , that considerably reduced
the average bit rate, without quality degradation.
0448_01.WAV
(was: 0448)
| wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
|
0448_02.WAV
(was: 0448)
| wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
|
0448_03.WAV
(was: 0448)
| wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
|
0448_04.WAV
(was: 0448)
| wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
|
0448_05.WAV
(was: 0448)
| wav sound file
File type: Sound File
Format: Sound File: WAV
Tech. description: 16 bit, mono, 8000 Hz, wav file
Creating Application:: Unknown
Creating OS: Unknown
|
|