A split band vocoder in which the LP excitation is split into voiced and unvoiced frequencies is presented. In doing this the coder's performance during both mixed voicing and speech containing acoustic noise is greatly improved, producing soft natural sounding speech. In addition a variable rate version which achieves an average rate of 1.4 kb/s is detailed. The issue of fixed point real time implementation of this coder is also presented.
ABSTRACT
The packet loss effects of DM coded speech can be mitigated by either using an embedded DM system (EDM) or using a tree search interpolator. This paper provides theoretical and experimental results for EDM coding of autoregressive sources under random error conditions. For the tree interpolation, we explore the benefits of delayed decoding by using an interpolative DM code generator to form a tree of sample possibilities given the remaining adjacent samples.
ABSTRACT
In this paper, the optimal transformation and quantization of Line Spectrum Pair (LSP) are accomplished. Based upon the interframe and intraframe correlation properties of the LSPs, the Karhunen-Loeve (KL) transformation is adopted by Principal Component Analysis (PCA) neural network. The spectral sensitivity of the LSP and transformed coefficients are investigated in order to develop better scalar and vector quantizers for these coefficients. Using PCA network with spectral sensitivity guided quantizers we show that this new approach leads to as good as or better distortion compared to other methods for speech coding.
Linear predictive coding (LPC) parameters are widely used in various
speech coding applications for representing the spectral envelope information
of speech. Transparent quantization of the LPC parameters (average spectral
distortion of 1 dB) can be achieved at 24 bits/frame using the split vector
LPC quantizer (SVLPC) which quantizes 10-dimensional line spectral frequency
(LSF) vectors in two parts. However, SVLPC su ers from a high computational
complexity in quantizing each part (one of dimension 4 and the other of
dimension 6) using independent codebooks of size 4096 (corresponding to
a rate of 12 bits/part). This limits the practical real-time application
of the coder. In this paper, we reduce the computational complexity of
the split vector quantizer by 2 orders of magnitude using the fast K-dimensional
(K-d) tree search algorithm under the bucket-Voronoi intersection (BVI)
search framework. This is of signiFIcant importance in rendering the SVLPC
amenable for practical real-time coding applications.
A0074.pdf
The quantization of linear prediction coefficients (LPC) is an important
aspect in low bit rate speech coding. In this work, we introduce a new
approach, which exploits the temporal dependencies in the line spectral
frequencies (LSF). We approximate each LSF track using expansion into wavelet
basis functions. As the LSF vary fairly smoothly as functions of time,
they perform very well when interpolated. By vector quantizing the resulting
wavelet expansion coefficients, the interpolated LSF tracks could be quantized
with a distortion of 0.91 dB using only 15.6 bits per 20 ms update (780
bits per second). This is about 4 bits per update less than the results
obtained with previously described procedures.
A0292.pdf
In this paper a novel DCT prototype interpolation synthesis process
is presented and used to model the input speech signal. The compression
efficiency of the DCT when applied to prototype pitch segments, leads to
1.7/2.4 kb/s DCT-PIC systems which can deliver decoded speech of high communication
quality.
A0300.pdf
ABSTRACT
This paper describes an improved RP-VSELP (IRP-VSELP) speech coding. The RP-VSELP is classifed as a fast VSELP since it produces a comparable speech quality to the VSELP with much simplified system complexity. The new RP-VSELP coder proposed in this paper has additional new features, such as a fast codebook search obtained by employing backward filtering and pitch-adaptive regular pulse excitation. Due to new features added to the original RP-VSELP, the proposed method not only reduces the complexity of the original RP-VSELP but also provides an improved speech quality. Throughout objective and subjective tests, IRP-VSELP outperformed RP-VSELP as the ref erence coder. Simulation results are presented to verify the performance of the proposed method.
The multiband excitation (MBE) vocoder represents speech signal with
a pitch, band magnitudes, and a voice / unvoice (V/UV) decision for each
spectral band. In the conventional MBE model, model parameters are sequentially
estimated in two steps. The pitch and band magnitudes are firstly estimated
on the assumption of voiced speech model by the analysis-by-synthesis (AbS)
in frequency domain, and then the V/UVs are decided. However, the synthetic
spectrum by the above assumption may have large spectral distortion if
the speech frame is strongly unvoiced such as transient region. In this
paper, we propose joint estimation method which estimates and decides all
the model parameters in AbS loop. For this, voiced or unvoiced speech models
for each band are used during the analysis procedure. After estimating
the parameters with the two speech models, a model for each band is selected
so as to produce smaller spectral estimation error. By analyzing the short
time spectrum and the long time spectrogram, it is shown that the reproduced
speech of the proposed model is superior to that of the conventional one.
In addition, through informal listening test we also confirm the superiority
of the proposed model.
A0339.pdf
ABSTRACT
The distance measure has a great importance in the phase of the construction of a vector quantizer for LSP parameters as well as in the coding phase. Due to its complexity, the meaningful spectral distance is seldom used for the purpose of quantization. The weighted squared Euclidean distances are mathematicallv more tractahle and are commonly used. Significant differences can be found in the performanance of different distances. The aim of this paper is to study different distance measures used in the field of LSP Coding. A new weighted Euclidean distance will he proposed that not only replaces the spectral distance hut estimates well its exact value. However, the use of squared distances will he justified as well. In a real time application. often weights can not be calculated according to the input vector computation must be done according to the code- words, before coding. This causes some problems in case of split vector quantization or multi stage vector quantization. Some solutions will he given at the end of this paper.
ABSTRACT
This paper addresses the problem of very-low-rate compression of digitized wideband speech signals for storage. It concentrates on applications where the text transcription of the speech corpus is available and where high quality of recovered speech is required. Following the problem statement, all unique features of the task are analysed and possible methods of implementation discussed. As a result, a novel speech compression technique is proposed, its general structure is presented, and its characteristics are considered. The new compression technique - hybrid speech compression - takes full advantage of the available text transcription. The proposed hybrid compression approach utilises an optimum balance of Text To Speech (TTS) synthesis technology with dynamic speech conversion to yield a data stream comprising original text enriched by prosodic features and conversion control information. The proposed speech compression method aims to achieve an extremely low data rate while preserving a high quality of the compressed wideband speech.
In this paper the possibilities of channel-error protection for transmission
of CELP-coded speech over highly disturbed channels without additional
bits for error-control are discussed. Algorithms are given which do not
require explicit channel models and work without additional delay and almost
no additional complexity. Time-based and mutual dependencies of the speech
codec parameters are exploited for channel-error detection and parameter
extrapolation at the decoder. The algorithms are optimized by informal
listening tests rather than by maximization of a mathematically tractable
measure.
A0440.pdf
ABSTRACT
This paper presents some improvements to the mixed Harmonic and Stochastic eXcitation (HSX) algorithm in the context of low bit rate speech coding (around 2.4 kbit/s). The dominant issue is the modeling of the excitation signal in order to improve the quality of the synthesized speech signal without increasing neither the bit rate nor the complexity. The pitch tracking algorithm is revised in order to increase the robustness and to reduce the complexity. The voicing analysis algorithm is also refined. Informal listening of the synthesized speech at 2.4 kbitls shows a significant improvement.
This paper describes a phonetic vocoding scheme which relies on speaker
adaptation to capture important speaker characteristics. These are typically
lost in phonetic vocoders which transmit only information about the phones
which are recognized, together with some prosodic information. In our scheme,
however, additional speaker characteristics are transmitted in vowel regions
(average values of LSP coefficients for each phone). This additional information
yielded potentially good speaker recognizability results, in informal listening
tests, while still achieving a rather low average bit rate, suitable for
many transmission and storage applications. This work extends our previous
phonetic vocoding scheme described in [5]. The vocoder is now fully quantized
and the number of transmitted parameters had been significantly reduced.
A0656.pdf
ABSTRACT
This paper deals with the coding of spectral envelope parameters for very low bit rate speech coding (inferior to 500 bps). In order to obtain a sufficient intelligibility, segmental techniques are necessary. Variable dimension vector quantization is one of these. We propose a new interpretation of already published research from Chou-Lockabaugh [2] and Cernocky- Baudoin-Chollet [4,6] on the quantization of variable length sequences of spectral vectors, named respectively Variable to Variable length Vector Quantization (VVVQ) and Multigrams Quantization (MGQ). This interpretation gives a meaning to the Lagrange multiplier used in the optimization criterion of the VVVQ, and should allow new developments as, for example, new modelization of the probability density of the source. We have also studied the influence of the limitation of the delay introduced by the method. It was found that a maximal delay of 400 ms is generally sufficient. Finally, we propose the introduction of long sequences in the segmental codebook by linear interpolation of shorter ones.
Temporal Decomposition (TD) is an efficient technique for modeling speech
spectral evolution through orthogonalization of the matrix of spectral
parameters which reduces the amount of spectral information in TD-based
speech coding. We have shown in earlier work that ``event'' functions can
be approximated by fixed-width Gaussian functions with a minor degradation
in the reconstructed speech, leading to further bit-rate reduction in such
systems. In this paper, through perceptually-based spectral distortion
measurement, we show the impact of events shape on the speech quality,
and propose a new composite function and discuss its effect on the coder
performance using different combinations of spectral parameters in event
detection and speech synthesis.
A0764.pdf
A new phase coding algorithm is introduced in this paper, which works
in the pitch-cycle waveform domain. It provides accurate phase coding at
low bit cost. Its performance is analyzed inside a multiband excitation
coder with improved onset representation. In this context, the introduction
of original phase information by means of the proposed coding algorithm
provides noticeable quality improvement without increasing the total bit
rate of the coder.
A0772.pdf
Four female native speakers of Modern Greek listened to 465 synthetic vowel tokens with Fl frequencies ranging from 250 to 800 Hz and F2 frequencies ranging from 900 to 2900 Hz in 50 Hz steps. They were asked to identify each stimulus as one of the five vowels of Modern Greek or to reject it if they thought it could not be a vowel of their language. The subjects rejected about 64 percent of the tokens as not possible vowels. The remaining points were plotted in an F I by F2 space with the codes assigned by each subject and in a composite space, where only the points identified with the same response by at least three subjects were used. The results replicated those of Hawks and Fourakis [1], except that the code for the vowel [e] was assigned to many more points than the codes for the other wowels.
This paper describes the Mixed Multi-Band Excitation coder used for
a low bit-rate speech coding. In MBE coders, there are significant differences
of the fine structure between the original and the synthetic spectrum.
They are mainly due to the exclusive partition of voiced and unvoiced regions
in frequency domain and the decision procedure based on the experimental
threshold. The MMBE uses frequency domain mixture function (FDMF) to overcome
these drawbacks of the MBE coder. Also, two analysis methods, which do
not need any decision procedure based on a threshold, are presented. The
performance evaluation results show that the 2.6kbps MMBE coder reduces
the average spectral distortion by a clear margin comparing to the 2.9kbps
MBE coder. The computational load of the proposed coder is sufficiently
small for a real-time implementation on the modern DSP chip.
A0860.pdf
In the digital mobile radio system GSM (Global System for Mobile Communications)
there is a need for reducing the subjective effects of residual bit errors
by error concealment techniques. Due to the fact that the standard does
not specify these algorithms bit exactly, there is room for new solutions
to improve the decoding process. This contribution presents a new approach
for optimum estimation of speech codec parameters [7] applied to the GSM
system. It requires a soft-output channel decoder (e.g. soft-output Viterbi
algorithm -- SOVA [8]) providing a bit reliability information for the
proposed parameter estimation process. Additionally, a priori knowledge
about the residual redundancy in the sequence of codec parameters is exploited.
The new method includes an inherent muting mechanism leading to a graceful
degradation of speech quality in case of adverse transmission conditions.
If the channel is error free, bit exactness as required by the GSM standard
is preserved.
A1185.pdf
This paper describes two aspects of a linear predictive coding (LPC)
vocoder developed for operation on wide- band speech. The method for encoding
the LPC parameters, based on the use of an adaptive predictor, is pre-
sented together with an extension to the vocoder model which enables it
to operate on speech sampled at 16kHz rather than 8kHz. Good-quality operation
on wide-band speech is achieved with an increase in bit rate of about 500
bits/s. Diagnostic rhyme test (DRT) results demonstrate the improvement
in intelligibility gained through coding speech at the higher sample rate.
A1333.pdf