Topics in Speech Coding II

Chair: Alan V. McCree, Texas Instruments, USA

Home

Use of the Pitch Synchronous Wavelet Transform as a New Decomposition Method for WI

Authors:

Nicola R Chong, University of Wollongong (Australia)
Ian S Burnett, University of Wollongong (Australia)
Joe F Chicharo, University of Wollongong (Australia)
Mark M Thomson, Motorola Australian Research Centre (Australia)

Volume 1, Page 513, Paper number 1343

Abstract:

A new characteristic waveform decomposition method based on wavelets is proposed for the Waveform Interpolation (WI) paradigm. In WI, pitch-cycle waveforms are filtered in the evolution domain to decompose the signal into two waveform surfaces, one characterising voiced speech and a second representing unvoiced speech. The slow roll-off of FIR filters leads, however, to a significant inter-relationship between the decomposed surfaces. Here we present the Pitch Synchronous Wavelet Transform (PSWT) as an alternative decomposition mechanism. Filtering is again performed in the evolutionary waveform domain, producing characteristic surfaces at several resolutions. This multi-scale characterisation leads to more flexible quantisation of parameters, especially at higher rates than WI's 2.4kb/s. FIR filters are replaced in the Wavelet filter bank by causal, stable IIR filters which achieve significant delay reductions over their FIR counterparts. Furthermore, IIR filters track the dynamic aspects of the evolutionary surfaces faster, overcoming problems existing in the current WI decomposition.

ic981343.pdf (From Postscript)

TOP

A 2.4 KBPS Variable Bit Rate ADP-CELP Speech Coder

Authors:

Masahiro Oshikiri, Kansai Research Laboratories, Toshiba Corporation (Japan)
Masami Akamine, Kansai Research Laboratories, Toshiba Corporation (Japan)

Volume 1, Page 517, Paper number 1549

Abstract:

This paper presents a variable bit rate ADP-CELP (Adaptive Density Pulse Code Excited Linear Prediction) coder that selects one of four kinds of coding structure in each frame based on short time speech characteristics. To improve speech quality and reduce the average bit rate, we have developed a speech/non-speech classification method using spectrum envelope variation, which is robust for background noise. In addition, we propose an efficient pitch lag coding technique. The technique interpolates consecutive frame pitch lags and quantizes a vector of relative pitch lags consisting of variation between an estimated pitch lag and a target pitch lag in plural subframes. The average bit rate of the proposed coder was approximately 2.4 kbps for speech sources with activity factor of 60%. Our subjective testing indicates the quality of the proposed coder exceeds that of the Japanese digital cellular standard with rate of 3.45 kbps.

ic981549.pdf (Scanned)

TOP

Multiple Source MOS Evaluation of a Flexible Low-Rate Vocoder

Authors:

Richard L Zinser, GE Corporate Research and Development (U.S.A.)
Mark L Grabb, GE Corporate Research and Development (U.S.A.)
Steven R Koch, GE Corporate Research and Development (U.S.A.)

Volume 1, Page 521, Paper number 2059

Abstract:

This paper describes the design and MOS performance of a family of low rate, low complexity speech coding algorithms known as Time Domain Voicing Cutoff (TDVC). TDVC is a predictive coding algorithm that employs a single transition frequency dividing voiced and unvoiced excitation. It provides the voicing flexibility of a frequency domain algorithm with lower complexity and rate overhead. A number of algorithm variants were MOS tested using three distinct sets of source material. The results are discussed in terms of performance for each of the three sources, and demonstrate that choice of source material has a great impact on both vocoder scoring and ranking.

ic982059.pdf (From Postscript)

TOP

Techniques for Improving Sinusoidal Transform Vocoders

Authors:

Wen-Wei Chang, National Chiao-Tung University (Taiwan)
De-Yu Wang, National Chiao-Tung University (Taiwan)

Volume 1, Page 525, Paper number 1121

Abstract:

This paper presents quality enhancement of sinusoidal transform coders (STC) via the development of new parametric models. First explored are the benefits of Bark spectrum for use in the design of perceptual coding of the sine-waveamplitudes. According to our results, the proposed approach provides a uniform perceptual fit across the spectrum. To enhance the accuracy of phase representation, noncausal all-pole modeling of the vocal system is also discussed. Experimental results indicate that the use of new parametric models allows the STC to improve the phase accuracy as well as the synthetic speech quality.

ic981121.pdf (Scanned)

TOP

Pitch-Synchronous Subband Representation of the Linear-Prediction Residual of Speech

Authors:

Huimin Yang, Tsinghua University (China)
W. Bastiaan Kleijn, KTH, Royal Institute of Technology (Sweden)

Volume 1, Page 529, Paper number 1992

Abstract:

In this paper, the characteristic waveform (CW) used in the waveform interpolation (WI) speech coder is interpreted as a pitch-synchronous subband representation (PSSR) of the speech. The inconsistency of the method, using the Gabor transform or the cosine modulated lapped transform. Perfect reconstruction of the speech is then guaranteed. Instead of using a time-varying transform, the speech signal is time-warped and pitch-synchronized operation is achievedby a time-invariant transform. Since the PSSR has the same physical meaning as that of the CW used in the WI speech coder, the coding efficiency can be expected to be similar at low rates, while the exact reconstruction property will lead to better quality at higher rates.

ic981992.pdf (From Postscript)

TOP

Robust Voicing Estimation with Dynamic Time Warping

Authors:

Tian Wang, University of California, Santa Barbara (U.S.A.)
Vladimir Cuperman, University of California, Santa Barbara (U.S.A.)

Volume 1, Page 533, Paper number 2224

Abstract:

This paper presents a robust voicing estimation algorithm for low bit rate harmonic speech coding. The algorithm is based on waveform time-warping followed by spectral matching based on voiced and unvoiced local spectral models. The objective of time warping is to reduce the effect of pitch variations the voicing decision. Several adaptive techniques are used to improve the flexibility and robustness of the conventional spectral matching algorithm. An objective evaluation of the new voicing algorithm is obtained by comparing to manually estimated voicing values. Subjective tests of a sinusoidal coder using the new voicing algorithm show significantly better performance than the standard spectral matching under both clean and noisy environment.

ic982224.pdf (From Postscript)

TOP

A Simplified Version of the ITU Algorithm for Objective Measurement of Speech Codec Quality

Authors:

Stephen D Voran, Institute for Telecommunication Sciences (U.S.A.)

Volume 1, Page 537, Paper number 1764

Abstract:

ITU-T Recommendation P.861 describes an objective speech quality assessment algorithm for speech codecs . This algorithm transforms codec input and output speech signals into a perceptual domain, compares them, and generates a noise disturbance value, which can be used to estimate perceived speech quality. The performance of this algorithm can be judged by the correlation between those estimates and actual listener opinions from formal subjective listening tests. We show that significant simplifications can be made to the P.861 algorithm with very minimal effect on its performance. Specifically, for the portions of the algorithm under study here, 64% of the floating point operations can be eliminated with only a 3.5% decrease in average correlation to listener opinions. The resulting simplified algorithm may offer a practical new objective function to drive parameter selections, excitation searches, and bit-allocations in speech and audio coders.

ic981764.pdf (From Postscript)

TOP

Performance of the Modified Bark Spectral Distortion as an Objective Speech Quality Measure

Authors:

Wonho Yang, Temple University (U.S.A.)
Majid Benbouchta, Temple University (U.S.A.)
Robert Yantorno, Temple University (U.S.A.)

Volume 1, Page 541, Paper number 2461

Abstract:

The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously. The MBSD measure takes into account the noise masking threshold in order to use only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, performance of the MBSD is reported in terms of frame sizes, speech classes, and spectral regions. The performance of the MBSD is not very sensitive to the frame size. The performance of the MBSD for voiced speech is almost the same as for non-silent speech. The high frequency region appears to play an important role in human perception of speech quality.

ic982461.pdf (From Postscript)

TOP

Application of Meddis' Inner Hair-Cell Model to The Prediction of Subjective Speech-Quality

Authors:

Markus Hauenstein, University of Kiel (Germany)

Volume 1, Page 545, Paper number 2561

Abstract:

This paper demonstrates how an instrumental speech-quality measure based on the comparison of auditory-nerve firing-patterns can be constructed. Four available subjective tests prove that the mean opinion scores (MOS) estimatedby the objective measure are in good agreement with the subjectively obtained results.