ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - SP16 |
 |
SP16.1
|
Use of the Pitch Synchronous Wavelet Transform as a New Decomposition Method for WI
N. Chong,
I. Burnett,
J. Chicharo (University of Wollongong, Australia);
M. Thomson (Motorola Australia, Australia)
A new characteristic waveform decomposition method based on wavelets is proposed for the Waveform Interpolation (WI) paradigm. In WI, pitch-cycle waveforms are filtered in the evolution domain to decompose the signal into two waveform surfaces, one characterising voiced speech and a second representing unvoiced speech. The slow roll-off of FIR filters leads, however, to a significant inter-relationship between the decomposed surfaces. Here we present the Pitch Synchronous Wavelet Transform (PSWT) as an alternative decomposition mechanism. Filtering is again performed in the evolutionary waveform domain, producing characteristic surfaces at several resolutions. This multi-scale characterisation leads to more flexible quantisation of parameters, especially at higher rates than WI's 2.4kb/s. FIR filters are replaced in the Wavelet filter bank by causal, stable IIR filters which achieve significant delay reductions over their FIR counterparts. Furthermore, IIR filters track the dynamic aspects of the evolutionary surfaces faster, overcoming problems existing in the current WI decomposition.
|
SP16.2
|
A 2.4 KBPS Variable Bit Rate ADP-CELP Speech Coder
M. Oshikiri,
M. Akamine (Kansai Research Laboratories, Toshiba Corporation, Japan)
This paper presents a variable bit rate ADP-CELP (Adaptive Density Pulse Code Excited Linear Prediction) coder that selects one of four kinds of coding structure in each frame based on short time speech characteristics. To improve speech quality and reduce the average bit rate, we have developed a speech/non-speech classification method using spectrum envelope variation, which is robust for background noise. In addition, we propose an efficient pitch lag coding technique. The technique interpolates consecutive frame pitch lags and quantizes a vector of relative pitch lags consisting of variation between an estimated pitch lag and a target pitch lag in plural subframes. The average bit rate of the proposed coder was approximately 2.4 kbps for speech sources with activity factor of 60%. Our subjective testing indicates the quality of the proposed coder exceeds that of the Japanese digital cellular standard with rate of 3.45 kbps.
|
SP16.3
|
Multiple Source MOS Evaluation of a Flexible Low-Rate Vocoder
R. Zinser,
M. Grabb,
S. Koch (GE Corporate Research and Development, USA)
This paper describes the design and MOS performance of a family of low rate, low complexity speech coding algorithms known as Time Domain Voicing Cutoff (TDVC). TDVC is a predictive coding algorithm that employs a single transition frequency dividing voiced and unvoiced excitation. It provides the voicing flexibility of a frequency domain algorithm with lower complexity and rate overhead. A number of algorithm variants were MOS tested using three distinct sets of source material. The results are discussed in terms of performance for each of the three sources, and demonstrate that choice of source material has a great impact on both vocoder scoring and ranking.
|
SP16.4
|
Techniques for Improving Sinusoidal Transform Vocoders
W. Chang,
D. Wang (National Chiao-Tung University, Taiwan, ROC)
This paper presents quality enhancement of sinusoidal transform coders (STC) via the development of new parametric models. First explored are the benefits of Bark spectrum for use in the design of perceptual coding of the sine-wave amplitudes. According to our results, the proposed approach provides a uniform perceptual fit across the spectrum. To enhance the accuracy of phase representation, noncausal all-pole modeling of the vocal system is also discussed. Experimental results indicate that the use of new parametric models allows the STC to improve the phase accuracy as well as the synthetic speech quality.
|
SP16.5
|
Pitch-Synchronous Subband Representation of the Linear-Prediction Residual of Speech
H. Yang,
W. Kleijn (KTH, Royal Institute of Technology, Sweden)
In this paper, the characteristic waveform (CW) used in the waveform interpolation (WI) speech coder is interpreted as a pitch-synchronous subband representation (PSSR) of the speech. The inconsistency of the method, using the Gabor transform or the cosine modulated lapped transform. Perfect reconstruction of the speech is then guaranteed. Instead of using a time-varying transform, the speech signal is time-warped and pitch-synchronized operation is achieved by a time-invariant transform. Since the PSSR has the same physical meaning as that of the CW used in the WI speech coder, the coding efficiency can be expected to be similar at low rates, while the exact reconstruction property will lead to better quality at higher rates.
|
SP16.6
|
Robust Voicing Estimation with Dynamic Time Warping
T. Wang,
V. Cuperman (University of California, Santa Barbara, USA)
This paper presents a robust voicing estimation algorithm for low bit rate harmonic speech coding. The algorithm is based on waveform time-warping followed by spectral matching based on voiced and unvoiced local spectral models. The objective of time warping is to reduce the effect of pitch variations the voicing decision. Several adaptive techniques are used to improve the flexibility and robustness of the conventional spectral matching algorithm. An objective evaluation of the new voicing algorithm is obtained by comparing to manually estimated voicing values. Subjective tests of a sinusoidal coder using the new voicing algorithm show significantly better performance than the standard spectral matching under both clean and noisy environment.
|
SP16.7
|
A Simplified Version of the ITU Algorithm for Objective Measurement of Speech Codec Quality
S. Voran (Institute for Telecommunication Sciences, USA)
ITU-T Recommendation P.861 describes an objective speech quality assessment algorithm for speech codecs . This algorithm transforms codec input and output speech signals into a perceptual domain, compares them, and generates a noise disturbance value, which can be used to estimate perceived speech quality. The performance of this algorithm can be judged by the correlation between those estimates and actual listener opinions from formal subjective listening tests. We show that significant simplifications can be made to the P.861 algorithm with very minimal effect on its performance. Specifically, for the portions of the algorithm under study here, 64% of the floating point operations can be eliminated with only a 3.5% decrease in average correlation to listener opinions. The resulting simplified algorithm may offer a practical new objective function to drive parameter selections, excitation searches, and bit-allocations in speech and audio coders.
|
SP16.8
|
Performance of the Modified Bark Spectral Distortion Measure as an Objective Speech Quality Measure
W. Yang,
M. Benbouchta,
R. Yantorno (Temple University, USA)
The Modified Bark Spectral Distortion (MBSD), used for an objective speech quality measure, was presented previously. The MBSD measure takes into account the noise masking threshold in order to use only audible distortions in the calculation of the distortion measure. Preliminary simulation results have shown improvement of the MBSD over the conventional BSD. In this paper, performance of the MBSD is reported in terms of frame sizes, speech classes, and spectral regions. The performance of the MBSD is not very sensitive to the frame size. The performance of the MBSD for voiced speech is almost the same as for non-silent speech. The high frequency region appears to play an important role in human perception of speech quality.
|
SP16.9
|
Application of Meddis' Inner Hair-Cell Model to The Prediction of Subjective Speech-Quality
M. Hauenstein (University of Kiel, Germany)
This paper demonstrates how an instrumental speech-quality measure based on the comparison of auditory-nerve firing-patterns can be constructed. Four available subjective tests prove that the mean opinion scores (MOS) estimated by the objective measure are in good agreement with the subjectively obtained results.
|
< Previous Abstract - SP15 |
SP17 - Next Abstract > |
|