ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - SP18 |
 |
SP18.1
|
Enhanced Harmonic Coding of Speech with Frequency Domain Transition Modeling
C. Li,
V. Cuperman (University of California, Santa Barbara, USA)
A major source of audible distortion in current low-bit-rate harmonic speech coding algorithms is the ineffective modeling of the transitional speech signals such as onsets, plosives etc.. A new method of modeling transitional speech based on a frequency domain approach is introduced in this paper. The approach uses a modified harmonic model able to produce non-periodic pulse sequences in conjunction with a closed-loop analysis-by-synthesis scheme for parameter estimation and quantization. The structure of a speech coding system based on this model is outlined. The proposed approach is shown to give better performance than transition encoding based on a standard CELP algorithm at rates of 4-8kb/s.
|
SP18.2
|
Combined Harmonic and Waveform Coding of Speech at Low Bit Rates
E. Shlomot,
V. Cuperman,
A. Gersho (University of California, Santa Barbara, USA)
In this paper we present a new approach for speech coding, which combines frequency-domain harmonic coding for periodic and "noise like" unvoiced segments of speech with a time-domain waveform coder for transition signals. This hybrid coder requires special handling of the boundary between voiced and transition segments. We outline the details of a 4kbps hybrid coder and present subjective quality test results of this coder.
|
SP18.3
|
A Mixed Sinusoidally Excited Linear Prediction Coder at 4 Kb/s and below
S. Yeldener (COMSAT, USA);
J. De Martin,
V. Viswanathan (Texas Instruments, USA)
There is currently a great deal of interest in the development of speech coding algorithms capable of delivering toll quality at 4 kb/s and below. For synthesizing high quality speech, accurate representation of the voiced portions of speech is essential. For bit rates of 4 kb/s and below, conventional Code Excited Linear Prediction (CELP) may likely not provide the appropriate degree of periodicity. It has been shown that good quality low bit rate speech coding can be obtained by frequency domain techniques such as Sinusoidal Transform Coding (STC), Multi Band Excitation (MBE), Mixed Excitation Linear Prediction (MELP), and Multi-Band LPC (MB-LPC) vocoders. In this paper, a speech coding algorithm based on an improved version of MB-LPC is presented. Main features of this algorithm include a multi-stage time/frequency pitch estimation and an improved mixed voicing representation. An efficient quantization scheme for the spectral amplitudes of the excitation, called Formant Weighted Vector Quantization, is also used. This improved coder, called Mixed Sinusoidally Excited Linear Prediction (MSELP), yields an unquantized model with speech quality better than the 32 kb/s ADPCM quality. Initial efforts towards a fully quantized 4 kb/s coder, although not yet successful in achieving the toll quality goal, have produced good output speech quality.
|
SP18.4
|
A 1.7 KB/S MELP Coder with Improved Analysis and Quantization
A. McCree,
J. De Martin (Texas Instruments, USA)
This paper describes our new Mixed Excitation Linear Predictive (MELP) coder designed for very low bit rate applications. This new coder, through algorithmic improvements and enhanced quantization techniques, produces better speech quality at 1.7 kb/s than the new U.S. Federal Standard MELP coder at 2.4 kb/s. Key features of the coder are an improved pitch estimation algorithm and a Line Spectral Frequencies (LSF) quantization scheme that requires only 21 bits per frame. With channel coding, this new MELP coder is capable of maintaining good speech quality even in severely degraded channels, at a total bit rate of only 3 kb/s.
|
SP18.5
|
A New Approach to Modeling Excitation in Very Low-Rate Speech Coding
S. Ghaemmaghami,
M. Deriche (Queensland University of Technology - Signal Processing Research Center, Australia)
A new method for two-band approximation of excitation signals in an LPC model, to improve speech naturalness in very low rate coding, is proposed. Based on a simplified model of Multi-Band Excitation, the method accurately determines the degree of periodicity, using the concept of Instantaneous Frequency (IF) estimation in frequency domain. The harmonic structure in the spectrum of LPC residual, within individual bands, is identified based on flatness of the IF as a criterion for pitch and voicing detection. On this basis, the excitation is modelled by combining a predefined periodic signal in the lower band and a random signal in the higher band. It is shown that this improves considerably the naturalness of reconstructed speech in very low rate coding in comparison with that obtained using traditional binary excitation [1]. The performance of the technique is also given in Temporal Decomposition (TD) based coding at 800 b/s.
|
SP18.6
|
A Spectrally Mixed Excitation Vocoder (SMX) with Robust Parameter Determination
C. Yong Duk,
K. Moo Young,
K. Sang Ryong (Samsung Advanced Institute of Technology, Korea)
Sinusoidal speech coders have been widely studied for low-bit rate coding around 4 kbit/s. However, the estimation error of the sinusoidal model parameters would seriously degrade the speech quality. In general, the estimation errors are caused by the effects of various types of speech signal or backgound noise. In this paper we propose a sinusoidal speech coder with robust parameter determination methods. They consist of spectro-temporal autocorrelation method for robust pitch determination, frequency shifting method for robust voicing level measurement, and residual-spectrum magnitude coding method for spectral magnitude compensation. From the experimental results, we can find the robustnesses of the proposed techniques. In addition, informal listening test of the synthesized speech confirms the effectiveness of the incorporated schemes.
|
SP18.7
|
Segmental Vocoder - Going Beyond the Phonetic Approach
J. Cernocky (FEI VUT, France);
G. Baudoin (ESIEE Paris, France);
G. Chollet (ENST Paris, France)
In our paper, the problem of very low bit rate segmental speech coding is addressed. The basic units are found automatically in the training database using temporal decomposition, vector quantization and multigrams. They are modelled by HMMs. The coding is based on recognition and synthesis. In single speaker tests, we obtained intelligible and naturally sounding speech at mean rate of 211.2 b/s. In the end, future extensions of our scheme (diphone-like synthesis and speaker adaptation) as well as possible use of automatically derived units in recognition are discussed.
|
SP18.8
|
A Very Low Bit Rate Speech Coder Using HMM-Based Speech Recognition/Synthesis
K. Tokuda (Nagoya Institute of Technology, Japan);
T. Masuko (Tokyo Institute of Technology, Japan);
J. Hiroi (Nagoya Institute of Technology, Japan);
T. Kobayashi (Tokyo Institute of Technology, Japan);
T. Kitamura (Nagoya Institute of Technology, Japan)
This paper presents a very low bit rate speech coder based on HMM (Hidden Markov Model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM by using an ML-based speech parameter generation technique. Finally we obtain synthetic speech by exciting the MLSA (Mel Log Spectrum Approximation) filter, whose coefficients are given by mel-cepstral coefficients, according to the pitch information. A subjective listening test shows that the performance of the proposed coder at about 150 bit/s (for the test data including 26% silence region) is comparable to a VQ-based vocoder at 400 bit/s (= 8 bit/frame x 50 frame/s) without pitch quantization for both coders.
|
< Previous Abstract - SP17 |
SP19 - Next Abstract > |
|