Full List of Titles 1: ICSLP'98 Proceedings 2: SST Student Day Author Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Multimedia Files |
A Very Low Bit Rate Speech Coder Using HMM With Speaker AdaptationAuthors:
Takashi Masuko, Precision and Intelligence Laboratory, Tokyo Inst. of Tech. (Japan)
Page (NA) Paper number 777Abstract:This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.
|
0777_01.WAV(was: 0777_01.wav) | synthesized from original spectral parameters File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4 |
0777_02.WAV(was: 0777_02.wav) | coded speech using speaker dependent models File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4 |
0777_03.WAV(was: 0777_03.wav) | coded speech using speaker independent models without adaptation File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4 |
0777_04.WAV(was: 0777_04.wav) | coded speech using adapted models without quantization of transfer vectors File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4 |
0777_05.WAV(was: 0777_05.wav) | coded speech using adapted models with quantization of transfer vectors File type: Sound File Format: Sound File: WAV Tech. description: 11.025kHz(up sampled from 10kHz), signed short(16bit), mono, linear Creating Application:: sox-10 Creating OS: SunOS 4.1.4 |
E. Ekudden, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
R. Hagen, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
B. Johansson, Audio and Visual Technology Research, Ericsson Radio Systems AB (Sweden)
S. Hayashi, NTT Human Interface Labs (Japan)
A. Kataoka, NTT Human Interface Labs (Japan)
S. Kurihara, NTT Human Interface Labs (Japan)
This paper describes the 6.4 kbit/s CS-ACELP coder being standardized as annex D to ITU-T G.729. The coder is based on the same building blocks as the 8 kbit/s G.729 to facilitate low complexity extensions to G.729 in terms of additional memory usage. It is fully switchable with the 8 kbit/s coder and provides additional flexibility to existing and emerging G.729 applications. The fixed codebook is a 2-pulse algebraic codebook. The adaptive codebook quantization has been changed and a new conjugate structure gain codebook is used. In order to compensate for the sparser algebraic codebook, an adaptive post-processing technique is used to enhance the quality for unvoiced speech and background noise sounds. Subjective tests have indicated that the coder has a performance close to that of G.729, and equivalent to that of G.723.1 at 6.3 kbit/s for speech.
Damith J. Mudugamuwa, Royal Melbourne Institute of Technology (Australia)
Alan B. Bradley, Royal Melbourne Institute of Technology (Australia)
In voice coding applications where there is no constraint on the encoding delay, segment coding techniques can be used to achieve a reduction in data rate. For low data rate linear predictive coding schemes, increasing the encoding delay allows one to exploit any long term temporal stationarities on an interframe basis, thus reducing the transmission bandwidth or storage needs of the speech signal. Transform coding has previously been applied in low data rate speech coding to exploit both the interframe and the intraframe correlation [1][6][8]. This paper investigates the potential of an adaptive transformation scheme for a segmented parametric speech representation. The problem of transform quantization is formulated and a solution methodology was proposed. The potential benefit of the use of the proposed adaptive transformation scheme is discussed in the context of segmented LSPs.
Julien Epps, University of New South Wales (Australia)
W. Harvey Holmes, University of New South Wales (Australia)
Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlation between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from narrowband speech. This paper presents a method for re-synthesizing narrowband coded speech using sinusoidal transform coding (STC), modified codebook mapping and a novel method for the synthesis of highband unvoiced components. Informal listening test results indicate that this method produces a significant quality improvement in speech which has been coded using narrowband standards.
Weihua Zhang, ioWave, Inc. (USA)
W. Harvey Holmes, The University of New South Wales, School of Electrical Engineering (Australia)
The Spectral Envelope Estimation Vocoder (SEEVOC) is a successful spectral envelope estimation method that plays an important role in low bit rate speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, and shows that the required accuracy for the initial pitch estimate is greater than commonly supposed. It also generalizes and optimizes the SEEVOC algorithm by choice of the search range parameters a and b. Rules for the optimum choice of a and b are derived, based on both theoretical analysis and experimental results. The effects of noise on the SEEVOC algorithm are also investigated. Experimental results show that the SEEVOC algorithm performs better for voiced speech in the presence of noise than linear prediction (LP) analysis.