Spacer ICASSP '98 Main Page

Spacer
General Information
Spacer
Conference Schedule
Spacer
Technical Program
Spacer
    Overview
    50th Annivary Events
    Plenary Sessions
    Special Sessions
    Tutorials
    Technical Sessions
    
By Date
    May 12, Tue
May 13, Wed
May 14, Thur
May 15, Fri
    
By Category
    AE    ANNIV   
COMM    DSP   
IMDSP    MMSP   
NNSP    PLEN   
SP    SPEC   
SSAP    UA   
VLSI   
    
By Author
    A    B    C    D    E   
F    G    H    I    J   
K    L    M    N    O   
P    Q    R    S    T   
U    V    W    X    Y   
Z   

    Invited Speakers
Spacer
Registration
Spacer
Exhibits
Spacer
Social Events
Spacer
Coming to Seattle
Spacer
Satellite Events
Spacer
Call for Papers/
Author's Kit

Spacer
Future Conferences
Spacer
Help

Abstract -  SP28   


 
SP28.1

   
A Variational Approach for Estimating Vocal Tract Shapes from Speech Signal
Y. Laprie, B. Mathieu  (LORIA, France)
This paper presents a novel approach to recovering articulatory trajectories from the speech signal using a variational calculus method and Maeda's articulatory model. The acoustic-to-articulatory mapping is generally assessed by a double criterion: the acoustic proximity of results to acoustic data and the smoothness of articulatory trajectories. Most of the existing methods are unable to exploit the two criteria simultaneously or at least at the same level. On the other hand, our variational calculus approach combines the two criteria simultaneously and ensures the global acoustic and articulatory consistency without further optimization. This method gives rise to an iterative process which optimizes a startup solution given by an improved lookup algorithm. Codebooks generated with an articulatory model show nonuniform sampling of the acoustic space due to nonlinearities of the acoustic-to-articulatory mapping. We therefore designed an improved lookup algorithm building realistic articulatory trajectories which are not necessarily defined throughout the speech signal.
 
SP28.2

   
Speech Intelligibility in the Presence of Cross-Channel Spectral Asynchrony
T. Arai, S. Greenberg  (Int'l Computer Science Institute, USA)
The spectrum of spoken sentences was partitioned into quarter-octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency axis. Human listeners are remarkably tolerant of cross-channel spectral asynchrony induced in this fashion. Speech intelligibility remains relatively unimpaired until the average asynchrony spans three or more phonetic segments. Such perceptual robustness is correlated with the magnitude of the low-frequency (3-6 Hz) modulation spectrum and thus highlights the importance of syllabic segmentation and analysis for robust processing of spoken language. High-frequency channels (>1.5 kHz) play a particularly important role when the spectral asynchrony is sufficiently large as to significantly reduce the power in the low-frequency modulation spectrum (analogous to acoustic reverberation) and may thereby account for the deterioration of speech intelligibility among the hearing impaired under conditions of acoustic interference (such as background noise and reverberation) characteristic of the real world.
 
SP28.3

   
Acoustic Breathiness Measures in the Description of Pathologic Voices
M. Fröhlich, D. Michaelis, H. Strube  (Drittes Physikalisches Institut Göttingen, Germany)
One important perceptual attribute of voice quality is breathiness. Since breathiness is generally regarded to be caused by glottal air leakage, acoustic measures related to breathiness may be used to distinguish between different physiological phonation conditions for pathological voices. Seven ``breathiness features'' described in the literature plus one self-developed measure (the glottal to noise excitation ratio, GNE) are compared for their distinguishing properties between different well-defined pathological phonation mechanisms. It is found that only GNE allows a distinction between all the pathological groups and both the normal and aphonic reference group. Furthermore, GNE is among the measures showing the most significant distinctions between the different pathologic phonation mechanism groups. Therefore GNE should be given preference over the other features in the independent assessment of glottal air leakage or ``breathiness'' for moderately or highly disturbed voices.
 
SP28.4

   
Automatic Estimation of Formant and Voice Source Parameters Using a Subspace Based Algoritm
C. Yang, H. Kasuya  (Utsunomiya University, Japan)
An automatic method is proposed to estimate jointly formant and voice source parameters from a speech signal. A Rosenberg-Klatt model is used to approximate a voicing source waveform for voiced speech, whereas a white noise signal is assumed for the unvoiced. The vocal tract characteristic is represented by an IIR filter. The formant and anti-formant values are calculated from the IIR filter coefficents which are estimated by using the subspace-based system identification algorithm, while an exhaustive search procedure is applied to obtain the optimal source parameter values, where an error criterion is introduced in the frequency domain. An experiment has been performed to examine performance of the proposed method with natural speech. The results show that the source parameters such as open and closure instants estimated by the method is in good agreement with those defined on the electro-glottograp signals and the formant values estimated are also accurate.
 
SP28.5

   
Estimating the Speaking Rate by Vowel Detection
T. Pfau, G. Ruske  (Inst. for Human-Machine-Communication, TU Muenchen, Germany)
We present a new feature-based method for estimating the speaking rate by detecting vowels in continuous speech. The features used are the modified loudness and the zerocrossing rate which are both calculated in the standard preprocessing unit of our speech recognition system. As vowels in general correspond to syllable nuclei, the feature-based vowel rate is comparable to an estimate of the lexically-based syllable rate. The vowel detector presented is tested on the spontaneously spoken German Verbmobil task and is evaluated using manually transcribed data. The lowest vowel error rate (including insertions) on the defined test set is 22,72% on average over all vowels. Additionally correlation coefficients between our estimates and reference rates are calculated. These coefficients reach up to 0,796 and therefore are comparable to those for lexically-based measures (like the phone rate) on other tasks. The accuracy is sufficient to use our measurement for speaking rate adaptation.
 
SP28.6

   
A New Algorithm for Incorporating Acoustic Constraints into the Inverse Speech Problem
J. DeLucia, F. Kochman  (Center for Communications Research, USA)
We describe a new noniterative algorithm that generates the unique area function determined by the vocal tract length, the lip radius, and the spectral pair consisting of poles of the transfer function and zeros of the input impedance function. Our analysis is restricted to the class of piecewise-constant area functions defined on an even number of equal length intervals. The resulting algorithm involves fewer floating point operations per evaluation than the analogous method of Paige and Zue [4]. A method which uses a corpus of X-ray data is discussed for setting the higher order unobservable pole/zero frequencies.
 
SP28.7

   
Cascade Recursive Least Squares with Subsection Adaptation for AR Parameter Estimation
G. Zakaria, L. Beex  (Virginia Tech, USA)
We propose the adaptive cascade recursive least squares (CRLS-SA) algorithm for the estimation of linear prediction, or AR model, coefficients. The CRLS-SA algorithm features low computational complexity since each section is adapted independently from the other sections. It is shown here that the CRLS-SA algorithm can yield AR coefficient estimates closer to the true values, for some known signals, than the widely used autocorrelation method. CRLS-SA converges faster to the true values of the model, which is critically important for estimation from short data records. While the computational effort of CRLS-SA is a factor of 3 to 4 higher than that for the autocorrelation method, the improvement in performance yields a viable alternative for a number of applications.
 
SP28.8

   
Spectral Stability Based Event Localizing Temporal Decomposition
A. Nandasena, M. Akagi  (Japan Advanced Institute of Science and Technology, Japan)
In this paper a new approach to temporal decomposition (TD) of speech, called ``Spectral Stability Based Event Localizing Temporal Decomposition'', abbreviated SBEL-TD, is presented. The original method of TD proposed by Atal is known to have the drawbacks of high computational cost, and the instability of the number and locations of events [1]. In SBEL-TD, the event localization is performed based on a maximum spectral stability criterion. This overcomes the instability problem of events of the Atal's method. Also, SBEL-TD avoids the use of the computationally costly singular value decomposition routine used in the Atal's method, thus resulting in a computationally simpler algorithm of TD. Simulation results show that an average spectral distortion of about 1.5 dB can be achieved with LSF as the spectral parameter. Also, we have shown that the temporal pattern of the speech excitation parameters can also be well described using the SBEL-TD technique.
 

< Previous Abstract - SP27

SP29 - Next Abstract >