ICASSP '98 Main Page
 General Information
 Conference Schedule
 Technical Program

Overview
50th Annivary Events
Plenary Sessions
Special Sessions
Tutorials
Technical Sessions
Invited Speakers
 Registration
 Exhibits
 Social Events
 Coming to Seattle
 Satellite Events
 Call for Papers/ Author's Kit
 Future Conferences
 Help
|
Abstract - SP17 |
 |
SP17.1
|
Classification of Speech Under Stress Based on Features Derived from the Nonlinear Teager Energy Operator
G. Zhou,
J. Hansen,
J. Kaiser (Duke University, USA)
Studies have shown that distortion introduced by stress or emotion can severely reduce speech recognition accuracy. Techniques for detecting or assessing the presence of stress could help neutralize stressed speech and improve robustness of speech recognition systems. Although some acoustic variables derived from linear speech production theory have been investigated as indicators of stress, they are not consistent. In this paper, three new features derived from the nonlinear Teager Energy Operator (TEO) are investigated for stress assessment and classification. It is believed that TEO based features are better able to reflect the nonlinear airflow structure of speech production under adverse stressful conditions. The proposed features outperform stress classification using traditional pitch by +22.5%for the Normalized TEO Autocorrelation Envelope Area feature (TEO-Auto-Env), and by +28.8% for TEO based Pitch feature (TEO-Pitch). Overall neutral/stress classification rates are more consistent for TEO based features (TEO-Auto-Env: standard deviation = 5.15, TEO-Pitch: standard deviation = 7.83) vs. (Pitch: standard deviation = 23.40). Also, evaluation results using actual emergency aircraft cockpit stressed speech from NATO show that TEO-Auto-Env works best for stress assessment.
|
SP17.2
|
Improved Robustness for Speech Recognition Under Noisy Conditions Using Correlated Parallel Model Combination
J. Hung,
J. Shen,
L. Lee (Institute of Information Science, Academia Sinica, Taiwan, ROC)
The parallel model combination (PMC) technique has been shown to achieve very good performance for speech recognition under noisy conditions. In this approach, the speech signal and the noise are assumed uncorrelated during modeling. In this paper, a new correlated PMC is proposed by properly estimating and modeling the nonzero correlation between the speech signal and the noise. Preliminary experimental results show that this correlated PMC can provide significant improvements over the original PMC in terms of both the model differences and the recognition accuracies. Error rate reduction on the order of 14% can be achieved.
|
SP17.3
|
Multi-Resolution Cepstral Features for Phoneme Recognition Across Speech Sub-Bands
P. McCourt,
S. Vaseghi,
N. Harte (Queen's University of Belfast, N. Ireland)
Multi-resolution sub-band cepstral features strive to exploit discriminative cues in localised regions of the spectral domain by supplementing the full bandwith cepstral features with sub-band cepstral features derived from several levels of sub-band decomposition. Mult-iresolution feature vectors, formed by concatenation of the subband cepstral features into an extended feature vector, are shown to yield better performance than conventional MFCCs for phoneme recognition on the TIMIT database. Possible strategies for the recombination of partial recognition scores from independent multi-resoltuion sub-band models are explored. By exploiting the sub-band variations in signal to noise ratio for linearly weighted recombination of the log likelihood probabilities we obtained improved phoneme recognition performance in broadband noise compared to MFCC features. This is an advantage over a purely sub-band approach using non linear recombination which is robust only to narrow band noise.
|
SP17.4
|
Improved Model Parameter Compensation Method for Noise-Robust Speech Recognition
Y. Chang,
Y. Chung,
S. Park (LGIC, Korea)
In this paper we study model parameter compensation methods for noise-robust speech recognition based on CDHMM. First, we proposed a modified PMC method where adjustment term in the model parameter adaptation is varied depending on mixture components of HMM to obtain more reliable model. A state-dependent association factor that controls the average parameter variability of Gaussian mixtures and the variability of the repective mixtures is used to find the final optimum model parameters. Second, we propose a re-estimation solution of environmental variables with additive noise and spectral tilt based on expectation-maximation(EM) algorithm in the cepstral domain. The approach is based on the vector Taylor series(VTS) approximation. In our experiments on a speaker independent isolated Korean word recognition, the modified PMC show better performance than the Gales' PMC and the proposed VTS is superior to both of them.
|
SP17.5
|
A Fuzzy Logic-Based Speech Detection Algorithm for Communications in Noisy Environments
A. Cavallaro,
F. Beritelli,
S. Casale (University of Catania, Italy)
In the field of mobile communications correct Voice Activity Detection (VAD) is a crucial point for the perceived speech quality, the reduction of co-channel interference, the power consumption in portable equipment. This paper shows that a valid alternative to deal with the problem of activity decision is to use methodologies like fuzzy logic, which are suitable for problems requiring approximate rather than exact solutions, and which can be presented through descriptive or qualitative expressions. The Fuzzy Voice Activity Detector (FVAD) proposed uses the same set of parameters adopted by the VAD in Annex B to ITU-T G.729 and a set of six fuzzy rules automatically extracted through supervised learning. Objective and listening tests confirm a significative improvement respect the traditional methods above all for low signal-to-noise ratios.
|
SP17.6
|
Subband Based Classification of Speech under Stress
R. Sarikaya,
J. Gowdy (Clemson University, USA)
This study proposes a new set of feature parameters based on subband analysis of the speech signal for classification of speech under stress. The new speech features are Scale Energy (SE), Autocorrelation-Scale-Energy (ACSE), Subband based cepstral parameters (SC), and Autocorrelation-SC (ACSC). The parameters' ability to capture different stress types is compared to widely used Mel-scale cepstrum based representations: Mel-frequency cepstral coefficents (MFCC) and Autocorrelation-Mel-scale (AC-Mel). Next, a feedforward neural network is formulated for speaker-dependent stress classification of 10 stress conditions: Angry, Clear, Cond50/70, Fast, Loud, Lombard, Neutral, Question, Slow, and Soft. The classification algorithm is evaluated using a previously established stressed speech database (SUSAS)[4]. Subband based features are shown to achieve +7.3% and +9.1% increase in the classification rates over the MFCC based parameters for ungrouped and grouped stress closed vocabulary test scenarios, respectively. Moreover, the average scores across the simulations of new features are +8.6% and +13.6% higher than MFCC based features for the ungrouped and grouped stress test scenarious respectively .
|
SP17.7
|
Separation of Spontaneous and Non-Spontaneous Speech
O. Kenny (Defense Science & Technology Organization, Australia);
D. Nelson,
J. Bodenschatz,
H. McMonagle (Department of Defense, Fort Meade, MD, USA)
This paper outlines and compares three methods for automatically classifying spontaneous and non-spontaneous speech and presents experimental results comparing the performance of the methods. All three methods are based on an analysis of the probability distributions of prosodic features extracted from the speech signal. The first method uses an expansion of the probability distribution in terms of the statistical moments. The second method is an application of a modified Hellinger's method, and the third method is based on a measure of the non-Gaussianity of the data.
|
SP17.8
|
Robust Features Derived from Temporal Trajectory Filtering for Speech Recognition under the Corruption of Additive and Convolutional Noises
K. Yuo,
H. Wang (National Tsing Hua University, Taiwan, ROC)
This paper presents a novel method using robust features for speech recognition when the speech signal is corrupted by additive and convolutional noises. This method is conceptually simple and easy to be implemented. The additive noise and the convolutional noise are removed by temporal trajectory filtering in autocorrelation domain and crpstral domain, respectively. A task of multi-speaker isolated digit recognition is conducted to demonstrate the effectiveness of using these robust features. The case of channel filtered speech signal corrupted by additive white noise and color noise are tested. Experimental results show that significant improvements can be achieved as comparing with some traditional features.
|
< Previous Abstract - SP16 |
SP18 - Next Abstract > |
|