Spacer ICASSP '98 Main Page

Spacer
General Information
Spacer
Conference Schedule
Spacer
Technical Program
Spacer
    Overview
    50th Annivary Events
    Plenary Sessions
    Special Sessions
    Tutorials
    Technical Sessions
    
By Date
    May 12, Tue
May 13, Wed
May 14, Thur
May 15, Fri
    
By Category
    AE    ANNIV   
COMM    DSP   
IMDSP    MMSP   
NNSP    PLEN   
SP    SPEC   
SSAP    UA   
VLSI   
    
By Author
    A    B    C    D    E   
F    G    H    I    J   
K    L    M    N    O   
P    Q    R    S    T   
U    V    W    X    Y   
Z   

    Invited Speakers
Spacer
Registration
Spacer
Exhibits
Spacer
Social Events
Spacer
Coming to Seattle
Spacer
Satellite Events
Spacer
Call for Papers/
Author's Kit

Spacer
Future Conferences
Spacer
Help

Abstract -  NNSP2   


 
NNSP2.1

   
Speech Coding with Nonlinear Local Prediction Model
N. Ma, G. Wei  (SCUT, Japan)
A new signal process based on a nonlinear local prediction model(NLLP) is presented and applied to speech coding. With the same implemention, the speech coding based on the NLLP gives improved performance compared to reference versions of the standard ITU-T G.728 and linear local scheme. The computational efforts for the NLLP analysis does not increase over the conventional linear prediction(LP), and the NLLP supplies better prediction performance over the LP and linear local prediction.
 
NNSP2.2

   
Parametric Subspace Modelling of Speech Transitions
K. Reinhard, M. Niranjan  (Cambridge University Engineering Department, UK)
In this paper we report on attempting to capture segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. in attempts such as recurrent neural networks there is the hope, but not convincing demonstration, that such transitional information could be captured. We start from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved (Time Constrained Principal Component Analysis). On this subspace, we attempt a parametric modelling of the trajectory, compute a distance metric to perform classification of diphones. Much of the discriminant information is still retained in this subspace. This is illustrated on the isolated transitions /bee/, /dee/ and /gee/.
 
NNSP2.3

   
A Neural Architecture for Computing Acoustic-Phonetic Invariants
E. Tsiang  (Monowave Corporation, USA)
The proposed neural architecture consists of an analytic lower net, and a synthetic upper net. This paper focuses on the upper net. The lower net performs a 2D multiresolution wavelet decomposition of an initial spectral representation to yield a multichannel representation of local frequency modulations at multiple scales. From this representation, the upper net synthesizes increasingly complex features, resulting in a set of acoustic observables at the top layer with multiscale context dependence. The upper net also provides for invariance under frequency shifts, dilatations in tone intervals and time intervals, by building these transformations into the architecture. Application of this architecture to the recognition of gross and fine phonetic categories from continuous speech of diverse speakers shows that it provides high accuracy and strong generalization from modest amounts of training data.
 
NNSP2.4

   
Simplified Neural Network Architectures for a Hybrid Speech Recognition System with Small Vocabulary Size
H. Sedarat, R. Khadem  (Stanford University, USA);   H. Franco  (SRI International, USA)
Recent studies suggest that a hybrid speech recognition system based on a hidden Markov model (HMM) with a neural network (NN) subsystem as the estimator of the state conditional observation probability may have some advantages over the conventional HMMs with Gaussian mixture models for the observation probabilities. The HMM and NN modules are typically treated as separate entities in a hybrid system. This paper, however, suggests that the a priori knowledge of HMM structure can be beneficial in the design of the NN subsystem. A case of isolated word recognition is studied to demonstrate that a substantially simplified NN can be achieved in a structured HMM by applying a Bayesian factorization and pre-classification. The results indicate a similar performance to that obtained with the classical approach with much less complexity in NN structure.
 
NNSP2.5

   
Hidden Neural Networks: Application to Speech Recognition
S. Riis  (Technical University of Denmark, Denmark)
In this paper we evaluate the Hidden Neural Network HMM/NN hybrid presented at last years ICASSP on two speech recognition benchmark tasks; 1) task independent isolated word recognition on the PHONEBOOK database, and 2) recognition of broad phoneme classes in continuous speech from the TIMIT database. It is shown how Hidden Neural Networks (HNNs) with much fewer parameters than conventional HMMs and other hybrids can obtain comparable performance, and for the broad class task it is illustrated how the HNN can be applied as a purely transition based system, where acoustic context dependent transition probabilities are estimated by neural networks.
 
NNSP2.6

   
An MRNN-Based Method for Continuous Mandarin Speech Recognition
Y. Liao, S. Chen  (National Chaio-Tung University, Taiwan, ROC)
A new MRNN-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem . They include two RNNs for the discriminations of the two sub-syllable groups of 100 RFD initials and 39 CI finals, two RNNs for the generations of dynamic weighting functions for sub-syllable*s integrations, and one RNN for syllable boundary detection. All RNN modules are combined using a delay-decision Viterbi search. The method differs from the ANN/HMM hybrid approach on using ANNs to perform not only sub-syllables discrimination but also temporal structure modeling of speech signal. The system is trained using a three-stage training method embedding with the MCE/GPD algorithms. Besides, fast recognition method using multi-level pruning is also proposed. Experimental results showed that it outperforms the HMM method on both the recognition accuracy and the computational complexity.
 
NNSP2.7

   
An Off-Line Working Speech Recognition System Employing a Compound Neural Network and Fuzzy Logic
L. Zhou  (Beijing University of Posts & Telecommunications, Dept. of Radio Engineering, P R China)
This paper introduces an off-line working speech recognition hardware system. A new compound structure of neural networks is proposed and fuzzy logic is adopted to implement the system. So the system is able to perform speaker-independent real time speech recognition in actual environments where there are heavier noises.
 
NNSP2.8

   
An Analysis of Data Fusion Methods for Speaker Verification
K. Farrell  (T-NETIX Inc., USA);   R. Ramachandran  (Rowan University, USA);   R. Mammone  (Rutgers University, USA)
In this paper, we analyze the diversity of information as provided by several modeling approaches for speaker verification. This information is used to facilitate the fusion of the individual results into an overall result that provides advantages in accuracy over the individual models. The modeling methods that are evaluated consist of the neural tree network (NTN), Gaussian mixture model (GMM), hidden Markov model (HMM), and dynamic time warping (DTW). With the exception of DTW, all methods utilize subword-based approaches. The phrase-level scores for each modeling approach are used for combination. Several data fusion methods are evaluated for combining the model results, including the linear and log opinioin pool approaches along with voting. The results of the above analysis have been integrated into a system that has been tested with several databases collected within landline and cellular environments. We have found the linear and log opinion pool methods to consistently reduce the error rate from that obtained when the models are used individually.
 

< Previous Abstract - NNSP1

NNSP3 - Next Abstract >