Session T2C Feature Estimation I

Chairperson Paul Daalsgard University of Aalborg, Denmark

Home


ACOUSTIC PARAMETERS OPTIMISED FOR RECOGNITION OF PHONETIC FEATURES

Authors: Anya Varnich Hansen

Center for PersonKommunikation, Aalborg University, DK-9220 Aalborg, Denmark. Tel. +45 96358671, FAX: +45 98151583, E-mail: avh@cpk.auc.dk

Volume 1 pages 397 - 400

ABSTRACT

Speaker variability is a major problem in today's state-of- the-art speech recognition systems. Parameterisation of speech in terms of Acoustic Parameters (APs) motivated by phonetic feature theory has shown to be more robustness to speaker variability as compared to cepstral coefficients when tested on the task of broad-class recognition [1]. Also APs has been successfully applied for identification of semivowels [2,3]. The aim of the present study is to investigate the use of APs for phoneme recognition. An extended set of features is used to distinguish between all phonemes in the TIMIT database and APs related to the extended feature set are found in literature. A separability measure is calculated to investigate the importance of the suggested APs for the separation of phonemes and feature classes. Results show that the APs that are the most important for separation of classes of phonetic features are also the most important for separation of phonemes classes. This indicates that phonemes can be recognised on the basis of phonetic features captured by the use of APs. However much work still needs to be done to understand and reliably extract all of the acoustic correlates of the phonetic features applied.

A0387.pdf

TOP


HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION 1

Authors: Andrew K. Halberstadt and James R. Glass

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA fdrew, jrgg@sls.lcs.mit.edu

Volume 1 pages 401 - 404

ABSTRACT

In this paper we describe our recent efforts to improve acoustic-phonetic modeling by developing sets of heterogeneous, phone-class- specific measurements, and combining these diverse measurements into a probabilistic classification framework. We first describe a baseline classifier using homogeneous measurements. After comparing selected sub-tasks to known human performance, we define sets of phone-class-specific measurements which improve within-class classification performance. Subsequently, we combine these heterogeneous measurements into an overall context-independent classification framework. We report on a series of phonetic classification experiments using the TIMIT acoustic-phonetic corpus. Our overall frame-work achieves 79.0% accuracy on the NIST core test set.

A0555.pdf

TOP


CEPSTRAL-TIME MATRICES AND LDA FOR IMPROVED CONNECTED DIGIT AND SUB-WORD RECOGNITION ACCURACY

Authors: Ben Milner

ben@saltfarm.bt.co.uk Speech Technology Unit, BT Laboratories, Martlesham Heath, Suffolk, UK.

Volume 1 pages 405 - 408

ABSTRACT

Previous work has shown that good accuracy improvements can be made for isolated word recognition using cepstral-time matrices as the speech feature instead of the more conventional MFCC-based speech feature augmented with higher order cepstrum. This work extends the performance improvements to UK English connected digit strings and to a sub-word based town names task. Experimental results are presented for a range different sized cepstral-time matrix widths - ranging from a stack width of 3 up to 13 MFCC frames. In addition a variety of columns are selected from the cepstral-time matrix for use as the final speech feature. Tests show that the optimal implementation of the cepstral-time matrix varies according to the specific recognition task. Finally the technique of linear discriminative analysis (LDA) is applied to cepstral-time matrices and is shown to successfully improve recognition performance, as well as reducing the size of the final speech feature. Three different implementations of LDA are described and are demonstrated on isolated digit and sub-word tasks.

A0660.pdf

TOP


DATA-DRIVEN DESIGN OF RASTA-LIKE FILTERS

Authors: Sarel van Vuuren (1) and Hynek Hermansky (2)

1,2 Oregon Graduate Institute of Science and Technology, Portland, Oregon, USA 2 International Computer Science Institute, Berkeley, California, USA email: sarelv@ee.ogi.edu, hynek@ee.ogi.edu

Volume 1 pages 409 - 412

ABSTRACT

We describe use of Linear Discriminant Analysis (LDA) for data-driven automatic design of RASTA-like filters. The LDA applied to rather long segments of time trajectories of critical-band energies yields FIR filters to be applied to these time trajectories in the feature extraction module. Frequency responses of the first three discriminant vectors are in principle consistent with the ad hoc designed RASTA, delta and double-delta filters. On a connected digit task the new features outperform the original RASTA processing.

A0897.pdf

TOP


EVALUATING FEATURE SET PERFORMANCE USING THE F-RATIO AND J-MEASURES

Authors: Simon Nicholson*, Ben Milner** and Stephen Cox*

*University of East Anglia, Norwich, Norfolk, UK **BT Laboratories, Martlesham Heath, Suffolk, UK. ben@saltfarm.bt.co.uk sjc@sys.uea.ac.uk

Volume 1 pages 413 - 416

ABSTRACT

Several methods of measuring the class separability in a feature space used to model speech sounds are described. A simple one-dimensional feature space is considered first where class discrimination is measured using the F-ratio. Using a conventional feature set comprising static, velocity and acceleration MFCCs a ranking of the discriminative ability of each coefficient is made for both a digit and alphabet vocabulary. These rankings are shown to be quite similar for the two vocabularies. Discrimination measures are extended to multi-dimensional feature spaces using the J-measures. It is postulated that high correlation exists between feature sets which have a good measured class discrimination and those which give good recognition accuracy. Experiments are presented which measure this correlation and use it to predict recognition accuracy for a given set of features. These estimates are shown to be accurate for previously unseen combinations of features. A brief analysis of the effect linear discriminant analysis on the feature space is made using these measures of separability. It is shown that LDA and separability measures are closely linked.

A0905.pdf

TOP


ROBUST SPEECH PARAMETERS LOCATED IN THE FREQUENCY DOMAIN

Authors: J. Hernando and C. Nadeu

Universitat Politecnica de Catalunya Barcelona, Spain javier@gps.tsc.upc.es

Volume 1 pages 417 - 420

ABSTRACT

In this paper, two ways of obtaining more robust spectral parameters are explored. Firstly, an hybridization of both LP and filter-bank approaches is considered, which is capable of improving recognition results for both noisy and clean speech in CDHMM digit recognition. Secondly, better performance may also be achieved by replacing the cepstral coefficients by a recently proposed set of parameters located in the frequency domain which come from a simple filtering of the log band energies.

A1355.pdf

TOP