Signal Processing and Speech Analysis 1

Home
Full List of Titles
1: ICSLP'98 Proceedings

Keynote Speeches

Text-To-Speech Synthesis 1

Spoken Language Models and Dialog 1

Prosody and Emotion 1

Hidden Markov Model Techniques 1

Speaker and Language Recognition 1

Multimodal Spoken Language Processing 1

Isolated Word Recognition

Robust Speech Processing in Adverse Environments 1

Spoken Language Models and Dialog 2

Articulatory Modelling 1

Talking to Infants, Pets and Lovers

Robust Speech Processing in Adverse Environments 2

Spoken Language Models and Dialog 3

Speech Coding 1

Articulatory Modelling 2

Prosody and Emotion 2

Neural Networks, Fuzzy and Evolutionary Methods 1

Utterance Verification and Word Spotting 1 / Speaker Adaptation 1

Text-To-Speech Synthesis 2

Spoken Language Models and Dialog 4

Human Speech Perception 1

Robust Speech Processing in Adverse Environments 3

Speech and Hearing Disorders 1

Prosody and Emotion 3

Spoken Language Understanding Systems 1

Signal Processing and Speech Analysis 1

Spoken Language Generation and Translation 1

Spoken Language Models and Dialog 5

Segmentation, Labelling and Speech Corpora 1

Multimodal Spoken Language Processing 2

Prosody and Emotion 4

Neural Networks, Fuzzy and Evolutionary Methods 2

Large Vocabulary Continuous Speech Recognition 1

Speaker and Language Recognition 2

Signal Processing and Speech Analysis 2

Prosody and Emotion 5

Robust Speech Processing in Adverse Environments 4

Segmentation, Labelling and Speech Corpora 2

Speech Technology Applications and Human-Machine Interface 1

Large Vocabulary Continuous Speech Recognition 2

Text-To-Speech Synthesis 3

Language Acquisition 1

Acoustic Phonetics 1

Speaker Adaptation 2

Speech Coding 2

Hidden Markov Model Techniques 2

Multilingual Perception and Recognition 1

Large Vocabulary Continuous Speech Recognition 3

Articulatory Modelling 3

Language Acquisition 2

Speaker and Language Recognition 3

Text-To-Speech Synthesis 4

Spoken Language Understanding Systems 4

Human Speech Perception 2

Large Vocabulary Continuous Speech Recognition 4

Spoken Language Understanding Systems 2

Signal Processing and Speech Analysis 3

Human Speech Perception 3

Speaker Adaptation 3

Spoken Language Understanding Systems 3

Multimodal Spoken Language Processing 3

Acoustic Phonetics 2

Large Vocabulary Continuous Speech Recognition 5

Speech Coding 3

Language Acquisition 3 / Multilingual Perception and Recognition 2

Segmentation, Labelling and Speech Corpora 3

Text-To-Speech Synthesis 5

Spoken Language Generation and Translation 2

Human Speech Perception 4

Robust Speech Processing in Adverse Environments 5

Text-To-Speech Synthesis 6

Speech Technology Applications and Human-Machine Interface 2

Prosody and Emotion 6

Hidden Markov Model Techniques 3

Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1

Human Speech Production

Segmentation, Labelling and Speech Corpora 4

Speaker and Language Recognition 4

Speech Technology Applications and Human-Machine Interface 3

Utterance Verification and Word Spotting 2

Large Vocabulary Continuous Speech Recognition 6

Neural Networks, Fuzzy and Evolutionary Methods 3

Speech Processing for the Speech-Impaired and Hearing-Impaired 2

Prosody and Emotion 7
2: SST Student Day

SST Student Day - Poster Session 1

SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Maximum a Posteriori Pitch Tracking

Authors:

James Droppo, Microsoft Research (USA)
Alex Acero, Microsoft Research (USA)

Page (NA) Paper number 71

Abstract:

A Maximum a posteriori framework for computing pitch tracks as well as voicing decisions is presented. The proposed algorithm consists of creating a time-pitch energy distribution based on predictable energy that improves on the normalized cross-correlation. A large database is used to evaluate the algorithm's performance against two standard solutions, using glottal closure instants (GCI) obtained from electroglottogram (EGG) signals as a reference. The new MAP algorithm exhibits higher pitch accuracy and better voiced/unvoiced discrimination.

SL980071.PDF (From Author) SL980071.PDF (Rasterized)

TOP

Vowel Separation Using the Reassigned Amplitude-Modulation Spectrum

Authors:

Dekun Yang, Keele University (U.K.)
Georg F. Meyer, Keele University (U.K.)
William A. Ainsworth, Keele University (U.K.)

Page (NA) Paper number 511

Abstract:

This paper presents a method for segregating and recognizing concurrent vowels based on the amplitude modulation spectrum. Vowel segregation is accomplished by F0-guided grouping of harmonic components encoded in the amplitude modulation spectrum while vowel recognition is achieved by classifying the segregated vowel spectrum. Main features of the method are (1) the reassigned technique is employed to obtain a high resolution amplitude modulation spectrum and (2) Fisher's linear discriminant analysis is used to improve the performance of vowel classification. The method is tested on a double-vowel identification task and some preliminary results are provided.

SL980511.PDF (From Author) SL980511.PDF (Rasterized)

TOP

Feature Decorrelation Methods in Speech Recognition. A Comparative Study

Authors:

Eloi Batlle, Universitat Politecnica de Catalunya (Spain)
Climent Nadeu, Universitat Politecnica de Catalunya (Spain)
José A.R. Fonollosa, Universitat Politecnica de Catalunya (Spain)

Page (NA) Paper number 473

Abstract:

In this paper we study various decorrelation methods for the features used in speech recognition and we compare the performance of each one by running several tests with a speech database. First of all we study the Principal Components Analysis (PCA). PCA extracts the dimensions along which the data vary the most, and thus it allows us to reduce the dimension of the data points without significant loss of performance. The second transform we study is the Discrete Cosine Transform (DCT). As it will be shown, it is an approximation of the PCA analysis. By applying this transform to FBE parameters we obtain the MFCC coefficients. A further step is taken with the Linear Discriminant Analysis (LDA), which, not only reduces the dimensionality of the problem, but also discriminates among classes to reduce the confusion error. The last method we study is Frequency Filtering (FF). This method consists of a linear filtering of the frequency sequence of the log FBE that both decorrelates and equalizes the variance of the coefficients.

SL980473.PDF (From Author) SL980473.PDF (Rasterized)

TOP

Multi-Resolution for Speech Analysis

Authors:

Marie-José Caraty, Laboratoire d'Informatique de Paris 6 (France)
Claude Montacié, Laboratoire d'Informatique de Paris 6 (France)

Page (NA) Paper number 1142

Abstract:

In the purpose to deal with artifact on observations measurements resulting from usual speech processing, we propose to extend the representation of the speech signal by taking a sequence of sets of observations instead of a simple sequence of observations. A set of observations is computed from temporal Multi-Resolution (MR) analysis. This method is designed to be adapted to any usual mode and technique of analysis. Its originality is to take into account two main variations in the analysis, -the center of the frame and -the duration of the frame. In speech processing, multi-resolution analysis has many applications. MR analysis is a basic representation -to locate the stationary and non-stationary parts of speech from the inertia computation, -to select the best representative observation from centroid or generalized centroid. Preliminary experiments are presented. The first one consists in the MR analysis of pieces of the French and the English-American speech databases (i.e., TIMIT, BREF80) and on the inertia as a criterion of location of stationary and non-stationary parts of the speech signal. The second one is on the computation of the phoneme prototypes of the two speech databases. At last, some perspectives are discussed.

SL981142.PDF (From Author) SL981142.PDF (Rasterized)

TOP

Dynamic features in Children's Vowels

Authors:

Steve Cassidy, SHLRC, Macquarie University (Australia)
Catherine Watson, SHLRC, Macquarie University (Australia)

Page (NA) Paper number 664

Abstract:

As part of a long term project to develop speech recognitions systems for young computer users, specifically children aged between 6 and 11 years, this paper presents a preliminary investigation into the classification of children's vowels. In earlier studies of adult speech we found that dynamic or time-varying cues were useful in classifying diphthongal vowels but provided no advantage for monophthongs if duration is included as an additional cue. In this study we investigate whether dynamic cues (modelled by Discrete Cosine Transform coefficients) are present to a greater or lesser extent in children's vowels. Our hypothesis is that some of the observed variability in children's vowels may be due to systematic time-varying features. We found that the children's monophthong data was better separated by a combination of DCT coefficients and vowel duration than by the formant data sampled at the vowel midpoint plus duration. This result contrasts with our finding on Australian adult data in which we found it was necessary to model the formant trajectory only to separate the diphthongs.

SL980664.PDF (From Author) SL980664.PDF (Rasterized)

TOP

Effectiveness of Phase-Corrected Rasta for Continuous Speech Recognition

Authors:

Johan de Veth, A2RT, Dept. of Language & Speech, University of Nijmegen (The Netherlands)
Louis Boves, A2RT, Dept. of Language & Speech, University of Nijmegen (The Netherlands)

Page (NA) Paper number 359

Abstract:

Phase-corrected RASTA is a new technique for channel normalization that consists of classical RASTA filtering followed by a phase correction operation. In this manner, the channel bias is as effectively removed as with classical RASTA, without introducing a left context dependency. The performance of the phase-corrected RASTA channel normalization technique was evaluated for a continuous speech recognition task. Using context-independent hidden Markov models we found that phase-corrected RASTA reduces the best-sentence word error rate (WER) by 23 % compared to classical RASTA. For context-dependent models phase-corrected RASTA reduces WER by 15 % compared to classical RASTA.

SL980359.PDF (From Author) SL980359.PDF (Rasterized)

TOP

Techniques For Capturing Temporal Variations In Speech Signals With Fixed-Rate Processing

Authors:

Satya Dharanipragada, IBM TJ Watson Research Center (USA)
Ramesh A. Gopinath, IBM TJ Watson Research Center (USA)
Bhaskar D. Rao, University of California, San Diego (USA)

Page (NA) Paper number 590

Abstract:

Fixed-rate feature extraction which is used in most current speech recognizers is equivalent to sampling the feature trajectories at a uniform rate. Often this sampling rate is well below the Nyquist rate and thus leads to distortions in the sampled feature stream due to aliasing. In this paper we explore various techniques, ranging from simple cepstral and spectral smoothing to filtering and data-driven dimensionality expansion using Linear Discriminant Analysis (LDA), to counter aliasing and the variable rate nature of information in speech signals. Smoothing in the spectral domain results in a reduction in the variance of the short term spectral estimates which directly translates to reduction in the variances of the Gaussians in the acoustic models. With these techniques we obtain modest improvements, both in word error rate and robustness to noise, on large vocabulary speech recognition tasks.

SL980590.PDF (From Author) SL980590.PDF (Rasterized)

TOP

Automatic Detection of Landmark for Nasal Consonants from Speech Waveform

Authors:

Limin Du, Inst Acoustics, Chinese Acad Sci (China)
Kenneth Noble Stevens, Dept Electrical Engineering and Computer Science, Massachusetts Institute of Technology (USA)

Page (NA) Paper number 302

Abstract:

A knowledge-based approach towards automatically detecting nasal landmarks (/m/, /n/, and /ng/) from speech waveform is developed. The acoustic characteristics Fn1 locus calculated on each frame of speech waveform as the mass center of spectrum amplitude in the vicinity of the lowest spectral prominence between 150-1000Hz, and A23 locus calculated on the same speech frame as a band energy between 1000-3000Hz were incorporated together to construct the nasal landmark detector, which alarms at the instants of closure and release of nasal murmur. Experiment observations on the acoustic characteristics of Fn1 and A23 and the nasal consonant landmark detection results on the VCV database are also presented.

SL980302.PDF (From Author) SL980302.PDF (Rasterized)

TOP

Plug and Play Software for Designing High-Level Speech Processing Systems

Authors:

Thierry Dutoit, Faculte Polytechnique de Mons (Belgium)
Juergen Shroeter, AT&T Labs - Research (USA)

Page (NA) Paper number 520

Abstract:

Software engineering for research and development in the area of signal processing is by no means unimportant. For speech processing, in particular, it should be a priority: given the intrinsic complexity of text-to-speech or recognition systems, there is little hope to do state-of-the-art research without solid and extensible code. This paper describes a simple and efficient methodology for the design of maximally reusable and extensible software components for speech and signal processing. The resulting programming paradigm allows software components to be advantageously combined with each other in a way that recalls the concept of hardware plug-and-play, without the need for incorporating complex schedulers to control data flows. It has been successfully used for the design of a software library for high-level speech processing systems at AT&T Labs, as well as for several other large-scale software projects.

SL980520.PDF (From Author) SL980520.PDF (Rasterized)

TOP

Creating Speaker Independent HMM Models for Restricted Database Using STRAIGHT-TEMPO Morphing

Authors:

Alexandre Girardi, NAIST - Nara Institute of Science and Technology (Japan)
Kiyohiro Shikano, NAIST - Nara Institute of Science and Technology (Japan)
Satoshi Nakamura, NAIST - Nara Institute of Science and Technology (Japan)

Page (NA) Paper number 687

Abstract:

In speaker independent speech recognition, one problem we often face is the insufficient database for training. This problem is even more serious for children database. Besides, adult data when used as children data is affected by differences in pitch and spectral frequency stretch that affects recognition. In this paper, as an approach to solve the above problem, we applied STRAIGHT-TEMPO algorithm to morph adult data towards children data, in order to construct more robust HMM acoustic models, as well as to study the effect of a combined change in the pitch and spectral frequency stretch of the original utterances in the database. Using the morphed database, we analyzed the level of improvement that can be obtained, in terms of recognition rate, compared with non morphed data.

SL980687.PDF (From Author) SL980687.PDF (Rasterized)

TOP

Restoration Of Hyperbaric Speech By Correction Of The Formants And The Pitch

Authors:

Laure Charonnat, ENSSAT (France)
Michel Guitton, ENSSAT (France)
Joel Crestel, ENSSAT (France)
Gerome Allée, ENSSAT (France)

Page (NA) Paper number 1119

Abstract:

This paper describes an hyperbaric speech processing algorithm combining a restoration of the formants position and a correction of the pitch. The pitch is corrected using an algorithm of time-scale modification associated to an oversampling module. This operation does not only perform a shift of the fundamental frequency, but induces a shift of the other frequencies of the signal. This shift, as well as the formants shift due to the hyperbaric environment, is corrected by the formants restoration module, based on the linear speech production model.

SL981119.PDF (From Author) SL981119.PDF (Rasterized)

TOP

Voice Conversion Based on Parameter Transformation

Authors:

Juana M. Gutiérrez-Arriola, Grupo de Tecnología del Habla- IEL- UPM (Spain)
Yung-Sheng Hsiao, Mind Machine Interaction Center. Electronic and Computer Engineer Department. University of Florida (USA)
Juan Manuel Montero, Grupo de Tecnología del Habla- IEL- UPM (Spain)
José Manuel Pardo, Grupo de Tecnología del Habla- IEL- UPM (Spain)
Donald G. Childers, Mind Machine Interaction Center. Electronic and Computer Engineer Department. UF (USA)

Page (NA) Paper number 468

Abstract:

This paper describes a voice conversion system based on parameter transformation. Voice conversion is a process of making one person's voice "source" sound like another person's voice "target". We will present a voice conversion scheme consisting of three stages. First an analysis is performed on the natural speech to obtain the acoustical parameters. These parameters will be voiced and unvoiced regions, the glottal source model, pitch, energy, formants and bandwidths. Once these parameters have been obtained for two different speakers they are transformed using linear functions. Finally the transformed parameters are synthesized by means of a formant synthesizer. Experiments will show that this scheme is effective in transforming the speaker individuality. It will also be shown that the transformation can not be unique from one speaker to another but it has to be divided in several functions each to transform a certain part of the speech signal. Segmentation based on spectral stability will divide the sentence into parts, for each segment a transformation function will be applied.

SL980468.PDF (From Author) SL980468.PDF (Rasterized)

TOP

Noise Robust Two-Stream Auditory Feature Extraction Method for Speech Recognition

Authors:

Jilei Tian, Nokia Research Center (Finland)
Ramalingam Hariharan, Nokia Research Center (Finland)
Kari Laurila, Nokia Research Center (Finland)

Page (NA) Paper number 325

Abstract:

Part of the problems in noise robust speech recognition can be attributed to poor acoustic modeling and use of inappropriate features. It is known that the human auditory system is superior to the best speech recognizer currently available. Hence, in this paper, we propose a new two-stream feature extractor that incorporates some of the key functions of the peripheral auditory subsystem. To enhance noise robustness, the input is divided into low-pass and high-pass channels to form so-called static and dynamic streams. These two streams are independently processed and recombined to produce a single stream, containing 13 feature vector components, with improved linguistic information. Speaker-dependent isolated-word recognition tests, using the proposed front-end, produced an average 39% and 17% error rate reductions, over all noisy environments, as compared to the standard Mel Frequency Cepstral Coefficient (MFCC) front-ends with 13 (statics only) and 26 (statics and deltas) feature vector components, respectively.

SL980325.PDF (From Author) SL980325.PDF (Rasterized)

TOP

Heterogeneous Measurements and Multiple Classifiers for Speech Recognition

Authors:

Andrew K. Halberstadt, MIT Laboratory for Computer Science (USA)
James R. Glass, MIT Laboratory for Computer Science (USA)

Page (NA) Paper number 396

Abstract:

This paper addresses the problem of acoustic phonetic modeling. First, heterogeneous acoustic measurements are chosen in order to maximize the acoustic-phonetic information extracted from the speech signal in preprocessing. Second, classifier systems are presented for successfully utilizing high-dimensional acoustic measurement spaces. The techniques used for achieving these two goals can be broadly categorized as hierarchical, committee-based, or a hybrid of these two. This paper presents committee-based and hybrid approaches. In context-independent classification and context-dependent recognition on the TIMIT core test set using 39 classes, the system achieved error rates of 18.3% and 24.4%, respectively. These error rates are the lowest we have seen reported on these tasks. In addition, experiments with a telephone-based weather information word recognition task led to word error rate reductions of 10-16%.

SL980396.PDF (From Author) SL980396.PDF (Rasterized)

TOP

Joint Recognition and Segmentation Using Phonetically Derived Features and a Hybrid Phoneme Model

Authors:

Naomi Harte, The Queen's University of Belfast (Ireland)
Saeed Vaseghi, The Queen's University of Belfast (Ireland)
Ben Milner, BT Research Laboratories (U.K.)

Page (NA) Paper number 259

Abstract:

This paper encompasses the approaches of segmental modelling and the use of dynamic features in addressing the constraints of the IID assumption in standard HMM. Phonetic features are introduced which capture the transitional dynamics across a phoneme unit via a DCT transformation of a variable length segment. Alongside this, the use of a hybrid phoneme model is proposed. Classification experiments demonstrate the potential of these features and this model to match the performance of standard HMM. The extension of these features to full recognition is explored and details of a novel recognition framework presented alongside preliminary results. Lattice rescoring based on these models and features is also explored. This reduces the set of segmentations considered allowing a more detailed exploration of the nature of the model and features and the challenges in using the proposed recognition strategy.

SL980259.PDF (From Author) SL980259.PDF (Rasterized)

TOP

TRAPS - Classifiers Of Temporal Patterns

Authors:

Hynek Hermansky, Oregon Graduate Institute Of Science And Technology (USA)
Sangita Sharma, Oregon Graduate Institute Of Science And Technology (USA)

Page (NA) Paper number 615

Abstract:

The work proposes a radically different set of features for ASR where TempoRAl Patterns of spectral energies are used in place of the conventional spectral patterns. The approach has several inherent advantages, among them robustness to stationary or slowly varying disturbances.

SL980615.PDF (From Author) SL980615.PDF (Rasterized)

TOP

Robust Measurement of Fundamental Frequency and Degree of Voicing

Authors:

John N. Holmes, Consultant (U.K.)

Page (NA) Paper number 351

Abstract:

Both for robust fundamental frequency (F0) measurement and to provide a degree of voicing indication, a new algorithm has been developed based on multi-channel autocorrelation analysis. The speech is filtered into eight separate frequency bands, representing the lowest 500 Hz and seven overlapping band-pass channels each about 1000 Hz wide. The outputs of all the band-pass channels are full-wave rectified and band-pass filtered between 50 Hz and 500 Hz. Autocorrelation functions are calculated for the signals from all eight channels, and these functions are used both for the F0 measurement and for the voicing indication. Optional dynamic programming is provided to maximize the continuity of position of the correlation peaks selected for fundamental period measurement. The algorithm has been designed for coding onto a 16-bit integer DSP, using less than 4 MIPS processing power and 1500 words of data memory.

SL980351.PDF (From Author) SL980351.PDF (Rasterized)

TOP

Micropower Electro-Magnetic Sensors for Speech Characterization, Recognition, Verification, and other applications

Authors:

John F. Holzrichter, Lawrence Livermore National Laboratory (USA)
Gregory C. Burnett, Lawrence Livermore National Laboratory (USA)
Todd J. Gable, Lawrence Livermore National Laboratory (USA)
Lawrence C. Ng, Lawrence Livermore National Laboratory (USA)

Page (NA) Paper number 1064

Abstract:

Experiments have been conducted using a variety of very low power EM sensors that measure articulator motions occurring in two frequency bands, 1 Hz to 20Hz, and 70 Hz to 7 kHz. They enable noise free estimates of a voiced excitation function, accurate pitch measurements, generalized transfer function descriptions, and detection of vocal articulator motions.

SL981064.PDF (From Author) SL981064.PDF (Rasterized)

TOP

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Authors:

Jia-Lin Shen, Institute of Information Science, Academia Sinica (Taiwan)
Jeih-Weih Hung, Institute of Information Science, Academia Sinica (Taiwan)
Lin-Shan Lee, Institute of Information Science, Academia Sinica (Taiwan)

Page (NA) Paper number 232

Abstract:

This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments. Instead of using the conventional energy-based features, the spectral entropy is developed to identify the speech segments accurately. Experimental results show that this algorithm outperforms the energy-based algorithms in both detection accuracy and recognition performance under noisy environments, with an average error rate reduction of more than 16%.

SL980232.PDF (From Author) SL980232.PDF (Rasterized)

TOP

Statistical Integration of Temporal Filter Banks for Robust Speech Recognition Using Linear Discriminant Analysis (LDA)

Authors:

Jia-Lin Shen, Institute of Information Science, Academia Sinica (Taiwan)
Wen-Liang Hwang, Institute of Information Science, Academia Sinica (Taiwan)

Page (NA) Paper number 447

Abstract:

This paper presents a study on statistical integration of temporal filter banks for robust speech recognition using linear discriminant analysis (LDA). The temporal properties of stationary features were first captured and represented using a bank of well-defined temporal filters. Then these derived temporal features can be integrated and compressed using the LDA technique. Experimental results show that the recognition performance can be significantly improved both in clean and in noisy environments.

SL980447.PDF (From Author) SL980447.PDF (Rasterized)

TOP

Feature-Based Approach to Speech Recognition

Authors:

Dorota J. Iskra, University of Birmingham (U.K.)
William H. Edmondson, University of Birmingham (U.K.)

Page (NA) Paper number 778

Abstract:

The alternative approach to speech recognition proposed here is based on pseudo-articulatory representations (PARs), which can be described as approximations of distinctive features, and aims to establish a mapping between them and their acoustic specifications (in this case cepstral coefficients). This mapping which is used as the basis for recognition is first done for vowels. It is obtained using multiple regression analysis after all the vowels have been described in terms of phonetic features and an average cepstral vector has been calculated for each of them. Based on this vowel model, the PAR values are calculated for consonants. At this point recognition is performed using a brute search mechanism to derive PAR trajectories and subsequently dynamic programming to obtain a phone sequence. The results are not as good as when hidden Markov modelling is used, but very promising taking into account the early stage of the experiments and the novelty of the approach.

SL980778.PDF (From Author) SL980778.PDF (Rasterized)

TOP

Periodicity Emphasis of Voice Wave using Nonlinear IIR Digital Filters and Its Applications

Authors:

Hiroyuki Kamata, Meiji University (Japan)
Akira Kaneko, Meiji University (Japan)
Yoshihisa Ishida, Meiji University (Japan)

Page (NA) Paper number 1016

Abstract:

We propose a new method for emphasizing the periodicity of voice wave using chaotic neurons, and propose a practical method to detect the fundamental frequency of human voice. The chaotic neuron is a kind of nonlinear recursive mapping proposed in the field of nonlinear theory and is usually used to generate the chaotic signal. Besides, when the chaotic neuron is considered in the theory of linear signal processing, we can interpret that the chaotic neuron is a positive feedback IIR digital filter of first order, therefore, it gives a spectrum slope to the target spectrum of input speech signal. In this study, we try to tune up the chaotic neuron to amplify the low frequency components to emphasize the component of fundamental frequency. As the result, spectrum peaks based on the formants are canceled, the spectrum peak corresponded to the fundamental frequency of voiced speech can be detected easily. In addition, a nonlinear function that has a dead band is included in the feedback loop of the chaotic neuron. As its effect, noise components of unvoiced speech are not amplified.

SL981016.PDF (From Author) SL981016.PDF (Rasterized)

TOP

Speech Recognition Via Phonetically Featured Syllables

Authors:

Simon King, University of Edinburgh (U.K.)
Todd Stephenson, University of Edinburgh (U.K.)
Stephen Isard, University of Edinburgh (U.K.)
Paul Taylor, University of Edinburgh (U.K.)
Alex Strachan, University of Edinburgh (U.K.)

Page (NA) Paper number 557

Abstract:

We describe a speech recogniser which uses a speech production-motivated phonetic-feature description of speech. We argue that this is a natural way to describe the speech signal and offers an efficient intermediate parameterisation for use in speech recognition. We also propose to model this description at the syllable rather than phone level. The ultimate goal of this work is to generate syllable models whose parameters explicitly describe the trajectories of the phonetic features of the syllable. We hope to move away from Hidden Markov Models (HMMs) of context-dependent phone units. As a step towards this, we present a preliminary system which consists of two parts: recognition of the phonetic features from the speech signal using a neural network; and decoding of the feature-based description into phonemes using HMMs.

SL980557.PDF (From Author) SL980557.PDF (Rasterized)

TOP

Do Phonetic Features Help to Improve Consonant Identification in ASR?

Authors:

Jacques Koreman, University of the Saarland, Institute of Phonetics (Germany)
Bistra Andreeva, University of the Saarland, Institute of Phonetics (Germany)
William J. Barry, University of the Saarland, Institute of Phonetics (Germany)

Page (NA) Paper number 549

Abstract:

The hidden Markov modelling experiments presented in this paper show that consonant identification results can be improved substantially if a neural network is used to extract linguistically relevant information from the acoustic signal before applying hidden Markov modelling. The neural network - or in this case a combination of two Kohonen networks - takes 12 mel-frequency cepstral coefficients, overall energy and the corresponding delta parameters as input and outputs distinctive phonetic features, like [(plus-minus)uvular] and [(plus-minus)plosive]. Not only does this preprocessing of the data lead to better consonant identification rates, the confusions that occur between the consonants are less severe from a phonetic viewpoint, as is demonstrated. One reason for the improved consonant identification is that the acoustically variable consonant realisations can be mapped onto identical phonetic features by the neural network. This makes the input to hidden Markov modelling more homogenous and improves consonant identification. Furthermore, by using phonetic features the neural network helps the system to focus on linguistically relevant information in the acoustic signal.

SL980549.PDF (From Author) SL980549.PDF (Rasterized)

0549_01.PDF (was: 0549_01.GIF)	Consonant confusion matrix for hidden Markov modelling experiment using mapping of acoustic parameters onto phonetic features File type: Image File Format: Image : GIF Tech. description: 480 x 270, 24 bits per pixel Creating Application:: xv Creating OS: UNIX, sun4\_solaris 2.6
0549_02.PDF (was: 0549_02.GIF)	Consonant confusion matrix for hidden Markov modelling experiment using acoustic parameters as input directly File type: Image File Format: Image : GIF Tech. description: 480 x 270, 24 bits per pixel Creating Application:: xv Creating OS: UNIX, sun4\_solaris 2.6

TOP

Perceptual and Acoustic Properties of Phonemes in Continuous Speech for Different Speaking Rate

Authors:

Hisao Kuwabara, Teikyo University of Science & Technology (Japan)

Page (NA) Paper number 34

Abstract:

Investigations have been made on the perceptual properties of CV-syllables taken out from continuous speech spoken with three different speaking rate. Fifteen short Japanese sentences were spoke by a male speaker with 1) fast speaking rate, 2) normal rate and 3) slow rate. The results reveal that individual syllables do not have enough phonetic information to be correctly identified especially for the fast speech. The average identification of syllables for the fast speech is 35%, 59% for the normal and 86% for the slow. It has been found that syllable perception almost entirely depends on the consonant identification.

SL980034.PDF (From Author) SL980034.PDF (Rasterized)

TOP

On Robust Sequential Estimator Based on T-Distribution with Forgetting Factor for Speech Analysis

Authors:

Joohun Lee, Dong-Ah Broadcasting College (Korea)
Ki Yong Lee, Soongsil University (Korea)

Page (NA) Paper number 296

Abstract:

In this paper, to estimate the time-varying speech parameters having non-Gaussian excitation source, we use the robust sequential estimator(RSE) based on t-distribution and introduce the forgetting factor. By using the RSE based on t-distribution with small degree of freedom, we can alleviate efficiently the effects of outliers to obtain the better performance of parameter estimation. Moreover, by the forgetting factor, the proposed algorithm can estimate the accurate parameters under the rapid variation of speech signal.

SL980296.PDF (From Author) SL980296.PDF (Rasterized)

TOP

Discriminant Wavelet Basis Construction for Speech Recognition

Authors:

Christopher John Long, Loughborough University (U.K.)
Sekharajit Datta, Loughborough University (U.K.)

Page (NA) Paper number 802

Abstract:

In this paper, a new feature extraction methodology based on Wavelet Transforms is examined, which unlike some conventional parameterisation techniques, is flexible enough to cope with the broadly differing characteristics of typical speech signals. A training phase is involved during which the final classifier is invoked to associate a cost function (a proxy for misclassification) with a given resolution. The sub spaces are then searched and pruned to provide a Wavelet Basis best suited to the classification problem. Comparative results are given illustrating some improvement over the Short-Time Fourier Transform using two differing subclasses of speech.

SL980802.PDF (From Author) SL980802.PDF (Rasterized)

TOP

An Efficient Mel-LPC Analysis Method for Speech Recognition

Authors:

Hiroshi Matsumoto, Dept. of Electrical & Electronic Eng., Faculty of Eng., Shinshu University (Japan)
Yoshihisa Nakatoh, Multimedia Development Center, Matsushita Electric Industrial Co., Ltd. (Japan)
Yoshinori Furuhata, Dept. of Electrical & Electronic Eng., Faculty of Eng., Shinshu University (Japan)

Page (NA) Paper number 47

Abstract:

This paper proposes a simple and efficient time domain technique to estimate an all-pole model on a mel-frequency axis (Mel-LPC), i.e., a bilinear transformed all-pole model by Strube. Autocorrelation coefficients on the mel-frequency axis are exactly derived by computing cross-correlation coefficients between speech signal and all-pass filtered one without any approximation. This method requires only two-fold computational cost as compared to conventional linear prediction analysis. The recognition performance of mel-cepstral parameters obtained by the Mel LPC analysis is compared with those of conventional LP mel-cepstra and the mel-frequency cepstrum coefficients (MFCC) through gender-dependent phoneme and word recognition tests. The results show that the Mel-LPC cepstrum attains a significant improvement in recognition accuracy over conventional LP mel-cepstrum, and gives slightly higher accuracy for male speakers and slightly lower accuracy for female speakers than MFCC.

SL980047.PDF (From Author) SL980047.PDF (Rasterized)

TOP

Discriminative Weighting of Multi-Resolution Sub-Band Cepstral Features for Speech Recognition

Authors:

Philip McMahon, The Queen's University of Belfast (Ireland)
Paul McCourt, The Queen's University of Belfast (Ireland)
Saeed Vaseghi, The Queen's University of Belfast (Ireland)

Page (NA) Paper number 315

Abstract:

This paper explores possible strategies for the recombination of independent multi-resolution sub-band based recognisers. The multi-resolution approach is based on the premise that additional cues for phonetic discrimination may exist in the spectral correlates of a particular sub-band, but not in another. Weights are derived via discriminative training using the 'Minimum Classification Error' (MCE) criterion on log-likelihood scores. Using this criterion the weights for correct and competing classes are adjusted in opposite directions, thus conveying the sense of enforcing separation of confusable classes. Discriminative re-combination is shown to provide significant increases for both phone classification and continuous recognition tasks on the TIMIT database. Weighted recombination of independent multi-resolution sub-band models is also shown to provide robustness improvements in broadband noise.

SL980315.PDF (From Author) SL980315.PDF (Rasterized)

TOP

Separation of Singing and Piano Sounds

Authors:

Yoram Meron, University of Tokyo (Japan)
Keikichi Hirose, University of Tokyo (Japan)

Page (NA) Paper number 416

Abstract:

Our goal is to develop a singing synthesis system, in which "singing units" are automatically extracted from existing musical recordings of a singer accompanied by a musical instrument (piano). This paper concentrates on the problem of separating singer from accompaniment. Existing separation methods require the knowledge of the exact frequencies of the signals to be separated, and are prone to degrading the quality of the separated signals. In this paper, we use the framework of the sinusoidal modeling approach. We suggest the use of further sources of information, available to the specific task: advance knowledge of the music score, knowledge about the piano sound, and a relatively large database of the piano and singer signals, which is used to build a model of the piano sound. Results show that using musical score information and piano note modeling can improve separation quality.

SL980416.PDF (From Author) SL980416.PDF (Rasterized)

TOP

Modeling of Variations in Cepstral Coefficients Caused by F0 Changes and its Application to Speech Processing

Authors:

Nobuaki Minematsu, Toyohashi Univ. of Tech. (Japan)
Seiichi Nakagawa, Toyohashi Univ. of Tech. (Japan)

Page (NA) Paper number 52

Abstract:

Correlation of spectral variations and F0 changes in a vowel is firstly analyzed, where the variations are also compared to VQ distortions calculated in a five-vowel space. It is shown that the F0 change approximately by a half octave produces the spectral variation comparable to the VQ distortion when the codebook size is the number of the vowels. Next, a model to predict the cepstral coefficients' variations caused by the F0 changes is built using the multivariate regression analysis. Experiments show that the generated frame by the model has a remarkably small distance to the target frame. Furthermore, the model is evaluated separately in terms of a spectral envelope predictor with a given F0 and a mapping function of feature sub-spaces. While the models should be built dependently on phonemes and speakers as the former, adequate selection of parameters can enable the speaker/phoneme-independent models to work effectively as the latter.

SL980052.PDF (From Author) SL980052.PDF (Rasterized)

TOP

A Detection Framework for Locating Phonetic Events

Authors:

Partha Niyogi, Bell Labs - Lucent Technologies (USA)
Partha Mitra, Bell Labs - Lucent Technologies (USA)
Man Mohan Sondhi, Bell Labs - Lucent Technologies (USA)

Page (NA) Paper number 665

Abstract:

We consider the problem of detecting stop consonants in continuously spoken speech. We pose the problem as one of finding the optimal filter (linear or non-linear) that operates on a particular appropriately chosen representation. We discuss the performance of several variants of a canonical stop detector and consider its implications for human and machine speech recognition.

SL980665.PDF (From Author) SL980665.PDF (Rasterized)

TOP

On Frequency Averaging For Spectral Analysis In Speech Recognition

Authors:

Climent Nadeu, Universitat Politècnica de Catalunya (Spain)
Fèlix Galindo, Universitat Politècnica de Catalunya (Spain)
Jaume Padrell, Universitat Politècnica de Catalunya (Spain)

Page (NA) Paper number 1135

Abstract:

Many speech recognition systems use logarithmic filter-bank energies or a linear transformation of them to represent the speech signal. Usually, each of those energies is routinely computed as a weighted average of the periodogram samples that lie in the corresponding frequency band. In this work, we attempt to gain an insight into the statistical properties of the frequency-averaged periodogram (FAP) from which those energies are samples. Thus, we have shown that the FAP is statistically and asymptotically equivalent to a multiwindow estimator that arises from the Thomson[HEX 146]s optimization approach and uses orthogonal sinusoids as windows. The FAP and other multiwindow estimators are tested in a speech recognition application, observing the influence of several design factors. Particularly, a technique that is computationally simple like the FAP[HEX 146]s one, and which is equivalent to use multiple cosine windows, appears as an alternative to be taken into consideration.

SL981135.PDF (From Author) SL981135.PDF (Rasterized)

TOP

Wavelet Transform Domain Blind Equalization and Its Application to Speech Analysis

Authors:

Munehiro Namba, Meiji University (Japan)
Yoshihisa Ishida, Meiji University (Japan)

Page (NA) Paper number 55

Abstract:

In this paper, a wavelet transform domain realization of the blind equalization technique termed as EVA is applied to speech analysis. The conventional linear prediction problem can be viewed as a constrained blind equalization problem. Because the EVA does not impose any restriction to the probability distribution in the input (the glottal excitation), the principal features of speech can be effectively separated from a speech in a short duration. The computational complexity will be a problem, but the proposed implementation in a wavelet transform domain promotes the faster convergence in the analysis of speech signal.

SL980055.PDF (From Author) SL980055.PDF (Rasterized)

TOP

A Novel Method of Formant Analysis and Glottal Inverse Filtering

Authors:

Steve Pearson, Panasonic Technologies, Inc./Speech Technology Lab (USA)

Page (NA) Paper number 647

Abstract:

This paper presents a class of methods for automatically extracting formant parameters from speech. The methods rely on an iterative optimization algorithm. It was found that formant parameter data derived with these methods was less prone to discontinuity errors than conventional methods. Also, experiments were conducted that demonstrated that these methods are capable of better accuracy in formant estimation than LPC, especially for the first formant. In some cases, the analytic (non-iterative) solution has been derived, making real time applications feasible. The main target that we have been pursuing is text-to-speech (TTS) conversion. These methods are being used to automatically analyze a concatenation database, without the need for a tuning phase to fix errors. In addition, they are instrumental in realizing high quality pitch tracking, and pitch epoch marking.

SL980647.PDF (From Author) SL980647.PDF (Rasterized)

TOP

Vector Quantizer Acceleration for an Automatic Speech Recognition Application

Authors:

António J. Araújo, FEUP/INESC (Portugal)
Vitor C. Pera, FEUP (Portugal)
Marcio N. Souza, UFRJ (Brazil)

Page (NA) Paper number 319

Abstract:

For a real-time application of an automatic speech recognition system, hardware acceleration can be the key to reduce the execution time. Vector quantization is an important task that a recognizer based on discrete hidden Markov models must perform. Due to the amount of floating point operations executed, the vector quantizer is an excellent candidate to be accelerated by customized hardware. The design, implementation and obtained results of a hardware solution based on field programmable gate array devices are presented.

SL980319.PDF (From Author) SL980319.PDF (Rasterized)

TOP

Local Speech Rate as a Combination of Syllable and Phone Rate

Authors:

Hartmut R. Pfitzinger, Department of Phonetics, University of Munich (Germany)

Page (NA) Paper number 523

Abstract:

This investigation focuses on deriving local speech rate directly out of the speech signal, which differs from syllable rate and from phone rate. Since local speech rate modifies acoustic cues (e.g. transitions), phones, and even words, it is one of the most important prosodic cues. Our local speech rate estimation method is based on a linear combination of the syllable rate and of the phone rate, since this investigation strongly suggests that neither the syllable rate nor the phone rate on its own represent the speech rate sufficiently. Our results show (a) that perceptual local speech rate correlates better with local syllable rate than with local phone rate (r=0.81>r=0.73), (b) that the linear combination of both is well-correlated with perceptual local speech rate (r=0.88), and (c) that it is now possible to calculate the perceptual local speech rate with the aid of automatic phone boundary detectors and syllable nuclei detectors directly from the speech signal.

SL980523.PDF (From Author) SL980523.PDF (Rasterized)

TOP

Recovering Gestures From Speech Signals: A Preliminary Study for Nasal Vowels

Authors:

Solange Rossato, Institut de la Communication Parlée de Grenoble (France)
Gang Feng, Institut de la Communication Parlée de Grenoble (France)
Rafaël Laboissière, Institut de la Communication Parlée de Grenoble (France)

Page (NA) Paper number 540

Abstract:

For nasal vowels, a gesture as simple as the lowering of the velum produces complex acoustic spectra. However, we still find a relative simplicity in the perceptual space; nasality is perceived easily. In this preliminary study, we use statistic method to recover the gesture of the velum. In order to reduce the extreme variability of nasal vowels, we introduced a simulation based on Maeda's model instead of using a natural speech signal. In previous studies, nasality is supposed to increase either with size of the nasal area or with the area ratio between nasal and oral tracts at the extremity of the velum. In this work, both types of data are considered and analyzed with linear and non-linear tools. Finally, statistic inference is described and results are given for various areas of the nasal tract entrance and for various area ratios. The results show that velar port area is correctly estimated for small values while area ratio is a better parameter when velar port area increases.

SL980540.PDF (From Author) SL980540.PDF (Rasterized)

TOP

Extended Linear Discriminant Analysis (ELDA) for Speech Recognition

Authors:

Guenther Ruske, Institute for Human-Machine-Communication, Technical University of Munich (Germany)
Robert Faltlhauser, Institute for Human-Machine-Communication, Technical University of Munich (Germany)
Thilo Pfau, Institute for Human-Machine-Communication, Technical University of Munich (Germany)

Page (NA) Paper number 100

Abstract:

Speech recognition systems based on hidden Markov models (HMM) favourably apply a linear discriminant analysis transform (LDA) which yields low-dimensional and uncorrelated feature components. However, since the distributions in the HMM states usually are modeled by mixture gaussian densities, the description by second-order moments no longer is correct. For this purpose we introduced a new "extended linear discriminant analysis" transform (ELDA) which starts from conventional LDA. The ELDA transform is derived by use of a gradient descent optimization procedure based on a "minimum classification error" (MCE) principle, which is applied to the original high-dimensional pattern space. The transform matrix, the best fitting prototype of the correct class (i.e. HMM state) and the nearest rival are adapted. We developed a method which additionally updates all prototypes by a separate maximum likelihood (ML) estimation step. This avoids that such means and covariances, which mostly remain unaffected by the MCE procedure, may diverge step by step.

SL980100.PDF (From Author) SL980100.PDF (Rasterized)

TOP

Speech, Silence, Music and Noise Classification of TV Broadcast Material

Authors:

Ara Samouelian, University Of Wollongong (Australia)
Jordi Robert-Ribes, Digital Media Information Systems, CSIRO Mathematical and Information Sciences (Australia)
Mike Plumpe, Microsoft Corporation (USA)

Page (NA) Paper number 620

Abstract:

Speech processing can be of great help for indexing and archiving TV broadcast material. Broadcasting station standards will be soon digital. There will be a huge increase in the use of speech processing techniques for maintaining the archives as well as accessing them. We present an application of information theory to the classification and automatic labelling of TV broadcast material into speech, music and noise. We use information theory to construct a decision tree from several different TV programs and then apply it to a different set of TV programs. We present classification results on training and test data sets. Frame level correct classification rate, for training data was 95.5%, while for test data it ranged from 60.4% to 84.5%, depending on TV program type. At the segment level, correct recognition rate and accuracy on train data were 100% and 95.1%, respectively while for test data the % correct ranged from 80% to 100% and %accuracy ranged from 64.7% to 100%.

SL980620.PDF (From Author) SL980620.PDF (Rasterized)

TOP

The Relation Between Vocal Tract Shape And Formant Frequencies Can Be Described By Means Of A System Of Coupled Differential Equations

Authors:

Jean Schoentgen, Université Libre de Bruxelles (Belgium)
Alain Soquet, Université Libre de Bruxelles (Belgium)
Véronique Lecuit, Université Libre de Bruxelles (Belgium)
Sorin Ciocea, Université Libre de Bruxelles (Belgium)

Page (NA) Paper number 1104

Abstract:

The objective is to present a formalism which offers a framework for several articulatory models and notions such as targets, gestures and the quantal principle of speech production. The formalism is based on coupled differential equations that relate the vocal tract shape to its eigenfrequencies. The shape of the vocal tract is described either directly by means of an area function model or indirectly via an articulatory model. Possible synergetic relations between phonetic gestures or targets and the quantal principle of speech production are discussed.

SL981104.PDF (From Author) SL981104.PDF (Rasterized)

TOP

Improving Speech Recognizer by Broader Acoustic-Phonetic Group Classification

Authors:

Youngjoo Suh, ETRI (Korea)
Kyuwoong Hwang, ETRI (Korea)
Oh-Wook Kwon, ETRI (Korea)
Jun Park, ETRI (Korea)

Page (NA) Paper number 638

Abstract:

We propose a new approach to improve the performance of speech recognizers by utilizing acoustic-phonetic knowledge sources. We use the unvoiced, voiced. and silence (UVS) group information of the input speech signal in the conventional speech recognizer. We extract the UVS information by, using a recurrent neural network (RNN). generate a rule-based score, and then add the score representing the INS information to the conventional spectral feature-driven score in the search module. Experimental results showed that the approach reduces 9% of errors in a 5000- word Korean spontaneous speech recognition domain.

SL980638.PDF (Scanned)

TOP

Separation of Speech Source and Filter by Time-Domain Deconvolution

Authors:

C. William Thorpe, National Voice Centre (Australia)

Page (NA) Paper number 244

Abstract:

A subtractive deconvolution algorithm is described which allows one to separate a voiced speech signal into two components, representing the time-invariant and dynamic parts of the signal respectively. The resulting dynamic component can be encoded at a lower data rate than can the original speech signal. Results are presented which validate the utility of decomposing the speech waveform into these two components, and demonstrate the ability of the algorithm to represent speech signals at a reduced data rate.

SL980244.PDF (From Author) SL980244.PDF (Rasterized)

TOP

On the Application of the AM-FM Model for the Recovery of Missing Frequency Bands of Telephone Speech

Authors:

Hesham Tolba, INRS-Telecommunications (Canada)
Douglas O'Shaughnessy, INRS-Telecommunications (Canada)

Page (NA) Paper number 343

Abstract:

This study presents a novel technique to reconstruct the missing frequency bands of bandlimited telephone speech signals. This technique is based on the Amplitude and Frequency Modulation (AM-FM) model, which models the speech signal as the sum of N successive AM-FM signals. Based on a least-mean-square error criterion, each AM-FM signal is modified using an iterative algorithm in order to regenerate the high-frequency AM-FM signals. These modified signals are then combined in order to reconstruct the broadband speech signal. Experiments were conducted using speech signals extracted from the NTIMIT database. Such experiments demonstrate the ability of the algorithm for speech recovery, in terms of a comparison between the original and synthesized speech and informal listening tests.

SL980343.PDF (From Author) SL980343.PDF (Rasterized)

TOP

Estimation of Voice Source and Vocal Tract Parameters Using Combined Subspace-Based and Amplitude Spectrum-Based Algorithm

Authors:

Chang-Sheng Yang, Utsunomiya University (Japan)
Hideki Kasuya, Utsunomiya University (Japan)

Page (NA) Paper number 1143

Abstract:

In this paper, a high quality pole-zero speech analysis technique is proposed. The speech production process is represented by a source-filter model. A Rosenberg-Klatt model is used to approximate a voicing source waveform for voiced speech, whereas a white noise is assumed for unvoiced. The vocal tract transfer function is represented by a pole-zero filter. For voiced speech, parameters of the source model are jointly estimated with those of the vocal tract filter. A combined algorithm is developed to estimate the vocal tract parameters, i.e., formants and anti-formants which are calculated from the poles and zeros of the filter. By the algorithm, poles are estimated based on a subspace algorithm, while zeros are estimated from the amplitude spectrum. For unvoiced speech, an AR model is assumed, which can be solved by LPC analysis. An experiment using synthesized nasal sounds shows that the poles and zeros are estimated quite accurately.

SL981143.PDF (From Author) SL981143.PDF (Rasterized)

TOP

The Distance Measure For Line Spectrum Pairs Applied to Speech Recognition

Authors:

Fang Zheng, Tsinghua University (China)
Zhanjiang Song, Tsinghua University (China)
Ling Li, Tsinghua University (China)
Wenjian Yu, Tsinghua University (China)
Fengzhou Zheng, Tsinghua University (China)
Wenhu Wu, Tsinghua University (China)

Page (NA) Paper number 171

Abstract:

The Line Spectrum Pair (LSP) based on the principle of linear predictive coding (LPC) plays a very important role in the speech synthesis; it has many interesting properties. Several famous speech compression / decompression algorithms, including the famous code excited linear predictive coding (CELP), are based on the LSP analysis, where the information loss or predicting errors are often very small due to the LSP's characteristics. Unfortunately till now there is not a satisfying kind of distance measure available for LSP so that this kind of features can be used for speech recognition applications. In this paper, the principle of LSP analysis is studied at first, and then several distance measures for LSP are proposed which can describe very well the difference between two groups of different LSP parameters. Experimental results are also given to show the efficiency of the proposed distance measures.

SL980171.PDF (From Author) SL980171.PDF (Rasterized)

TOP