Session Th4B Auditory Modelling and Psychoacoustics

Chairperson William Ainsworth Keele Univ. , UK

Home

IMPROVING OF AMPLITUDE MODULATION MAPS FOR FO-DEPENDENT SEGREGATION OF HARMONIC SOUNDS

Authors: Frederic BERTHOMMIER* & Georg MEYER+

* Institut de la Communication Parlee/INPG 46, Av. Felix Viallet 38031 Grenoble CEDEX, FRANCE bertho@icp.grenet.fr +Department of Computer Science Keele University Keele, Staffs. STSBG, UK georg@cs.keele.ac.uk

Volume 5 pages 2483 - 2486

ABSTRACT

The AM-map model [ 1 ] can. be improved by adding two supplementary integration stages: the pooled map and the identification map. The pooled map's representation corresponds to a systematic bottom-up grouping of the first harmonics extracted at the level of the primary AM map. The identitication map's representation corresponds to a classification of spectra segregated along the pitch axis. This labelling allows selection at the pooled map level of the two salient vowels according to the distribution of energy across the pitch axis. The selected labels are those associated with the higher peaks. During this selection stage, FOs are not given. Simulations show that the model is able to separate spectra according to FO differences. The model therefore predicts qualitatively (1) the ability of listeners to segregate concurrent vowels, and (2) the effects of vowels' duration and relative level on segregation performance.

A0112.pdf

TOP

PSYCHOPHYSICAL EVALUATION OF PSOLA: NATURAL VERSUS SYNTHETIC SPEECH

Authors: R. Kortekaas and A. Kohlrausch

IPO Center for Research on User-System Interaction P.O. Box 513 - 5600 MB Eindhoven - The Netherlands E-mail: kortekaa@ipo.tue.nl kohlraus@ipo.tue.nl

Volume 5 pages 2487 - 2490

ABSTRACT

This paper presents the results of psychophysical experiments dealing with pitch-marker positioning within the Pitch Synchronous OverLap and Add (PSOLA) framework. Sustained natural vowels were PSOLA-modified in fundamental frequency. The experiments were aimed at determining the auditory sensitivity to (1) deterministic shifts of either all or single pitch markers within a sequence, and (2) random shifts of all pitch markers ("jitter"). As for deterministic shifts of all pitch markers, the results were in reasonable agreement with results obtained previously for synthetic formant signals. For deterministic shifts of single pitch markers, thresholds depended on position in the sequence. Detection thresholds for jittered shifts were comparable to thresholds for detecting jitter in pulse trains. The ranking of the thresholds for these three conditions indicated that the auditory system is more sensitive to dynamic (modulation) cues rather than to static (timbral) cues arising from shifts in pitch-marker positioning.

A0319.pdf

TOP

Perception of noised words by normal children and children with speech and language impairments

Authors: V.V. Lublinskaja, I.V. Koroleva, A.N. Kornev, E.V. Iagounova

I.P. Pavlov Institute of Physiology 199034, St.-Petersburg, nab. Makarova, 6 E-mail: chi@physiology.spb.su Reserch Instituteof Ear, Troat, Nose and Speech Pathology 198103, St.-Petersburg, Bronneckaya, 9 E-mail: vigarb@thewall.ioffe.rssi

Volume 5 pages 2491 - 2494

ABSTRACT

The recognition of noised words by 4-7 years old children with normal speech (NS) and with speech and language impairments (SLI) was studied. It was shown that children in both groups have more mistakes and more long reaction time than adults. Moreover, SLI children had worse performance than NS. In both groups older (> 5 years) children recognized noised words better than the younger (< 5 years) ones. NS children perceived the words which were acquired at the early age with less mistakes than the words acquired at the older age. The relations between the development of speech perception, noise resistance and speech production are discussed.

A0452.pdf

TOP

MODELLING THE PERCEPTION OF SIMULTANEOUS SEMI-VOWELS

Authors: G.F. Meyer (1) and W.A. Ainsworth (2)

Human and Machine Perception Research Centre Dept of Computer Science1 and Dept of Commmunication and Neuroscience2 Keele University Keele, Staffs., ST5 5BG, UK Tel ++44 1782 584111, Fax ++44 1782 713082, email {georg|bill}@cs.keele.ac.uk

Volume 5 pages 2495 - 2498

ABSTRACT

A model that is able to predict human performance in a simultaneous glide recognition task is described. The model combines a primitive, F0 guided, segregation stage and a schema driven stage with a heuristic that models whether listeners perceive a single or two simultaneous sounds.

A0476.pdf

TOP

PROPERTIES OF AUDITORY MODEL REPRESENTATIONS

Authors: Fernando S. Perdigão; Luís V. Sá

Dept. Eng. Electrotécnica & Instituto de Telecomunicações Polo II Univ. Coimbra, University of Coimbra, 3030 Coimbra - Portugal E-mail: fp@it.uc.pt, luis.sa@it.uc.pt

Volume 5 pages 2499 - 2502

ABSTRACT

We address the problem of robustness of auditory models as front ends for speech recognition. Auditory models have been referred as superior front ends when speech is corrupted by noise or linear filtering, but there is not yet a deep understanding of its functioning. We analyze some commonly used auditory models and show that they present some interesting properties which are useful for robust speech recognition. In our view, the short-time adaptation provided by hair cell models is a key factor for this robustness. A disadvantage of auditory models is that the distributions of the obtained features are not well represented by gaussian pdfs. We discuss the problem of parameter transformation in order to use a standard recognizer based on CDHMMs with gaussian pdfs and present some digit recognition experiments.

A0669.pdf

TOP

Impact of "ascending sequence" AI (auditory primary cortex) cells on stop consonant perception.

Authors: Eduardo Sá Marta, Luis Vieira de Sá

email: EMARTA@IT.UC.PT Dep. Engenharia Electrotécnica, FCTUC (Universidade de Coimbra) Instituto de Telecomunicações - Pólo de Coimbra

Volume 5 pages 2503 - 2506

ABSTRACT

The existence of multiple information carriers for a single phonemic distinction is well evident in studies of auditory and visual information integration for speech perception. Given the highly non-homogeneous nature of the auditorily-represented information carriers, we are applying the same principle withinthe auditory domain. Based on psychophysical experiments we have hypothesized that firing of "ascending sequence" cells in the primary auditory cortex is a primary information carrier for LABIAL place in stop-consonant discrimination. Partial implementation of a fuzzy-logic model for the firing of these cells, combined with a model for one other, secondary, information carrier, has yielded 1% errors in discrimination of /p/ vs. /t/ or /k/ in a "E-set", Portuguese research CV database. Exactly the same partial model, applied to /b/ vs. /d/ discrimination in an American English spelled letters database (ISOLET-1) yielded just 5% errors, providing strong evidence for the role of these cells in stop consonant discrimination across languages.

A0770.pdf