ABSTRACT
The AM-map model [ 1 ] can. be improved by adding two supplementary integration stages: the pooled map and the identification map. The pooled map's representation corresponds to a systematic bottom-up grouping of the first harmonics extracted at the level of the primary AM map. The identitication map's representation corresponds to a classification of spectra segregated along the pitch axis. This labelling allows selection at the pooled map level of the two salient vowels according to the distribution of energy across the pitch axis. The selected labels are those associated with the higher peaks. During this selection stage, FOs are not given. Simulations show that the model is able to separate spectra according to FO differences. The model therefore predicts qualitatively (1) the ability of listeners to segregate concurrent vowels, and (2) the effects of vowels' duration and relative level on segregation performance.
ABSTRACT
This paper presents the results of psychophysical experiments dealing with pitch-marker positioning within the Pitch Synchronous OverLap and Add (PSOLA) framework. Sustained natural vowels were PSOLA-modified in fundamental frequency. The experiments were aimed at determining the auditory sensitivity to (1) deterministic shifts of either all or single pitch markers within a sequence, and (2) random shifts of all pitch markers ("jitter"). As for deterministic shifts of all pitch markers, the results were in reasonable agreement with results obtained previously for synthetic formant signals. For deterministic shifts of single pitch markers, thresholds depended on position in the sequence. Detection thresholds for jittered shifts were comparable to thresholds for detecting jitter in pulse trains. The ranking of the thresholds for these three conditions indicated that the auditory system is more sensitive to dynamic (modulation) cues rather than to static (timbral) cues arising from shifts in pitch-marker positioning.
ABSTRACT
The recognition of noised words by 4-7 years old children with normal speech (NS) and with speech and language impairments (SLI) was studied. It was shown that children in both groups have more mistakes and more long reaction time than adults. Moreover, SLI children had worse performance than NS. In both groups older (> 5 years) children recognized noised words better than the younger (< 5 years) ones. NS children perceived the words which were acquired at the early age with less mistakes than the words acquired at the older age. The relations between the development of speech perception, noise resistance and speech production are discussed.
ABSTRACT
A model that is able to predict human performance in a simultaneous glide recognition task is described. The model combines a primitive, F0 guided, segregation stage and a schema driven stage with a heuristic that models whether listeners perceive a single or two simultaneous sounds.
ABSTRACT
We address the problem of robustness of auditory models as front ends for speech recognition. Auditory models have been referred as superior front ends when speech is corrupted by noise or linear filtering, but there is not yet a deep understanding of its functioning. We analyze some commonly used auditory models and show that they present some interesting properties which are useful for robust speech recognition. In our view, the short-time adaptation provided by hair cell models is a key factor for this robustness. A disadvantage of auditory models is that the distributions of the obtained features are not well represented by gaussian pdfs. We discuss the problem of parameter transformation in order to use a standard recognizer based on CDHMMs with gaussian pdfs and present some digit recognition experiments.
ABSTRACT
The existence of multiple information carriers for a single phonemic distinction is well evident in studies of auditory and visual information integration for speech perception. Given the highly non-homogeneous nature of the auditorily-represented information carriers, we are applying the same principle withinthe auditory domain. Based on psychophysical experiments we have hypothesized that firing of "ascending sequence" cells in the primary auditory cortex is a primary information carrier for LABIAL place in stop-consonant discrimination. Partial implementation of a fuzzy-logic model for the firing of these cells, combined with a model for one other, secondary, information carrier, has yielded 1% errors in discrimination of /p/ vs. /t/ or /k/ in a "E-set", Portuguese research CV database. Exactly the same partial model, applied to /b/ vs. /d/ discrimination in an American English spelled letters database (ISOLET-1) yielded just 5% errors, providing strong evidence for the role of these cells in stop consonant discrimination across languages.