Human Speech Perception 4

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Orthografik Inkoncistensy Ephekts in Foneme Detektion?

Authors:

Anne Cutler, Max Planck Institute for Psycholinguistics (The Netherlands)
Rebecca Treiman, Wayne State University (USA)
Brit van Ooijen, Laboratoire de Sciences Cognitives et Psycholinguistique CNRS (France)

Page (NA) Paper number 94

Abstract:

The phoneme detection task is widely used in spoken word recognition research. Alphabetically literate participants, however, are more used to explicit representations of letters than of phonemes. The present study explored whether phoneme detection is sensitive to how target phonemes are, or may be, orthographically realised. Listeners detected the target sounds [b,m,t,f,s,k] in word-initial position in sequences of isolated English words. Response times were faster to the targets [b,m,t], which have consistent word-initial spelling, than to the targets [f,s,k], which are inconsistently spelled, but only when listeners' attention was drawn to spelling by the presence in the experiment of many irregularly spelled fillers. Within the inconsistent targets [f,s,k], there was no significant difference between responses to targets in words with majority and minority spellings. We conclude that performance in the phoneme detection task is not necessarily sensitive to orthographic effects, but that salient orthographic manipulation can induce such sensitivity.

SL980094.PDF (From Author) SL980094.PDF (Rasterized)

TOP


The Effect of Orthographic Knowledge on the Segmentation of Speech

Authors:

Bruce L. Derwing, University of Alberta (Canada)
Terrance M. Nearey, University of Alberta (Canada)
Yeo Bom Yoon, Seoul National University of Education (Korea)

Page (NA) Paper number 978

Abstract:

This study is part of a cross-linguistic investigation of phonological units, with emphasis on such ostensive universals as the segment (C or V), the syllable (e.g., CVC), and the rime (VC). Prior work has indicated that speakers of some languages (e.g., Chinese) may not segment words into units smaller than the whole syllable, while in other languages (e.g., Korean and Japanese) units called the body (CV) or the mora may supplant the rime. However, the native speakers tested so far were all relatively well educated, literate, and often bilingual. Thus they were all exposed to writing systems that might have influenced their performance. Since knowledge of orthography has not been controlled in previous studies, in the present research we will test speakers of English and Korean who have not been subjected to the influence of spelling. These include preliterate children, who are the focus of the study.

SL980978.PDF (From Author) SL980978.PDF (Rasterized)

TOP


Spotting (Different Types of) Words in (Different Types of) Context

Authors:

James M. McQueen, Max-Planck-Institute for Psycholinguistics (The Netherlands)
Anne Cutler, Max-Planck-Institute for Psycholinguistics (The Netherlands)

Page (NA) Paper number 33

Abstract:

Results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.

SL980033.PDF (From Author) SL980033.PDF (Rasterized)

TOP


Correlation Between Consonantal VC Transitions And Degree Of Perceptual Confusion Of Place Contrast In Hindi

Authors:

Manjari Ohala, San José State University (USA)
John J. Ohala, University of California, Berkeley (USA)

Page (NA) Paper number 238

Abstract:

A previous study of Hindi VC transitions revealed that its five places of stop articulation exhibit considerable contextual variability. We studied whether these VC formant patterns may nevertheless contain sufficient cues to differentiate the place or whether they also require the cues at the stop release. Twenty-one listeners were asked to identify the final stop in syllables with varying place and preceding vowel, with and without the final stop release. There were 86% correct judgments of place with the stop releases but 63% without. In the gated condition the responses showed a marked asymmetry: stops normally having weak bursts (labial and dental) were most often correct but were also the favored erroneous response for the other stops. The results further suggest a qualification to Steriade's claim that retroflex stop's VC place cues are more robust than release: this may be true after low vowels but is not true after /i/.

SL980238.PDF (From Author) SL980238.PDF (Rasterized)

TOP


Perception Of Tonal Rises And Falls For Accentuation And Phrasing In Swedish

Authors:

David House, Department of Languages, University of Skövde (Sweden)
Dik Hermes, IPO, Center for Research on User-System Interaction (The Netherlands)
Frédéric Beaugendre, Lernout and Hauspie Speech Products (Belgium)

Page (NA) Paper number 44

Abstract:

In previous experiments with Dutch, French and Swedish listeners, it was shown that the location in the syllable of the onset of a rising or falling pitch movement is critical for the perception of accentuation. As the onset of the pitch movement is shifted through the syllable, there is a point at which the percept of accentuation shifts from one syllable to the next. This point is termed the accentuation boundary. It has also been proposed that in certain positions, the percept of accentuation conflicts with the percept of phrasing. An experiment with Swedish listeners was carried out using the same stimuli as used for the accentuation study, but now the task was to determine the phrasing of the syllables. The results indicate that perceptual phrase boundaries can be determined in the same way as accentuation boundaries. Differences in the locations of the boundaries can be interpreted in terms of strengths of tonal cues for accentuation and phrasing.

SL980044.PDF (From Author) SL980044.PDF (Rasterized)

TOP


Speech Intelligibility Derived From Exceedingly Sparse Spectral Information

Authors:

Steven Greenberg, International Computer Science Institute (USA)
Takayuki Arai, Sophia University (Japan)
Rosaria Silipo, International Computer Science Institute (USA)

Page (NA) Paper number 74

Abstract:

Traditional models of speech assume that a detailed analysis of the acoustic spectrum is essential for understanding spoken language. The validity of this assumption was tested by partitioning the spectrum of spoken sentences into 1/3-octave channels ("slits") and measuring the intelligibility associated with each channel presented alone and in concert with the others. Four spectral channels, distributed over the speech-audio range (0.3 - 6 kHz) are sufficient for human listeners to decode sentential material with nearly 90% accuracy, although more than 70% of the spectrum is missing. Word recognition often remains relatively high (60-83%) when just two or three channels are presented concurrently, even though the intelligibility of these same slits, presented in isolation, is less than 9%. Such data suggest that intelligibility is derived from a compound "image" of the modulation spectrum distributed across the frequency spectrum. Because intelligibility seriously degrades when slits are desynchronized by more than 25 ms, this image is probably derived from both the amplitude and phase components of the modulation spectrum.

SL980074.PDF (From Author) SL980074.PDF (Rasterized)

TOP