Segmentation, Labelling and Speech Corpora 1

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Acoustic Indicators Of Topic Segmentation

Authors:

Julia Hirschberg, AT&T Labs / Research (USA)
Christine H. Nakatani, AT&T Labs / Research (USA)

Page (NA) Paper number 976

Abstract:

The segmentation of text and speech into topics and subtopics is an important step in document interpretation. For text, formatting information, such as headings and paragraphing, is available to aid in this endeavor, although this information is by no means sufficient. For speech, the task is even more difficult. We present results of the application of machine learning techniques to the automatic identification of intonational phrases beginning and ending 'topics' determined independently by annotators for two corpora | the Boston Directions Corpus and the Broadcast News (HUB-4) DARPA/NIST database.

SL980976.PDF (From Author) SL980976.PDF (Rasterized)

TOP


IViE - A Comparative Transcription system for Intonational Variation in English

Authors:

Esther Grabe, University of Cambridge (U.K.)
Francis Nolan, University of Cambridge (U.K.)
Kimberley J. Farrar, University of Cambridge (U.K.)

Page (NA) Paper number 99

Abstract:

In this paper, we offer an alternative to ToBI, the current de facto standard for machine-readable labelling of English prosody. We have three reasons for arguing that an alternative is needed. Firstly, the ToBI tone inventory is not maximally constrained; it appears to be difficult for transcribers to reach high inter-transcriber agreement scores for tone labels. Secondly, the growing demand for prosodically labelled data from non-standard varieties of English suggests a need for a transparent comparative transcription system. ToBI was not designed for this purpose. Thirdly, the low inter-transcriber agreement scores for ToBI suggest that the system is not as easy to apply as it may at first appear. In the present paper, we describe an alternative: the IViE system (Intonational Variation in English). We describe the structure of IViE and discuss its application with examples.

SL980099.PDF (From Author) SL980099.PDF (Rasterized)

TOP


Automatic Segmental and Prosodic Labeling of Mandarin Speech Database

Authors:

Fu-Chiang Chou, Dept. of Electrical Engineering, National Taiwan University (Taiwan)
Chiu-Yu Tseng, Institute of Linguistics, Preparatory Office, Academia Sinica (Taiwan)
Lin-Shan Lee, Dept. of Electrical Engineering, National Taiwan University (Taiwan)

Page (NA) Paper number 266

Abstract:

In this paper we describe the techniques and methodology developed for automatic labeling of segmental and prosodic information for the Mandarin speech database. There are two major procedures. First, the text is converted into the phonetic network of possible pronunciations, and this network is aligned with the speech data by recognition processes. Secondly, many acoustic prosodic features are derived and the break indices are labeled with these features by decision trees. For the segmental labeling, 96.5% of automatically determined segment boundaries are accurate within a range of 20 ms. For the prosodic labeling, 84.9% of the automatic labeled break indices are the same with the manual labeled one.

SL980266.PDF (From Author) SL980266.PDF (Rasterized)

TOP


Automatic Labelling of German Prosody

Authors:

Stefan Rapp, Sony International (Europe) GmbH (Germany)

Page (NA) Paper number 907

Abstract:

We present research on an automatic labelling system that is able to produce a phonological tonal labelling according to the ToBI like intonation model for German developed by Fery. The current system was trained on about 1 hour of expert prosodically labelled speech from a single male radio news announcer. We present experiments for finding a suitable feature set drawn from features that describe the prosodic correlates fundamental frequency, duration and intensity as well as some lexical and syntactic features. With the best feature set, we achieve a recognition rate of 78.7% for speaker dependent recognition of ToBI labels (simultaneously predicting prominence and phrasing) and 86.9% for the simpler accented/not accented decision. Although the system's accuracy is well below that of human transcribers, it is a useful tool actively used in our laboratory due to it's ability to process large amounts of speech data at low costs.

SL980907.PDF (From Author) SL980907.PDF (Rasterized)

TOP