Segmentation, Labelling and Speech Corpora 4

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

A Multilingual Prosodic Database

Authors:

Estelle Campione, Université de Provence (France)
Jean Véronis, Université de Provence (France)

Page (NA) Paper number 844

Abstract:

We present a prosodic corpus in five languages (French, English, Italian, German and Spanish) comprising 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based are extracted from the EUROM 1 speech database and consists of passages of about five sentences. The corpus was stylized automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour undistinguishable from the original when re-synthesized, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at word-level. The entire corpus was verified and manually corrected by experts for each language. It will be made available at production cost for research through the European Language Resource Association (ELRA).

SL980844.PDF (From Author) SL980844.PDF (Rasterized)

TOP


The CSLU Speaker Recognition Corpus

Authors:

Ronald A. Cole, CSLU (USA)
Mike Noel, CSLU (USA)
Victoria Noel, CSLU (USA)

Page (NA) Paper number 856

Abstract:

This paper describes the CSLU Speaker Recognition Corpus data collection. The corpus was motivated by a need for speech data from many speakers, under different environmental conditions, with each speaker providing data over a significant period of time. The corpus was designed to provide sufficient data to study phonetic variability within and across sessions, and to design and evaluate systems for both vocabulary independent and vocabulary specific recognition and verification tasks. The protocol includes fixed vocabulary phrases, digit strings, personal utterances (e.g., eye color), and fluent speech. The resulting Speaker Recognition Corpus is a collection of telephone speech recordings from over 500 participants collected over a two-year period. We describe the data collection procedure, the protocol, the transcription methods and the current status of the Speaker Recognition Corpus.

SL980856.PDF (From Author) SL980856.PDF (Rasterized)

TOP


How Effective Is Unsupervised Data Collection For Children's Speech Recognition?

Authors:

Gregory Aist, Carnegie Mellon University (USA)
Peggy Chan, Carnegie Mellon University (USA)
Xuedong Huang, Microsoft Research (USA)
Li Jiang, Microsoft Research (USA)
Rebecca Kennedy, Carnegie Mellon University (USA)
DeWitt Latimer, Carnegie Mellon University (USA)
Jack Mostow, Carnegie Mellon University (USA)
Calvin Yeung, Carnegie Mellon University (USA)

Page (NA) Paper number 929

Abstract:

Children present a unique challenge to automatic speech recognition. Today's state-of-the-art speech recognition systems still have problems handling children's speech because acoustic models are trained on data collected from adult speech. In this paper we describe an inexpensive way to mend this problem. We collected children's speech when they interact with an automated reading tutor. These data are subsequently transcribed by a speech recognition system and automatically filtered. We studied how to use these automatically collected data to improve children's speech recognition system's performance. Experiments indicate that automatically collected data can reduce the error rate significantly on children's speech.

SL980929.PDF (From Author) SL980929.PDF (Rasterized)

TOP


An Algorithm for Automatic Generation of Mandarin Phonetic Balanced Corpus

Authors:

Jyh-Shing Shyuu, Department of Computer Science and Information Engineering (Taiwan)
Wang Jhing-Fa, Department of Computer Science and Information Engineering (Taiwan)

Page (NA) Paper number 960

Abstract:

This paper proposed an algorithm for automatic generation of Mandarin phonetic balanced corpus. The design of phonetic balanced corpus is particularly important for the collection of continuous speech database to reduce the co-articulate effects in continuous speech recognition(CSR). Traditionally, balanced corpus is generated manually or semi- automatically. Our proposed algorithm tries to find a minimum number of sentences from a large text corpus set and ensures that 408 Mandarin base syllables and 38*22 co-articulations between vowels and consonants are distributed in the extracted sentences.

SL980960.PDF (From Author) SL980960.PDF (Rasterized)

TOP


Towards a Formal Framework for Linguistic Annotations

Authors:

Steven Bird, LDC, University of Pennsylvania (USA)
Mark Liberman, LDC, University of Pennsylvania (USA)

Page (NA) Paper number 774

Abstract:

`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.

SL980774.PDF (From Author) SL980774.PDF (Rasterized)

TOP


Forming Generic Models Of Speech For Uniform Database Access

Authors:

Toomas Altosaar, Helsinki University of Technology (Finland)
Martti Vainio, University of Helsinki (Finland)

Page (NA) Paper number 887

Abstract:

This paper presents a formalism that models speech from different databases generically. For each utterance in a speech database a communication framework is first constructed which is composed of a set of communication planes, such as acoustic, orthographic, linguistic, and phonetic. Each plane in turn is made up of a set of levels to represent the plane's structural hierarchy, e.g., for the linguistic plane, levels such as sentence, word, syllable, and phoneme may exist. Information from speech databases is parsed and compiled into such objects and exhibit both individual and class inherited behaviour. Once placed into the framework these objects can have their relationships to other objects explicitly defined by links on the same level, across different levels, and across different planes. Speech from databases covering different languages and annotation styles can therefore be modelled generically allowing for uniform database access. Searches can be performed on the framework and the results used for further analyses.

SL980887.PDF (From Author) SL980887.PDF (Rasterized)

TOP