Authors:
Estelle Campione, Université de Provence (France)
Jean Véronis, Université de Provence (France)
Page (NA) Paper number 844
Abstract:
We present a prosodic corpus in five languages (French, English, Italian,
German and Spanish) comprising 4 hours and 20 minutes of speech and
involving 50 different speakers (5 male and 5 female per language).
The recordings on which the corpus is based are extracted from the
EUROM 1 speech database and consists of passages of about five sentences.
The corpus was stylized automatically by an algorithm which factors
out microprosodic effects and represents the intonation contour of
utterances by a series of target points. Once interpolated by a smooth
curve (spline), these points produce a contour undistinguishable from
the original when re-synthesized, apart from a few detection errors.
A symbolic coding of the 50000 pitch movements of the corpus is also
provided, along with the time-alignment of orthographic transcription
to signal at word-level. The entire corpus was verified and manually
corrected by experts for each language. It will be made available at
production cost for research through the European Language Resource
Association (ELRA).
Authors:
Ronald A. Cole, CSLU (USA)
Mike Noel, CSLU (USA)
Victoria Noel, CSLU (USA)
Page (NA) Paper number 856
Abstract:
This paper describes the CSLU Speaker Recognition Corpus data collection.
The corpus was motivated by a need for speech data from many speakers,
under different environmental conditions, with each speaker providing
data over a significant period of time. The corpus was designed to
provide sufficient data to study phonetic variability within and across
sessions, and to design and evaluate systems for both vocabulary independent
and vocabulary specific recognition and verification tasks. The protocol
includes fixed vocabulary phrases, digit strings, personal utterances
(e.g., eye color), and fluent speech. The resulting Speaker Recognition
Corpus is a collection of telephone speech recordings from over 500
participants collected over a two-year period. We describe the data
collection procedure, the protocol, the transcription methods and the
current status of the Speaker Recognition Corpus.
Authors:
Gregory Aist, Carnegie Mellon University (USA)
Peggy Chan, Carnegie Mellon University (USA)
Xuedong Huang, Microsoft Research (USA)
Li Jiang, Microsoft Research (USA)
Rebecca Kennedy, Carnegie Mellon University (USA)
DeWitt Latimer, Carnegie Mellon University (USA)
Jack Mostow, Carnegie Mellon University (USA)
Calvin Yeung, Carnegie Mellon University (USA)
Page (NA) Paper number 929
Abstract:
Children present a unique challenge to automatic speech recognition.
Today's state-of-the-art speech recognition systems still have problems
handling children's speech because acoustic models are trained on data
collected from adult speech. In this paper we describe an inexpensive
way to mend this problem. We collected children's speech when they
interact with an automated reading tutor. These data are subsequently
transcribed by a speech recognition system and automatically filtered.
We studied how to use these automatically collected data to improve
children's speech recognition system's performance. Experiments indicate
that automatically collected data can reduce the error rate significantly
on children's speech.
Authors:
Jyh-Shing Shyuu, Department of Computer Science and Information Engineering (Taiwan)
Wang Jhing-Fa, Department of Computer Science and Information Engineering (Taiwan)
Page (NA) Paper number 960
Abstract:
This paper proposed an algorithm for automatic generation of Mandarin
phonetic balanced corpus. The design of phonetic balanced corpus is
particularly important for the collection of continuous speech database
to reduce the co-articulate effects in continuous speech recognition(CSR).
Traditionally, balanced corpus is generated manually or semi- automatically.
Our proposed algorithm tries to find a minimum number of sentences
from a large text corpus set and ensures that 408 Mandarin base syllables
and 38*22 co-articulations between vowels and consonants are distributed
in the extracted sentences.
Authors:
Steven Bird, LDC, University of Pennsylvania (USA)
Mark Liberman, LDC, University of Pennsylvania (USA)
Page (NA) Paper number 774
Abstract:
`Linguistic annotation' is a term covering any transcription, translation
or annotation of textual data or recorded linguistic signals. While
there are several ongoing efforts to provide formats and tools for
such annotations and to publish annotated linguistic databases, the
lack of widely accepted standards is becoming a critical problem.
Proposed standards, to the extent they exist, have focussed on file
formats. This paper focuses instead on the logical structure of linguistic
annotations. We survey a wide variety of annotation formats and demonstrate
a common conceptual core. This provides the foundation for an algebraic
framework which encompasses the representation, archiving and query
of linguistic annotations, while remaining consistent with many alternative
file formats.
Authors:
Toomas Altosaar, Helsinki University of Technology (Finland)
Martti Vainio, University of Helsinki (Finland)
Page (NA) Paper number 887
Abstract:
This paper presents a formalism that models speech from different databases
generically. For each utterance in a speech database a communication
framework is first constructed which is composed of a set of communication
planes, such as acoustic, orthographic, linguistic, and phonetic. Each
plane in turn is made up of a set of levels to represent the plane's
structural hierarchy, e.g., for the linguistic plane, levels such as
sentence, word, syllable, and phoneme may exist. Information from
speech databases is parsed and compiled into such objects and exhibit
both individual and class inherited behaviour. Once placed into the
framework these objects can have their relationships to other objects
explicitly defined by links on the same level, across different levels,
and across different planes. Speech from databases covering different
languages and annotation styles can therefore be modelled generically
allowing for uniform database access. Searches can be performed on
the framework and the results used for further analyses.
|