Authors:
Julia Hirschberg, AT&T Labs / Research (USA)
Christine H. Nakatani, AT&T Labs / Research (USA)
Page (NA) Paper number 976
Abstract:
The segmentation of text and speech into topics and subtopics is an
important step in document interpretation. For text, formatting information,
such as headings and paragraphing, is available to aid in this endeavor,
although this information is by no means sufficient. For speech, the
task is even more difficult. We present results of the application
of machine learning techniques to the automatic identification of intonational
phrases beginning and ending 'topics' determined independently by annotators
for two corpora | the Boston Directions Corpus and the Broadcast News
(HUB-4) DARPA/NIST database.
Authors:
Esther Grabe, University of Cambridge (U.K.)
Francis Nolan, University of Cambridge (U.K.)
Kimberley J. Farrar, University of Cambridge (U.K.)
Page (NA) Paper number 99
Abstract:
In this paper, we offer an alternative to ToBI, the current de facto
standard for machine-readable labelling of English prosody. We have
three reasons for arguing that an alternative is needed. Firstly, the
ToBI tone inventory is not maximally constrained; it appears to be
difficult for transcribers to reach high inter-transcriber agreement
scores for tone labels. Secondly, the growing demand for prosodically
labelled data from non-standard varieties of English suggests a need
for a transparent comparative transcription system. ToBI was not designed
for this purpose. Thirdly, the low inter-transcriber agreement scores
for ToBI suggest that the system is not as easy to apply as it may
at first appear. In the present paper, we describe an alternative:
the IViE system (Intonational Variation in English). We describe the
structure of IViE and discuss its application with examples.
Authors:
Fu-Chiang Chou, Dept. of Electrical Engineering, National Taiwan University (Taiwan)
Chiu-Yu Tseng, Institute of Linguistics, Preparatory Office, Academia Sinica (Taiwan)
Lin-Shan Lee, Dept. of Electrical Engineering, National Taiwan University (Taiwan)
Page (NA) Paper number 266
Abstract:
In this paper we describe the techniques and methodology developed
for automatic labeling of segmental and prosodic information for the
Mandarin speech database. There are two major procedures. First, the
text is converted into the phonetic network of possible pronunciations,
and this network is aligned with the speech data by recognition processes.
Secondly, many acoustic prosodic features are derived and the break
indices are labeled with these features by decision trees. For the
segmental labeling, 96.5% of automatically determined segment boundaries
are accurate within a range of 20 ms. For the prosodic labeling, 84.9%
of the automatic labeled break indices are the same with the manual
labeled one.
Authors:
Stefan Rapp, Sony International (Europe) GmbH (Germany)
Page (NA) Paper number 907
Abstract:
We present research on an automatic labelling system that is able to
produce a phonological tonal labelling according to the ToBI like intonation
model for German developed by Fery. The current system was trained
on about 1 hour of expert prosodically labelled speech from a single
male radio news announcer. We present experiments for finding a suitable
feature set drawn from features that describe the prosodic correlates
fundamental frequency, duration and intensity as well as some lexical
and syntactic features. With the best feature set, we achieve a recognition
rate of 78.7% for speaker dependent recognition of ToBI labels (simultaneously
predicting prominence and phrasing) and 86.9% for the simpler accented/not
accented decision. Although the system's accuracy is well below that
of human transcribers, it is a useful tool actively used in our laboratory
due to it's ability to process large amounts of speech data at low
costs.
|