Segmentation, Labelling and Speech Corpora 3

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

A Recursive Algorithm for the Forced Alignment of Very Long Audio Segments

Authors:

Pedro J. Moreno, Compaq Computer Corporation (USA)
Chris Joerg, Compaq Computer Corporation (USA)
Jean-Manuel Van Thong, Compaq Computer Corporation (USA)
Oren Glickman, Compaq Computer Corporation (USA)

Page (NA) Paper number 68

Abstract:

In this paper we address the problem of aligning very long (often more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recursive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem into a recursive speech recognition problem with a gradually restricting dictionary and language model. The algorithm is tolerant to acoustic noise and errors or gaps in the text transcript or audio tracks. We report experimental results on a 3 hour audio file containing TV and radio broadcasts. We will show accurate alignments on speech under a variety of real acoustic conditions such as speech over music and speech over telephone lines. We also report results when the same audio stream has been corrupted with white additive noise or compressed using a popular web encoding format such as RealAudio. This algorithm has been used in our internal multimedia indexing project. It has processed more than 200 hours of audio from varied sources, such as WGBH NOVA documentaries and NPR web audio files. The system aligns speech media content in about one to five times realtime, depending on the acoustic conditions of the audio signal.

SL980068.PDF (From Author) SL980068.PDF (Rasterized)

TOP


The Selection of Pronunciation Variants: Comparing the Performance of Man and Machine

Authors:

Judith M. Kessens, A2RT, University of Nijmegen (The Netherlands)
Mirjam Wester, A2RT, University of Nijmegen (The Netherlands)
Catia Cucchiarini, A2RT, University of Nijmegen (The Netherlands)
Helmer Strik, A2RT, University of Nijmegen (The Netherlands)

Page (NA) Paper number 372

Abstract:

In this paper the performance of an automatic transcription tool is evaluated. The transcription tool is a Continuous Speech Recognizer (CSR) running in forced recognition mode. For evaluation the performance of the CSR was compared to that of nine expert listeners. Both man and the machine carried out exactly the same task: deciding whether a segment was present or not in 467 cases. It turned out that the performance of the CSR is comparable to that of the experts.

SL980372.PDF (From Author) SL980372.PDF (Rasterized)

TOP


Acoustic Confidence Measures for Segmenting Broadcast News

Authors:

Jon Barker, University of Sheffield (U.K.)
Gethin Williams, University of Sheffield (U.K.)
Steve Renals, University of Sheffield (U.K.)

Page (NA) Paper number 643

Abstract:

In this paper we define an acoustic confidence measure based on the estimates of local posterior probabilities produced by a HMM/ANN large vocabulary continuous speech recognition system. We use this measure to segment continuous audio into regions where it is and is not appropriate to expend recognition effort. The segmentation is computationally inexpensive and provides reductions in both overall word error rate and decoding time. The technique is evaluated using material from the Broadcast News corpus.

SL980643.PDF (From Author) SL980643.PDF (Rasterized)

TOP


A Duration-Based Confidence Measure for Automatic Segmentation of Noise Corrupted Speech

Authors:

Bryan L. Pellom, Duke University (USA)
John H.L. Hansen, Duke University (USA)

Page (NA) Paper number 853

Abstract:

In this study, a duration-based measure is formulated for assigning confidence scores to phonetic time-alignments produced by an automatic speech segmentation system. For speech corrupted by additive noise or telephone channel environments, the proposed confidence measure is shown to provide a reliable means by which gross segmentation errors can be automatically detected and marked for human hand correction. The measure is evaluated by computing Receiver Operating Characteristic (ROC) curves to illustrate the expected trade-off in probability of detecting gross segmentation errors versus false alarm rates.

SL980853.PDF (From Author)

TOP


Segmentation and Classification of Broadcast News Audio

Authors:

Thomas Hain, Cambridge University (U.K.)
Philip C. Woodland, Cambridge University (U.K.)

Page (NA) Paper number 851

Abstract:

Broadcast news audio data contains a wide variety of different speakers and audio conditions (channel and background noise). This paper describes a segmentation, gender detection and audio classification scheme for such data which aims to provide a speech recogniser with a stream of reasonably-sized segments, each from a single speaker and audio type while discarding non-speech data. Each segment is labelled as either narrow or wide band and from either a female or male speaker. The segmentation system has been evaluated on the DARPA 1997 broadcast news data set and detailed segmentation accuracy results are presented. It is shown that the speech recognition accuracy for these automatically derived segments is very nearly the same as that for manually segmented data.

SL980851.PDF (From Author) SL980851.PDF (Rasterized)

TOP


Speaker Recruitment Methods And Speaker Coverage - Experiences From A Large Multilingual Speech Database Collection

Authors:

Børge Lindberg, CPK, Aalborg University (Denmark)
Robrecht Comeyne, Lernout & Hauspie Speech Products NV (Belgium)
Christoph Draxler, IPSK, University of Munich (Germany)
Francesco Senia, CSELT (Italy)

Page (NA) Paper number 1126

Abstract:

With the globalisation and evolving technology of voice-driven man-machine interfaces there is a growing demand for acquisition of spoken language resources in a number of speaker populations being representative for a number of languages and countries. In this paper experience from work within a large consortium in creating large multilingual speech databases for tele-services are reported. In particular the methods and experiences in recruiting speakers for such recordings are reported across a number of participating partners. The reporting is from the SpeechDat project (http://speechdat.phonetik.uni-muenchen.de).

SL981126.PDF (From Author) SL981126.PDF (Rasterized)

TOP