Large Vocabulary Continuous Speech Recognition 6

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Sharable Software Repository for Japanese Large Vocabulary Continuous Speech Recognition

Authors:

Tatsuya Kawahara, Kyoto Univ. (Japan)
Tetsunori Kobayashi, Waseda Univ. (Japan)
Kazuya Takeda, Nagoya Univ. (Japan)
Nobuaki Minematsu, Toyohashi Univ. of Tech. (Japan)
Katsunobu Itou, ETL (Japan)
Mikio Yamamoto, Tsukuba Univ. (Japan)
Atsushi Yamada, ASTEM (Japan)
Takehito Utsuro, Nara Institute of Science and Technology (Japan)
Kiyohiro Shikano, Nara Institute of Science and Technology (Japan)

Page (NA) Paper number 763

Abstract:

The project of Japanese LVCSR (Large Vocabulary Continuous Speech Recognition) platform is introduced. It is a collaboration of researchers of different academic institutes and intended to develop a sharable software repository of not only databases but also models and programs. The platform consists of a standard recognition engine, Japanese phone models and Japanese statistical language models. A set of Japanese phone HMMs are trained with ASJ (Acoustic Society of Japan) databases of 20K sentence utterances per each gender. Japanese word N-gram (2-gram and 3-gram) models are constructed with a corpus of Mainichi newspaper of four years. The recognition engine JULIUS is developed for assessment of both acoustic and language models. The modules are integrated as a Japanese LVCSR system and evaluated on 5000-word dictation task. The software repository is available to the public.

SL980763.PDF (From Author) SL980763.PDF (Rasterized)

TOP


The Design of the Newspaper-Based Japanese Large Vocabulary Continuous Speech Recognition Corpus

Authors:

Katunobu Itou, ETL (Japan)
Mikio Yamamoto, Univ. of Tsukuba (Japan)
Kazuya Takeda, Nagoya Univ. (Japan)
Toshiyuki Takezawa, ATR (Japan)
Tatsuo Matsuoka, NTT (Japan)
Tetsunori Kobayashi, Waseda Univ. (Japan)
Kiyohiro Shikano, NAIST (Japan)
Shuichi Itahashi, University of Tsukuba (Japan)

Page (NA) Paper number 722

Abstract:

In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 hrs.) and their orthographic transcriptions for 306 speakers (153 males and 153 females) reading excerpts from the newspaper's articles and phonetically balanced (PB) sentences. This corpus contains utterances of about 45,000 sentences as a whole with each speaker reading about 150 sentences. JNAS is being distributed on 16 CD-ROMs.

SL980722.PDF (From Author) SL980722.PDF (Rasterized)

TOP


Indexing and Classification of TV News Articles Based on Speech Dictation Using Word Bigram

Authors:

Jun Ogata, Ryukoku University (Japan)
Yasuo Ariki, Ryukoku University (Japan)

Page (NA) Paper number 126

Abstract:

In order to construct a news database with a function of video on demand (VOD), it is required to classify news articles into topics. In this paper, we propose a method to automatically index and classify TV news articles into 10 topics based on a speech dictation techniques using speaker independent triphone HMMs and word bigram.

SL980126.PDF (From Author) SL980126.PDF (Rasterized)

TOP


Parametric Trajectory Mixtures for LVCSR

Authors:

Man-Hung Siu, GTE/BBN Technologies (USA)
Rukmini Iyer, GTE/BBN Technologies (USA)
Herbert Gish, GTE/BBN Technologies (USA)
Carl Quillen, GTE/BBN Technologies (USA)

Page (NA) Paper number 890

Abstract:

Parametric trajectory models explicitly represent the temporal evolution of the speech features as a Gaussian process with time-varying parameters. HMMs are a special case of such models, one in which the trajectory constraints in the speech segment are ignored by the assumption of conditional independence across frames within the segment. In this paper, we investigate in detail some extensions to our trajectory modeling approach aimed at improving LVCSR performance: (i) improved modeling of mixtures of trajectories via better initialization, (ii) modeling of context dependence, and (iii) improved segment boundaries by means of search. We will present results in terms of both phone classification and recognition accuracy on the Switchboard corpus.

SL980890.PDF (From Author) SL980890.PDF (Rasterized)

TOP