Authors:
Tatsuya Kawahara, Kyoto Univ. (Japan)
Tetsunori Kobayashi, Waseda Univ. (Japan)
Kazuya Takeda, Nagoya Univ. (Japan)
Nobuaki Minematsu, Toyohashi Univ. of Tech. (Japan)
Katsunobu Itou, ETL (Japan)
Mikio Yamamoto, Tsukuba Univ. (Japan)
Atsushi Yamada, ASTEM (Japan)
Takehito Utsuro, Nara Institute of Science and Technology (Japan)
Kiyohiro Shikano, Nara Institute of Science and Technology (Japan)
Page (NA) Paper number 763
Abstract:
The project of Japanese LVCSR (Large Vocabulary Continuous Speech Recognition)
platform is introduced. It is a collaboration of researchers of different
academic institutes and intended to develop a sharable software repository
of not only databases but also models and programs. The platform consists
of a standard recognition engine, Japanese phone models and Japanese
statistical language models. A set of Japanese phone HMMs are trained
with ASJ (Acoustic Society of Japan) databases of 20K sentence utterances
per each gender. Japanese word N-gram (2-gram and 3-gram) models are
constructed with a corpus of Mainichi newspaper of four years. The
recognition engine JULIUS is developed for assessment of both acoustic
and language models. The modules are integrated as a Japanese LVCSR
system and evaluated on 5000-word dictation task. The software repository
is available to the public.
Authors:
Katunobu Itou, ETL (Japan)
Mikio Yamamoto, Univ. of Tsukuba (Japan)
Kazuya Takeda, Nagoya Univ. (Japan)
Toshiyuki Takezawa, ATR (Japan)
Tatsuo Matsuoka, NTT (Japan)
Tetsunori Kobayashi, Waseda Univ. (Japan)
Kiyohiro Shikano, NAIST (Japan)
Shuichi Itahashi, University of Tsukuba (Japan)
Page (NA) Paper number 722
Abstract:
In this paper we present the first public Japanese speech corpus for
large vocabulary continuous speech recognition (LVCSR) technology,
which we have titled JNAS (Japanese Newspaper Article Sentences).
We designed it to be comparable to the corpora used in the American
and European LVCSR projects. The corpus contains speech recordings
(60 hrs.) and their orthographic transcriptions for 306 speakers (153
males and 153 females) reading excerpts from the newspaper's articles
and phonetically balanced (PB) sentences. This corpus contains utterances
of about 45,000 sentences as a whole with each speaker reading about
150 sentences. JNAS is being distributed on 16 CD-ROMs.
Authors:
Jun Ogata, Ryukoku University (Japan)
Yasuo Ariki, Ryukoku University (Japan)
Page (NA) Paper number 126
Abstract:
In order to construct a news database with a function of video on demand
(VOD), it is required to classify news articles into topics. In this
paper, we propose a method to automatically index and classify TV news
articles into 10 topics based on a speech dictation techniques using
speaker independent triphone HMMs and word bigram.
Authors:
Man-Hung Siu, GTE/BBN Technologies (USA)
Rukmini Iyer, GTE/BBN Technologies (USA)
Herbert Gish, GTE/BBN Technologies (USA)
Carl Quillen, GTE/BBN Technologies (USA)
Page (NA) Paper number 890
Abstract:
Parametric trajectory models explicitly represent the temporal evolution
of the speech features as a Gaussian process with time-varying parameters.
HMMs are a special case of such models, one in which the trajectory
constraints in the speech segment are ignored by the assumption of
conditional independence across frames within the segment. In this
paper, we investigate in detail some extensions to our trajectory modeling
approach aimed at improving LVCSR performance: (i) improved modeling
of mixtures of trajectories via better initialization, (ii) modeling
of context dependence, and (iii) improved segment boundaries by means
of search. We will present results in terms of both phone classification
and recognition accuracy on the Switchboard corpus.
|