Large Vocabulary Continuous Speech Recognition 1

Home
Full List of Titles
1: ICSLP'98 Proceedings
Keynote Speeches
Text-To-Speech Synthesis 1
Spoken Language Models and Dialog 1
Prosody and Emotion 1
Hidden Markov Model Techniques 1
Speaker and Language Recognition 1
Multimodal Spoken Language Processing 1
Isolated Word Recognition
Robust Speech Processing in Adverse Environments 1
Spoken Language Models and Dialog 2
Articulatory Modelling 1
Talking to Infants, Pets and Lovers
Robust Speech Processing in Adverse Environments 2
Spoken Language Models and Dialog 3
Speech Coding 1
Articulatory Modelling 2
Prosody and Emotion 2
Neural Networks, Fuzzy and Evolutionary Methods 1
Utterance Verification and Word Spotting 1 / Speaker Adaptation 1
Text-To-Speech Synthesis 2
Spoken Language Models and Dialog 4
Human Speech Perception 1
Robust Speech Processing in Adverse Environments 3
Speech and Hearing Disorders 1
Prosody and Emotion 3
Spoken Language Understanding Systems 1
Signal Processing and Speech Analysis 1
Spoken Language Generation and Translation 1
Spoken Language Models and Dialog 5
Segmentation, Labelling and Speech Corpora 1
Multimodal Spoken Language Processing 2
Prosody and Emotion 4
Neural Networks, Fuzzy and Evolutionary Methods 2
Large Vocabulary Continuous Speech Recognition 1
Speaker and Language Recognition 2
Signal Processing and Speech Analysis 2
Prosody and Emotion 5
Robust Speech Processing in Adverse Environments 4
Segmentation, Labelling and Speech Corpora 2
Speech Technology Applications and Human-Machine Interface 1
Large Vocabulary Continuous Speech Recognition 2
Text-To-Speech Synthesis 3
Language Acquisition 1
Acoustic Phonetics 1
Speaker Adaptation 2
Speech Coding 2
Hidden Markov Model Techniques 2
Multilingual Perception and Recognition 1
Large Vocabulary Continuous Speech Recognition 3
Articulatory Modelling 3
Language Acquisition 2
Speaker and Language Recognition 3
Text-To-Speech Synthesis 4
Spoken Language Understanding Systems 4
Human Speech Perception 2
Large Vocabulary Continuous Speech Recognition 4
Spoken Language Understanding Systems 2
Signal Processing and Speech Analysis 3
Human Speech Perception 3
Speaker Adaptation 3
Spoken Language Understanding Systems 3
Multimodal Spoken Language Processing 3
Acoustic Phonetics 2
Large Vocabulary Continuous Speech Recognition 5
Speech Coding 3
Language Acquisition 3 / Multilingual Perception and Recognition 2
Segmentation, Labelling and Speech Corpora 3
Text-To-Speech Synthesis 5
Spoken Language Generation and Translation 2
Human Speech Perception 4
Robust Speech Processing in Adverse Environments 5
Text-To-Speech Synthesis 6
Speech Technology Applications and Human-Machine Interface 2
Prosody and Emotion 6
Hidden Markov Model Techniques 3
Speech and Hearing Disorders 2 / Speech Processing for the Speech and Hearing Impaired 1
Human Speech Production
Segmentation, Labelling and Speech Corpora 4
Speaker and Language Recognition 4
Speech Technology Applications and Human-Machine Interface 3
Utterance Verification and Word Spotting 2
Large Vocabulary Continuous Speech Recognition 6
Neural Networks, Fuzzy and Evolutionary Methods 3
Speech Processing for the Speech-Impaired and Hearing-Impaired 2
Prosody and Emotion 7
2: SST Student Day
SST Student Day - Poster Session 1
SST Student Day - Poster Session 2

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Multimedia Files

Real-Time Recognition of Broadcast News

Authors:

Gary Cook, Cambridge University (U.K.)
Tony Robinson, Cambridge University (U.K.)
James Christie, Cambridge University (U.K.)

Page (NA) Paper number 65

Abstract:

Although the performance of state-of-the-art automatic speech recognition systems on the challenging task of broadcast news transcription has improved considerably in recent years, many of the systems operate in 130-300 times real-time. Many applications of automatic transcription of broadcast news, eg. closed-caption subtitles for television broadcasts, require real-time operation. This paper describes a connectionist-HMM system for broadcast news transcription, and the modifications to this system necessary for real-time operation. We show that real-time operation is possible with a relative increase in word error rate of about 12%.

SL980065.PDF (From Author) SL980065.PDF (Rasterized)

TOP


Automatic Recognition of Korean Broadcast News Speech

Authors:

Ha-Jin Yu, LG Corporate Institute Of Technology (Korea)
Hoon Kim, LG Corporate Institute Of Technology (Korea)
Jae-Seung Choi, LG Corporate Institute Of Technology (Korea)
Joon-Mo Hong, LG Corporate Institute of Technology (Korea)
Kew-Suh Park, LG Corporate Institute of Technology (Korea)
Jong-Seok Lee, LG Corporate Institute of Technology (Korea)
Hee-Youn Lee, LG Corporate Institute of Technology (Korea)

Page (NA) Paper number 412

Abstract:

This paper describes preliminary results of automatic recognition of Korean broadcast-news speech. We have been working on flexible vocabulary isolated-word speech recognition, and the same HMM models are used for broadcast-news continuous speech recognition. The recognizer is trained by using phonetically balanced isolated words speech, rather than the broadcast news speech itself. In this research, we use several different lexica to investigate the recognition performance according to the length of the words. We also propose a long-distance bigram language model, which can be used at the first stage of the search, so that it can reduce the recognition errors caused by earlier pruning of correct hypothesis.

SL980412.PDF (From Author) SL980412.PDF (Rasterized)

TOP


Telephone-Based Conversational Speech Recognition in the JUPITER Domain

Authors:

James R. Glass, MIT Lab for Computer Science (USA)
Timothy J. Hazen, MIT Lab for Computer Science (USA)

Page (NA) Paper number 593

Abstract:

This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recognizer vocabulary, pronunciations, language and acoustic models for this system, and report on the current performance of the recognizer under several different conditions.

SL980593.PDF (From Author) SL980593.PDF (Rasterized)

TOP


Japanese Large-Vocabulary Continuous Speech Recognition System Based on Microsoft Whisper

Authors:

Hsiao-Wuen Hon, Microsoft Research (USA)
Yun-Cheng Ju, Microsoft Research (USA)
Keiko Otani, Microsoft Research (USA)

Page (NA) Paper number 597

Abstract:

Input of Asian ideographic characters has traditionally been one of the biggest impediments for information processing in Asia. Speech is arguably the most effective and efficient input method for Asian non-spelling characters. This paper presents a Japanese large-vocabulary continuous speech recognition system based on Microsoft Whisper technology. We focus on the aspects of the system that are language specific and demonstrate the adaptability of the Whisper system to new languages. In this paper, we demonstrate that our pronunciation/part-of-speech distinguished morpheme based language models and Whisper based Japanese senonic acoustic models are able to yield state-of-the-art Japanese LVCSR recognition performance. The speaker-independent character and Kana error rates on the JNAS database are 10% and 5% respectively.

SL980597.PDF (From Author) SL980597.PDF (Rasterized)

TOP


Partitioning And Transcription Of Broadcast News Data

Authors:

Jean-Luc Gauvain, LIMSI/CNRS (France)
Lori F. Lamel, LIMSI/CNRS (France)
Gilles Adda, LIMSI/CNRS (France)

Page (NA) Paper number 84

Abstract:

Radio and television broadcasts consist of a continuous stream of data comprised of segments of different linguistic and acoustic natures, which poses challenges for transcription. In this paper we report on our recent work in transcribing broadcast news data, including the problem of partitioning the data into homogeneous segments prior to word recognition. Gaussian mixture models are used to identify speech and non-speech segments. A maximum-likelihood segmentation/clustering process is then applied to the speech segments using GMMs and an agglomerative clustering algorithm. The clustered segments are then labeled according to bandwidth and gender. The recognizer is a continuous mixture density, tied-state cross-word context-dependent HMM system with a 65k trigram language model. Decoding is carried out in three passes, with a final pass incorporating cluster-based test-set MLLR adaptation. The overall word transcription error on the Nov'97 unpartitioned evaluation test data was 18.5%.

SL980084.PDF (From Author) SL980084.PDF (Rasterized)

TOP