Authors:
Ulla Uebler, Bavarian Research Center for Knowledge Based Systems (Germany)
Michael Schüßler, Bavarian Research Center for Knowledge Based Systems (Germany)
Heinrich Niemann, Bavarian Research Center for Knowledge Based Systems (Germany)
Page (NA) Paper number 337
Abstract:
In this paper, we report our investigations on the use of adaptation
and retraining in our bilingual (Italian, German) and multidialectal
recognition system. Our approach for bilingual speech recognition
is to assume the two languages as being one, which is best suited for
a task where Italian and German natives speak both languages, resulting
in a variety of accents and dialects. We performed adaptation on single
speakers and speaker groups built from combinations of spoken and native
language. Furthermore, we performed retraining on partitions of the
adaptation or training data. Our experiments led to an error rate
reduction in all cases: compared to the baseline system, we achieved
an overall improvement of 14, 12--14 and 7 % for speaker adaptation,
speaker group adaptation and retraining, respectively. Furthermore,
we found among others that performance is rather stable for Italian
between adaptation and retraining, while adaptation for German outperforms
retraining by far.
Authors:
Tanja Schultz, Interactive Systems Laboratories (Germany)
Alex Waibel, Interactive Systems Laboratories (USA)
Page (NA) Paper number 577
Abstract:
This paper describes the design of a multilingual speech recognizer
using an LVCSR dictation database which has been collected under the
project GlobalPhone. This project at the University of Karlsruhe investigates
LVCSR systems in 15 languages of the world, namely Arabic, Chinese,
Croatian, English, French, German, Italian, Japanese, Korean, Portuguese,
Russian, Spanish, Swedish, Tamil, and Turkish. Based on a global phoneme
set we built different multilingual speech recognition systems for
five of the 15 languages. Context dependent phoneme models are created
data-driven by introducing questions about languages and language groups
to our polyphone clustering procedure. We apply the resulting multilingual
models to unseen languages and present several recognition results
in language independent and language adaptive setups. The results indicate
that the method of parameter sharing should be decided depending on
whether multilingual or crosslingual speech recognition is projected.
Authors:
Goh Kawai, University of Tokyo (Japan)
Keikichi Hirose, University of Tokyo (Japan)
Page (NA) Paper number 782
Abstract:
The problem addressed is automatically detecting, measuring and correcting
nonnative pronunciation characteristics (so-called "foreign accents")
in foreign language speech. Systemic, structural and realizational
differences between L1 (native language) and L2 (target language) appear
as phone insertions, deletions and substitutions. A bilingual phone
recognizer using native-trained acoustic models of the learner's L1
and L2 was developed to identify insertions, deletions and substitutions
of L2 phones. Recognition results are combined with knowledge of phonetics,
phonology and pedagogy to show learners which phones were mispronounced
and to instruct how to modify their articulatory gestures for more
native-sounding speech. The degree of the learner's foreign accent
is measured based on the number of alternate pronunciations the learner
uses; the number decreases as learning progresses. Evaluation experiments
using Japanese and American English indicate that the system is an
effective component technology for computer-aided pronunciation learning.
|