Session M4C Language Identification

Chairperson Mark Zissman MIT Lincoln Laboratory, USA

Home

PREDICTING, DIAGNOSING AND IMPROVING AUTOMATIC LANGUAGE IDENTIFICATION PERFORMANCE

Authors: Marc A. Zissman

Lincoln Laboratory Massachusetts Institute of Technology 244 Wood Street Lexington, MA 02173–9108 USA Voice: +1 617 981-2547 Fax: +1 617 981-0186 E-mail: MAZ@SST.LL.MIT.EDU

Volume 1 pages 51 - 54

ABSTRACT

Language-identification (LID) techniques that use multiple sin-gle- language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now able to produce meaningful confidence scores along with our LID hypotheses. Finally, we have developed some diagnostic measures that can predict performance of our LID algorithms.

A0047.pdf

TOP

LANGUAGE IDENTIFICATION WITH LANGUAGE-INDEPENDENT ACOUSTIC MODELS

Authors: C. Corredor-Ardoy, J.L. Gauvain, M. Adda-Decker, L. Lamel

Spoken Language Processing Group LIMSI-CNRS, BP 133 91403 Orsay cedex, FRANCE fcorredor,gauvain,madda,lamelg@limsi.fr

Volume 1 pages 55 - 58

ABSTRACT

In this paper we explore the use of language- independent acoustic models for language identification (LID). The phone sequence output by a single language-independent phone recognizer is rescored with language-dependent phonotactic models approximated by phone bigrams. The language-independent phoneme inventory was obtained by Agglomerative Hierarchical Clustering, using a measure of similarity between phones. This system is compared with a parallel language-dependent phone architecture, which uses optimally the acoustic log likelihood and the phonotactic score for language identiffication. Experiments were carried out on the 4-language telephone speech corpus IDEAL, containing calls in British English, Spanish, French and German. Results show that the language-independent approach performs as well as the language-dependent one: 9% versus 10% of error rate on 10 second chunks, for the 4-language task.

A0156.pdf

TOP

BAYESIAN METHODS FOR LANGUAGE VERIFICATION

Authors: Eluned S. Parris(1), Harvey Lloyd-Thomas(1), Michael J. Carey(1) and Jerry H. Wright(2).

(1) Ensigma Ltd, Turing House, Station Road, Chepstow, Monmouthshire, NP6 5PB, U.K. (2) Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1TR, U.K.

Volume 1 pages 59 - 62

ABSTRACT

This paper describes a number of techniques for language verification based on acoustic processing and n-gram language modelling. A new technique is described which uses anti-models to model the general class of languages. These models are then used to normalise the acoustic score giving a 34% reduction in the error rate of the system. An approach to automatically generate discriminative subword strings for language verification is presented. The occurrence of recurrent strings are scored using a Poisson-based significance test. It is shown that when significant sub-strings do occur in the test material they are strong indicators of the target language occurring.

A0258.pdf

TOP

USE OF RECURRENT NETWORK FOR UNKNOWN LANGUAGE REJECTION IN LANGUAGE IDENTIFICATION SYSTEM

Authors: HingKeung Kwan * and Keikichi Hirose

Dept. of Information and Communication Engineering University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan E-mail: kan@gavo.t.u-tokyo.ac.jp, hirose@gavo.t.u-tokyo.ac.jp

Volume 1 pages 63 - 66

ABSTRACT

In the past, we attempted to use a multilayer perceptron neural network as a means to prevent those unknown language inputs from being misidentified as one of the target languages in language identification system. However, the use of multilayer perceptron neural network could not utilize the temporal information from the utterances. Results show that with the use of phonemic unigram as input features to a recurrent neural network of Jordan architecture, a 3 target language identification rate of 98.1% can be achieved. By setting the output thresholds to 0.6 to reject 2 more unknown languages, a lower overall rate of 85.9% is obtained.

A0279.pdf

TOP

Language-identification based on Cross-Language Acoustic models and Optimised Information Combination

Authors: Ove Andersen and Paul Dalsgaard

Center for PersonKommunikation (CPK) Aalborg University, Denmark

Volume 1 pages 67 - 70

ABSTRACT

This work is concerned with the subject of language- identification (LID). Two central issues are addressed. The first is to analyse the trade-off between detailed acoustic modelling and robust estimation of acoustic and language models. The second to find the optimal combination of acoustic and language scores for language identification.. Experiments are carried out using the three languages American-English, German and Spanish from the OGI-TS . database. It is shown that on the average the acoustic modelling is able to recognise 46.3% of the phones correctly across the three languages. Insertion and deletion rate is 35.7% and 6.6%, respectively. Language-identification performance is 82.6% with the full set of acoustic models. The performance is increased to 83.7% after having . conducted 80 iterations of a hierarchical clustering in which phones are merged across the languages.

A0383.pdf

TOP

PHONETIC-CONTEXT MAPPING IN LANGUAGE IDENTIFICATION

Authors: Jiri Navratil and Werner Zuhlke

Department of Communication and Measurement Technical University of Ilmenau, P.O.Box 100565, 98684 Ilmenau, Germany e-mail: jiri.navratil@e-technik.tu-ilmenau.de

Volume 1 pages 71 - 74

ABSTRACT

This paper deals with the problem of exploiting information from a wide phonetic context for the purpose of language identiffication. Two approaches to language modeling are presented here: 1) modified bigrams with a con- text-mapping matrix and 2) language models based on binary decision trees. Both models were incorporated in a phonotactic language identiffier with a double-bigram decoding architecture and were shown to consistently improve the performance of standard bigrams. Measured on the NIST'95 evaluation set, the described system outperforms the state-of-the-art phonotactic components and is, at the same time, computationally less expensive.

A0526.pdf