ABSTRACT
Language-identification (LID) techniques that use multiple sin-gle- language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now able to produce meaningful confidence scores along with our LID hypotheses. Finally, we have developed some diagnostic measures that can predict performance of our LID algorithms.
ABSTRACT
In this paper we explore the use of language- independent acoustic models for language identification (LID). The phone sequence output by a single language-independent phone recognizer is rescored with language-dependent phonotactic models approximated by phone bigrams. The language-independent phoneme inventory was obtained by Agglomerative Hierarchical Clustering, using a measure of similarity between phones. This system is compared with a parallel language-dependent phone architecture, which uses optimally the acoustic log likelihood and the phonotactic score for language identiffication. Experiments were carried out on the 4-language telephone speech corpus IDEAL, containing calls in British English, Spanish, French and German. Results show that the language-independent approach performs as well as the language-dependent one: 9% versus 10% of error rate on 10 second chunks, for the 4-language task.
ABSTRACT
This paper describes a number of techniques for language verification based on acoustic processing and n-gram language modelling. A new technique is described which uses anti-models to model the general class of languages. These models are then used to normalise the acoustic score giving a 34% reduction in the error rate of the system. An approach to automatically generate discriminative subword strings for language verification is presented. The occurrence of recurrent strings are scored using a Poisson-based significance test. It is shown that when significant sub-strings do occur in the test material they are strong indicators of the target language occurring.
ABSTRACT
In the past, we attempted to use a multilayer perceptron neural network as a means to prevent those unknown language inputs from being misidentified as one of the target languages in language identification system. However, the use of multilayer perceptron neural network could not utilize the temporal information from the utterances. Results show that with the use of phonemic unigram as input features to a recurrent neural network of Jordan architecture, a 3 target language identification rate of 98.1% can be achieved. By setting the output thresholds to 0.6 to reject 2 more unknown languages, a lower overall rate of 85.9% is obtained.
ABSTRACT
This work is concerned with the subject of language- identification (LID). Two central issues are addressed. The first is to analyse the trade-off between detailed acoustic modelling and robust estimation of acoustic and language models. The second to find the optimal combination of acoustic and language scores for language identification.. Experiments are carried out using the three languages American-English, German and Spanish from the OGI-TS . database. It is shown that on the average the acoustic modelling is able to recognise 46.3% of the phones correctly across the three languages. Insertion and deletion rate is 35.7% and 6.6%, respectively. Language-identification performance is 82.6% with the full set of acoustic models. The performance is increased to 83.7% after having . conducted 80 iterations of a hierarchical clustering in which phones are merged across the languages.
ABSTRACT
This paper deals with the problem of exploiting information from a wide phonetic context for the purpose of language identiffication. Two approaches to language modeling are presented here: 1) modified bigrams with a con- text-mapping matrix and 2) language models based on binary decision trees. Both models were incorporated in a phonotactic language identiffier with a double-bigram decoding architecture and were shown to consistently improve the performance of standard bigrams. Measured on the NIST'95 evaluation set, the described system outperforms the state-of-the-art phonotactic components and is, at the same time, computationally less expensive.