ABSTRACT
To improve recognition accuracy for large vocabulary speech recognition systems we use language models based on linguistic classes (extended POS). In this paper an adaptation technique is presented, which profits from linguistic knowledge about unknown words of new domain. Switching from basis domain to new domain we keep the bigram probabilities of linguistic classes fixed and adapt only monograms of word probabilities. In our experiments we use three different corpora: financial columns of a newspaper corpus and two medical corpora (computer tomography and magnetic resonance). Adapted language models show an improvement of test- set perplexity of 48% to 51% compared to the case of putting unknown words into the language model ``unknown'' class.
ABSTRACT
A new method is presented to quickly adapt a given language model to local text characteristics. The basic approach is to choose the adaptive models as close as possible to the background estimates while constraining them to respect the locally estimated unigram probabilities. Several means are investigated to speed up the calculations. We measure both perplexity and word error rate to gauge the quality of our model.
ABSTRACT
Standard statistical language modeling techniques suffer from sparse-data problems when applied to real tasks in speech recognition, where large amounts of domain-dependent text are not available. In this work, we introduce a modified representation of the standard word n-gram model using part-of-speech (POS) labels that compensates for word and POS usage differences across domains. Two different approaches are explored: (i) imposing an explicit transformation of the out-of- domain n-gram distributions before combining with an in-domain model, and (ii) POS smoothing of multi- domain n-gram components. Results are presented on a spontaneous speech recognition task (Switchboard), showing that the POS smoothing framework reduces word error rate and perplexity over a standard word n-gram model on in-domain data, with increased gains using multi-domain models.
ABSTRACT
The amount of text data available from a corpus for training language models is usually limited. Data from larger general or related corpora can be utilized to improve the performance of the language model on the corpus of interest. We explore one method of adapting a prior model from a large corpus to a smaller one of interest. Perplexity results of adapting a prior model constructed using the NAB corpus to the Switchboard and ATIS corpora are presented and compared with those of interpolated models.
ABSTRACT
The first class based adaptation approaches [FGH + 97, Ueb97] take the use of classes in the construction of statistical m-gram models one significant step further than just using them as a smoothing technique: The m-gram of classes is trained on the large background corpus while the word likelihoods given the class are estimated on the small target corpus. To make full use of this technique a specialized clusteralgorithm has been developed [FGH + 97, Ueb97]. In this paper we extend class adaptation to make use of the m-gram distribution of the target domain. As a second independent contribution this paper introduces an efficient morphing algorithm, that tries to achieve adaptation by using a stochastic mapping of words between the vocabularies of the respective domains. As a result we can show, that for small adaptation steps class based adaptation is a very useful technique. For larger adaptation steps the perplexity of the modified model is greatly improved, yet no improvement over the unadapted model was observed when used in linear interpolation. Whether this is due to the fact that we use class based adaptation or that we do just modify the unigram distribution is still unresolved, although the new stochastic mapping technique might help to give an answer to this question in the future.
ABSTRACT
The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes a piece of text, chooses the most similar topic clusters from a set of over 5000 elemental topics, and uses topic specific language models built from the topic clusters to rescore N-best lists. We are able to achieve a 15% reduction in perplexity and a small improvement in WER by using this adaptation. We also investigate the use of a topic tree, where the amount of training data for a specific topic can be judiciously increased in cases where the elemental topic cluster has too few word tokens to build a reliably smoothed and representative language model. Our system is able to fine-tune topic adaptation by interpolating models chosen from thousands of topics, allowing for adaptation to unique, previously unseen combinations of subjects.