ABSTRACT
Over the last few years, some alternatives to N-gram language models, which are based on stochastic regular grammars, have been proposed. These grammars are estimated from data through Grammatical Inference algorithms. In particular, the Morphic Generator Grammatical Inference (MGGI) methodology has been applied to tasks of written natural language queries to databases. As for N-gram models, language models obtained through this methodology require the use of smoothing techniques. This work incorporates a version of the well-known Back-Off smoothing method to the MGGI language models to solve the estimation problem of unseen events in the training corpus, and shows the behaviour of the smoothed MGGI models in two tasks of written sentences. The results illustrate that the smoothed MGGI model works better than the standard smoothed bigram model.
ABSTRACT
As to traditional n-gram model, smaller n value is an inherent defect for estimating language probabilities in speech recognition, simply because that estimation could not be executed over farther word association but by means of short sequential word correlated information. This has an strong effect on the performance of speech recognition. This paper introduces an integrated language modeling with n-gram model and word association model (abbreviated as WA model). This model integrated two kind of joint probabilities, traditional n-gram probability and word association probability, to estimate actual output probability. WA model are based on a combined probability estimation of orderly word association without distant and strict sequential limitation. In addition, two kinds of local linguistic constraints have also been incorporated into n-gram estimation for smoothing date sparse and adjusting special language unit score locally. A substantial improvement for the performance of Chinese phonetic-to-text transcription in speech recognition has been obtained.
ABSTRACT
We introduce a statistical model for dialogues. We describe a dynamic programming algorithm that can be used to bracket a dialogue into segments and label each segment with its speech act. We evaluate the performance of the model. We also use this model for language modelling and get perplexity reduction.
ABSTRACT
The CMU Statistical Language Modeling toolkit was released in 1994 in order to facilitate the construction and testing of bigram and trigram language models. It is currently in use in over 40 academic, government and industrial laboratories in over 12 countries. This paper presents a new version of the toolkit. We outline the conventional language modeling technology, as implemented in the toolkit, and describe the extra efficiency and functionality that the new toolkit provides as compared tï previous software for this task. Finally, we give an example of the use of the toolkit in constructing and testing a simple language model.
ABSTRACT
In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure different lexical coverages and language model perplexities, both of which are closely related to the speech recognition accuracies obtained on read news-paper texts. Different text normalizations of up to 185M words of newspaper texts are presented along with corresponding lexical coverage and perplexity measures. Some normalizations were found to be necessary to achieve good lexical coverage, while others were more or less equivalent in this regard. The choice of normalization to create language models for use in the recognition experiments with read newspaper texts was based on these findings. Our best system configuration obtained a 11.2% word error rate in the AUPELF 'French-speaking' speech recognizer evaluation test held in February 1997.
ABSTRACT
In this paper, a new method to cluster words into classes is proposed in order to define a statistical language model. The purpose of this algorithm is to decrease the computational cost of the clustering task while not degrading speech recognition performance. The algorithm provides a bottom-up hierarchical clustering using the reciprocal neighbours method. This technique consists in merging several pairs of classes within a single iteration. Experiments on a spontaneous speech corpus are presented. Results are given both in terms of perplexity and word recognition error rate. We obtain a large reduction in the number of iterations necessary to build a classification tree and thus a CPU time reduction in building the model as well as a reduction in both perplexity and word error rate.
ABSTRACT
This paper proposes a novel variable-length class- based language model that integrates local and global constraints. In this model, the classes are iteratively recreated by grouping consecutive words and by splitting initial part-of speech (POS) clusters into finer clusters (word-classes). The main characteristic of this modeling is that these operations of grouping and splitting is carried out selectively, taking into account global constraints between noncontiguous words on the basis of a minimum entropy criterion. To capture the global constraints, the model takes into account the sequences of the function words and of the content words, which are expected to respectively represent the syntactic and semantic relationships between words. Experiments showed that the perplexity of the proposed model for the test corpus is lower than that of conventional models and that this model requires a small number of statistical parameters, showing the model's effectiveness.
ABSTRACT
This paper describes the combination of a stochastic language model and a formal grammar modelled such as a unification grammar. The stochastic model is trained over 42 million words extracted from Le monde newspaper. The stochastic model is based on smoothed 3-gram and 3-class. The 3-class model is represented by a Markov chain made up of four states. Several experiments have been done to state which values are the best for specific training and test corpus. Experiments indicate that the unification grammar reduce strongly the number of hypothesis (sentences) produced by the stochastic model.
ABSTRACT
In this paper, we describe three approaches of continuous speech recognition. Two of them (referred to as (W,P) and (W',P) models) take into account pronunciation variants of words. They allow to handle (very common) phonological french phenomena like liaisons or mute-e elision. The (W',P) model introduces the phonotypical level as defined in the MHAT Model [4,5]. Comparing (W,P) and (W',P) models show a significant improvement in recognition accuracy when a contextual language model is introduced at this phonotypical level.
ABSTRACT
In our paper, we address the problem of estimating stochastic language models based on n-gram statistics. We present a novel approach, rational interpolation, for the combination of a competing set of conditional n-gram word probability predictors, which consistently outperforms the traditional linea,r interpolation scheme. The superiority of rational interpolation is substantiated by experimental results from language modeling, speech recognition, dialog act classiflcation, and language identiflcation.
ABSTRACT
This paper describes an N-gram language model adaptation technique. As an N-gram model requires a large size sample corpus for probability estimation, it is difficult to utilize N-gram model for a specific small task. In this paper, N-gram task adaptation is proposed using large corpus of the general task (TI text) and small corpus of the specific task (AD text). A simple weighting is employed to mix TI and AD text. In addition to mix two texts, the effect of vocabulary is also investigated. The experimental results show that adapted N-gram model with proper vocabulary size has significantly lower perplexity than the task independent models.
ABSTRACT
Recent progress in variable n-gram language modeling provides an ecient representation of n-gram models and makes training of higher order n-grams possible. In this paper, we apply the variable n-gram design algorithm to conversational speech, extending the algorithm to learn skips and classes in context to handle conversational speech characteristics such as repetitions and dis uency markers. We show that using the extended variable n-gram, we can build a language model that uses fewer parameters for longer context and improves the test perplexity and recognition accuracy.
ABSTRACT
Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how tointerpolate with a Part-of-Speech (POS) tag-based language model as example of a class-based model, where a word can be member of many different classes. Here the actual class membership of a word in the lattice becomes a hidden event of the A-algorithm used for rescoring. A forward type of algorithm is defined as extension of the lattice rescorer to handle these hidden events in a mathematically sound fashion. Applying the mixture of viterbi and forward kind of rescoring procedure to the German Spontaneous Scheduling Task (GSST) yields some improvement inword accuracy. Above all, the rescoring procedure enables usage of any fuzzy/stochastic class denition for recognition units that might be determined through automatic clustering algorithms in the future.
ABSTRACT
We have proposed a concept-driven semantic interpretation method for a spoken dialogue system that robustly understands various expressions uttered by a naive user. The method is now being improved for practical application. Domain knowledge is important for this improvement. The system must also have portability. This paper discusses the generalization of the semantic interpretation method, and proposes a method that integrates concepts using general linguistic knowledge of conceptual dependency. Speech understanding for various utterances about Kamakura sightseeing with a 1000-word vocabulary was empirically evaluated. The results show that this method can achieve a satisfactory understanding rate.
ABSTRACT
This work proposes the use of hierarchical LMs as an effective method both for efficiently dealing with context- dependent LMs in a dialogue system and for increasing the robustness of LM estimation and adaptation. Starting from basic LMs that express elementary semantic units, concepts, or data-types, sentence level LMs are recursively built. The resulting LMs may be a combination of grammars, word classes, and statistical LMs. Moreover, these LMs can be efficiently compiled into probabilistic recursive transition networks. A speech decoding algorithm directly exploits the recursive representation and produces the most probable parse tree matching the speech signal. The proposed approach has been implemented for a data-entry task which covers structured data, e.g. numbers, dates, and proper names, as well as free texts. In this task, the active LMmust continuously change according to the current status, the active form, and the data entered so far. Finally, while the hierarchical approach results very convenient to cope with this task, it also looks very general and can give advantages in other applications, e.g. dictation.
ABSTRACT
This paper presents a study on the use of wide-coverage semantic knowledge for large vocabulary (theoretically unrestricted) domain-independent speech recognition. A machine readable dictionary was used to provide the semantic information about the words and a semantic model was developed based on the conceptual association between words as computed directly from the textual representations of their meanings. The findings of our research suggest that the model is capable of capturing phenomena of semantic associativity or connectivity between words in texts and considerably reducing the semantic ambiguity in natural language. The model can cover both short and long-distance semantic relationships between words and has shown signs of robustness across various text genres. Experiments with simulated speech recognition hypotheses indicate that the model can efficiently be used to reduce the word error rates when applied to word lattices or N-best sentence hypotheses.
ABSTRACT
This paper proposes a novel spontaneous speech recognition approach to obtain not a whole utterance but reliably recognized partial segments of an utterance to achieve robust speech understanding. Our method obtains reliably recognized partial segments of an utterance by using both grammatical and n-gram based statistical language constraints cooperatively, and uses a robust parsing technique to apply the grammatical constraints. Through an experiment, it has been confirmed that the proposed method can recognize partial segments of an utterance with a higher reliability than conventional continuous speech recognition methods using an n-gram based statistical language model.
ABSTRACT
This paper describes a method for using intonation to reduce word error rate in a speech recognition system designed to recognise spontaneous dialogue speech. We use a form of dialogue analysis based on the theory of conversational games. Different move types under this analysis conform to different language models. Different move types are also characterised by different into-national tunes. Our overall recognition strategy is first to predict from intonation the type of game move that a test utterance represents, and then to use a bigram language model for that type of move during recognition.
ABSTRACT
Language models for speech recognition tend to concentrate solely on recognizing the words that were spoken. In this paper, we redefine the speech recognition problem so that its goal is to find both the best sequence of words and their syntactic role (part-of-speech) in the utterance. This is a necessary first step towards tightening the interaction between speech recognition and natural language understanding.
ABSTRACT
We report results from using language model confidence measures based on the degree of backoff used in a trigram language model. Both utterance-level and word-level confidence metrics proved useful for a dialog manager to identify out-of-domain utterances. The metric assigns successively lower confidence as the language model estimate is backed off to a bigram or unigram. It also bases its estimates on sequences of backoff degree. Experimental results with utterances from the domain of medical records management showed that the distributions of the confidence metric for in-domain and out-of-domain utterances are separated. Use of the corresponding word-level confidence metric shows similar encouraging results.
ABSTRACT
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar. Such a grammar expresses the relations between words by a directed graph. Because the edges of this graph may connect words that are arbitrarily far apart in a sentence, this technique can incorporate the predictive power of words that lie outside of bigram or trigram range. We have built several simple dependency models, as we call them, and tested them in a speech recognition experiment. We report experimental results for these models here, including one that has a small but statistically significant advantage (p<.02) over a bigram language model.
ABSTRACT
Language modeling, especially for spontaneous speech, often suffers from a mismatch of utterance segmentations between training and test conditions. In particular, training often uses linguistically-based segments, whereas testing occurs on acoustically determined segments, resulting in degraded performance. We present an N-best rescoring algorithm that removes the effect of segmentation mismatch. Furthermore, we show that explicit language modeling of hidden linguistic segment boundaries is improved by including turn-boundary events in the model.
ABSTRACT
The use of several n-gram and hybrid language models with and without cache is examined in the context of producing court transcripts. Language models with cache (in which words which have recently been uttered are preferred) have seen considerable use. The suitability of cache models (with fixed size cache) in the production of court transcripts is not clear. A decrease in perplexity and an improvement in the word error rate is observed with some of the models when using a cache, however, performance deteriorates with increasing cache size.
ABSTRACT
We present an approach to statistical part- of-speech tagging that uses two different tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modied and optimized to increase tagging accuracy (with respect to the external tagset). We evaluate this approach inan experiment and show that it performs significantly better than approaches using only one tagset.