Authors:
Steven C. Lee, MIT Lab for Computer Science (USA)
James R. Glass, MIT Lab for Computer Science (USA)
Page (NA) Paper number 594
Abstract:
In this work, we investigate modifications to a probabilistic segmentation
algorithm to achieve a real-time, and pipelined capability for a segment-based
speech recognizer. The existing algorithm used a Viterbi and backwards
A* search to hypothesize phonetic segments. We were able to reduce
the computational requirements of this algorithm by reducing the effective
search space to acoustic landmarks, and were able to achieve pipelined
capability by executing the A* search in blocks defined by reliably
detected phonetic boundaries. The new algorithm produces 30% fewer
segments, and improves TIMIT phonetic recognition performance by 2.4%
over an acoustic segmentation baseline. We were also able to produce
30% fewer segments on a word recognition task in a weather information
domain.
Authors:
Guillaume Gravier, ENST/TSi CNRS-URA 820 (France)
Marc Sigelle, ENST/TSI CBRS-URA 820 (France)
Gérard Chollet, ENST/TSI CNRS-URA 820 (France)
Page (NA) Paper number 560
Abstract:
In this paper, we present a new technique for statistical modeling
of speech segments based on Markov random fields. Classical and multi-stream
HMMs are particular cases of this more general family of models. However,
the Random Field Model (RFM) proposed here can be seen as an extension
of the multi-band HMM in which interactions between the frequency bands
have been added. In a first experiment, samples are drawn from different
models and compared to real observations. This experiment shows that
the RFM is able to produce realistic samples but a single HMM still
performs better. Isolated word recognition experiments stress the fact
that more work must be done on the RFM in order to reach the performances
of classical hidden Markov modeling techniques. For the moment, the
RFM parameters are estimated using a heuristic. We believe that a real
maximum likelihood parameter estimation algorithm should improve the
results. The main advantage of this new model is that it can easily
be extended since a model is defined by some local interactions and
the Gibbs potential functions associated to those interactions.
Authors:
Rukmini Iyer, GTE/BBN Technologies (USA)
Herbert Gish, GTE/BBN Technologies (USA)
Man-Hung Siu, GTE/BBN Technologies (USA)
George Zavaliagkos, GTE/BBN Technologies (USA)
Spyros Matsoukas, GTE/BBN Technologies (USA)
Page (NA) Paper number 891
Abstract:
Current state-of-the-art statistical speech recognition systems use
hidden Markov models (HMM) for modeling the speech signal. However,
it is well known that HMM's do not exploit the time-dependence in the
speech process, since they are limited by the assumption of conditional
independence of observations given the state sequence. Alternative
techniques, such as segment modeling approaches, can effectively exploit
time-dependencies in the acoustic signal by discarding the observation
independence assumption. However, losing the basic HMM structure is
often a high computational price to pay for improved acoustic models.
In this paper, we introduce the parallel path HMM that exploits the
time-dependence in speech via parametric trajectory models while maintaining
the HMM framework. We present preliminary results on Switchboard, a
large vocabulary conversational speech recognition task, demonstrating
both improved modeling and potential for improved recognition performance.
|