Session MAA Training Techniques. Efficient Decoding in ASR

Chairperson Jerome Belegarda Apple Computer, USA

Home

ACOUSTIC MODELING BASED ON THE MDL PRINCIPLE FOR SPEECH RECOGNITION

Authors: Koichi Shinoda and Takao Watanabe

NEC Corporation 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216, JAPAN fshinoda,watanabeg@hum.cl.nec.co.jp

Volume 1 pages 99 - 102

ABSTRACT

Recently context-dependent phone units, such as triphones, have been used to model subword units in speech recognition based on Hidden Markov Models (HMMs). While most such methods employ clustering of the HMM parameters(e.g., subword clustering, state clustering, etc.), to control HMM size so as to avoid poor recognition accuracy due to an insuffciency of training data, none of them provide any effective criterion for the optimal degree of clustering that should be performed. This paper proposes a method in which state clustering is accomplished by way of phonetic decision trees and in which the MDL criterion is used to optimize the degree of clustering. Large-vocabulary Japanese recognition experiments show that the models obtained by this method achieved the highest accuracy among the models of various sizes obtained with conventional clustering approaches.

Session MAA Training Techniques. Efficient Decoding in ASR

Chairperson Jerome Belegarda Apple Computer, USA

Authors: Koichi Shinoda and Takao Watanabe

NEC Corporation 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216, JAPAN fshinoda,watanabeg@hum.cl.nec.co.jp

Volume 1 pages 99 - 102

Authors: Piyush Modi and Mazin Rahim

AT&T Labs 180 Park Avenue, Florham Park, New Jersey 07932-0971, USA Email: piyush@research.att.com, mazin@research.att.com

Volume 1 pages 103 - 106

Authors: Enrico Bocchieri and Brian Mak*

AT&T Labs-Research, 180 Park Ave, Florham Park, NJ 07932. (*) Oregon Graduate Institute, 20000 NW Walker Rd, Portland OR, 97006. enrico@research.att.com and mak@research.att.com

Volume 1 pages 107 - 110

Authors: H.J. Nock M.J.F. Gales S.J. Young

Cambridge University Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK. Tel: [+44] 1223 332800 Fax: [+44] 1223 332662 email : hjn11,mjfg,sjy@eng.cam.ac.uk

Volume 1 pages 111 - 114

Authors: Alfred Kaltenmeier, Jurgen Franke

Daimler Benz AG, Research Institute, Wilhelm Runge Str. 11, D-89081 Ulm Germany e-mail: kaltenmeier@dbag.ulm.DaimlerBenz.COM

Volume 1 pages 115 - 118

Authors: Alexandre Girardi, Harald Singer, Kiyohiro Shikano, Satoshi Nakamura

Nara Institute of Science and Technology Takayama-cho 8916-5, Ikoma-shi, Nara-ken 630-01 Japan E-mail: alex-g@is.aist-nara.ac.jp

Volume 1 pages 119 - 122

Authors: Erik McDermott Shigeru Katagiri

ATR Human Information Processing Res Labs 2-2 Hikari-dai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan

Volume 1 pages 123 - 126

Authors: Ze'ev Rivlin, Ananth Sankar, and Harry Bratt

Speech Technology And Research Laboratory SRI International Menlo Park, California 94025 U.S.A. {zev,sankar,harry}@speech.sri.com

Volume 1 pages 127 - 130

Authors: Mehryar Mohri Michael Riley

AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932-0971, USA

Volume 1 pages 131 - 134

Authors: Steven Phillips and Anne Rogers

AT&T Labs-Research, 180 Park Ave, PO Box 971, Florham Park, NJ 07932-0971 email: {phillips,amr}@research.att.com

Volume 1 pages 135 - 138

Authors: Stefan Ortmanns, Thorsten Firzlaff and Hermann Ney

Lehrstuhl fur Informatik VI, RWTH Aachen - University of Technology, D-52056 Aachen, Germany

Volume 1 pages 139 - 142

Authors: Kris Demuynck, Jacques Duchateau and Dirk Van Compernolle

K. U. Leuven - ESAT., Kardinaal Mercierlaan 94, B-3001 Heverlee, Belgium E-mail: Kris.Demuynck@esat.kuleuven.ac.be

Volume 1 pages 143 - 146

Authors: M. Padmanabhan, L. R. Bahl, D. Nahamoo, P. de Souza

IBM T. J. Watson Research Center P. O. Box 218, Yorktown Heights, NY 10598

Volume 1 pages 147 - 150

Authors: M. Ravishankar, R. Bisiani* and E. Thayer

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA-15213, USA. *Dept. of Computer Science, University of Milan, Italy Tel. +1 412 268 3344, FAX: +1 412 268 5576, E-mail: rkm@cs.cmu.edu

Volume 1 pages 151 - 154

Authors: Simon Hovell

Speech Technology Unit, BT Laboratories, Martlesham Heath, Suffolk, England. simon.hovell@bt-sys.bt.co.uk

Volume 1 pages 155 - 158

Authors: Miroslav Novak

IBM Watson Research Center - Human Language Technologies Group P.O. Box 218, Yorktown Heights, NY 10598, USA email: novak@watson.ibm.com

Volume 1 pages 159 - 162

Authors: Andreas Stolcke Yochai Konig Mitchel Weintraub

Speech Technology and Research Laboratory SRI International, Menlo Park, CA, U.S.A. http://www.speech.sri.com/ {stolcke,konig,mw)@speech.sri.com

Volume 1 pages 163 - 166

Authors: Long Nguyen, Richard Schwartz

BBN Systems & Technologies 70 Fawcett Street Cambridge, MA. 02138, USA. ln@bbn.com

Volume 1 pages 167 - 170

Authors: T.Iwasaki and Y.Abe

Human Media Technology Dept. Information Technology R&D Center MITSUBISHI Electric Corp. 5-1-1, Ofuna, Kamakura, Kanagawa, 247, Japan

Volume 1 pages 171 - 174