Session Th3C Towards Robust ASR for Car and Telephone Applications

Chairperson Jean-Claud Junqua Panasonic Technologies Inc., California, USA

Home

METHODS FOR MICROPHONE EQUALIZATION IN SPEECH RECOGNITION

Authors: L. Fissore, G. Micca and C. Vair

CSELT - Centro Studi e Laboratori Telecomunicazioni Via G. Reiss Romoli 274 - 10148 Torino, Italy E-Mail fissore/micca/vair@cselt.stet.it

Volume 5 pages 2415 - 2418

ABSTRACT

This paper presents a review of current research carried on at various laboratories aiming to increase the robustness of speech recognition systems to channel and microphone variations. A comparative analysis of several techniques, used in recent studies on microphone-independence, are discussed and compared: these include Cepstral High- Pass Filtering, Cepstral-Mean Normalization, Ratz algorithm and Bayesian learning. Also, some results obtained at CSELT labs using the methods above mentioned are reported, specifically addressing the issue of robustness of ASR systems to microphone variations.

A2004.pdf

TOP

ROOM ACOUSTICS AND REVERBERATION: IMPACT ON HANDS-FREE RECOGNITION

Authors: Satoshi NAKAMURA and Kiyohiro SHIKANO

Graduate School of Information Science, Nara Institute of .Science and Technology 8916-5, Takayama-cho, Ikoma-shi, Nara, 630-01, JAPAN nakamura@is.aist-nara.ac.jp

Volume 5 pages 2419 - 2422

ABSTRACT

Hands-free speech recognition is a very important issue for a natural human machine interface. The distant talking speech in real environments is distorted by noise and reverberation of the room. This paper introduces characteristics of the room acoustical distortion and their influences on speech recognition accuracy. Then the paper tries to give a prospect of the solution based on previous studies and our research efforts. Especially a microphone array based-method and a model adaptation method are discussed. The microphone array can reduce the influences of the acoustical distortion by beam-forming. On the other hand, the model adaptation method can estimate the acoustical transfer function and adapt the speech models against the distorted observation signals. Furthermore, this paper also addresses hands-free speech recognition by incorporating automatic lip reading.

A2005.pdf

TOP

ECHO AND NOISE REDUCTION FOR HANDS-FREE TERMINALS - STATE OF THE ART -

Authors: Gerard FAUCON, Regine LE BOUQUIN-JEANNES

Laboratoire du Traitement du Signal et de 1'Image - Universite de Rennes 1 Bat. 22 - Campus de Beaulieu - 35042 RENNES CEDEX - FRANCE

Volume 5 pages 2423 - 2426

ABSTRACT

This paper deals with speech enhancement in hands-free telecommunication systems. We summarize and discuss recent results on methods combining the two major problems encountered in such systems - acoustic echo cancellation and noise reduction -. Single microphone and two-microphone approaches are addressed. Finally, we outline the limitations of the different techniques and propose some prospects.

A2006.pdf

TOP

Robust Speech Recognition for Wireless Networks and Mobile Telephony

Authors: Reinhold Haeb-Umbach

Philips GmbH Forschungslaboratorien P.O. Box 50 01 45 D-52085 Aachen, Germany Email: haeb@pfa.research.philips.com

Volume 5 pages 2427 - 2430

ABSTRACT

The increased popularity of mobile telephony introduces both challenges and opportunitites for automatic speech recognition. ASR offers ways to simplify the use of mobile phones, notably in hands- and eyes-busy situations. However, the acoustic environment can be severely degraded and the wireless network may add additional distortions to the speech signal. This paper gives an overview of the sources of degradation and attempts to robust speech recognition for mobile communications. Emphasis is placed on approaches which are suitable for implementation in mobile terminals. Two example applications are described which illustrate the robustness issues and design considerations typical of low-cost noisy speech recognition: voice-dialling in a GSM phone and hands-free digit recognition in the car.

A2007.pdf

TOP

SPEECH RECOGNITION IN THE CAR From Phone Dialing to Car Navigation

Authors: Dirk Van Compernolle

Lernout & Hauspie Speech Products NV St Krispijnstraat 7, 8900 Ieper, Belgium Tel. +32 2 456 05 00, Fax +32 2 460 01 72, E-mail Dirk.VanCompernolle@lhs.be

Volume 5 pages 2431 - 2434

ABSTRACT

This paper focuses on the evolving demands for speech recognition in the car and its corresponding impact on algorithmic and technological development. Till today the major demand for speech recognition in the car was related to hands free operation of the telephone. This functionality could be provided in a satisfactory way with a word based system, at the same time allowing for more simplistic noise suppression algorithms. Fully new speech recognition systems are required today to be able to cope with the demands for voice control of car navigation systems. These systems require noise robust phoneme based large vocabulary recognition systems and a much more advanced user interface. The very large perplexity of a car navigation task requires inherent embodiment of a spelling recognizer. Hardware and software design for this new application must also be tackled from the point of view that it will be one, though central part of a fully integrated speech control inside the car.

A2009.pdf