ABSTRACT
This paper presents a review of current research carried on at various laboratories aiming to increase the robustness of speech recognition systems to channel and microphone variations. A comparative analysis of several techniques, used in recent studies on microphone-independence, are discussed and compared: these include Cepstral High- Pass Filtering, Cepstral-Mean Normalization, Ratz algorithm and Bayesian learning. Also, some results obtained at CSELT labs using the methods above mentioned are reported, specifically addressing the issue of robustness of ASR systems to microphone variations.
ABSTRACT
Hands-free speech recognition is a very important issue for a natural human machine interface. The distant talking speech in real environments is distorted by noise and reverberation of the room. This paper introduces characteristics of the room acoustical distortion and their influences on speech recognition accuracy. Then the paper tries to give a prospect of the solution based on previous studies and our research efforts. Especially a microphone array based-method and a model adaptation method are discussed. The microphone array can reduce the influences of the acoustical distortion by beam-forming. On the other hand, the model adaptation method can estimate the acoustical transfer function and adapt the speech models against the distorted observation signals. Furthermore, this paper also addresses hands-free speech recognition by incorporating automatic lip reading.
ABSTRACT
This paper deals with speech enhancement in hands-free telecommunication systems. We summarize and discuss recent results on methods combining the two major problems encountered in such systems - acoustic echo cancellation and noise reduction -. Single microphone and two-microphone approaches are addressed. Finally, we outline the limitations of the different techniques and propose some prospects.
ABSTRACT
The increased popularity of mobile telephony introduces both challenges and opportunitites for automatic speech recognition. ASR offers ways to simplify the use of mobile phones, notably in hands- and eyes-busy situations. However, the acoustic environment can be severely degraded and the wireless network may add additional distortions to the speech signal. This paper gives an overview of the sources of degradation and attempts to robust speech recognition for mobile communications. Emphasis is placed on approaches which are suitable for implementation in mobile terminals. Two example applications are described which illustrate the robustness issues and design considerations typical of low-cost noisy speech recognition: voice-dialling in a GSM phone and hands-free digit recognition in the car.
ABSTRACT
This paper focuses on the evolving demands for speech recognition in the car and its corresponding impact on algorithmic and technological development. Till today the major demand for speech recognition in the car was related to hands free operation of the telephone. This functionality could be provided in a satisfactory way with a word based system, at the same time allowing for more simplistic noise suppression algorithms. Fully new speech recognition systems are required today to be able to cope with the demands for voice control of car navigation systems. These systems require noise robust phoneme based large vocabulary recognition systems and a much more advanced user interface. The very large perplexity of a car navigation task requires inherent embodiment of a spelling recognizer. Hardware and software design for this new application must also be tackled from the point of view that it will be one, though central part of a fully integrated speech control inside the car.