Session WMD Applications of Speech Technology

Chairperson Richard Winski Vocalis, UK

Home

ANALYSIS OF INFANT CRIES FOR THE EARLY DETECTION OF HEARING IMPAIRMENT

Authors: S. Möller*, R. Schönweiler**

*Institute of Communication Acoustics, Ruhr-Universität Bochum, D-44780 Bochum, Germany Tel. +49 234 700 3979, Fax +49 234 709 4165, E-mail: moeller@ika.ruhr-uni-bochum.de **Department of Phoniatrics and Pedaudiology, Hannover Medical School, D-30623 Hannover, Germany Tel. +49 511 532 9104, Fax +49 511 532 4609

Volume 4 pages 1759 - 1762

ABSTRACT

Auditory and instrumental analysis of cries from normally hearing and profoundly hearing-impaired infants (2-11 month) is presented. Results from listening experiments lead to the assumption that differences exist between cries from the two infant groups. Attributes expressing the difference are related to the emotional state of the infant, to prosodic features, and to voice quality. Signal analysis of the cries confirms these findings showing statistically significant differences for spectral parameters and those describing the melody contour of the cries. The usability of neural networks for an automatic classification and discrimination of cries is discussed. If the tendencies shown here hold true for other data sets, the findings can be used to develop a new screening method detecting hearing impairment and auditory perception disorders at a very early age.
A0069.pdf

TOP

Optical Logo-Therapy (OLT) : A Computer-Based Real Time Visual Feedback Application for Speech Training

Authors: A. Hatzis (1), P.D. Green (1), S.J. Howard (2)

(1) Speech and Hearing Group, Dept. of Computer Science, University of Sheffield, Tel. +44-114-222 (1879) (1836), FAX. +44-114-278 0972, E-mail: a.hatzis@dcs.shef.ac.uk, p.green@dcs.shef.ac.uk (2) Dept. of Human Communication Science, University of Sheffield, Tel. +44-114-222 2448, FAX: +44-114-278 2403, E-mail: s.howard@sheffield.ac.uk

Volume 4 pages 1763 - 1766

ABSTRACT

Traditional speech training methods can prove cumbersome because of the difficulty of providing the subject with good feedback, maintaining her/his motivation over long periods and stabilising the improvement in articulation. In this work we provide visual feedback based on displaying a trajectory in a 2-Dimensional 'phonetic space'. Data are presented from a small-scale efficacy study, which illustrate the use of OLT in speech therapy for misarticulated sibilant fricatives. Results for the contrasting articulations are compared and the potential of OLT as a therapeutic technique is discussed.
A0144.pdf

Recordings

TOP

INTELLIGENT RETRIEVAL OF VERY LARGE CHINESE DICTIONARIES WITH SPEECH QUERIES

Authors: Sung-Chien Lin (1) , Lee-Feng Chien (2) , Ming-Chiuan Chen (2) ,Lin-Shan Lee (1),(2) , Ker-Jiann Chen (2)

(1) Dept. of Computer Science and Information Engineering, National Taiwan University (2) Institute of information Science, Academia Sinica Taipei, Taiwan, Republic of China lsc@speech.ee.ntu.edu.tw

Volume 4 pages 1767 - 1770

ABSTRACT

To retrieve a Chinese word from a Chinese dictionary, it needs the user to know exactly the first character of the desired word. Because there is more than 10,000 Chinese characters, this makes the Chinese dictionary relatively difficult to be used. To reduce the problem, this paper presents intelligent retrieval techniques for very large Chinese dictionaries with speech queries. The proposed techniques properly integrate the technologies of Mandarin speech recognition and Chinese information retrieval with a syllable-based approach utilizing the mono-syllabic structure of the language. Moreover, it is very nice to provide the function of retrieving all relevant word entries from the dictionaries using speech queries describing "general concepts" of the desired words. To achieve the challenging function, the techniques of relevance feedback are also included. Based on these techniques, a retrieval system was implemented successfully on a Pentium PC for a very large Chinese dictionary which includes 160,000 word entries and the total length of the lexical information under the word entries exceeds 20,000,000 words.
A0196.pdf

TOP

PRELIMINARY RESULTS OF A MULTILINGUAL INTERACTIVE VOICE ACTIVATED TELEPHONE SERVICE FOR PEOPLE-ON-THE-MOVE

Authors: Fulvio Leonardi, Giorgio Micca, Sheyla Militello, Mario Nigra

CSELT - Centro Studi e Laboratori Telecomunicazioni Via G.Reiss Romoli, 274 I-10148 Torino (Italia) (leonardi,micca,militello,nigra)@cselt.it

Volume 4 pages 1771 - 1774

ABSTRACT

The EURESCOM P502 project, Multilingual Interactive Voice Activated (MIVA) telephone services, launched in 1995 for a three-years term, aimed at designing and experimenting on an automatic multilingual telephone assistant for people-on-the-move, that provided them with instructions about the use of most important telephone services in the country they are traveling. The core information provided by the system is: emergency services, international and national calls, card calls. Six European Telecom research laboratories were involved in the project: CNET, the project leader; British Telecom, Deutsch Telekom, KPN, Portugal Telecom and CSELT. The final prototype has to include a language selection module and a menu-driven procedure, using a common structure of the information contents in all languages. Several factors are currently being investigated, such as the impact of a talk-through capability, the effect of the cellular network as well as the usage of different national networks on the ASR performance, and the optimization of the dialogue strategy at the system interface level. The prototypes are in the process of being tested within the individual national research units, and cross-country tests will follow. As a further benefit, the potential savings which can be obtained by sharing the costs of development of ASR-based multilingual telephone services, will be estimated successively. A final field trial of the national implementations of the systems has to be carried out starting October '97 for a thorough evaluation of the multilingual services.
A0197.pdf

TOP

Assessment of an Operational Dialogue System used by a Blind Telephone Switchboard Operator

Authors: Jean-Christophe Dubois(1), Yolande Anglade(2), Dominique Fohr(1)

(1) CRIN-CNRS & INRIA Lorraine, BP 239, F54506 Vandoeuvre, France (2) IRISA-LLI, IUT, BP 150, 6 rue E. Branly, F22300, Lannion, France

Volume 4 pages 1775 - 1778

ABSTRACT

This paper presents the assessment of a dialogue system which is used daily by a blind telephone switchboard operator. The purpose of the system is to provide this operator with some information about company members called by external correspondents (phone extensions, department...). In order to realize this system, we have to solved two main problems: on the one hand the recognition of confusable letters (like P and T) and on the second hand, the access to a database with a name -possibly misspelled -. In the paper, we present the assessment which allowed us to measure the system performances but also to validate the two methods designed for this project.
A0223.pdf

TOP

STACC: AN AUTOMATIC SERVICE FOR INFORMATION ACCESS USING CONTINUOUS SPEECH RECOGNITION THROUGH TELEPHONE LINE

Authors: Antonio J. Rubio, Pedro García, Ángel de la Torre, José C. Segura, Jesús Díaz-Verdejo, María C. Benítez, Victoria Sánchez, Antonio M. Peinado, Juan M. López-Soler, José L. Pérez-Córdoba

Dpto. de Electrónica y Tecnología de Computadores Universidad de Granada, 18071 GRANADA (Spain) e-mail rubio@hal.ugr.es telf: 34-58-243193 fax 34-58-243230

Volume 4 pages 1779 - 1782

ABSTRACT

This work presents the STACC, Sistema Telefónico Automático de Consulta de Calificaciones Automatic Telephone System for Consulting Marks). This system has been developed at our laboratory during 1996 and implements a service through telephone line that allows the students to consult by speech their marks after the exams by means of a simple phone call. This experience provided us an interesting point of view about the problems of real applications of speech technology. In this work we describe the system and some statistics about the use of STACC by the students are presented.
A0228.pdf

TOP

A VOICE ACTIVATED DIALOGUE SYSTEM FOR FAST-FOOD RESTAURANT APPLICATIONS

Authors: R. López-Cózar, P. García, J. Díaz and A.J. Rubio

Dept. Electrónica y Tecnología de Computadores Universidad de Granada, 18071 Granada, España (Spain) Tel.: +34-58-243193, FAX: +34-58-243230, e-mail: rubio@hal.ugr.es

Volume 4 pages 1783 - 1786

ABSTRACT

We present a preliminary version of a voice dialogue system suitable to deal with client orders and questions in fast-food restaurants. The system consists of two main sub-systems, namely a dialogue sub-system and a voice interface. The dialogue sub-system is a natural language processing system that may be considered a rule-based expert system, whose behaviour is decided from a recorded dialogue corpus obtained at a real restaurant. In this paper we present a general description of both sub-systems, and focus on knowledge representation, grammar, and module structure of the dialogue sub-system. An introduction to the natural language generation mechanism used is introduced, and future work is mentioned. Finally some conclusions are shown.
A0230.pdf

TOP

MULTI-MICROPHONE SUB-BAND ADAPTIVE SIGNAL PROCESSING FOR IMPROVEMENT OF HEARING AID PERFORMANCE

Authors: P. W. Shields and D. R. Campbell

Department of Electronic Engineering and Physics, University of Paisley, High Street Paisley, Renfrewshire, PA1 2BE, SCOTLAND, U.K. paul@diana22.paisley.ac.uk

Volume 4 pages 1787 - 1790

ABSTRACT

A scheme for binaural pre-processing of speech signals for input to a standard linear hearing aid has been proposed. The system is based on that of Toner & Campbell [1] who applied the Least Mean Squares (LMS) algorithm in sub-bands to speech signals from various acoustic environments and signal to noise ratios (SNR). The processing scheme attempts to take advantage of the multiple inputs to perform noise cancellation. The use of sub-bands enables a diverse processing mechanism to be employed, where the wide-band signal is split into smaller frequency limited sub-bands, which can subsequently be processed according to their signal characteristics. The results of a series of intelligibility tests are presented from experiments in which acoustic speech and noise data, generated in a simulated room was tested on hearing impaired volunteers.
A0257.pdf

TOP

Tactile Transmission of Intonation and Stress

Authors: Hans Georg Piroth and Thomas Arnhold

Institut für Phonetik und Sprachliche Kommunikation der Universität München Schellingstr. 3, D-80799 Munich, Germany Tel. +49 89 2178 2655, Fax +49 89 2178 2652, E-mail: piroth@phonetik.uni-muenchen.de

Volume 4 pages 1791 - 1794

ABSTRACT

The development of technical communication devices for hearing-impaired or deaf persons is one of the main topics in actual research on aids for the disabled and elderly. Besides the well-known advances in the construction of analogue and digital hearing aids for the hard-of-hearing, there is also a long tradition of research on tactile speech aids. From the beginning of this century, many attempts have been made to resolve the problem of speech substitution for deaf people and those suffering from severe hearing loss. A series of experiments was carried out to determine a robust coding method for tactile transmission of F0 and stress to support speech perception of deaf or severely hearing impaired persons unable to extract suprasegmental speech features by the auditory sense.
A0379.pdf

TOP

HEARING IMPAIRMENT SIMULATION: AN INTERACTIVE MULTIMEDIA PROGRAMME ON THE INTERNET FOR STUDENTS OF SPEECH THERAPY

Authors: K. Huttunen (1), P. Körkkö (1) and M. Sorri (2)

(1) Department of Finnish, Saami and Logopedics University of Oulu, P.O. Box 111, FIN-90571 Oulu, Finland Tel. + 358-8-553 3426, FAX +358-8-553 3383, E-mail: khuttune@cc.oulu.fi (2) Department of Otolaryngology University of Oulu, Kajaanintie 50, FIN-90220 Oulu, Finland Tel. +358-8-315 2011, FAX +358-8-315 5317, E-mail: msorri@cc.oulu.fiABSTRACT

Volume 4 pages 1795 - 1798

ABSTRACT

Students of speech therapy and audiology are often faced with the difficulty of obtaining a realistic view of the speech reception abilites of the hearing impaired. Illustrating speech recognition defects is a demanding task for the teaching staff, too. To improve the students' awareness of the effects of hearing defects, an interactive multimedia programme allowing simulation of various types of hearing impairment was constructed and placed on an Internet media server for online use. The simulated hearing-impaired speech material was produced using digital signal processing (e.g. mixing and filtering of speech and noise) and multimedia and audio technologies which enable streaming of sound files on the Internet. In the programme, word recognition scores for several degrees and types of hearing impairment in varying conditions of background noise and reverberation time can also be computed.
A0548.pdf

A0548G01.gif

TOP

ANALYSIS OF DYSARTHRIC SPEECH BY MEANS OF FORMANT-TO-AREA MAPPING

Authors: S. Ciocea (1) , J. Schoentgen (1) , L. Crevier-Buchman (2)

(1) Laboratory of Experimental Phonetics, Institute of Modern Languages and Phonetics, CP110, Université Libre de Bruxelles, Av. F. D. Roosevelt, 50, B-1050 Brussels, Belgium. Tel. +32 2 650 2010, Fax: 32 2 650 2007, E-mail: sciocea@ulb.ac.be (2) Laboratoire Voix, Biomatériaux et Cancérologie ORL, Service d'ORL et de Chirurgie de la Face et du Cou, Hôpital Laënnec, Paris, France

Volume 4 pages 1799 - 1802

ABSTRACT

This article presents a preliminary study of dysarthric speech by means of formant-to-area mapping. Dysarthria is a speech impairment which is the result of paralysis or ataxia of the speech muscles. Formant-to-area mapping is the inference of the shape of a tract model via observed formant frequencies. The corpus is composed of vowel- vowel sequences [iaia] produced by speakers suffering from amyotrophic lateral sclerosis (ALS) and normal speakers. The results show that the shapes and movements of the acoustically mapped area function models are typical of the motions and postures of the vocal tracts of ALS speakers.
A0645.pdf

TOP

AN INTELLIGENT TELEPHONE ANSWERING SYSTEM USING SPEECH RECOGNITION

Authors: Lobanov, B.M., Brickle, S.V., Kubashin, A.V., Levkovskaja, T.V.

Institute of Engineering Cybernetics, Academy of Science of Belarus Surganov St. 6, 220011 Minsk, Belarus Tel. +375 172 685295, Email lobanov@novcom.bas-net.by

Volume 4 pages 1803 - 1806

ABSTRACT

The computer system described in this paper answers incoming telephone calls and employs speaker-independent speech recognition to identify callers. The users of the system can define caller-specific treatment and change this treatment using a graphical user interface. Apart from relating and receiving spoken messages, the system also offers advanced telephony features such as paging and call forwarding, providing the required subscription services are available from the telephone company. Standard interfaces to the telephony and audio hardware are used, so that the system runs on a desktop PC equipped with a voice-enabled modem.
A0676.pdf

TOP

SPEEDATA: A PROTOTYPE FOR MULTILINGUAL SPOKEN DATA-ENTRY

Authors: U. Ackermann (2) , B. Angelini (1) , F. Brugnara (1) , M. Federico (1) , D. Giuliani (1) , R. Gretter (1) , H. Niemann (2)

IRST - Istituto per la Ricerca Scientifica e Technologica I-38050 Povo, Trento, Italy. (2) FORWISS - Bayerisches Forschungszentrum fur Wissensbasierte Systeme D- 91058 Erlangen, Germany.

Volume 4 pages 1807 - 1810

ABSTRACT

In this work we describe the development and evaluation of SpeeData, a prototype for multilingual spoken data- entry. The SpeeData project aims at developing a demon- strator that provides a user-friendly interface for spoken data-entry in two languages: Italian and German. A real world application domain is considered, which is the Land Register of an Italian region in which both languages are officially spoken. Original topics of this paper are the in- teraction modality for spoken data-entry, the evaluation of a data-entry system, bilingual speech recognition, bilin- gual speaker adaptation.
A0713.pdf

TOP

APPLICATIONS FOR THE HEARING-IMPAIRED: EVALUATION OF FINNISH PHONEME RECOGNITION METHODS

Authors: Matti Karjalainen (1), Peter Boda (2), Panu Somervuo (3), and Toomas Altosaar (1)

(1) Acoustics Laboratory (2) Speech and Audio Systems Laboratory (3) Neural Networks Research Centre Helsinki University of Technology Nokia Research Center Helsinki University of Technology P.O.Box 3000, FIN-02015 HUT, Finland P.O. Box 100, FIN-33721 Tampere, Finland P.O.Box 2200, FIN-02015 HUT, Finland E-mail: matti.karjalainen@hut.fi E-mail: peter.boda@research.nokia.com E-mail: panu.somervuo@hut.fi

Volume 4 pages 1811 - 1814

ABSTRACT

It has been hypothesized that the Finnish language is well suited to speech-to-text conversion for the communication aids of the hearing impaired. In a related study it was shown that, depend-ing on context, 10 to 20 % of phoneme errors can be tolerated with good comprehension when reading text converted from raw phonemic recognition. Two sets of phoneme recognition exper-iments were carried out in this study in order to evaluate the performance of existing speech recognition systems in this ap-plication. For telephone bandwidth speech both systems showed speaker dependent error scores of about 10 % or below, thus supporting the feasibility of the application. For speaker inde-pendent cases the error rate was typically more than 20 % which is too high for effortless and fluent communication.
A0724.pdf

TOP

APPLICATIONS FOR THE HEARING-IMPAIRED: COMPREHENSION OF FINNISH TEXT WITH PHONEME ERRORS

Authors: Nina Alarotu (1), Mietta Lennes (1), Toomas Altosaar (2), Anja Malm (3), and Matti Karjalainen (2)

(1)Department of Phonetics University of Helsinki PO Box 3, FIN-00014 University of Helsinki, Finland E-mail: nina.alarotu@helsinki.fi (2) Acoustics Laboratory Helsinki University of Technology PO Box 2200, FIN-02015 HUT Finland E-mail: toomas.altosaar@hut.fi (3)Finnish Association of the Deaf PO Box 57, FIN-00401 Helsinki Finland Tel. +358-9-580 3460

Volume 4 pages 1815 - 1818

ABSTRACT

This study simulates the phoneme errors made by speech recognizers and determines the phoneme error level at which a reasonable comprehension of text can still be achieved. Finnish is written almost phonemically and Finnish-speakers have no trouble in comprehending phonemic text. Phonemically corrupted text was presented to normal-hearing, hearing-impaired as well as deaf subjects and their comprehension levels were measured. According to this study, current speech recognition methods allow for limited applications in this field.
A0727.pdf

TOP

ACCeSS - Automated Call Center Through Speech Understanding System

Authors: Ute Ehrlich, Gerhard Hanrieder¨, Ludwig Hitzenberger¨¨, Paul Heisterkamp, Klaus Mecklenburg, Peter Regel-Brietzmann,

Daimler-Benz AG, Institute for Information Technology Wilhelm-Runge-Str. 11, D-89081 Ulm, Germany regel@dbag.ulm.daimlerbenz.com ¨ Daimler-Benz Aerospace, Wörthstr. 85, D-89077 Ulm, Germany ¨¨ University of Regensburg, FG Information Science, D-93040 Regensburg, Germany

Volume 4 pages 1819 - 1822

ABSTRACT

This paper will describe the results of a high sophisticated speech application project. ACCeSS is an EU project with Greek and German partners. Our Greek partners are Knowledge S.A. and the University of Patras. In this paper we report about the German application of the project. The project addresses a first step for automation of call centers for personal intensive applications in insurances. New forms of insurance operation are more and more using the telephone or direct mailing for the contact between an insurance and its customers. This makes the business more efficient and more direct than the classical operation with agents. Routine contractual details can easily be handled in such a way and all other cases of insurance actions can be realised with such communication media, too. A direct insurance company running a large call center is involved in the project and pays attention to the functionality of the system. The user needs are analysed using Wizard-of-Oz experiments. The system structure, a sketch of the algorithms and modules, and first results of evaluation will be given. Keywords: speech dialogue, Call Center, Wizard-of-Oz, dialogue strategy, semantics, ACCeSS
A0819.pdf

TOP

INTEGRATING A RADIO MODEL WITH A SPOKEN LANGUAGE INTERFACE FOR MILITARY SIMULATIONS

Authors: E. Richard Anthony, Charles Bowen, Margot T. Peet, Susan Tammaro

The MITRE Corporation 1820 Dolley Madison Drive, McLean, VA 22102-3481 USA Email: mpeet@mitre.org

Volume 4 pages 1823 - 1826

ABSTRACT

We incorporated a simulated military radio into a spoken language interface to a distributed simulation environment for military commander training. The resulting architecture bypassed the inherent problem of acoustic mismatch that arises in integrating radio output with a speech recognition front end, while at the same time preserving the realism of speech synthesis output through a military radio. We assessed the utility of formal evaluation methods to benchmark the impact of the radio model on a commercial speech synthesizer.
A0932.pdf

TOP

ON FIELD EXPERIMENTS OF CONTINUOUS DIGIT RECOGNITION OVER THE TELEPHONE NETWORK

Authors: D. Falavigna, R. Gretter

IRST - Istituto per la Riccrca Scientifica e Technologica 38050 Pante di Povo, Trento Italy e-mail:falavi@irst.itc.it gretter@irst.itc.it

Volume 4 pages 1827 - 1830

ABSTRACT

In this paper a continuous digit recognizer over the telephone network in real time will be de- scribed. The activity has allowed the realization of a system, installed in some Italian telephone exchanges, for providing semi-automatic collect call services. Data collection has also been per- formed, and a field database was built. Either a continuous digit recognition task and a confirma- tion task, requiring rejection, have been defined. Recognition results are presented.
A1110.pdf

TOP

An HMM-based phoneme recognizer applied to assessment of dysarthric speech

Authors: Xavier Menéndez-Pidal � , Polikoff, J.B., and Bunnell, H.T.

Applied Science & Engineering Laboratories, duPont Hospital for Children, P.O. Box 269, Wilmington, DE 19899, USA � SONY Electronics Inc., 3300 Zanker Rd, MS SJ-2D4, San Jose, CA 95134, USA E-mail: bunnell@.asel.udel.edu or xavier@lsi.sel.sony.com

Volume 4 pages 1831 - 1834

ABSTRACT

This paper describes work on the development of an HMM-based system for automatic speech assessment, particularly of dysarthric speech. As a first step, we compare recognizer performance on a closed-set, forced choice identification test of dysarthric speech with performance on the same test by untrained listeners. Results indicate that HMM recognition accuracy averaged over all utterances of a dysarthric talker is well-correlated with measures of overall talker intelligibility. However, on an utterance-by-utterance basis, the pattern of errors obtained from the human subjects and the machine, while significantly correlated, accounts for, at best, only about 25 percent of the variance. Potential methods for improving this performance are considered.
A1164.pdf

TOP

MULTIAPPLICATION PLATFORM BASED ON TECHNOLOGY FOR MOBILE TELEPHONE NETWORK SERVICES

Authors: Celinda de la Torre (*) and Gonzalo Alonso (**)

(*) Speech Technology Group. Telefónica Investigación y Desarrollo, Emilio Vargas 6, 28043 Madrid, Spain (**) Dept. Desarrollo de Servicios. Telefónica Servicios Móviles. Llodio, 2, 28034 Madrid. Spain e-mail: (celinda)@craso.tid.es

Volume 4 pages 1835 - 1838

ABSTRACT

This paper describes a new platform developed at Telefónica I+D, based on Speech Technology, for being integrated in the Mobile Telephone Network, and specially suitable for quick incorporation of new customer demanded services. This multiapplication platform has been conceived as "call-basis" driven, that means, that the number called by the user will determine which application must be run at each time in order to serve him. The system permits dynamically redistribute the lines assigned to each application following traffic criteria. In the other hand, the use of dynamic libraries allow a quick procedure to update, incorporate or eliminate applications. The integration of computers with the telephone network into commercial equipments has become feasible due to the proliferation of inexpensive Personal Computers and advances in speech processing. The availability of the Speech Technology Products of Telefónica of Spain [1] [2] [3] [4] has made possible to develop speech based servers for new telephone services in a rapid and reasonable way.
A1220.pdf

TOP

FIELD TEST OF A CALLING CARD SERVICE BASED ON SPEAKER VERIFICATION AND AUTOMATIC SPEECH RECOGNITION

Authors: Els den Os (1) , Lou Boves (1) (2), David James (3), Richard Winski (4) , Kurt Fridh (5)

(1)KPN Research, (2)KUN , (3)Ubilab, (4)Vocalis, (5)Telia P.O. Box 421, 2260 AL Leidschendam, the Netherlands E.A.denOs@research.kpn.com

Volume 4 pages 1839 - 1842

ABSTRACT

In this research we have studied several human factors problems that are connected to the deployment of speaker verification technology in telecommunication services. We investigate the perception of the safety of a calling card service when it is protected by speaker verification on the 14 digit card number, and compare it to the perceived safety of speaker verification and PIN. Moreover, we compare a voice based interface to the service with a DTMF based interface. The results are crucial for guiding the introduction and deployment of speaker verification technology in actual applications.
A1235.pdf

TOP

SPEECH: A PRIVILEGED MODALITY

Authors: Luc E. JULIA Adam J. CHEYER

STAR Laboratory SRI International 333, Ravenswood Ave. Menlo Park, California 94025 Tel. +1 415 859 4269, E-mail: julia@speech.sri.com Artificial Intelligence Center SRI International 333, Ravenswood Ave. Menlo Park, California 94025 Tel. +1 415 859 4119, E-mail: cheyer@ai.sri.com

Volume 4 pages 1843 - 1846

ABSTRACT

Ever since the publication of Bolt's ground-breaking "Put-That There" paper [1], providing multiple modalities as a means of easing the interaction between humans and computers has been a desirable attribute of user interface design. In Bolt's early approach, the style of modality combination required the user to conform to a rigid order when entering spoken and gestural commands. In the early 1990s, the idea of synergistic multimodal combination began to emerge [4], although actual implemented systems (generally using keyboard and mouse) remained far from being synergistic. Next-generation approaches involved time-stamped events to reason about the fusion of multimodal input arriving in a given time window, but these systems were hindered by time-consuming matching algorithms. To overcome this limitation, we proposed [6] a truly synergistic application and a distributed architecture for flexible interaction that reduces the need for explicit time stamping. Our slot-based approach is command directed, making it suitable for applications using speech as a primary modality. In this article, we use our interaction model to demonstrate that during multimodal fusion, speech should be a privileged modality, driving the interpretation of a query, and that in certain cases, speech has even more power to override and modify the combination of other modalities than previously believed.
A1282.pdf