Session TMc Technology for S&L Acquisition, Speech Processing Tools

Chairperson Petros Maragos ILSP, Greece

Home


The "Sketchboard": A Dynamic Interpretative Memory and its Use for Spoken Language Understanding

Authors: Gérard SABAH

Language and Cognition Group LIMSI - CNRS B.P. 133, 91403 ORSAY Cedex, FRANCE Tel: 33 1 69 85 80 03, Fax: 33 1 69 85 80 88, E-mail: gs@limsi.fr

Volume 2 pages 617 - 620

ABSTRACT

Blackboards allow various knowledge sources to be triggered in an opportunistic way, but does not allow higher modules to feedback information to lower level modules. The solution presented here remedies this shortcoming, since our Sketchboard implements reactive feedback loops. Within the Sketchboard, modules are considered from two points of view: either they build a result (a sketch, possibly rough and vague) or they give back a response to the modules from which they received their input data. This response signals the degree of confidence the module has towards its own result. These relations are generalized across all the modules that interact when solving a problem. As higher and higher level modules are triggered, the initial sketch become more and more precise, taking into account the higher modules knowledge. Conceived for natural language processing, the Sketchboard is also useful for spoken language understanding as shown by a detailed example.

A0025.pdf

TOP


SPEECH TECHNOLOGY INTEGRATION AND RESEARCH PLATFORM: A SYSTEM STUDY

Authors: Qiru Zhou, Chin-Hui Lee, Wu Chou and Andrew Pargellis

Multimedia Communication Research Laboratory Bell Laboratories, Lucent Technologies 600-700 Mountain Avenue Murray Hill, NJ 07974, USA {qzhou, chl, wuchou, anp}@research.bell-labs.com

Volume 2 pages 621 - 624

ABSTRACT

We present a generic speech technology integration platform for application development and research across different domains. The goal of the design is two-fold: On the application development side, the system provides an intuitive developer's interface defined by a high level application definition language and a set of convenient speech application building tools. It allows a novice developer to rapidly deploy and modify a spoken language dialogue application. On the system research and development side, the system uses a thin, 'broker' layer to separate the system application programming interface from the service provider interface. It makes the system easy to incorporate new technologies and new functional components. We also use a domain independent acoustic model set to cover US English phones for general speech applications. The system grammar and lexicon engine creates grammars and lexicon dictionaries on the fly to enable a practically unrestricted vocabulary for many recognition and synthesis applications.

A0080.pdf

TOP


Speech Recognition on SPHERIC --- An IC for Command & Control Applications

Authors: Dieter Geller, Markus Lieb, Wolfgang Budde, Oliver Muelhens, Manfred Zinke

Philips GmbH Forschungslaboratorien Aachen P.O. Box 500145, D-52085 Aachen, Germany E-mail: fgeller,lieb,budde,muelhens,zinkeg@pfa.research.philips.com

Volume 2 pages 625 - 628

ABSTRACT

SPHERIC is a new IC that has been designed specially for automatic speech recognition applications in Consumer Electronics with a vocabulary of up to 126 words. It allows real-time recognition of both speaker dependent and independent words spoken continuously or in an isolated way. Key word spotting and playback of coded messages and user trained words are additional features. After a short system overview the hardware architecture and software structure are presented in this paper. The techniques for reducing computation time and necessary memory size are examined in more detail. Finally, the implemented speech recognition al- gorithm is described.

A0092.pdf

TOP


MUSE: A SCRIPTING LANGUAGE FOR THE DEVELOPMENT OF INTERACTIVE SPEECH ANALYSIS AND RECOGNITION TOOLS 1

Authors: Michael K. McCandless and James R. Glass

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu, mailto:fmike, jrgg@sls.lcs.mit.edu

Volume 2 pages 629 - 632

ABSTRACT

Speech research is a complex endeavor, as reflected in the numerous tools and specialized languages the modern researcher needs to learn. These tools, while adequate for what they have been designed for, are difficult to customize or extend in new directions, even though this is often required. We feel this situation can be improved and propose a new scripting language, MUSE, designed explicitly for speech research, in order to facilitate exploration of new ideas. MUSE is designed to support many modes of research from interactive speech analysis through compute-intensive speech understanding systems, and has facilities for automating some of the more difficult requirements of speech tools: user interactivity, distributed computation, and caching. In this paper we describe the design of the MUSE language and our current prototype MUSE interpreter.

A0100.pdf

TOP


LANGUAGE LEARNING BASED ON NON-NATIVE SPEECH RECOGNITION

Authors: Silke Witt Steve Young

Cambridge University Engineering Department Trumpington Street, Cambridge CB2 1PZ United Kingdom Email: fsmw24,sjyg@eng.cam.ac.uk

Volume 2 pages 633 - 636

ABSTRACT

This work presents methods of assessing non-native speech to aid computer-assisted pronunciation teaching. These methods are based on automatic speech recognition (ASR) techniques using Hidden Markov Models. Confidence scores at the phoneme level are calculated to provide detailed information about the pronunciation quality of a foreign language student. Experimental results are given based on both artificial data and a database of non-native speech, the latter being recorded specifically for this purpose. The presented results demonstrate the metrics' capability to locate and assess mispronunciations at the phoneme level.

A0122.pdf

TOP


TASK MODELLING BY SENTENCE TEMPLATES

Authors: Ute Kilian, Klaus Bader

Daimler-Benz AG, Research and Technology Wilhelm-Runge-Str. 11, D-89081 Ulm, Germany kilian@dbag.ulm.DaimlerBenz.COM

Volume 2 pages 637 - 640

ABSTRACT

Speech recognition applications always face the problem of changing vocabulary and functionality. The use of speech recognition systems will become more attractive if the system user is able to define or redefine the task himself in a suitable manner. Modelling a new task normally requires the experience of a human expert and a lot of time. Aditionally, the expert always has to be contacted if system changes become necessary. In this paper we present a fully operational system for continuous speech recognition with a powerful user interface. Most of the internal aspects of the speech recognition system are hidden. The task may be divided into different subtasks corresponding to dialogue states. Each subtask is defined by a set of expected user utterances based on sentence templates. This definition is automatically transformed into a lexicon and a language model used by the speech recognition system.

A0134.pdf

TOP


EXTRACTION AND REPRESENTATION RHYTHMIC COMPONENTS OF SPONTANEOUS SPEECH

Authors: S. Kitaazawa, H. Ichikawa, S. Kobayashi & Y. Nishinuma*

Department of Computer Science, Faculty of Information, Shizuoka University, 5-1, 3-Chome, Jouhoku, Hamamatsu, 432, JAPAN Tel. +81 53 478 1471, FAX: +81 53 475 4595, kitazawa@cs.inf.shizuoka.ac.jp *CNRS, URA 261 "Laboratoire parole et langage", Universite de Provence, 13621 Aix-en-Provence, France

Volume 2 pages 641 - 644

ABSTRACT

Speech speed is measured and displayed with our specific algorithm TEMAX (Temporal Evaluation and Measurement Algorithm by KS). The TEMAX-gram, a sonagraphic output of speech envelope, the DFT using a 1-second window is convenient to set off isosyllabic characteristics. For Japanese traces 2 dark bars, called rhythmic formants: RF1 and RF2: the first one, around 8 Hz, and the second one, at halfway. RF1 corresponds to speech rate, RF2 represents the bimoraic rhythmic foot. As far as English, its isochronic characteristics are observable with a 2-seconds window as RF1. Furthermore, using a 1-second window the periodicity of syllables between stress is displayed as RF2.

A0140.pdf

Recordings

TOP


AUTOMATIC PRONUNCIATION SCORING OF SPECIFIC PHONE SEGMENTS FOR LANGUAGE INSTRUCTION

Authors: Yoon Kim, Horacio Franco, and Leonardo Neumeyer

Speech Technology and Research Laboratory SRI International, Menlo Park, CA 94025 USA http://www.speech.sri.com

Volume 2 pages 645 - 648

ABSTRACT

The aim of the work described in this paper is to develop methods for automatically assessing the pronunciation quality of specific phone segments uttered by students learning a foreign language. From the phonetic time alignments generated by SRI's Decipher^TM HMM-based speech recognition system, we use various probabilistic models to produce pronunciation scores for the phone utterance. We evaluate the performance of the proposed algorithms by measuring how well the machine-produced scores correlate with human judgments on a large database. Of the various algorithms considered, the one based on phone log-posterior-probability produced the highest correlation (r xy = 0.72) with the human ratings, which was comparable with correlations between human raters.

A0205.pdf

TOP


AUTOMATIC DETECTION OF MISPRONUNCIATION FOR LANGUAGE INSTRUCTION

Authors: Orith Ronen, Leonardo Neumeyer, and Horacio Franco

Speech Technology and Research Laboratory SRI International, Menlo Park, California 94025 USA http://www.speech.sri.com

Volume 2 pages 649 - 652

ABSTRACT

This work is part of a project aimed at developing a speech recognition system for language instruction that can assess the quality of pronunciation, identify pronunciation problems, and provide the student with accurate feedback about specific mistakes. Previous work was mainly concerned with scoring the quality of pronunciation. In this work we focus on automatic detection of mispronunciation. While scoring quantifies the mispronunciation, detection identifies the occurrence of a specific problem. Detecting pronunciation problems is necessary for providing feedback to the student. We use pronunciation scoring techniques to evaluate the performance of our mispronunciation model.

A0206.pdf

TOP


CONTINUOUS FORMANT-TRACKING APPLIED TO VISUAL REPRESENTATIONS OF THE SPEECH AND SPEECH RECOGNITION

Authors: A. Alvarez, R. Martinez, V. Nieto, V Rodellar and P. Gomez

Departamento de Arquitectura y Tecnologia de Sistemas Informaticos Universidad Politecnica de Madrid Campus de Montegancedo, s/n, 28660 Boadilla del Monte, Madrid, Spain Tel.: +34.1.336.73.84, Fax: +34.1.336.74.12, E-mail: pedro@pino.datsi.fi.upm.es

Volume 2 pages 653 - 656

ABSTRACT

Through the present paper, a methodology to create Visual Representations of Speech for Speech Perception Enhancement Applications, based on the use of a Continuous Formant- Tracking Algorithm, is presented. The specific mathematical and computational issues introduced for such treatment are given, and a specific case for Computer-Aided Language Learning oriented to the Phonetic Specificities of English for Spanish Speakers is also presented. This specific technique may also be used in statistically normalizing Speech Data for Speech Recognition Systems. In this context, an example of a Robust to Noise Speech Recognizer, which uses Eormant Dynamic Information is shown.

A0261.pdf

TOP


A CALL SYSTEM USING SPEECH RECOGNITION TO TRAIN THE PRONUNCIATION OF JAPANESE LONG VOWELS, THE MORA NASAL AND MORA OBSTRUENTS

Authors: Goh Kawai and Keikichi Hirose

Department of Information and Communication Engineering The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113 Japan E-mail: goh@kawai.com hirose@gavo.t.u-tokyo.ac.jp

Volume 2 pages 657 - 660

ABSTRACT

We developed a CALL (computer-aided language learning) system for teaching the pronunciation of Japanese long vowels, the mora nasal and mora obstruents to non-native speakers of Japanese. Long vowels and short vowels are spectrally almost identical but their phone durations differ significantly. Similar conditions exist between mora nasals and non-mora nasals, and between mora and non-mora obstruents. Our system uses speech recognition to measure the durations of each phone and compares them with distributions of native speakers while correcting for different speech rates. Results show that learners quickly capture the relevant duration cues. The amount of learning time spent on acquiring these durational skills is well within the time constraints of TJSL (teaching Japanese as a second language) curricula.

A0277.pdf

TOP


AN EDUCATIONAL AND EXPERIMENTAL WORKBENCH FOR VISUAL PROCESSING OF SPEECH DATA

Authors: Jan Nouza , Miroslav Holada , Daniel Hajek

SpeechLab, Dept of Electronics and Signal Processing Technical University of Liberec, Halkova 5 , 461 17 Liberec, Czech Republic Tel, +420-48-254 41 /208, FAX: +420-48-510 71 26, E-mail: jan.nouza@vslib.cz

Volume 2 pages 661 - 664

ABSTRACT

In the article the focus is put on educational aspects of the speech processing science. A set of tools that have been developed with the aim at presenting, visualizing and explaining basic topics of speech recognition is described. The set consists of programs, like a signal analysis unit, a dynamic time warping algorithm (DTW) explorer and hidden Markov model (HMM) investigation tools, that are integrated into a single environment and allow for easy and highly illustrative learning through experiments with real speech data.

A0328.pdf

TOP


A 3 CHANNEL DIGITAL CVSD BIT-RATE CONVERSION SYSTEM USING A GENERAL PURPOSE DSP

Authors: Yong-Soo Choi (1) Hong-Goo Kang (2) Sung-Youn Kim (1) Young-Cheol Park* Dae-Hee Youn(1)

(1)ASSP Lab., Dept. of Electronic Eng., Yonsei University, Seoul 120-749, Korea E-mail: cando@caas.yonsei.ac.kr (2)AT&T-Labs Research, Murray Hill, NJ07974, USA *Samsung Biomedical Research Institute, Seoul Korea

Volume 2 pages 665 - 668

ABSTRACT

This paper presents a bit-rate conversion system for an efficient communication between two CVSD systems with different bit-rates. To ensure the robustness to external noises, the presented system is implemented in digital domain using a general purpose digital signal processor (DSP). In order to overcome the problems caused by different bit-rate and time-constants, several methods are considered in this study. In addition, a significant simplification of the system complexity is obtained by introducing the IIR filter to the decimation/interpolation process. The use of the IIR filter provides comptational advantages over the conversion system employing FIR filters, because the linear phase is not a critical issue in this application. By modifying the algorithm based on the IIR filter, a 3-channel full-duplex conversion algorithm was successfully implemented on a single DSP. Experimentals results are presented to exihibit the consistent and reliable performance of the bit-rate conversion system.

A0337.pdf

TOP


SLIM Prosodic Module for Learning Activities in a Foreign Language

Authors: Rodolfo Delmonte, Mirela Petrea, Ciprian Bacalu

Università Ca' Foscari - Ca' Garzoni-Moro Laboratorio Linguistico Computazionale San Marco, 3417 - 30124 Venezia (Italy) Tel.:041-2578464/52/19 E-mail:delmont@unive.it WebSite:byron.cgm.unive.it

Volume 2 pages 669 - 672

ABSTRACT

The Prosodic Module of SLIM has been created in order to solve problems related to segmental and suprasegmental features of spoken English in a courseware for computer-assisted foreign language learning called SLIM - an acronym for Multimedia Interactive Linguistic Software, developed at the University of Venice. It is composed of two different sets of Learning Activities, the first one dealing with phonetic and prosodic problems at word segmental level, the second one dealing with prosodic problems at utterance suprasegmental level. The main goal of Prosodic Activities is to ensure feed-back to the student intending to improve his/her pronunciation in a foreign language. The programme works by comparing two signals, the master and the student ones, where the master has been previously edited by a human tutor inserting orthographic syllabic information at segmentation marks automatically computed by the underlying acoustic segmenter called Prosodics(see 1). When a student, after listening and evaluating the master signal tries to mimic the original utterance or word the system assigns a score and, if needed spots a mistake and indicates what it consists of. The elements of comparison are constituted by the acoustic correlates of prosodic features such as intonational contour, sentence accent and word stress, rhythm and duration at word and sentence level.

A0425.pdf

TOP


BARGE-IN REVISED

Authors: B. Kaspar, K. Schuhmacher, S. Feldes

Deutsche Telekom Berkom GmbH Research Group Speech Processing D-64295 Darmstadt, Germany email {kaspar,schuhm,feldes}@tzd.telekom.de

Volume 2 pages 673 - 676

ABSTRACT

We consider speech dialogues, allowing for simultaneous input (via speech recognition) and output (via speech synthesis or pre-recorded prompts), often referred to as "barge in". We start with a collection of dialogue situations, where simultaneous input and output is useful. It is argued, that a variety of possible system behaviour is necessary in order to take into account these situations adequately. We then define a formalism, that allows to control this system behaviour. We end up with reporting some experience gathered both in lab tests and a in real world pilot.

A0435.pdf

TOP


WaveEdit, An Interactive Speech Processing Environment for Microsoft Windows Platform

Authors: M. Akbar

Laboratoire de la Communication Langagière Interaction Personne Système Université de Joseph Fourier, 38041 Grenoble cedex 9, France Tel. +33 4 76 51 45 26, FAX: +33 4 76 44 66 75, Email: Mohammad.Akbar@imag.fr

Volume 2 pages 677 - 680

ABSTRACT

This paper presents a new interactive speech processing environment designed for Microsoft Window platforms. It will be shown that how an integrated speech processing environment was made following Windows Interface Design Guidelines. The environment integrates many traditional time and frequency domain analysis algorithms as well as basic functions like recording, listening and labeling. Choosing Component Object Model (COM) as the architectural framework assures high maintainability, scripting capability and further expandability of this environment. Extensive use of the system in laboratory has shown how this interactive environment improves users performance in their every day speech processing tasks.

A0566.pdf

TOP


SUBARASHII: JAPANESE INTERACTIVE SPOKEN LANGUAGE EDUCATION

Authors: Farzad Ehsani, Jared Bernstein, Amir Najmi, Ognjen Todic

Entropic Research Laboratory, Inc. 1040 Noel Dr., Menlo Park, CA 94025, USA Tel: 1-415-328-8877, FAX: 1-415-328-8866, E-mail: farzad@entropic.com

Volume 2 pages 681 - 684

ABSTRACT

Subarashii is a system that uses automatic speech recognition (ASR) to offer first-level, computer-based exercises in the Japanese language for beginning high school students. Building the Subarashii system has identified strengths and limitations of ASR technology and has led to some novel methods in the development of materials for computer-based interactive spoken language education.

A0570.pdf

TOP


Deploying Speech Applications over the Web

Authors: David Goddeau, William Goldenthal, and Chris Weikart

Digital Equipment Corporation Cambridge Research Laboratory 1 Kendall Square, Bldg 700 Cambridge, MA 02139 http://www.research.digital.com/CRL/ email: dg@crl.dec.com, thal@crl.dec.com, weikart@crl.dec.com

Volume 2 pages 685 - 688

ABSTRACT

At Digital Equipment Corporation's Cambridge Research Lab (CRL), the Speech Interaction Group has been focusing on building speech applications for deployment over the World-Wide Web. Web-based speech applications require the browser to capture and transmit speech to remote servers for back-end processing, maintain application state, and present multi-media responses. This paper describes the group's strategy for delivering speech applications built around a mechanism, the digital Voice Plugin, for capturing and transmitting audio from a browser. It describes a conversational application implemented within this framework and discusses the problems of delivering these systems on the Web. In addition, we brie y touch upon some other Web-based speech applications that have been developed at CRL.

A0636.pdf

TOP


CSLUsh: AN EXTENDIBLE RESEARCH ENVIRONMENT

Authors: Johan Schalkwyk, Jacques de Villiers, Sarel van Vuuren and Pieter Vermeulen

Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, 20000 N.W. Walker Road, P.O. Box 91000, Portland, OR 97291-1000, USA

Volume 2 pages 689 - 692

ABSTRACT

The CSLU shell (CSLUsh), is a collection of modular building blocks which aim to provide the user with a powerful, extendible, research, development and implementation environment. Implemented in C with standardized Tcl/Tk interfaces to provide a scripting and visualization environment, it allows a exible cast for both research algorithms and system deployment. This shell is the architecture on which the CSLU Toolkit is built and may be downloaded for non-commercial use from http://www.cse.ogi.edu/CSLU/toolkit.

A0680.pdf

TOP


A Flexible Client-Server Model for Multilingual CTS/TTS Development

Authors: Tibor Ferenczi*, Geza Nemeth*, Gabor Olaszy**, Zoltan Gaspar*

*Department of Telecommunications and Telematics, Technical University of Budapest, Hungary E-mail: ferenczi@ss20.ttt.bme.hu nemeth@ttt.bme.hu gaspar@ss20. ttt. bme. hu **Phonetics Laboratory, Linguistics Institute of the Hungarian Academy of Sciences, Budapest, Hungary E-mail: olaszy@ttt-202.ttt.bme.hu

Volume 2 pages 693 - 696

ABSTRACT

The efficiency of the development of CTSlTTS systems is inJluenced by the features and services of the software development tools used in the development process. A development system should be highly flexible, informative and user friendly to fulfil all or alrnost all the requirernents the researcher could have. In this paper we present a development systenr, MVoxDev, that can provide an inforrnative and Jlexible environment for the developrnent of rnultilingual CTS/TTS systents. Zhe developrnent systern gives aid to inspect and modifv all the constituent parts of the CTS/TTS system as a client of the developed CTS/TTS system.

A0742.pdf

TOP


CRITICALLY SAMPLED PR FILTERBANKS OF NONUNIFORM RESOLUTION BASED ON BLOCK RECURSIVE FAMLET TRANSFORM

Authors: Unto K. Laine

Laboratory of Acoustics and Audio Signal Processing Helsinki University of Technology P.O. Box 3000, FIN-02015 Espoo, Finland Tel. +358 9 4512492, FAX: +358 9 460224, E-mail: Unto.Laine@hut.fi

Volume 2 pages 697 - 700

ABSTRACT

A new block recursive algorithm is introduced for effective FAMlet transform implementation. When the Fourier transform is combined with the algorithm a nonuniform resolution filterbank is created. The algorithm allows to approximate frequency resolutions of any type, the ERB-rate scale included. The signals can be vector based critically down sampled which allows a perfect reconstruction.

A0760.pdf

TOP


AUTOMATIC DETECTION OF ACCENT IN ENGLISH WORDS SPOKEN BY JAPANESE STUDENTS

Authors: Nobuaki MINEMATSU Nariaki OHASHI Seiichi NAKAGAWA

mine@tutics.tut.ac.jp ohashi@slp.tutics.tut.ac.jp nakagawa@tutics.tut.ac.jp Dept. of Information and Computer Sciences, Toyohashi Univ. of Tech., 1-1 Hibarigaoka, Tempaku-chou, Toyohashi-shi, Aichi-ken, 441 JAPAN Tel: +81-532-44-6767, FAX: +81-532-44-6757

Volume 2 pages 701 - 704

ABSTRACT

Acoustic realization of word accent differs among languages. While, in Japanese, it is fully represented by an F 0 contour of a word, English word accent is characterized by power, duration, F 0 , vowel quality and so forth. In addition to the difference in syllable structure between the two languages, that in word accent makes it even more difficult for Japanese students to master correct pronunciation of English words. It indicates that the development of an automatic evaluation method of English word accent, as one of English teaching tools, will be helpful especially to Japanese students. In this paper, as the first step to the development, a detection method of accent in English words spoken by Japanese is proposed, where syllable-size HMMs are built using positional information of the syllables and adequately detected syllable boundaries are used for the detection. Results of accent detection experiments show 90 % and 93 % as detection rates of Japanese students and native speakers respectively.

A0780.pdf

TOP


AN ENGLISH CONVERSATION AND PRONUNCIATION CAI SYSTEM USING SPEECH RECOGNITION TECHNOLOGY

Authors: Yasuhiro Taniguchi, Allan A. Reyes, Hideyuki Suzuki and Seiichi Nakagawa

Toyohashi University of Technology Department of Information and Computer Sciences Tenpaku-cho, Toyohashi, 441, Japan Tel. +81 532 44 6777, FAX: +81 532 44 6777 E-mail: ftaniguc1,nakagawag@slp.tutics.tut.ac.jp

Volume 2 pages 705 - 708

ABSTRACT

This paper describes an English conversation and pronunciation CAI using speech recognition techniques. This system was intended to recognize user's utterances and to respond to him properly according to the recognized results. In the case of a learner with unskilled pronunciation, because of differences in the phonemic system between his mother tongue and the second language, the speech recognition system cannot run normally. After this improvement, evaluation experiments were conducted. The results indicate that learners' ability in speaking and in listening to English is improved by using the system.

A0781.pdf

TOP


BRINGING SPOKEN LANGUAGE SYSTEMS TO THE CLASSROOM

Authors: Sutton, S., Kaiser, E., Cronk, A. and Cole, R.

Center for Spoken Language Understanding Dept. Of Computer Science & Engineering Oregon Graduate Institute of Science and Technology PO Box 91000, Oregon 97291-1000, USA. sutton@cse.ogi.edu http://www.cse.ogi.edu/CSLU/

Volume 2 pages 709 - 712

ABSTRACT

Currently, there are few opportunities for people to learn about and experiment with the latest spoken language technology. Furthermore, most research and development activities are restricted to a handful of academic and industrial labs. In order to make the technology less exclusive, it must become more accessible to the general population. This is now feasible with the development of the CSLU Toolkit which combines easy-to-use authoring tools with state-of- the-art human language technology. In this paper, we focus on the educational role of the toolkit and describe how it is being used in several local schools.

A0890.pdf

TOP


AUTOMATIC ASSESSMENT OF FOREIGN SPEAKERS' PRONUNCIATION OF DUTCH

Authors: C. Cucchiarini and L. Boves

University of Nijmegen, Dept. of Language & Speech P.O. Box 9103, 6500 HD Nijmegen, The Netherlands tel: 31-24-3615785, fax: 31-24-3615939 {Cucchiarini, Boves}@let.kun.nl, http://lands.let.kun.nl/staff/

Volume 2 pages 713 - 716

ABSTRACT

The aim of the research reported on here is to develop a system for automatic assessment of foreign speakers' pronunciation of Dutch. In this paper similar studies carried out for English are first examined. Subsequently, suggestions are made for partly improving the methodology that is usually adopted in research on automatic pronunciation assessment. Finally, an experiment is presented in which automatic scores of telephone speech produced by native and nonnative speakers are compared with scores assigned by human raters. The approach used in this experiment is compared with those of previous studies.

A1129.pdf

TOP


USE OF LOW POWER EM RADAR SENSORS FOR SPEECH ARTICULATOR MEASUREMENTS

Authors: J.F. Holzrichter and G.C. Burnett

Lawrence Livermore National Laboratory, L-3 P.O. Box 808 Livermore, California 94550 email: holzrichter1@llnl.gov

Volume 2 pages 717 - 720

ABSTRACT

Very low power electromagnetic (EM) wave sensors are being used to measure speech articulator motions such as the vocal fold oscillations, jaw, tongue, and the soft palate. Data on vocal fold motions, that correlate well with established laboratory techniques, as well as data on the jaw, tongue and soft palate are shown The vocal fold measurements together with a volume air flow model are being used to perform pitch synchronous estimates of the voiced transfer functions using ARMA techniques.

A1201.pdf

TOP


Real time measurements of the vocal tract resonances during speech.

Authors: Julien Epps, Annette Dowd, John Smith and Joe Wolfe

School of Physics The University of New South Wales Sydney 2052 Australia jepps@newt.phys.unsw.edu.au

Volume 2 pages 721 - 724

ABSTRACT

The formants of speech sounds are usually attributed to resonances of the vocal tract. Formant frequencies are usually estimated by inspection of spectrograms or by automated techniques such as linear prediction. In this paper we measure the frequencies of the first two resonances of the vocal tract directly, in real time, using acoustic impedance spectrometry. The vocal tract is excited by a carefully calibrated, broad band, acoustic current signal applied outside the lips while the subject is speaking. The sound pressure response is analysed to give the resonant frequencies. We compare this new method (Real-time Acoustic Vocal tract Excitation or RAVE) with linear prediction and we report the vocal tract resonances for eleven vowels of Australian English. We also report preliminary results of using feedback from vocal tract excitation as a speech trainer, and its effect on improving the pronunciation of foreign vowel sounds by monolingual anglophones.

A1582.pdf

TOP