Session T3D Applications of Speech Technology I

Chairperson Klaus Fellbaum Univ. of Cottbus, Germany

Home

WEBGALAXY – INTEGRATING SPOKEN LANGUAGE AND HYPERTEXT NAVIGATION 1

Authors: Raymond Lau, Giovanni Flammia, Christine Pao, and Victor Zue

Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA http://www.sls.lcs.mit.edu, mailto:fraylau, flammia, pao, zueg@sls.lcs.mit.edu

Volume 2 pages 883 - 886

ABSTRACT

The growth in the quantity of information and services offered online has been phenomenal. Nevertheless, access mechanisms have remained relatively primitive, requiring users to primarily point and click their way through a forest of Web links and to expend valuable cognitive capacities to track the geography of the Web space. Conversational systems can provide an intuitive, flexible multi-modal interface to online resources. The explosive growth of the World Wide Web, the continuing standardization of Web related technologies, and the growing penetration of Internet access enable us to embed a very thin client inside a standard Web browser, making conversational interfaces available to a much wider audience. This paper presents WebGALAXY, a conversational spoken language system for access to selected on-line resources from within a typical browser. A thin Java based client is employed as the front end with much of the speech and natural language processing occuring on remote servers.

A0104.pdf

TOP

PITCH ESTIMATION OF SINGING FOR RE-SYNTHESIS AND MUSICAL TRANSCRIPTION

Authors: Michael J. Carey(1), Eluned S. Parris(1) and Graham D. Tattersall(2).

(1)Ensigma Ltd, Turing House, Station Road, Chepstow, Monmouthshire, NP6 5PB, U.K. (2)Snape Signals Research, New House, Friston, Saxmundham, Suffolk IP17 1PH, U.K. michael,eluned@ensigma.com, gdt@sys.uea.ac.uk

Volume 2 pages 887 - 890

ABSTRACT

This paper describes an algorithm which allows singing to be analysed in real time using a PC and then re-synthesised by the computer using whistled notes. The singing can also be transcribed as a series of notes on a musical stave using a MIDI file as interface. Pitch amplitude and spectral change parameters are derived from the input waveform. A sequence of musical notes is derived from a set of parameters using a set of rules. The system is designed as an entertaining, yet educational tool for children, and will be embodied in an interactive multi-media system. In its electronic form the paper has attached files demonstrating the results of the re-synthesis algorithm.

A0259.pdf

Recordings

TOP

Automated Lip Synchronisation for Human-Computer Interaction and Special Effect Animation

Authors: Christian Martyn Jones and Satnam Singh Dlay

Department of Electrical and Electronic Engineering, Merz Court, University of Newcastle, Newcastle upon Tyne, NE1 7RU, United Kingdom Tel: +44 (0) 191 222 7340, FAX:+44 (0) 191 222 8180 E-mail: c.m.jones@newcastle.ac.uk & s.s.dlay@newcastle.ac.uk

Volume 2 pages 891 - 894

ABSTRACT

The research presents MARTI (Man-machine animation real-time interface) for the realisation of automated special effect animation and human computer interaction. The future developments of the Internet, video communications and multi-media, virtual reality, and animation will rely on the derivation of a natural human-machine interface in order to submerse people, irrespective of technical know-how, into the latest technology, and allow them to interact with computers and one another using their own personality and idiosyncrasies. MARTI introduces novel research in a number of engineering fields to realise the first natural interface and animation system capable of high performance for real-users and real-world applications.

A0263.pdf

TOP

DEVELOPING WEB-BASED SPEECH APPLICATIONS

Authors: Charles T. Hemphill and Yeshwant K. Muthusamy

Media Technologies Laboratory 8330 LBJ Freeway, MS 8374 Texas Instruments, Dallas,.Texas,.USA, 75243 Tel. 972-997-6396, FAX: 972-997-5786, E-Mail: hemphill@csc.ti.com

Volume 2 pages 895 - 898

ABSTRACT

We have developed a speech interface to the Web that allow·s easv access to information and an approach to intelligent user agents. The meclianisms developed apply to other multimedia applications where speech can serve as an input modality. We describe the benefits of our recognition system to speech-application developers: (1) Developers need not know about speech - in the simplest case, developers simply define HTML links. (2) Developers need not worry about word pronunciations since the system provides these. Developers may specify grammass in a simple BNF syntax and the system automatically converts these for use by the recognizer. (3) Developers with programming skills may use a VVeb server or the Java programming language to easily produce more sophisticated speech interfaces. (4) Developers reap the benefits of portability through general HTML browsers and languages such as Java. Java also simplifies the development of portable graphical interfaces that couple with speech input.

A0990.pdf

TOP

AUTOMATIC POST-SYNCHRONIZATION OF SPEECH UTTERANCES

Authors: Werner VERHELST

Vrije Universiteit Brussel, Faculty of Applied Science Dept. of Electronics and Signal Processing (ETRO) Pleinlaan 2, B-1050 Brussels, Belgium E-mail: wverhels@vnet3.vub.ac.be

Volume 2 pages 899 - 902

ABSTRACT

The paper considers a prototype for automatic post- synchronization that consists of two basic components. As a first step, dynamic time warping is applied to compute the time-correspondence between an original utterance and an utterance that serves as the timing reference signal. In a second step, a time-scaling algorithm modifies the time structure of the original utterance accordingly. Informal diagnostic evaluation has shown that good results are obtained if the similarity between the acoustic-phonetic contents of the utterances is high. Possible ways for improving robustness against acoustic-phonetic differences, such as those that result from different coarticulation, are suggested.

A1083.pdf

TOP

AUTOMATIC GENERATION OF HYPERLINKS BETWEEN AUDIO AND TRANSCRIPT

Authors: J. Robert-Ribes (1),(2),(3) and R.G. Mukhtar(1),(3)

(1) Advanced Computational Systems CRC (2) Computer Science Lab, Australian National Univ. (3) C.S.I.R.O. Mathematical and Information Sciences Locked Bag 17, North Ryde NSW 21 13, Australia FAX: +61 2 9325.3200, E-mail: Jordi.Robert-Ribes@cmis.csiro.au

Volume 2 pages 903 - 906

ABSTRACT

We present a prototype that enables the generation of hyperlinks between audio and the corresponding transcript. The main issue in generating such hyperlinks is determining common time points in the transcript and the audio (this is also called aligning). The system is speaker independent and can deal with inexact transcripts. It combines inaccurate modules in such a way that the fmal results are extremely satisfactory.

A1238.pdf