ABSTRACT
The growth in the quantity of information and services offered online has been phenomenal. Nevertheless, access mechanisms have remained relatively primitive, requiring users to primarily point and click their way through a forest of Web links and to expend valuable cognitive capacities to track the geography of the Web space. Conversational systems can provide an intuitive, flexible multi-modal interface to online resources. The explosive growth of the World Wide Web, the continuing standardization of Web related technologies, and the growing penetration of Internet access enable us to embed a very thin client inside a standard Web browser, making conversational interfaces available to a much wider audience. This paper presents WebGALAXY, a conversational spoken language system for access to selected on-line resources from within a typical browser. A thin Java based client is employed as the front end with much of the speech and natural language processing occuring on remote servers.
ABSTRACT
This paper describes an algorithm which allows singing to be analysed in real time using a PC and then re-synthesised by the computer using whistled notes. The singing can also be transcribed as a series of notes on a musical stave using a MIDI file as interface. Pitch amplitude and spectral change parameters are derived from the input waveform. A sequence of musical notes is derived from a set of parameters using a set of rules. The system is designed as an entertaining, yet educational tool for children, and will be embodied in an interactive multi-media system. In its electronic form the paper has attached files demonstrating the results of the re-synthesis algorithm.
ABSTRACT
The research presents MARTI (Man-machine animation real-time interface) for the realisation of automated special effect animation and human computer interaction. The future developments of the Internet, video communications and multi-media, virtual reality, and animation will rely on the derivation of a natural human-machine interface in order to submerse people, irrespective of technical know-how, into the latest technology, and allow them to interact with computers and one another using their own personality and idiosyncrasies. MARTI introduces novel research in a number of engineering fields to realise the first natural interface and animation system capable of high performance for real-users and real-world applications.
ABSTRACT
We have developed a speech interface to the Web that allow·s easv access to information and an approach to intelligent user agents. The meclianisms developed apply to other multimedia applications where speech can serve as an input modality. We describe the benefits of our recognition system to speech-application developers: (1) Developers need not know about speech - in the simplest case, developers simply define HTML links. (2) Developers need not worry about word pronunciations since the system provides these. Developers may specify grammass in a simple BNF syntax and the system automatically converts these for use by the recognizer. (3) Developers with programming skills may use a VVeb server or the Java programming language to easily produce more sophisticated speech interfaces. (4) Developers reap the benefits of portability through general HTML browsers and languages such as Java. Java also simplifies the development of portable graphical interfaces that couple with speech input.
ABSTRACT
The paper considers a prototype for automatic post- synchronization that consists of two basic components. As a first step, dynamic time warping is applied to compute the time-correspondence between an original utterance and an utterance that serves as the timing reference signal. In a second step, a time-scaling algorithm modifies the time structure of the original utterance accordingly. Informal diagnostic evaluation has shown that good results are obtained if the similarity between the acoustic-phonetic contents of the utterances is high. Possible ways for improving robustness against acoustic-phonetic differences, such as those that result from different coarticulation, are suggested.
ABSTRACT
We present a prototype that enables the generation of hyperlinks between audio and the corresponding transcript. The main issue in generating such hyperlinks is determining common time points in the transcript and the audio (this is also called aligning). The system is speaker independent and can deal with inexact transcripts. It combines inaccurate modules in such a way that the fmal results are extremely satisfactory.