Auditory and instrumental analysis of cries from normally hearing and
profoundly hearing-impaired infants (2-11 month) is presented. Results
from listening experiments lead to the assumption that differences exist
between cries from the two infant groups. Attributes expressing the difference
are related to the emotional state of the infant, to prosodic features,
and to voice quality. Signal analysis of the cries confirms these findings
showing statistically significant differences for spectral parameters and
those describing the melody contour of the cries. The usability of neural
networks for an automatic classification and discrimination of cries is
discussed. If the tendencies shown here hold true for other data sets,
the findings can be used to develop a new screening method detecting hearing
impairment and auditory perception disorders at a very early age.
A0069.pdf
Traditional speech training methods can prove cumbersome because of
the difficulty of providing the subject with good feedback, maintaining
her/his motivation over long periods and stabilising the improvement in
articulation. In this work we provide visual feedback based on displaying
a trajectory in a 2-Dimensional 'phonetic space'. Data are presented from
a small-scale efficacy study, which illustrate the use of OLT in speech
therapy for misarticulated sibilant fricatives. Results for the contrasting
articulations are compared and the potential of OLT as a therapeutic technique
is discussed.
A0144.pdf
To retrieve a Chinese word from a Chinese dictionary, it needs the user
to know exactly the first character of the desired word. Because there
is more than 10,000 Chinese characters, this makes the Chinese dictionary
relatively difficult to be used. To reduce the problem, this paper presents
intelligent retrieval techniques for very large Chinese dictionaries with
speech queries. The proposed techniques properly integrate the technologies
of Mandarin speech recognition and Chinese information retrieval with a
syllable-based approach utilizing the mono-syllabic structure of the language.
Moreover, it is very nice to provide the function of retrieving all relevant
word entries from the dictionaries using speech queries describing "general
concepts" of the desired words. To achieve the challenging function, the
techniques of relevance feedback are also included. Based on these techniques,
a retrieval system was implemented successfully on a Pentium PC for a very
large Chinese dictionary which includes 160,000 word entries and the total
length of the lexical information under the word entries exceeds 20,000,000
words.
A0196.pdf
The EURESCOM P502 project, Multilingual Interactive Voice Activated
(MIVA) telephone services, launched in 1995 for a three-years term, aimed
at designing and experimenting on an automatic multilingual telephone assistant
for people-on-the-move, that provided them with instructions about the
use of most important telephone services in the country they are traveling.
The core information provided by the system is: emergency services, international
and national calls, card calls. Six European Telecom research laboratories
were involved in the project: CNET, the project leader; British Telecom,
Deutsch Telekom, KPN, Portugal Telecom and CSELT. The final prototype has
to include a language selection module and a menu-driven procedure, using
a common structure of the information contents in all languages. Several
factors are currently being investigated, such as the impact of a talk-through
capability, the effect of the cellular network as well as the usage of
different national networks on the ASR performance, and the optimization
of the dialogue strategy at the system interface level. The prototypes
are in the process of being tested within the individual national research
units, and cross-country tests will follow. As a further benefit, the potential
savings which can be obtained by sharing the costs of development of ASR-based
multilingual telephone services, will be estimated successively. A final
field trial of the national implementations of the systems has to be carried
out starting October '97 for a thorough evaluation of the multilingual
services.
A0197.pdf
This paper presents the assessment of a dialogue system which is used
daily by a blind telephone switchboard operator. The purpose of the system
is to provide this operator with some information about company members
called by external correspondents (phone extensions, department...). In
order to realize this system, we have to solved two main problems: on the
one hand the recognition of confusable letters (like P and T) and on the
second hand, the access to a database with a name -possibly misspelled
-. In the paper, we present the assessment which allowed us to measure
the system performances but also to validate the two methods designed for
this project.
A0223.pdf
This work presents the STACC, Sistema Telefónico Automático
de Consulta de Calificaciones Automatic Telephone System for Consulting
Marks). This system has been developed at our laboratory during 1996 and
implements a service through telephone line that allows the students to
consult by speech their marks after the exams by means of a simple phone
call. This experience provided us an interesting point of view about the
problems of real applications of speech technology. In this work we describe
the system and some statistics about the use of STACC by the students are
presented.
A0228.pdf
We present a preliminary version of a voice dialogue system suitable
to deal with client orders and questions in fast-food restaurants. The
system consists of two main sub-systems, namely a dialogue sub-system and
a voice interface. The dialogue sub-system is a natural language processing
system that may be considered a rule-based expert system, whose behaviour
is decided from a recorded dialogue corpus obtained at a real restaurant.
In this paper we present a general description of both sub-systems, and
focus on knowledge representation, grammar, and module structure of the
dialogue sub-system. An introduction to the natural language generation
mechanism used is introduced, and future work is mentioned. Finally some
conclusions are shown.
A0230.pdf
A scheme for binaural pre-processing of speech signals for input to
a standard linear hearing aid has been proposed. The system is based on
that of Toner & Campbell [1] who applied the Least Mean Squares (LMS)
algorithm in sub-bands to speech signals from various acoustic environments
and signal to noise ratios (SNR). The processing scheme attempts to take
advantage of the multiple inputs to perform noise cancellation. The use
of sub-bands enables a diverse processing mechanism to be employed, where
the wide-band signal is split into smaller frequency limited sub-bands,
which can subsequently be processed according to their signal characteristics.
The results of a series of intelligibility tests are presented from experiments
in which acoustic speech and noise data, generated in a simulated room
was tested on hearing impaired volunteers.
A0257.pdf
The development of technical communication devices for hearing-impaired
or deaf persons is one of the main topics in actual research on aids for
the disabled and elderly. Besides the well-known advances in the construction
of analogue and digital hearing aids for the hard-of-hearing, there is
also a long tradition of research on tactile speech aids. From the beginning
of this century, many attempts have been made to resolve the problem of
speech substitution for deaf people and those suffering from severe hearing
loss. A series of experiments was carried out to determine a robust coding
method for tactile transmission of F0 and stress to support speech perception
of deaf or severely hearing impaired persons unable to extract suprasegmental
speech features by the auditory sense.
A0379.pdf
Students of speech therapy and audiology are often faced with the difficulty
of obtaining a realistic view of the speech reception abilites of the hearing
impaired. Illustrating speech recognition defects is a demanding task for
the teaching staff, too. To improve the students' awareness of the effects
of hearing defects, an interactive multimedia programme allowing simulation
of various types of hearing impairment was constructed and placed on an
Internet media server for online use. The simulated hearing-impaired speech
material was produced using digital signal processing (e.g. mixing and
filtering of speech and noise) and multimedia and audio technologies which
enable streaming of sound files on the Internet. In the programme, word
recognition scores for several degrees and types of hearing impairment
in varying conditions of background noise and reverberation time can also
be computed.
A0548.pdf
This article presents a preliminary study of dysarthric speech by means
of formant-to-area mapping. Dysarthria is a speech impairment which is
the result of paralysis or ataxia of the speech muscles. Formant-to-area
mapping is the inference of the shape of a tract model via observed formant
frequencies. The corpus is composed of vowel- vowel sequences [iaia] produced
by speakers suffering from amyotrophic lateral sclerosis (ALS) and normal
speakers. The results show that the shapes and movements of the acoustically
mapped area function models are typical of the motions and postures of
the vocal tracts of ALS speakers.
A0645.pdf
The computer system described in this paper answers incoming telephone
calls and employs speaker-independent speech recognition to identify callers.
The users of the system can define caller-specific treatment and change
this treatment using a graphical user interface. Apart from relating and
receiving spoken messages, the system also offers advanced telephony features
such as paging and call forwarding, providing the required subscription
services are available from the telephone company. Standard interfaces
to the telephony and audio hardware are used, so that the system runs on
a desktop PC equipped with a voice-enabled modem.
A0676.pdf
In this work we describe the development and evaluation of SpeeData,
a prototype for multilingual spoken data- entry. The SpeeData project aims
at developing a demon- strator that provides a user-friendly interface
for spoken data-entry in two languages: Italian and German. A real world
application domain is considered, which is the Land Register of an Italian
region in which both languages are officially spoken. Original topics of
this paper are the in- teraction modality for spoken data-entry, the evaluation
of a data-entry system, bilingual speech recognition, bilin- gual speaker
adaptation.
A0713.pdf
It has been hypothesized that the Finnish language is well suited to
speech-to-text conversion for the communication aids of the hearing impaired.
In a related study it was shown that, depend-ing on context, 10 to 20 %
of phoneme errors can be tolerated with good comprehension when reading
text converted from raw phonemic recognition. Two sets of phoneme recognition
exper-iments were carried out in this study in order to evaluate the performance
of existing speech recognition systems in this ap-plication. For telephone
bandwidth speech both systems showed speaker dependent error scores of
about 10 % or below, thus supporting the feasibility of the application.
For speaker inde-pendent cases the error rate was typically more than 20
% which is too high for effortless and fluent communication.
A0724.pdf
This study simulates the phoneme errors made by speech recognizers and
determines the phoneme error level at which a reasonable comprehension
of text can still be achieved. Finnish is written almost phonemically and
Finnish-speakers have no trouble in comprehending phonemic text. Phonemically
corrupted text was presented to normal-hearing, hearing-impaired as well
as deaf subjects and their comprehension levels were measured. According
to this study, current speech recognition methods allow for limited applications
in this field.
A0727.pdf
This paper will describe the results of a high sophisticated speech
application project. ACCeSS is an EU project with Greek and German partners.
Our Greek partners are Knowledge S.A. and the University of Patras. In
this paper we report about the German application of the project. The project
addresses a first step for automation of call centers for personal intensive
applications in insurances. New forms of insurance operation are more and
more using the telephone or direct mailing for the contact between an insurance
and its customers. This makes the business more efficient and more direct
than the classical operation with agents. Routine contractual details can
easily be handled in such a way and all other cases of insurance actions
can be realised with such communication media, too. A direct insurance
company running a large call center is involved in the project and pays
attention to the functionality of the system. The user needs are analysed
using Wizard-of-Oz experiments. The system structure, a sketch of the algorithms
and modules, and first results of evaluation will be given. Keywords: speech
dialogue, Call Center, Wizard-of-Oz, dialogue strategy, semantics, ACCeSS
A0819.pdf
We incorporated a simulated military radio into a spoken language interface
to a distributed simulation environment for military commander training.
The resulting architecture bypassed the inherent problem of acoustic mismatch
that arises in integrating radio output with a speech recognition front
end, while at the same time preserving the realism of speech synthesis
output through a military radio. We assessed the utility of formal evaluation
methods to benchmark the impact of the radio model on a commercial speech
synthesizer.
A0932.pdf
In this paper a continuous digit recognizer over the telephone network
in real time will be de- scribed. The activity has allowed the realization
of a system, installed in some Italian telephone exchanges, for providing
semi-automatic collect call services. Data collection has also been per-
formed, and a field database was built. Either a continuous digit recognition
task and a confirma- tion task, requiring rejection, have been defined.
Recognition results are presented.
A1110.pdf
This paper describes work on the development of an HMM-based system
for automatic speech assessment, particularly of dysarthric speech. As
a first step, we compare recognizer performance on a closed-set, forced
choice identification test of dysarthric speech with performance on the
same test by untrained listeners. Results indicate that HMM recognition
accuracy averaged over all utterances of a dysarthric talker is well-correlated
with measures of overall talker intelligibility. However, on an utterance-by-utterance
basis, the pattern of errors obtained from the human subjects and the machine,
while significantly correlated, accounts for, at best, only about 25 percent
of the variance. Potential methods for improving this performance are considered.
A1164.pdf
This paper describes a new platform developed at Telefónica I+D,
based on Speech Technology, for being integrated in the Mobile Telephone
Network, and specially suitable for quick incorporation of new customer
demanded services. This multiapplication platform has been conceived as
"call-basis" driven, that means, that the number called by the user will
determine which application must be run at each time in order to serve
him. The system permits dynamically redistribute the lines assigned to
each application following traffic criteria. In the other hand, the use
of dynamic libraries allow a quick procedure to update, incorporate or
eliminate applications. The integration of computers with the telephone
network into commercial equipments has become feasible due to the proliferation
of inexpensive Personal Computers and advances in speech processing. The
availability of the Speech Technology Products of Telefónica of
Spain [1] [2] [3] [4] has made possible to develop speech based servers
for new telephone services in a rapid and reasonable way.
A1220.pdf
In this research we have studied several human factors problems that
are connected to the deployment of speaker verification technology in telecommunication
services. We investigate the perception of the safety of a calling card
service when it is protected by speaker verification on the 14 digit card
number, and compare it to the perceived safety of speaker verification
and PIN. Moreover, we compare a voice based interface to the service with
a DTMF based interface. The results are crucial for guiding the introduction
and deployment of speaker verification technology in actual applications.
A1235.pdf
Ever since the publication of Bolt's ground-breaking "Put-That There"
paper [1], providing multiple modalities as a means of easing the interaction
between humans and computers has been a desirable attribute of user interface
design. In Bolt's early approach, the style of modality combination required
the user to conform to a rigid order when entering spoken and gestural
commands. In the early 1990s, the idea of synergistic multimodal combination
began to emerge [4], although actual implemented systems (generally using
keyboard and mouse) remained far from being synergistic. Next-generation
approaches involved time-stamped events to reason about the fusion of multimodal
input arriving in a given time window, but these systems were hindered
by time-consuming matching algorithms. To overcome this limitation, we
proposed [6] a truly synergistic application and a distributed architecture
for flexible interaction that reduces the need for explicit time stamping.
Our slot-based approach is command directed, making it suitable for applications
using speech as a primary modality. In this article, we use our interaction
model to demonstrate that during multimodal fusion, speech should be a
privileged modality, driving the interpretation of a query, and that in
certain cases, speech has even more power to override and modify the combination
of other modalities than previously believed.
A1282.pdf