Authors:
Yolanda Blanco, Universidad Publica de Navarra (Spain)
Maria Cuellar, Universidad Publica de Navarra (Spain)
Arantxa Villanueva, Universidad Publica de Navarra (Spain)
Fernando Lacunza, Universidad Publica de Navarra (Spain)
Rafael Cabeza, Universidad Publica de Navarra (Spain)
Beatriz Marcotegui, Universidad Publica de Navarra (Spain)
Page (NA) Paper number 1127
Abstract:
This paper presents SIVHA, a high quality Spanish speech synthesis
system for severe disabled persons controlled by their eye movements.
The system follows the eye-gaze of the patients along the screen and
constructs the text with the selected words. When the user considers
that the construction of the message has been finished, the synthesis
of the message can be ordered. The system is divided in three modules.
The first one determines the point of the screen the user is looking
at, the second one is an interface to construct the sentences and the
third one is the synthesis itself.
Authors:
C.G. de Bruijn, University of Sheffield (U.K.)
Sandra P. Whiteside, University of Sheffield (U.K.)
P.A. Cudd, University of Sheffield (U.K.)
D. Syder, University of Sheffield (U.K.)
K.M. Rosen, KTH (Sweden)
L. Nord, KTH (Sweden)
Page (NA) Paper number 501
Abstract:
Literature and individual reports contain indications that the use
of speech recognition based human computer interfaces could potentially
lead to vocal fatigue, or even to symptoms associated with dysphonia.
While more and more people opt for a speech driven computer interface
as an alternative input method to the keyboard, and these speech recognition
systems become more and widely used, both in the home and office environment,
it has become necessary to qualify any potential risks of voice damage.
This study reports about ongoing research that investigates acoustic
changes in the voice, after use of a discrete speech recognition system.
Acoustic analyses were carried out on two Swedish users of such a system.
So far, for one of the users, two of the acoustic parameters under
investigation that could be an indicator of vocal fatigue, show a significant
difference directly before and after use of a speech recognition system.
Authors:
Robert Alexander Fearn, School of Physics, University of New South Wales (Australia)
Page (NA) Paper number 666
Abstract:
Cochlear implants were initially distributed in Western countries.
The speech processing strategies that have been developed to drive
the implants provide detailed information about the spectral envelope
and transients. This information is necessary to identify the phonemes
of English and other Western languages. Cochlear implants have more
recently been distributed in Eastern countries where tonal languages
such as Mandarin and Cantonese are spoken. In addition to spectral
envelope and transient information, tonal languages require finer resolution
of pitch to distinguish different words. Speech processing strategies
only provide the cochlear implant user with relatively low resolution
of pitch. This study investigates the importance of voice pitch (F0)
in detection of phonetically identical Cantonese words, which vary
in pitch. Simulations of speech processing strategies were performed
with a normal hearing subject and the results suggest that F0 is very
important for correct identification of tonal words.
Authors:
Karin Brunnegaard, Dept. of Logopedics and Phoniatrics, Göteborg University, Göteborg (Sweden)
Katja Laakso, Dept. of Logopedics and Phoniatrics, Göteborg University, Göteborg (Sweden)
Lena Hartelius, Dept. of Logopedics and Phoniatrics, Göteborg University, Göteborg (Sweden)
Elisabeth Ahlsén, Dept. of Linguistics, Göteborg University, Göteborg (Sweden)
Page (NA) Paper number 496
Abstract:
This study describes the development of a test battery to assess high-level
language function in Swedish and a description of the test performances
of a group of 9 individuals with multiple sclerosis (MS). The test
battery included tasks such as repetition of long sentences, understanding
of complicated logico-grammatical sentences, naming famous people,
resolving ambiguities, recreating sentences, understanding metaphors,
making inferences, defining words. The MS group included individuals
with self-reported language problems as well as individuals without
any such problems. Their performances were compared to a group of
7 control subjects with a Kruskal-Wallis one-way ANOVA which indicated
significantly different total mean scores. Post hoc analysis with Mann-Whitney
U-tests revealed that the group with self-reported language problems
had significantly lower mean scores when compared to control subjects
and to MS subjects without self-reported language problems. None of
the language difficulties were detected by a standard aphasia test.
Authors:
Shizuo Hiki, Graduate School of Human Sciences, Waseda University (Japan)
Kazuya Imaizumi, Graduate School of Human Sciences, Waseda University (Japan)
Yumiko Fukuda, Research Institute, National Rehabilitation Center for the Disabled (Japan)
Page (NA) Paper number 1091
Abstract:
Resolution of the fundamental frequency of speech sound required for
the design of a speech processor of a cochlear implant device is investigated,
with special regard to transmitting voice pitch information in Asian
languages. Clinical application of the cochlear implant has spread
rapidly in recent years to Asian countries where a variety of languages
having different voice pitch information from English and other European
languages are spoken. The perceptually acceptable area and required
resolution of duration and fundamental frequency is estimated on a
two-dimensional chart consisting of logarithmic time and frequency
scales, based on the typical voice pitch contours of Japanese word
accent and Chinese syllabic tone. As a result, it is shown that much
finer quantizing and time sampling for the change in fundamental frequency
is required compared with sentence intonation and emphasis common to
other languages. It is also shown that the amount of information conveyed
by combined use of lipreading with a cochlear implant is not sufficient
for supplementing the voice pitch information. A possible way of transmitting
such voice pitch information by transmission of the waveform of speech
sound directly to the auditory area of cortex, where the waveform is
reconstructed and voice pitch is extracted, is discussed.
Authors:
Aileen K. Ho, Psychology Dept, Monash University (Australia)
John L. Bradshaw, Psychology Dept, Monash University (Australia)
Robert Iansek, Geriatric Research Unit, Kingston Centre (Australia)
Robin J. Alfredson, Dept of Mechanical Engineering, Monash University (Australia)
Page (NA) Paper number 11
Abstract:
Past studies on Parkinsonian speech have generally examined the parameters
of speech separately. Thus volume and suprasegmental duration have
largely been described independently of each other on the assumption
that two measures are not related. This assumption was tested by manipulating
intensity and examining the corresponding effect on duration. Twelve
Parkinson's disease (PD) patients and twelve normal healthy controls
read according to three conditions; as softly as possible, as loudly
as possible, and with no volume instruction (at normal volume). Total
Duration of reading (with pauses), and Net Duration (without pauses)
were examined. For Net Duration, both groups were similar, and did
not vary across volume conditions. PD patients, however, demonstrated
decreased Total Duration as speech volume was increased. The abnormal
Parkinsonian relationship is suggestive of a trade-off between the
two parameters in order to achieve adequately loud reading, and may
be explained by increased attention associated with increased effort
when speaking louder.
Authors:
Cheol-Woo Jo, Changwon National University (Korea)
Dae-Hyun Kim, Changwon National University (Korea)
Page (NA) Paper number 691
Abstract:
In this paper a method to analyze pathological speech signal using
wavelet transform is suggested. Pathological speech signal is commercially
available pathological voice database and analyzed by the suggested
method. Normal speech signal is also from the same database and analyzed
as well. Then the results are compared to find the differences between
normal and pathological speech. Three level wavelet transform is used.
Normalized energy ratios between the levels and normalized peak-to-peak
values are used as parameters. As a result, it was possible to distinguish
between normal and pathological speech signal.
Authors:
Shigeyoshi Kitazawa, Department of Computer Science, Faculty of Information, Shizuoka University (Japan)
Hiroyuki Kirihata, Department of Computer Science, Faculty of Information, Shizuoka University (Japan)
Tatsuya Kitamura, Department of Computer Science, Faculty of Information, Shizuoka University (Japan)
Page (NA) Paper number 1036
Abstract:
This paper describes a speech coding strategy for a cochlear implant
system assuming a Nucleus Cochlear Implant receiver stimulator. Speech
processor converts input speech into a series of stimulation electrode
position and stimulation current intensities. This process can be
optimized with a decomposing process of an acoustic signal into a given
set of impulse responses corresponding to a set of electrode channels.
An error minimization algorithm can find a optimal stimulation sequence
that minimizes distortion of transferred speech and maximize transferred
phonological information as well as sound qualities. Re-synthesized
sound quality was qualitatively evaluated. Environmental sound can
also be recognizable with this method.
Authors:
Eva Agelfors, CTT/TMH, KTH (Sweden)
Jonas Beskow, CTT/TMH, KTH (Sweden)
Martin Dahlquist, CTT/TMH, KTH (Sweden)
Björn Granström, CTT/TMH, KTH (Sweden)
Magnus Lundeberg, CTT/TMH, KTH (Sweden)
Karl-Erik Spens, CTT/TMH, KTH (Sweden)
Tobias Öhman, CTT/TMH, KTH (Sweden)
Page (NA) Paper number 362
Abstract:
In the Teleface project the possibility to use a synthetic face as
a visual telephone communication aid for hearing impaired persons is
evaluated. In an earlier study, NH, a group of normal hearing persons
participated. This paper describes the results of two multimodal intelligibility
tests with hearing impaired persons, where the additional information
provided by a synthetic as well as a natural face is evaluated. In
a first round with hearing impaired persons, HI:1, twelve subjects
were presented with VCV-syllables and "everyday sentences" together
with a questionnaire. The intelligibility score for the VCV-syllables
presented as audio alone, was 30%. When adding a synthetic face the
score improved to 55% and when instead adding the natural face it was
58%. In a second round, HI:2, fifteen hearing impaired persons were
presented with the sentence material and a questionnaire. The audio
track was filtered to simulate telephone bandwidth. The intelligibility
score for the audio only condition was 57% correctly identified keywords.
Together with a synthetic face it was 66% and with a natural face 83%.
Answers in the questionnaires were collected and analysed. The general
subjective rating of the synthetic face was positive and the subjects
would like to use such a type of aid if available.
Authors:
Lois Martin, La Trobe University (Australia)
John Bench, La Trobe University (Australia)
Page (NA) Paper number 1006
Abstract:
The ability to understand speech in 27 hearing impaired children was
assessed using the BKB/A Picture Related Sentence test for children.
The mean sentence score for the group was 72% (range 100-18%). Language
scores (CELF-R) and Verbal Scale IQ (WISC-R) scores were significantly
below the norm (72.8 and 89.2 respectively). Performance Scale IQ
scores were slightly above the norm (106.3). Sentences scores were
correlated significantly with language score (r = 0.49). Further investigation
showed that the predictability of language scores could be improved
when sensation level was taken into account. Sensation level was negatively
correlated with language scores (r = - 0.51), demonstrating that children
with better language abilities perceived speech at relatively lower
intensity levels. The observed sensation levels from the group were
compared with the expected levels for normally hearing children. This
difference measure yielded a correlation coefficient of - 0.73 with
language scores.
Authors:
Oleg P. Skljarov, Research Institute of Ear, Throat, Nose and Speech (Russia)
Page (NA) Paper number 1149
Abstract:
Trying to understand the experimental data on segmentation of a speech
signal by a principle "Voice/Unvoice" has led us to the hypothesis
about a pair of logistical dependence between durations of these segments.
The segmentation was carried out with the help of the computer program
working in quasi real time. The hypothesis about logistic recurrent
dependence for sequence of segments durations has allowed to make a
conclusion about quasi rhythmical organization of this sequence. With
the help of offered recurrent dependencies it is possible to explain
statistical peculiarities of speech behaviour of stutterers in comparison
with normal speech behaviour. These logistic dependencies were confirmed
by direct experimental data. The assumption of origins of specified
rhythm is made. These origins are hidden at the level of control of
speech production and perception. Is shown, that the chaotic nature
of offered dynamics of formation of large-scale temporary structure
allows to enter concept of the information into consideration by a
natural way.
Authors:
Ali-Asghar Soltani-Farani, University of Surrey (U.K.)
Edward H.S. Chilton, University of Surrey (U.K.)
Robin Shirley, University of Surrey (U.K.)
Page (NA) Paper number 438
Abstract:
Visual perception of speech through spectrogram reading has long been
a subject of research, as an aid for the deaf or hearing impaired.
Attributing the lack of success in this type of visual aid mainly to
the static form of information presented by the spectrograms, this
paper proposes a system of dynamic visualisation for speech sounds.
This system samples a high resolved, auditory-based spectrogram, with
a window of 20 milliseconds duration, so that exploiting the periodicity
of the input sound, it produces a phase-locked sequence of images.
This sequence is then animated at a rate of 50 images per second to
produce a movie-like image displaying both the time-varying and time-independent
information of the underlying sound. Results of several preliminary
experiments for evaluation of the potential usefulness of the system
for the deaf, undertaken by normal-hearing subjects, support the quick
learning and persistence of the gestures for small sets of single words
and motivate further investigations.
Authors:
Rosemary A. Varley, University of Sheffield (U.K.)
Sandra P. Whiteside, University of Sheffield (U.K.)
Page (NA) Paper number 151
Abstract:
Contemporary psycholinguistic models suggest that there may be dual
routes operating in phonetic encoding: a direct route which uses stored
syllabic units, and an indirect route which relies on the on-line assembly
of sub-syllabic units. The more computationally efficient direct route
is more likely to be used for high frequency words, while the indirect
route is most likely to be used for novel or low frequency words. We
suggest that the acquired neurological disorder of apraxia of speech
(AOS), provides a window to speech encoding mechanisms and that the
disorder represents an impairment of direct route encoding mechanisms
and, therefore, a reliance on indirect mechanisms. We report an investigation
of the production of high and low frequency words across three subject
groups: non-brain damaged control (NBDC, N=3); brain damaged control
(BDC, N=3) and speakers with AOS (N=4). The results are presented and
discussed within the dual-route phonetic encoding hypothesis.
Authors:
M.F. Cheesman, University of Western Ontario (Canada)
K.L. Smilsky, University of Western Ontario (Canada)
T.M. Major, University of Western Ontario (Canada)
F. Lewis, University of Western Ontario (Canada)
L.M. Boorman, University of Western Ontario (Canada)
Page (NA) Paper number 841
Abstract:
A sample of 209 adults ranging from 20 to 79 years of age were studied
to measure speech communication profiles as a function of age in persons
who did not identify themselves as hearing impaired. The study was
conducted in order to evaluate age-related speech perception abilities
and communication profiles in a population who do not esent for hearing
assessment and who are not included in census statistics as having
hearing problems. Audiometric assessment, demographic and hearing
history self-reports, speech reception thresholds, consonant discrimination
perception in quiet and noi , and the Communication Profile for the
Hearing Impaired (CPHI) were the instruments used to develop speech
communication profiles. Hearing performance decreased with increased
age. However, despite self-reports of no hearing impairment, many subjects
ov age 50 had audiometric thresholds that indicated hearing impairment.
The responses to the CPHI were correlated to audiometric thresholds,
but also to the age of the respondent, when hearing thresholds had
been controlled statistically. A comparison of C I responses from
this study and that of two other samples in clinical populations revealed
only slightly different patterns of behaviour in the present sample
when confronted with communication difficulties.
|