Dorra Masmoudi, University of Bordeaux (France)
Dominique Dallet, University of Bordeaux (France)
Jean Paul Dom, University of Bordeaux (France)
This paper carried out a new design of head sized sensor arrays with a simple delay-and-sum beamforming which provides useful amounts of directivity index with sufficient robustness to errors. A frequency-independant sidelobe reduction is proposed to achieve optimal frequency characteristics. In order to obtain this control, a principle of combining multiple level of array structures is established. Results are presented for spherically isotropic noise. It is found that good performance can be obtained for a head sized array by combining multiple level structures with simple delay and sum beamformer.
Todd Schneider, Unitron (Canada)
Robert Brennan, Unitron (Canada)
Multi-channel compression schemes are a practical method of mapping the wide dynamic range of speech signals into the reduced dynamic range of hearing impaired listeners. These systems address two of the shortcomings of single-channel compression systems: (1) the reduction of gain as a result of narrow-band non-speech stimuli and (2) the reduction of gain that often occurs when high-frequency speech components are followed by intense low-frequency speech components. They also provide frequency-dependent compression ratios that are needed by many newer supra-threshold fitting strategies (e.g., DSL I/O). This paper presents a multichannel compression scheme that employs an oversampled, polyphase DFT filterbank. In each compressor channel, the gain is controlled by an adjustable combination of a overall, dual time-constant input signal level and the individual channel signal level that is measured with a short time-constant RMS detector. Informal listening tests have demonstrated that the design has very good audio quality and performs well in real-world listening situations. The design is suited for low-power, real-time operation.
Paul Shields, University of Paisley (U.K.)
Douglas R. Campbell, University of Paisley (U.K.)
A system for the binaural pre-processing of speech signals for input to a standard linear hearing aid has been proposed. The work is based on that of Toner & Campbell which applied the Least Mean Squares (LMS) algorithm in sub-bands to speech signals from various acoustic environments and signal to noise ratios (SNR). The method attempts to take advantage of the multiple inputs to perform noise cancellation. The use of sub-bands enables a diverse processing mechanism to be employed, where the wide-band signal is split into smaller sub-bands, which can subsequently be processed according to their signal characteristics. The results of a series of intelligibility tests are presented from experiments in which acoustic speech and noise data, generated in a simulated room was tested on normal hearing volunteers.
Kenzo Itoh, NTT HI Labs. (Japan)
Masahide Mizushima, NTT HI Labs. (Japan)
We proposed a vary practical and useful noise reduction system that has wide application for hearing impaired persons, such as a sound-gathering system at a lecture hall or conference room. The system uses two basic technologies, a speech/non-speech identification process and a new noise reduction process. A speech/non-speech identification process uses four characteristics of the time and frequency domains of the input signal. In the noise reduction process, frequency weighting function is used for basic spectral subtraction and loss control algorithm. Various kinds of environmental noise were reduced by this system, which showed excellent performance. Noise is further reduced by using a multi-microphone system as an acoustic noise suppressor. The results of intelligibility tests using persons with hearing loss show excellent noise reduction.
Russell Lambert, TRW (U.S.A.)
Anthony Bell, Salk Institute (U.S.A.)
We relate information theoretic blind learning methods (infomax) and Bussgang blind equalization methods. The multipath extension of blind source separation methods can be seen in the frequency domain using FIR matrix algebra (matrices of finite impulse response filters). Three forms of Bussgang algorithms are given. The blind serial update method of Cardoso and Laheld is related to the infomax objective of Bell and Sejnowski. The application emphasis is on speech separation. We demonstrate the robustness and power of the new techniques by blindly separating speech signals recorded in a multipath environment.
Fernando De Bernardinis, Dip. Ing. Informazione, Univ. Pisa (Italy)
Roberto Roncella, Dip. Ing. Informazione, Univ. Pisa (Italy)
Roberto Saletti, Dip. Ing. Informazione, Univ. Pisa (Italy)
Pierangelo Terreni, Dip. Ing. Informazione, Univ. Pisa (Italy)
Graziano Bertini, IEI-CNR, Pisa (Italy)
This paper presents a new hardware implementation of additive synthesis for high quality musical sound generation. The single-chip configuration is capable of performing 1,200 sinusoid real-time synthesis; the system is expandable to 13,200 partials by series connecting 11 chips. Each sinusoid is generated by a marginally stable second order IIR filter, and its frequency, amplitude and phase can be independently specified. The system is clocked at 60 MHz when working with a 44.1 kHz sampling rate. Two completely independent channels are available as output, and each sample relies on a 20 bit representation to achieve an SNR of at least 110 dB, thanks to the internal 24 bit word length. The IC is designed in a 0.5 (mu)m CMOS technology and has a core area of approximately 19 mm^2
Carlo Drioli, University of Padova (Italy)
Davide Rocchesso, University of Padova (Italy)
A musical-tone generator based on physical modeling of the sound production mechanisms is presented. To the purpose of making this scheme general for a wide class of musical instruments, the nonlinear part of the tone-generator is modeled by a neural network. The system learns its parameters and the nonlinearity shape by means of nonlinear identification procedures based on waveform or spectral matching. Two possible applications of this model are discussed: sound compression can be obtained when considering the system as a nonlinear predictor, while sound synthesis can be obtained by adding control inputs to the network and by training the system to respond as desired.
Michael Macon, Oregon Graduate Institute (U.S.A.)
Leslie Jensen-Link, Momentum Data Systems (U.S.A.)
James Oliverio, Georgia Institute of Technology (U.S.A.)
Mark A. Clements, Georgia Institute of Technology (U.S.A.)
E. Bryan George, Texas Instruments, Dallas (U.S.A.)
Although sinusoidal models have been demonstrated to be capable of high-quality musical instrument synthesis speech modification, and speech synthesis, little exploration of the application of these models to the synthesis of singing voice has been undertaken. In this paper, we propose a system framework similar to that employed in concatenation-based text-to-speech synthesizers, and describe its extension to the synthesis of singing voice. The power and flexibility of the sinusoidal model used in the waveform synthesis portion of the system enables high-quality, computationally-efficient synthesis and the incorporation of musical qualities such as vibrato and spectral tilt variation. Modeling of segmental phonetic characteristics is achieved by employing a "unit selection" procedure that selects sinusoidally-modeled segments from an inventory of singing voice data collected from a human vocalist. The system, called LYRICOS, is capable of synthesizing very natural-sounding singing that maintains the characteristics and perceived identity of the analyzed vocalist.
Khaled N. Hamdy, University of Minnesota (U.S.A.)
Ahmed H. Tewfik, University of Minnesota (U.S.A.)
Satoshi Takagi, Sony Corporation (Japan)
Ting Chen, Stanford University (U.S.A.)
We propose a new time-scale modification method for high quality audio signals. Our approach strives to preserve pitch and timbre. In our method, the signal is represented as the sum of sinusoidal components and a residual (edges and noise). The decomposition is computed via a combined harmonic and wavelet representation. Time-scaling is performed on the harmonic components and residual components separately. The harmonic portion is time-scaled by demodulating each harmonic component to DC, interpolating and decimating the DC signal, and remodulating each component back to its original frequency. The residual portion is time-scaled by preserving edges and relative distances between the edges while time-scaling the stationary (noise) components between the edges.
Erhard Rank, Vienna University of Technology (Austria)
Gernot Kubin, Vienna University of Technology (Austria)
Starting from the waveguide model for plucked strings, a new digital signal processing model for the slapping technique on electric bassguitars is derived. The model includes amplitude limitations for the string at the frets and/or the fingerboard. These highly nonlinear elements are realized by conditional reflections which depend on the local string displacement. A model of the string dynamics for the two slapbass techniques - knocking the string with the thumb knuckle and plucking very strong with the index or middle finger - has been implemented both as MATLAB and C simulations and synthesizes sounds close to the natural instrument.
Shao-Po Wu, Stanford University (U.S.A.)
William Putnam, Stanford University (U.S.A.)
This paper addresses the problem of designing finite impulse response filters which optimally approximate desired frequency responses in the sense that they minimize a perceptual audio spectral measure. This measure is based on a simplified auditory model similar to those used in the area of perceptual audio quality measurement. It is shown that this problem can be cast as a logarithmic Chebychev approximation problem, which can be solved efficiently using recent interior point methods.
Xiaoshu Qian, URI (U.S.A.)
Yinong Ding, TI (U.S.A.)
This paper presents a least square quadratic phase interpolation algorithm for sinusoidal model based music synthesis. This algorithm uses two additions with one parameter per data frame to generate the phase samples of a component sine wave. Compared with the cubic phase interpolation algorithm proposed by McAulay and Quatieri, the proposed algorithm is more efficient in terms of computational complexity and parameter storage. In the meantime, it also produces smoother frequency tracks. Unlike the existing quadratic phase interpolation algorithm, where the phase measurements are totally ignored ("magnitude-only"), the proposed algorithm interpolates phase in a least square sense from both the phase and the frequency measurements at data frame boundaries. Thus the resulting phase samples are approximately "locked" to the measured ones. Informal listening tests on various musical instrument tones indicate that the proposed algorithm clearly outperforms the magnitude-only synthesis approach and is qualitatively comparable to the cubic one.
Stéphan Tassart, IRCAM (France)
Philippe Depalle, IRCAM (France)
We propose in this paper a new point of view which unifies two well known filter families for approximating ideal fractional delay filters: Lagrange Interpolator Filters (LIF) and Thiran Allpass Filters. We achieve this unification by approximating the ideal Fourier transform of the fractional delay according to two different Padé approximations: series expansions and continued fraction expansions, and by proving that both approximations correspond exactly either to the LIF family or to the allpass delay filters family. This leads to an efficient modular implementation of LIFs.
Lauri Savioja, Helsinki University of Technology (Finland)
Vesa Välimäki, Helsinki University of Technology (Finland)
The digital waveguide mesh is an extension of the one-dimensional digital waveguide technique. Waveguide meshes are used for simulation of two- and three-dimensional wave propagation in musical instruments and acoustic spaces. The original waveguide mesh algorithm suffers from direction-dependent dispersion. In this paper we show that this problem may be reduced by using an interpolated rectilinear mesh. In the analysis part we show the analytical solution for the wave propagation speed and numerical simulations of the magnitude response and phase speed in both the original and the interpolated two-dimensional waveguide mesh algorithms. We demonstrate by simulation that the wave propagation characteristics of the proposed interpolated waveguide mesh are independent of direction and thus the remaining errors caused by dispersion may be corrected with a postprocessor.