R. Douglas Sharp, AT&T Laboratories (U.S.A.)
Enrico Bocchieri, AT&T Laboratories (U.S.A.)
Cecilia Castillo, AT&T Laboratories (U.S.A.)
S. Parthasarathy, AT&T Laboratories (U.S.A.)
Chris Rath, AT&T Laboratories (U.S.A.)
Michael Riley, AT&T Laboratories (U.S.A.)
James Rowland, AT&T Laboratories (U.S.A.)
In 1995, AT&T Research (then within Bell Labs) began work on a software-only automated speech recognition system named Watson. The goal was ambitious; Watson was to serve as a single code base supporting applications ranging from PC-desktop command and control through to scaleable telephony interactive voice services. Furthermore, the software was to be the new code base for the research group, allowing fast deployment of new algorithmic advances from the lab into the field. A set of C++ objects has been developed which support these objectives. This paper gives an overview of the Watson Automatic Speech Recognizer software architecture, describes the algorithms employed, and provides performance numbers for some sample tasks.
Johel Miteran, University ofBurgundy. (France)
Remy Bailly, University ofBurgundy. (France)
Patrick Gorria, University ofBurgundy. (France)
We present in this paper the realization of a classification board, for real-time image segmentation. The classification of each pixel is completed using a real time extraction of attributs and a geometric classification method by stress polytope training, which ensures a high decision speed (100 ns per pixels) and good performances. The decision operator has been integrated in the form of a full custom circuit, and the extraction of parameters is performed using a single high density FPGA.
Raymond C. Vasko, University of Pittsburgh (U.S.A.)
Amro El-Jaroudi, University of Pittsburgh (U.S.A.)
J. Robert Boston, University of Pittsburgh (U.S.A.)
At ICASSP '96, we presented an algorithm that estimates the topology of a hidden Markov model (HMM) given a set of time series data. The algorithm iteratively prunes state transitions from a large general HMM topology and selects a topology based on a likelihood criterion and a heuristic evaluation of complexity. In this paper, we apply the algorithm to estimate the dynamic structure of human body motion data from a repetitive lifting task. The estimated topology for low back pain patients was different from the topology for a control subject group. The body motions of patients tend not to change over the task, but the body motions of control subjects change systematically.
Jacob Griesbach, University of Colorado (U.S.A.)
Julie Wiejaczka, University of Colorado (U.S.A.)
Radu Frangopol, University of Colorado (U.S.A.)
Fransiska Harsono, University of Colorado (U.S.A.)
Delores Etter, University of Colorado (U.S.A.)
The Puzzle Project is an interactive software system that solves jigsaw puzzles. The voice interface includes speech synthesis and word recognition. The attributes of the puzzle pieces are determined using image processing techniques and wavelet decomposition. Two algorithms are used to solve the puzzles: an expert system and fuzzy logic. This paper describes the steps required to find the solution to the puzzle from image processing to decision-making algorithms. It also explains the techniques involved in designing the voice interface.
Nagendra Kumar, JHU (U.S.A.)
Wolfgang Himmelbauer, JHU (U.S.A.)
Gert Cauwenberghs, JHU (U.S.A.)
Andreas Andreou, JHU (U.S.A.)
We have developed a low power analog VLSI chip for real time signal processing motivated by the principles of human auditory system. A analog cochlear filter-bank (which is implemented on the chip) decomposes the input audio signal into several frequency bands that have almost equal bandwidth on a log scale. This step is thus similar to computing the wavelet transform. The chip then computes signal energies and zero crossing time intervals of frequency components in a cochlear filter bank. The chip is intended to work as a front-end of a speech recognition system. We include experimental results on a VLSI implementation of the auditory front-end. We present speech recognition result on the TI-DIGITS database obtained from computer simulations which model the functionality of the feature extraction VLSI hardware. We use Hidden Markov Models (HMM) in combination with Linear Discriminant Analysis (LDA) for the recognizer design.
Matthias H. Weiss, Technical University of Dresden (Germany)
Ulrich Walther, Technical University of Dresden (Germany)
Gerhard P. Fettweis, Technical University of Dresden (Germany)
Recent performance enhanced DSP (Digital Signal Proces sor) architectures incorporate either datapath add-ons such as dual-MAC architectures or tailored datapaths such as Viterbi accelerators. Both strategies strongly influence the instruction set architecture (ISA). Since common ISAs are not designed for architectural enhancements, either a com plete redesign is required or architectural enhancements cannot be fully exploited by the ISA. Taking the GSM Fullrate Vocoder in this paper a structural approach is presented to how datapath add-ons or tailorizations can be applied to increase DSP`s performance. To efficiently utilize architectural enhancements we propose a modified VLIW (very long instruction word) ISA, called TVLIW (tagged VLIW). TVLIW combines both VLIW performance and DSP codewidth requirements. To demon strate the applicability, we applied the TVLIW ISA to a highly pipelined quadruple-MAC architecture, incorporating only one dualport RAM and a 26-bit wide instruction word.
Amit Dutta, Oregon State University (U.S.A.)
Sayfe Kiaei, Oregon State University (U.S.A.)
Fading is a critical issue for the next generation Digital Cellular System using DS-CDMA. The problem of reducing bit error rate (BER) in presence of multipath fading is addressed. A new method is proposed based on adaptive Near-Far resistant demodulation techniques. It can be modified to eliminate the detrimental effect of fading in presence of power control. In addition this method will drastically reduce hardware complexity and increase cell capacity for Digital Cellular System.
Kenneth J. Turner, SPRC, Queensland University of Technology (Australia)
Farhan A. Faruqi, SPRC, Queensland University of Technology (Australia)
The problem of phase ambiguity resolution and filtering for interferometric GPS attitude determination is considered. Traditionally, the resolution of the phase ambiguity and the filtering stages were performed separately, with the filter formulated on the basis that the phase ambiguity is correctly resolved. Should the pre-processing stage not resolve the ambiguity correctly, erroneous results may occur. In response, a unified solution is proposed in which the ambiguity resolution and filtering processes are combined under a Gaussian Sum Filtering (GSF) framework. The GSF naturally accounts for the measurement ambiguity by generating multi-modal probability densities, which leads to a probabilistic interpretation of the attitude estimates. Simulations are performed to illustrate the effectiveness and functionality of the proposed solution.
Fabien Claveau, INO/NOI (Canada)
Michel Poirier, INO/NOI (Canada)
Denis Gingras, INO/NOI (Canada)
The National Optics Institute has recently developed an optical velocimeter composed of two parallel laser beams for measuring perpendicularly the speed and the length of vehicles. The system must be capable of measuring speeds varying from 0 to 150 km/h in both directions with an accuracy of 1%. This paper focuses on the algorithms and signal processing aspects of the system. The speed is measured by estimating the time delay using an FFT-based cross- covariance method between the signals generated by the optical velocimeter. The length is estimated using the speed and the time window corresponding to the entire vehicle. The measurement algorithms have been implemented to run in real time on a C31 DSP and a 486 processor.
William Phillips, University of Maryland (U.S.A.)
Rama Chellappa, University of Maryland (U.S.A.)
Constant False Alarm Rate (CFAR) detection in Synthetic Aperture Radar (SAR) is the first step in most ATR and image exploitation systems. In this paper several CFAR algorithms and their implementation on a 1-D SIMD array processor are investigated. We primarily focus on CFAR algorithms using the Weibull clutter model, but algorithms assuming K-distributed clutter should have similar implementations and runtimes. We show that high resolution SAR requires reference windows much larger than those used in traditional search radars, which permits fast moment based estimation instead of the computationally intensive maximum likelihood parameter estimates. We also extend a fast median filtering algorithm to the order statistic and censored CFAR algorithms. The running times of the CFAR algorithms are listed along with detection results using SAR imagery from the Northrop-Grumman TESAR sensor onboard the Predator unmanned aerial vehicle.
Michael Petronino, University of Massachusetts (U.S.A.)
Ray Bambha, University of Massachusetts (U.S.A.)
James Carswell, University of Massachusetts (U.S.A.)
Wayne P. Burleson, University of Massachusetts (U.S.A.)
We describe a 95 GHz radar for an unmanned aerial vehicle (UAV). The radar measures vertical profiles of the reflectivity and doppler velocity of clouds, which are then telemetered to the ground for storage. Telemetry bandwidth requires that substantial real-time data processing be done on the UAV in a low-power (less than 100 watts) and small size (less than 1 cubic foot) system. A prototype was developed in less than a year, thus a flexible programmable technology was required. Although typical remote sensing radars use DSP chips, it was determined that our power, size, performance and design-time requirements were best met using FPGA technology. Our system is based on the Giga-Ops Spectrum system which uses Xilinx FPGAs on a novel modular PCI board. Unlike numerous recent FPGA-based signal processors, this presents a new class of applications and embedded system requirements. Reconfigurable capabilities are currently being explored to support radar algorithms which can adapt to a changing environment.