SP Applications

Home

The Watson Speech Recognition Engine

Authors:

R. Douglas Sharp, AT&T Laboratories (U.S.A.)
Enrico Bocchieri, AT&T Laboratories (U.S.A.)
Cecilia Castillo, AT&T Laboratories (U.S.A.)
S. Parthasarathy, AT&T Laboratories (U.S.A.)
Chris Rath, AT&T Laboratories (U.S.A.)
Michael Riley, AT&T Laboratories (U.S.A.)
James Rowland, AT&T Laboratories (U.S.A.)

Volume 5, Page 4065

Abstract:

In 1995, AT&T Research (then within Bell Labs) began work on a software-only automated speech recognition system named Watson. The goal was ambitious; Watson was to serve as a single code base supporting applications ranging from PC-desktop command and control through to scaleable telephony interactive voice services. Furthermore, the software was to be the new code base for the research group, allowing fast deployment of new algorithmic advances from the lab into the field. A set of C++ objects has been developed which support these objectives. This paper gives an overview of the Watson Automatic Speech Recognizer software architecture, describes the algorithms employed, and provides performance numbers for some sample tasks.

ic974065.pdf

TOP

Classification board for real time image segmentation

Authors:

Johel Miteran, University ofBurgundy. (France)
Remy Bailly, University ofBurgundy. (France)
Patrick Gorria, University ofBurgundy. (France)

Volume 5, Page 4069

Abstract:

We present in this paper the realization of a classification board, for real-time image segmentation. The classification of each pixel is completed using a real time extraction of attributs and a geometric classification method by stress polytope training, which ensures a high decision speed (100 ns per pixels) and good performances. The decision operator has been integrated in the form of a full custom circuit, and the extraction of parameters is performed using a single high density FPGA.

ic974069.pdf

TOP

Application of Hidden Markov Model Topology Estimation to Repetitive Lifting Data

Authors:

Raymond C. Vasko, University of Pittsburgh (U.S.A.)
Amro El-Jaroudi, University of Pittsburgh (U.S.A.)
J. Robert Boston, University of Pittsburgh (U.S.A.)

Volume 5, Page 4073

Abstract:

At ICASSP '96, we presented an algorithm that estimates the topology of a hidden Markov model (HMM) given a set of time series data. The algorithm iteratively prunes state transitions from a large general HMM topology and selects a topology based on a likelihood criterion and a heuristic evaluation of complexity. In this paper, we apply the algorithm to estimate the dynamic structure of human body motion data from a repetitive lifting task. The estimated topology for low back pain patients was different from the topology for a control subject group. The body motions of patients tend not to change over the task, but the body motions of control subjects change systematically.

ic974073.pdf

TOP

The Puzzle Project: A Case Study in Multimedia Signal Processing

Authors:

Jacob Griesbach, University of Colorado (U.S.A.)
Julie Wiejaczka, University of Colorado (U.S.A.)
Radu Frangopol, University of Colorado (U.S.A.)
Fransiska Harsono, University of Colorado (U.S.A.)
Delores Etter, University of Colorado (U.S.A.)

Volume 5, Page 4077

Abstract:

The Puzzle Project is an interactive software system that solves jigsaw puzzles. The voice interface includes speech synthesis and word recognition. The attributes of the puzzle pieces are determined using image processing techniques and wavelet decomposition. Two algorithms are used to solve the puzzles: an expert system and fuzzy logic. This paper describes the steps required to find the solution to the puzzle from image processing to decision-making algorithms. It also explains the techniques involved in designing the voice interface.

ic974077.pdf

TOP

An Analog VLSI Architecture for Auditory-Based Feature Extraction

Authors:

Nagendra Kumar, JHU (U.S.A.)
Wolfgang Himmelbauer, JHU (U.S.A.)
Gert Cauwenberghs, JHU (U.S.A.)
Andreas Andreou, JHU (U.S.A.)

Volume 5, Page 4081

Abstract:

We have developed a low power analog VLSI chip for real time signal processing motivated by the principles of human auditory system. A analog cochlear filter-bank (which is implemented on the chip) decomposes the input audio signal into several frequency bands that have almost equal bandwidth on a log scale. This step is thus similar to computing the wavelet transform. The chip then computes signal energies and zero crossing time intervals of frequency components in a cochlear filter bank. The chip is intended to work as a front-end of a speech recognition system. We include experimental results on a VLSI implementation of the auditory front-end. We present speech recognition result on the TI-DIGITS database obtained from computer simulations which model the functionality of the feature extraction VLSI hardware. We use Hidden Markov Models (HMM) in combination with Linear Discriminant Analysis (LDA) for the recognizer design.

ic974081.pdf

TOP

A Structural Approach for Designing Performance Enhanced Signal Processors: A 1-MIPS GSM Fullrate Vocoder Case Study

Authors:

Matthias H. Weiss, Technical University of Dresden (Germany)
Ulrich Walther, Technical University of Dresden (Germany)
Gerhard P. Fettweis, Technical University of Dresden (Germany)

Volume 5, Page 4085

Abstract:

Recent performance enhanced DSP (Digital Signal Proces sor) architectures incorporate either datapath add-ons such as dual-MAC architectures or tailored datapaths such as Viterbi accelerators. Both strategies strongly influence the instruction set architecture (ISA). Since common ISAs are not designed for architectural enhancements, either a com plete redesign is required or architectural enhancements cannot be fully exploited by the ISA. Taking the GSM Fullrate Vocoder in this paper a structural approach is presented to how datapath add-ons or tailorizations can be applied to increase DSP`s performance. To efficiently utilize architectural enhancements we propose a modified VLIW (very long instruction word) ISA, called TVLIW (tagged VLIW). TVLIW combines both VLIW performance and DSP codewidth requirements. To demon strate the applicability, we applied the TVLIW ISA to a highly pipelined quadruple-MAC architecture, incorporating only one dualport RAM and a 26-bit wide instruction word.

ic974085.pdf

TOP

Modified Adaptive Multi-user Detector for DS-CDMA Multipath Fading

Authors:

Amit Dutta, Oregon State University (U.S.A.)
Sayfe Kiaei, Oregon State University (U.S.A.)

Volume 5, Page 4089

Abstract:

Fading is a critical issue for the next generation Digital Cellular System using DS-CDMA. The problem of reducing bit error rate (BER) in presence of multipath fading is addressed. A new method is proposed based on adaptive Near-Far resistant demodulation techniques. It can be modified to eliminate the detrimental effect of fading in presence of power control. In addition this method will drastically reduce hardware complexity and increase cell capacity for Digital Cellular System.

ic974089.pdf

TOP

A Gaussian Sum Filtering Approach for Phase Ambiguity Resolution in GPS Attitude Determination

Authors:

Kenneth J. Turner, SPRC, Queensland University of Technology (Australia)
Farhan A. Faruqi, SPRC, Queensland University of Technology (Australia)

Volume 5, Page 4093

Abstract:

The problem of phase ambiguity resolution and filtering for interferometric GPS attitude determination is considered. Traditionally, the resolution of the phase ambiguity and the filtering stages were performed separately, with the filter formulated on the basis that the phase ambiguity is correctly resolved. Should the pre-processing stage not resolve the ambiguity correctly, erroneous results may occur. In response, a unified solution is proposed in which the ambiguity resolution and filtering processes are combined under a Gaussian Sum Filtering (GSF) framework. The GSF naturally accounts for the measurement ambiguity by generating multi-modal probability densities, which leads to a probabilistic interpretation of the attitude estimates. Simulations are performed to illustrate the effectiveness and functionality of the proposed solution.

ic974093.pdf

TOP

FFT-based cross-covariance processing of optical signals for speed and length measurement.

Authors:

Fabien Claveau, INO/NOI (Canada)
Michel Poirier, INO/NOI (Canada)
Denis Gingras, INO/NOI (Canada)

Volume 5, Page 4097

Abstract:

The National Optics Institute has recently developed an optical velocimeter composed of two parallel laser beams for measuring perpendicularly the speed and the length of vehicles. The system must be capable of measuring speeds varying from 0 to 150 km/h in both directions with an accuracy of 1%. This paper focuses on the algorithms and signal processing aspects of the system. The speed is measured by estimating the time delay using an FFT-based cross- covariance method between the signals generated by the optical velocimeter. The length is estimated using the speed and the time window corresponding to the entire vehicle. The measurement algorithms have been implemented to run in real time on a C31 DSP and a 486 processor.

ic974097.pdf

TOP

SAR Target Detection Algorithms on Linear SIMD Arrays

Authors:

William Phillips, University of Maryland (U.S.A.)
Rama Chellappa, University of Maryland (U.S.A.)

Volume 5, Page 4101

Abstract:

Constant False Alarm Rate (CFAR) detection in Synthetic Aperture Radar (SAR) is the first step in most ATR and image exploitation systems. In this paper several CFAR algorithms and their implementation on a 1-D SIMD array processor are investigated. We primarily focus on CFAR algorithms using the Weibull clutter model, but algorithms assuming K-distributed clutter should have similar implementations and runtimes. We show that high resolution SAR requires reference windows much larger than those used in traditional search radars, which permits fast moment based estimation instead of the computationally intensive maximum likelihood parameter estimates. We also extend a fast median filtering algorithm to the order statistic and censored CFAR algorithms. The running times of the CFAR algorithms are listed along with detection results using SAR imagery from the Northrop-Grumman TESAR sensor onboard the Predator unmanned aerial vehicle.

ic974101.pdf

TOP

An FPGA-based Data Acquisition System for a 95 GHz W-band Radar

Authors:

Michael Petronino, University of Massachusetts (U.S.A.)
Ray Bambha, University of Massachusetts (U.S.A.)
James Carswell, University of Massachusetts (U.S.A.)
Wayne P. Burleson, University of Massachusetts (U.S.A.)

Volume 5, Page 4105

Abstract:

We describe a 95 GHz radar for an unmanned aerial vehicle (UAV). The radar measures vertical profiles of the reflectivity and doppler velocity of clouds, which are then telemetered to the ground for storage. Telemetry bandwidth requires that substantial real-time data processing be done on the UAV in a low-power (less than 100 watts) and small size (less than 1 cubic foot) system. A prototype was developed in less than a year, thus a flexible programmable technology was required. Although typical remote sensing radars use DSP chips, it was determined that our power, size, performance and design-time requirements were best met using FPGA technology. Our system is based on the Giga-Ops Spectrum system which uses Xilinx FPGAs on a novel modular PCI board. Unlike numerous recent FPGA-based signal processors, this presents a new class of applications and embedded system requirements. Reconfigurable capabilities are currently being explored to support radar algorithms which can adapt to a changing environment.