摘要簡介:
With the advance of modern high-speed computers, now we can try
computation intensive approaches that were deemed too inefficient for
practical problems. These approaches include adaptive learning
systems such as artificial neural networks and adaptive networks,
and innovative optimization techniques such as genetic algorithms (GA)
and simulated annealing.
These approaches, together with fuzzy set theory as a knowledge
representation tool, form the constituents of the so-called soft
computing that has been used for real-world problems such as
character recognition, color recipe prediction and adaptive control.
This project applies the aforementioned soft computing techniques
to a challenging real-world problem: automatic speaker recognition
(ASR). Given a speech input, the objective of ASR is to output
the identity of the person most likely to have spoken.
One application of ASR is to enhance human-machine interface.
For instance, voice activated computer should be programmed to adapt
and respond to the current user. Security applications of ASR
are plenty, for instance, security check when entering a building
or accessing a bank account. Moreover, ASR has the convenience of
easy data collection over the telephone.
This project emphasizes on both research and software/hardware
implementation.
ASR is a difficult problem in pattern recognition. It involves
typically a huge amount of data and we need to apply digital signal
processing techniques to down-size the data dimension and
extract relevant features for further processing of data
classification or discriminant analysis. For such a difficult
problem, a single approach is usually not enough and we need a
collection of various methodologies to complement each other to
accomplish the task.
For research part, we will tackle ASR with both soft-computing
techniques and conventional statistical pattern recognition.
We have been working on neuro-fuzzy and soft-computing techniques
for several years and the applications include
time series prediction, data classification, nonlinear system
identification, noise cancellation, channel equalization,
adaptive control, printed character recognition,
and inverse kinematics problems.
We shall apply the soft-computing techniques (neural networks, fuzzy
logic, adaptive neuro-fuzzy systems, genetic algorithms and simulated
annealing) we gained over years to ASR, and complement it with
conventional statistical pattern recognition such as Baysian
approach.
For software implementation, our primary tools are MATLAB and C.
MATLAB is an integrating environment for scientific computation and
data visualization tool. We have positive experiences using MATLAB
to deliver GUI-based fuzzy product [], and we expect to have GUI based
demo as the product of this project. For computation-intensive and
non-vectorizable operation, we will resort to C language for high
speed.
For hardware implementation, our goal is to set up a hardware system
using a Pentium PC and dSPACE 1102 controller board to take audio
signal from a speaker, do FFT and feature extraction, feed the
features to a trained classifier, and return the identity of the
speaker on the fly. The whole process is time consuming; it is
virtually impossible to do on-line identification without hardware
support.
To sum up, this project is well balanced in terms of research and
implementation. We will benefit from the research of using soft
computing and statistical approaches for speaker recognition; this
paves the avenue to a more difficult problem of speech recognition.
The hardware implementation can prove its feasibility and provide
a demonstration for further exploration and possible commercialization.