中文名稱 英文名稱 主持人 補助單位 計畫編號 補助金額 開始日期 結束日期 語者辨認 Speaker Recognition 張智星 國科會 NSC 86-2213-E-007-048 1996/8/1 1997/7/31
- 英文名稱: Speaker Recognition
- 計畫編號: NSC 86-2213-E-007-048
- 主持人: 張智星
- 補助單位: 國科會
- 計畫執行期間: 1996/8/1 to 1997/7/31
- 關鍵詞: Speaker recognition, pattern recognition, neuro-fuzzy modeling, artificial neural networks, fuzzy logic, digital signal processing
With the advance of modern high-speed computers, now we can try computation intensive approaches that were deemed too inefficient for practical problems. These approaches include adaptive learning systems such as artificial neural networks and adaptive networks, and innovative optimization techniques such as genetic algorithms (GA) and simulated annealing. These approaches, together with fuzzy set theory as a knowledge representation tool, form the constituents of the so-called soft computing that has been used for real-world problems such as character recognition, color recipe prediction and adaptive control.
This project applies the aforementioned soft computing techniques to a challenging real-world problem: automatic speaker recognition (ASR). Given a speech input, the objective of ASR is to output the identity of the person most likely to have spoken. One application of ASR is to enhance human-machine interface. For instance, voice activated computer should be programmed to adapt and respond to the current user. Security applications of ASR are plenty, for instance, security check when entering a building or accessing a bank account. Moreover, ASR has the convenience of easy data collection over the telephone.
This project emphasizes on both research and software/hardware implementation. ASR is a difficult problem in pattern recognition. It involves typically a huge amount of data and we need to apply digital signal processing techniques to down-size the data dimension and extract relevant features for further processing of data classification or discriminant analysis. For such a difficult problem, a single approach is usually not enough and we need a collection of various methodologies to complement each other to accomplish the task.
For research part, we will tackle ASR with both soft-computing techniques and conventional statistical pattern recognition. We have been working on neuro-fuzzy and soft-computing techniques for several years and the applications include time series prediction, data classification, nonlinear system identification, noise cancellation, channel equalization, adaptive control, printed character recognition, and inverse kinematics problems. We shall apply the soft-computing techniques (neural networks, fuzzy logic, adaptive neuro-fuzzy systems, genetic algorithms and simulated annealing) we gained over years to ASR, and complement it with conventional statistical pattern recognition such as Baysian approach.
For software implementation, our primary tools are MATLAB and C. MATLAB is an integrating environment for scientific computation and data visualization tool. We have positive experiences using MATLAB to deliver GUI-based fuzzy product , and we expect to have GUI based demo as the product of this project. For computation-intensive and non-vectorizable operation, we will resort to C language for high speed.
For hardware implementation, our goal is to set up a hardware system using a Pentium PC and dSPACE 1102 controller board to take audio signal from a speaker, do FFT and feature extraction, feed the features to a trained classifier, and return the identity of the speaker on the fly. The whole process is time consuming; it is virtually impossible to do on-line identification without hardware support.
To sum up, this project is well balanced in terms of research and implementation. We will benefit from the research of using soft computing and statistical approaches for speaker recognition; this paves the avenue to a more difficult problem of speech recognition. The hardware implementation can prove its feasibility and provide a demonstration for further exploration and possible commercialization.