17-4 Digit Recognition: Changing Acoustic Models (あr恁GModel)

在前幾節的說明,我們是以一個中文音節來作為一個語音模型(Acoustic Model),在本節中,我們將每一個音節拆成Phone,並以此Phone為語音模型,此種拆解方式稱為 Monophone,以別於右相關(Right-context dependent, RCD)的Biphone。這些資訊是記錄在 digitMonophone.pam,如下

原始檔(htk/chineseDigitRecog/training/digitMonophone.pam):(灰色區域按兩下即可拷貝)
ba	b a
er	er
jiou	j i o u
ling	l i ng
liou	l i o u
qi	q i
san	s a n
si	s i
sil	sil
wu	w u
i	i

因此只需將前幾節範例中的 digitSyl.pam 改為 digitMonophone.pam,即可進行訓練及辨識率測試,請見下列範例:

Example 1: htk/chineseDigitRecog/training/goMonophone13.mhtkPrm=htkParamSet; htkPrm.pamFile='digitMonophone.pam'; htkPrm.phoneMlfFile='digitMonophone.mlf'; htkPrm.mnlFile='digitMonophone.mnl'; disp(htkPrm) [trainRR, testRR]=htkTrainTest(htkPrm); fprintf('Inside test = %g%%, outside test = %g%%\n', trainRR, testRR); pamFile: 'digitMonophone.pam' feaCfgFile: 'mfcc.cfg' waveDir: '..\waveFile' sylMlfFile: 'digitSyl.mlf' phoneMlfFile: 'digitMonophone.mlf' mnlFile: 'digitMonophone.mnl' grammarFile: 'digit.grammar' feaType: 'MFCC_E' feaDim: 13 mixtureNum: 3 stateNum: 3 streamWidth: 13 Pruning-Off Pruning-Off Pruning-Off Pruning-Off Pruning-Off Inside test = 79.24%, outside test = 75.89%

此時所產生的 Monophone 列表如下:

原始檔(htk/chineseDigitRecog/training/output/digitMonophone.mnl):(灰色區域按兩下即可拷貝)
sil
l
i
ng
er
s
a
n
w
u
o
q
b
j

而對應於 Monophone 的 mlf 檔案如下:

Example(htk/chineseDigitRecog/training/output/digitMonophone.mlf):

若改用26維的MFCC,可見下列範例:

Example 2: htk/chineseDigitRecog/training/goMonoPhone26.mhtkPrm=htkParamSet; htkPrm.pamFile='digitMonophone.pam'; htkPrm.phoneMlfFile='digitMonophone.mlf'; htkPrm.mnlFile='digitMonophone.mnl'; htkPrm.feaCfgFile='mfcc26.cfg'; htkPrm.feaType='MFCC_E_D_Z'; htkPrm.feaDim=26; htkPrm.streamWidth=[26]; disp(htkPrm) [trainRR, testRR]=htkTrainTest(htkPrm); fprintf('Inside test = %g%%, outside test = %g%%\n', trainRR, testRR); pamFile: 'digitMonophone.pam' feaCfgFile: 'mfcc26.cfg' waveDir: '..\waveFile' sylMlfFile: 'digitSyl.mlf' phoneMlfFile: 'digitMonophone.mlf' mnlFile: 'digitMonophone.mnl' grammarFile: 'digit.grammar' feaType: 'MFCC_E_D_Z' feaDim: 26 mixtureNum: 3 stateNum: 3 streamWidth: 26 Pruning-Off Pruning-Off Pruning-Off Pruning-Off Pruning-Off Inside test = 83.71%, outside test = 87.5%

若改用39維的MFCC,可見下列範例:

Example 3: htk/chineseDigitRecog/training/goMonoPhone39.mhtkPrm=htkParamSet; htkPrm.pamFile='digitMonophone.pam'; htkPrm.phoneMlfFile='digitMonophone.mlf'; htkPrm.mnlFile='digitMonophone.mnl'; htkPrm.feaCfgFile='mfcc39.cfg'; htkPrm.feaType='MFCC_E_D_A_Z'; htkPrm.feaDim=39; htkPrm.streamWidth=[39]; disp(htkPrm) [trainRR, testRR]=htkTrainTest(htkPrm); fprintf('Inside test = %g%%, outside test = %g%%\n', trainRR, testRR); pamFile: 'digitMonophone.pam' feaCfgFile: 'mfcc39.cfg' waveDir: '..\waveFile' sylMlfFile: 'digitSyl.mlf' phoneMlfFile: 'digitMonophone.mlf' mnlFile: 'digitMonophone.mnl' grammarFile: 'digit.grammar' feaType: 'MFCC_E_D_A_Z' feaDim: 39 mixtureNum: 3 stateNum: 3 streamWidth: 39 Pruning-Off Pruning-Off Pruning-Off Pruning-Off Pruning-Off Inside test = 84.6%, outside test = 89.29%


Audio Signal Processing and Recognition (音訊處理與辨識)