In the previous section, we have demonstrated how to use HTK for Mandarin digit recognition. In this and the following sections, we shall change various settings (such as acoustic features, acoustic model configuration, etc) to improve the recognition rates.
For modularity, we have packed the basic training and test programs into an m-file function htkTrainTest.m. This function takes a structure variable that specifies all the parameters for training, and generates the final test results.
If we keep the configuration of the acoustic models, we can still change the acoustic features. In the previous section, we used a feature type of 13-dimensional MFCC_E. We can now change it to 26-dimensional MFCC_E_D or MFCC_E_D_Z. Furthermore, we can change it to 39-dimensional MFCC_E_D_A or MFCC_E_D_A_Z. For simplicity, we have use the string representations for various feature types, as explained next.
The following exmaple uses 26-dimensional MFCC_E_D_Z for recognition:
- E: Append energy.
- D: Apply delta operator.
- A: Apply acceleration operator.
- Z: Apply cepstrum mean subtraction (CMS).
The corresponding batch file is goSyl26.bat.Furthermore, the following example uses 39-dimensional MFCC_E_D_A_Z:
The corresponding batch file is goSyl39.bat.In the batch files, since we have not pack them into functions, the contents of batch files seem more complicated. But in fact, from goSyl13.bat to goSyl26.bat, only two lines have been changed. You can use the following command to verify their difference:
fc goSyl13.bat goSyl26.batSimilarly, you can use the same method to verify the difference between goSyl26.bat and goSyl39.bat.
Audio Signal Processing and Recognition (音訊處理與辨識)