- (**) Mandarin Digit Recognition: Please follow the steps in goSyl13.m to obtain the inside and outside-test recognition rates, where the traing and test copora will be given by TA in the class. The major part of this exercise is to prepare related scripts and parameter files, so your task is condensed to the writing of an m-file script goGenFile4htk.m which collects information of wave files and generates the following files for HTK training:
Then you can start training and computing recognition rates. For this part, you need to write an m-file script goHtkTrainTest.m to display the results. In particular, you need to compute the recognition rates for both inside and outside tests, when the dimension of MFCC is set to 13, 26, and 39, respectively. Please show the confusion matrices to TA for your demo. My results are
- digitSyl.mlf
- wav2fea.scp
- trainFea.scp & testFea.scp
Please be aware of the following facts:
- Based on goSyl13.m: inside test 86.38%, outside test 79.51%。
- Based on goSyl26.m: inside test 92.07%, outside test 87.25%。
- Based on goSyl39.m: inside test 95.17%, outside test 89.53%。
Hint: You can use recursiveFileList.m to retrieve all wave files under a given directory.
- Feature files cannot take Chinese name since HTK does not support.
- Every feature file name should be unique, so you need to convert the Chinese directories into numbers, plus the original file names to form a unique name for each feature file. For instance, 912508鄒銘軒\3a_7436_16017.wav ===> 00002-3a_7436_16017.fea.
- HTK is case sensitive, so you need to make sure a file name should appear correctly in a file list, and so on.
- (***) Programming contest: Mandarin digit recognition: Repeat the previous exercise by trying all kinds of methods to obtain the maximum performance defined as the average recognition rate of both inside and outside tests. Please record all the related settings (dimension of acoustic feature, unit for acoustic model, number of states, number of streams, number of mixtures, etc) in method.txt, together with the description of your approach. Please upload the following files:
TA will use these files to reproduce your recognition rates.
- The method description file method.txt
- The final macro file, with file name "final.mac"
- All the other necessary files for computing recognition rates
- (**) English letters recognition: Please repeat the exercise 1, but use the corpus of English letters instead. TA will give the training and test corpora in the class.
- (**) Programming contest: English letters recognition: Please repeat the exercise 2, but use the corpus of English letters instead. TA will give the training and test corpora in the class.
Audio Signal Processing and Recognition (音訊處理與辨識)