## Chapter 12: Exercises

1. (**) ­pºâ¤¤¤å¥À­µ¦@®¶®p¨Ãµe¹Ï: ½Ð¼g¤@­Ó MATLAB µ{¦¡ vowelFormantRecogChinese01.m¡A§¹¦¨¤U¦C¥\¯à¡G
1. ¶i¦æ¤­¬í¿ý­µ¡A¿ý­µ³W®æ¬O 16KHz/16Bits/Mono¡A¿ý­µ¤º®e¬O¤¤¤å¥À­µ¡u£«¡B£¸¡B£¹¡B£®¡B£¬¡v¡C¡]µo­µ½ÐºÉ¶q¥­Ã­¡A¨Ã¦b¥À­µ¤§¶¡µy³\°±¹y¡A¥H«K«áÄò¶i¦æ¦Û°Ê¤Á­µ¡A¥i¥H¸ÕÅ¥¦¹½d¨ÒÀÉ®×¡C¡^
2. ¨Ï¥Î wave2formant.m (in ASR Toolbox) ¨Ó§ì¥X¨â­Ó¦@®¶®p¡A¬ÛÃö³W®æ¬O formantNum=2, frameSize=20ms, frameStep=10ms, lpcOrder=12¡A³o¨Ç°Ñ¼Æ¤]¬O wave2formant.m ªº¹w³]°Ñ¼Æ¡C
3. ¨Ï¥Î endPointDetect.m (in SAP Toolbox) ¨Ó§ä¥X³o¤­­Ó­µªº¶}©l©Mµ²§ô¦ì¸m¡C½Ð½Õ¾ã¬ÛÃöºÝÂI°»´ú°Ñ¼Æ¡A¨Ï±o§Aªºµ{¦¡¯à°÷¦Û°Ê¦a¥¿½T¤Á¥X³o¤­­Ó­µ¡C§A¥i¥H³]©w plotOpt=1¡A¥H«K¨Ï¥Î endPointDetect.m ¨Óµe¥XºÝÂI°»´úµ²ªG¡A¥¿½T¹Ï§ÎÃþ¦ü¤U¹Ï¡G
4. ¨Ã¥Î¤£¦PªºÃC¦â¡A¦b¤G«×ªÅ¶¡µe¥X³o¤­­Ó­µªº¦@®¶®p¡Cµe¥X¹Ï§ÎÀ³¸ÓÃþ¦ü¤U¹Ï¡G
5. ½Ð¥Î knncLoo.m (in Machine Learning Toolbox) ¨Óºâ¥X¨Ï¥Î knnc ±N¸ê®Æ¤À¦¨¤­Ãþªº leave-one-out ¿ëÃÑ²v¡C
2. (**) Frame to MFCC conversion: Write a MATLAB function frame2mfcc.m that can compute 12-dimensional MFCC from a given speech/audio frame. Please follow the steps in the text.
Solution: Pleae check out the function in the SAP Toolbox.
3. (***) Use MFCC for classifying vowels: Write an m-file script to do the following tasks:
1. Record a 5-second clips of the Chinese vowel ¡u£«¡B£¸¡B£¹¡B£®¡B£¬¡v (or the English vowels "a, e, i, o, u") with 16KHz/16Bits/Mono. (Please try to maintain a stable pitch and volume, and keep a short pause between vowels to facilitate automatic vowel segmentation. Here is a sample file for your reference.)
2. Use epdByVol.m (in SAP Toolbox) to detect the starting and ending positions of these 5 vowels. If the segmentation is correct, you should have 5 sound segments from the 3rd output argument of epdByVol.m. Moreover, you should set plotOpt=1 to verify the segmentation result. Your plot should be similar to the following:
If the segmentation is not correct, you should adjust the parameters to epdByVol.m until you get the correct segmentation.
3. Use buffer2.m to do frame blocking on these 5 vowels, with frameSize=32ms and overlap=0ms. Please generate 5 plots of MFCC (use frame2mfcc.m or wave2mfcc.m) corresponding to each vowel. Each plot should contains as many curves of MFCC vectors as the number of frames in this vowel. Your plots should be similar to those shown below:
4. Use knncLoo.m (in Machine Learning Toolbox) to compute the leave-one-our recognition rate when we use MFCC to classify each frame into 5 classes of different vowels. In particular, we need to change the dimension of the feature from 1 to 12 and plot the leave-one-out recognition rates using KNNC with k=1. What is the maximum recognition rate? What is the corresponding optimum dimension? Your plot should be similar to the next one:
5. Record another clip of the same utterance and use it as the test data. Use the original clip as the train data. Use the optimum dimension in the previous subproblem to compute the the frame-based recognition rate of KNNC with k=1. What is the frame-based recognition rate? Plot the confusion matrix, which should be similar to the following figure:
6. What is the vowel-based recognition rate? Plot the confusion matrix, which should be similar to the following figure:
(Hint: The vowel-based recognition result is the voting of frame-based results. You can use mode to compute the result of voting.)
7. Perform feature selection based on sequential forward selection to select up to 12 features. Plot the leave-one-out recognition rates with respective to the selected features. Your plot should be similar to the next one:
What is the maximum recognition rate? What features are selected?
8. Use LDA to project 12-dimentional data onto 2D plane and plot the data to see if the data has the tendency of natural clustering based on their classes. Your plot should be similar to the next one:
What is the LOO recognition rate after LDA projection?
9. Repeat the previous sub-problem using PCA.

Audio Signal Processing and Recognition (­µ°T³B²z»P¿ëÃÑ)