Programming Contest:
Old Chinese version
The goal of this programming contest is to let students get familiar with the use of GMM (Gaussian Mixture Model) for speaker recognition. The students are required to tune a set of parameters to improve the recognition rates.
- Data to download:
- exampleProgram.rar: All example programs
- Wave files of Tang Poems to be use for this contest. TA will give the ftp address in the class.
- How to execute the example program:
- Change addMyPath.m to point to the correct paths of the required toolboxes.
- Set the variable waveDir in goExtractFeature.m to the path containing all the wave files.
- Run the main program by typing "go" within MATLAB. The contents of go.m is shown next:
% This is the main file for speaker recognition
addpath d:/users/jang/matlab/toolbox/utility
addpath d:/users/jang/matlab/toolbox/machineLearning
addpath d:/users/jang/matlab/toolbox/sap
goExtractFeature;
goTrainGmm;
goUtteranceRr;
In other words, go.m invokes three other m-file scripts, with the following functions:
- goExtractFeature.m: Feature extraction.
- goTrainGmm.m: GMM training.
- goUtteranceRr.m: Evaluation of utterance-based recognition rates
From the parameter settings at goExtractFeature.m, the script uses 10 utterances from each speaker, where the five odd-indexed utterances are used for training and the other ones are used for test. The inside-test recognition rate is 100% while the outside-test one is 99.00%. (Note that these are all utterance-based recognition rates.) The confusion matrix will also be given, as follows:
- What you need to demonstrate during the class:
- Plot the frame-based recognition rates (both inside and outside tests) as a function of the number of mixtures which takes the values of 2, 4, 8, 16, 32. (You might need to have more sentences for training if the number of mixture is large.)
- Use the number of mixtures at which the outside-test recognition rate is at its maximum. Plot the segment-based recognition rates (both inside and outside tests) as a function of the segment length (in terms of number of frames).
- You need to combine the above two to have this plot of the recognition rate w.r.t. the segment length. You need to show 10 curves, corresponding to the inside and outside tests for mixture numbers = 2, 4, 8, 16, 32.
(Hint: Be sure to save the speakerData for further processing, since we do not change the feature set.)
- How to modify the program to get better utterance-based recognition rates (please refer to "Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models"):
- Our example program only take 10 utterances from each speaker. If you take all utterances from a speaker, the computing time will be much longer. To get around, you should try your best to find a high-speed computer for this contest.
- For end-point detection, you should try to get rid of silence as well as unvoiced sounds.
- For GMM model parameters, you can try the following items:
- Change the number of mixtures in GMM. Choose a number that can have the best outside-test recognition rate.
- Use different methods for the initialization of k-means. A better k-means can improve the performance of GMM.
- Choose a VQ method by center-splitting. The function is vqLbg.m in DCPR toolbox.
- Increase the iteration count to see if we can get a higher log probability.
- For feature extraction, you can try the following items:
- Use the feature extraction function by HTK (htkWave2mfccMex.dll).
- Use MFCC only, which does not contain the log energy.
- Try cepstral mean normalization
- For reducing the computing time:
- You can replace vqKmeans.m with vqKmeansMex.dll for speeding up.
- If you think gmmTrain.m is too slow, you can write a C-callable mex file for speeding up.
- Performance evaluation: our TA will carry out both inside and outside tests to compute utterance-based recognition rates based on all the utterances (odd-indexed utterances as the training and even-indexed as the test set), and to have their average as the final performance index.
- The files for uploading:
- gmmPrm.mat: Parameters for GMM
- method.txt: Description of your methods
- Any files that are necessary for running your main program.