%% Tutorial on text-dependent speaker recognition
% In this tutorial, we shall explain the basics of text-dependent speaker identification using MFCC as features and DTW as comparison method.
% The dataset is availabe upon request. 
%% Preprocessing
% Before we start, let's add necessary toolboxes to the search path of MATLAB:
addpath d:/users/jang/matlab/toolbox/utility
addpath d:/users/jang/matlab/toolbox/sap
addpath d:/users/jang/matlab/toolbox/machineLearning
%%
% Note that all the above toolboxes can be downloaded from the author's <http://mirlab.org/jang/matlab/toolbox toolbox page>.
%%
% For compatibility, here we list the platform and MATLAB version that we used to run this script:
fprintf('Platform: %s\n', computer);
fprintf('MATLAB version: %s\n', version);
scriptStartTime=tic;
%% Options for speaker identification
% First of all, all the options for our speaker identification can be obtain as follows:
sidOpt=sidOptSet
%%
% The contents of sidOpt can be shown next:
type sidOptSet.m
%%
% And we need to create the output folder if it doesn't exist:
if ~exist(sidOpt.outputDir, 'dir'), fprintf('Creating %s...\n', sidOpt.outputDir); mkdirs(sidOpt.outputDir); end
%% Dataset construction
% Here we have some facts about the speaker-ID corpus for this tutorial:
%
% * All the audio clips are recorded by a single mobile phone to reduce channel distortion.
% * Each person is required to record 3 times for each of 10 speech passwords, leading to a total of 30 clips per person.
% * We have close to 100 persons for the recordings.
% * There are 2 sessions for recordings with a week apart. We shall use recordings in session 1 as the training data and those in session 2 as the test data.
%
% To collect all the audio files and perform feature extraction, we can invoke "sidFeaExtract" as follows:
tic
[speakerSet1, speakerSet2]=sidFeaExtract(sidOpt);
fprintf('Elapsed time = %g sec\n', toc);
%%
% Note the extracted data will be stored as a mat file for future use.
%
% To evaluate the performance, we can invoke "sidPerfEval":
tic
[overallRr, speakerSet1, time]=sidPerfEval(speakerSet1, speakerSet2, sidOpt, 1);
fprintf('Elapsed time = %g sec\n', toc);
fprintf('Saving %s/speakerSet1.mat...\n', sidOpt.outputDir);
eval(sprintf('save %s/speakerSet1 speakerSet1', sidOpt.outputDir));
%% Post analysis
% We can display the scatter plots based on the features of each sentence,
% in order to visualize if it is possible to separate "bad" utterances from
% "good" ones:
sentence=[speakerSet1.sentence];
DS.input=[[sentence.meanVolume]; [sentence.meanClarity]; [sentence.medianPitch]; [sentence.minDistance]; [sentence.frameNum]];
DS.inputName={'meanVolume', 'meanClarity', 'medianPitch', 'minDistance', 'frameNum'};
DS.input=inputNormalize(DS.input);
DS.output=2-[sentence.correct];
dsProjPlot2(DS); figEnlarge;
eval(sprintf('print -dpng %s/dtwDataDistribution', sidOpt.outputDir));
%% Summary
% This is a brief tutorial on text-dependent speaker recognition based on DTW.
% There are several directions for further improvement:
%
% * Explore other features, such as magnitude spectra or filter bank coefficients.
% * Try flexible anchor positions for DTW.
% * Use rejection to improve the overall system's accuracy.
%
%% Appendix
% List of functions used in this script
%
% * <../list.asp List of all files>
%
%%
% Overall elapsed time:
toc(scriptStartTime)
%%
% <http://mirlab.org/jang Jyh-Shing Roger Jang>, created on
datetime
%%
% If you are interested in the original MATLAB code for this page, you can
% type "grabcode(URL)" under MATLAB, where URL is the web address of this
% page.