One of the typical applications of DTW is text-independent speaker identification. The application is divided into two stages:
At the registration stage, each speaker is required to pronounce several utterances as the spoken passwords.
At the application stage, the speaker pronounces one of the spoken keywords and the system is required to find the identity of the speaker by comparing the spoken keywords against those keywords received at the registration stage. The comparisons are usually achieved by DTW for its robustness to variance in speech rate.
For instance, within the "dataSet" directory of the ML toolbox, we have a collection of recordings consisting of two sessions of 3 subjects each. Each subject was requrested to pronounce 10 spoken passwords for 3 times at each session. So each folder within a session contains 30 recordings for each subject.
First of all, it is a good programming habit to put all parameters related to our task into a function, which return a structure variable containing all parameters:
Example 1: speakerIdTextDependent/sidPrmSet.m function sidPrm=sidPrmSet
% sidPrmSet: Set parameters for speaker identification
% ====== Add required toolboxes to the search path
mltPath='/users/jang/matlab/toolbox/dcpr';
addpath(mltPath);
% ====== Wave directories
sidPrm.waveDir01=sprintf('%s\dataSet\speakerIdTextDependent\session01', mltRoot);
sidPrm.waveDir02=sprintf('%s\dataSet\speakerIdTextDependent\session02', mltRoot);
sidPrm.feaType='mfcc'; % 'mfcc', 'volume', 'pitch'
sidPrm.outputDir='output'; % Output directory
Note that the above function also puts required toolboxes into the MATLAB search path.
To read the data from the folder, see the next example:
Example 2: speakerIdTextDependent/goFeaExtract.m % Feature extraction
sidPrm=sidPrmSet;
% ====== Read session 1
speakerData1=speakerDataRead(sidPrm.waveDir01);
fprintf('Get wave info of %d persons from %s\n', length(speakerData1), sidPrm.waveDir01);
speakerData1=speakerDataAddFea(speakerData1, sidPrm); % Add features to speakerData1
% ====== Read session 2
speakerData2=speakerDataRead(sidPrm.waveDir02);
fprintf('Get wave info of %d persons from %s\n', length(speakerData2), sidPrm.waveDir02);
speakerData2=speakerDataAddFea(speakerData2, sidPrm); % Add features to speakerData2
fprintf('Save speakerData1 and speakerData2 to speakerData.mat\n');
save speakerData speakerData1 speakerData2 Get wave info of 3 persons from \users\jang\matlab\toolbox\dcpr\dataSet\speakerIdTextDependent\session01
1/3: Feature extraction from 30 recordings by 9761215 ===> 0.444344 sec
2/3: Feature extraction from 30 recordings by 9761217 ===> 0.264503 sec
3/3: Feature extraction from 30 recordings by 9762115 ===> 0.348345 sec
Get wave info of 3 persons from \users\jang\matlab\toolbox\dcpr\dataSet\speakerIdTextDependent\session02
1/3: Feature extraction from 30 recordings by 9761215 ===> 0.353603 sec
2/3: Feature extraction from 30 recordings by 9761217 ===> 0.349893 sec
3/3: Feature extraction from 30 recordings by 9762115 ===> 0.311077 sec
Save speakerData1 and speakerData2 to speakerData.mat
To check data consistency, see the next example:
Example 3: speakerIdTextDependent/goDataCheck.m % Check data consistency for speaker identification.
load speakerData.mat
sidPrm=sidPrmSet;
% ====== Check empty speaker folder in session01
sentenceNum=[speakerData1.sentenceNum];
index=find(sentenceNum==0);
speakerData1Empty=speakerData1(index);
title='Speakers with no recordings in session 1';
outputFile=sprintf('%s/%s.htm', sidPrm.outputDir, title);
if ~isempty(speakerData1Empty), structDispInHtml(speakerData1Empty, title, {'name'}, [], [], outputFile); end
speakerData1(index)=[];
% ====== Check empty speaker folder in session02
sentenceNum=[speakerData2.sentenceNum];
index=find(sentenceNum==0);
speakerData2Empty=speakerData2(index);
title='Speakers with no recordings in session 2';
outputFile=sprintf('%s/%s.htm', sidPrm.outputDir, title);
if ~isempty(speakerData2Empty), structDispInHtml(speakerData2Empty, title, {'name'}, [], [], outputFile); end
speakerData2(index)=[];
% ====== Speaker difference in both sessions
speaker1={speakerData1.name};
speaker2={speakerData2.name};
diffSet1=setdiff(speaker1, speaker2);
diffSet2=setdiff(speaker2, speaker1);
% === Speaker in session01 but not in session02
index1=[];
for i=1:length(diffSet1)
index1=[index1, find(strcmp(diffSet1{i}, speaker1))];
end
title='Speakers only in session 1';
outputFile=sprintf('%s/%s.htm', sidPrm.outputDir, title);
if ~isempty(index1), structDispInHtml(speakerData1(index1), title, {'name'}, [], [], outputFile); end
speakerData1(index1)=[];
% === Speaker in session02 but not in session01
index2=[];
for i=1:length(diffSet2)
index2=[index2, find(strcmp(diffSet2{i}, speaker2))];
end
title='Speakers only in session 2';
outputFile=sprintf('%s/%s.htm', sidPrm.outputDir, title);
if ~isempty(index2), structDispInHtml(speakerData2(index2), title, {'name'}, [], [], outputFile); end
speakerData2(index2)=[];
To evaluate the performance using DTW, try the next example:
Example 4: speakerIdTextDependent/goPerfEval.m % Performance evaluation
load speakerData.mat
% ====== Speaker ID by DTW
for i=1:length(speakerData2)
tInit=clock;
name=speakerData2(i).name;
fprintf('%d/%d: speaker=%s\n', i, length(speakerData2), name);
for j=1:length(speakerData2(i).sentence)
% fprintf('\tsentence=%d ==> ', j);
% t0=clock;
inputSentence=speakerData2(i).sentence(j);
[speakerIndex, sentenceIndex, minDistance]=speakerId(inputSentence, speakerData1);
computedName=speakerData1(speakerIndex).name;
% fprintf('computedName=%s, time=%.2f sec\n', computedName, etime(clock, t0));
speakerData2(i).sentence(j).correct=strcmp(name, computedName);
speakerData2(i).sentence(j).computedSpeakerIndex=speakerIndex;
speakerData2(i).sentence(j).computedSentenceIndex=sentenceIndex;
speakerData2(i).sentence(j).computedSentencePath=speakerData1(speakerIndex).sentence(sentenceIndex).path;
end
speakerData2(i).correct=[speakerData2(i).sentence.correct];
speakerData2(i).rr=sum(speakerData2(i).correct)/length(speakerData2(i).correct);
fprintf('\tRR for %s = %.2f%%, ave. time = %.2f sec\n', name, 100*speakerData2(i).rr, etime(clock, tInit)/length(speakerData2(i).sentence));
end
correct=[speakerData2.correct];
overallRr=sum(correct)/length(correct);
fprintf('Ovderall RR = %.2f%%\n', 100*overallRr);
fprintf('Save speakerData1 and speakerData2 to speakerData.mat\n');
save speakerData speakerData1 speakerData2 1/3: speaker=9761215
RR for 9761215 = 80.00%, ave. time = 0.05 sec
2/3: speaker=9761217
RR for 9761217 = 100.00%, ave. time = 0.03 sec
3/3: speaker=9762115
RR for 9762115 = 100.00%, ave. time = 0.05 sec
Ovderall RR = 93.33%
Save speakerData1 and speakerData2 to speakerData.mat
After obtaining the overall recognition rate, we can compute statistics of each person, and also list the misclassified utterances with their false output, as shown in the following example:
Example 5: speakerIdTextDependent/goPostAnalysis.m sidPrm=sidPrmSet;
load speakerData.mat
correct=[speakerData2.correct];
overallRr=sum(correct)/length(correct);
% ====== Display each person's performance
[junk, index]=sort([speakerData2.rr]);
sortedSpeakerData2=speakerData2(index);
outputFile=sprintf('%s/personRr_rr=%f%%.htm', sidPrm.outputDir, 100*overallRr);
structDispInHtml(sortedSpeakerData2, sprintf('Performance of all persons (Overall RR=%.2f%%)', 100*overallRr), {'name', 'rr'}, [], [], outputFile);
% ====== Display misclassified utterances
sentenceData=[sortedSpeakerData2.sentence];
sentenceDataMisclassified=sentenceData(~[sentenceData.correct]);
outputFile=sprintf('%s/sentenceMisclassified_rr=%f%%.htm', sidPrm.outputDir, 100*overallRr);
structDispInHtml(sentenceDataMisclassified, sprintf('Misclassified Sentences (Overall RR=%.2f%%)', 100*overallRr), {'path', 'computedSentencePath'}, [], [], outputFile);
This is a very important step toward error analysis for further improve the classification system.
Data Clustering and Pattern Recognition (資料分群與樣式辨認)