Tutorial on coin recognition

This tutorial explains the basics of coin recognition based on the sound when the coin is dropped to the ground.

Contents

Preprocessing

Before we start, let's add necessary toolboxes to the search path of MATLAB:

addpath d:/users/jang/matlab/toolbox/utility
addpath d:/users/jang/matlab/toolbox/sap
addpath d:/users/jang/matlab/toolbox/machineLearning

All the above toolboxes can be downloaded from the author's toolbox page. Make sure you are using the latest toolboxes to work with this script.

For compatibility, here we list the platform and MATLAB version that we used to run this script:

fprintf('Platform: %s\n', computer);
fprintf('MATLAB version: %s\n', version);
fprintf('Script starts at %s\n', char(datetime));
scriptStartTime=tic;	% Timing for the whole script
Platform: PCWIN64
MATLAB version: 8.5.0.197613 (R2015a)
Script starts at 04-Feb-2017 18:48:23

Dataset collection

First of all, we can collect all the sound files. The dataset can be found at this link. We can use the commmand "mmDataCollect" to collect all the file information:

auDir='coinSound';
opt=mmDataCollect('defaultOpt');
opt.extName='wav';
auSet=mmDataCollect(auDir, opt, 1);
Collecting 20 files with extension "wav" from "coinSound"...

We need to perform feature extraction and put all the dataset into a format that is easier for further processing, including classifier construction and evaluation.

myTic=tic;
if ~exist('ds.mat', 'file')
	opt=dsCreateFromMm('defaultOpt');
	opt.auFeaFcn=@auFeaMfcc;		% Function for feature extraction
	opt.auEpdOpt.method='vol';
	%opt.auEpdOpt.volRatio=0.02;	% To have the right EPD, but it doesn't help recognition!
	ds=dsCreateFromMm(auSet, opt);
	fprintf('Saving ds.mat...\n'); save ds ds
else
	fprintf('Loading ds.mat...\n'); load ds.mat
end
fprintf('time=%g sec\n', toc(myTic));
Loading ds.mat...
time=0.00108643 sec

Now all the frame-based features are extracted and stored in "ds". Next we can try to plot the extracted features for each class:

figure; dsFeaVecPlot(ds);

Performance evaluation

Now we want to do performance evaluation on LOFOCV (leave-one-file-out cross validation), where each file is a recording of a complete sound event. LOFOCV is proceeded as follows:

opt=perfLoo4audio('defaultOpt');
[ds2, fileRr, frameRr]=perfLoo4audio(ds, opt);
fprintf('Frame-based leave-one-file-out RR=%g%%\n', frameRr*100);
fprintf('File-based leave-one-file-out RR=%g%%\n', fileRr*100);
1/20: Leave-one-file-out CV for "coinSound/01/1nt_1.wav", time=0.14366 sec
2/20: Leave-one-file-out CV for "coinSound/01/1nt_2.wav", time=0.12168 sec
3/20: Leave-one-file-out CV for "coinSound/01/1nt_3.wav", time=0.161968 sec
4/20: Leave-one-file-out CV for "coinSound/01/1nt_4.wav", time=0.119383 sec
5/20: Leave-one-file-out CV for "coinSound/01/1nt_5.wav", time=0.119165 sec
6/20: Leave-one-file-out CV for "coinSound/05/5nt_1.wav", time=0.12179 sec
7/20: Leave-one-file-out CV for "coinSound/05/5nt_2.wav", time=0.137571 sec
8/20: Leave-one-file-out CV for "coinSound/05/5nt_3.wav", time=0.128433 sec
9/20: Leave-one-file-out CV for "coinSound/05/5nt_4.wav", time=0.131138 sec
10/20: Leave-one-file-out CV for "coinSound/05/5nt_5.wav", time=0.119772 sec
11/20: Leave-one-file-out CV for "coinSound/10/10nt_1.wav", time=0.138463 sec
12/20: Leave-one-file-out CV for "coinSound/10/10nt_2.wav", time=0.11201 sec
13/20: Leave-one-file-out CV for "coinSound/10/10nt_3.wav", time=0.140275 sec
14/20: Leave-one-file-out CV for "coinSound/10/10nt_4.wav", time=0.112496 sec
15/20: Leave-one-file-out CV for "coinSound/10/10nt_5.wav", time=0.125312 sec
16/20: Leave-one-file-out CV for "coinSound/50/50nt_1.wav", time=0.12639 sec
17/20: Leave-one-file-out CV for "coinSound/50/50nt_2.wav", time=0.121933 sec
18/20: Leave-one-file-out CV for "coinSound/50/50nt_3.wav", time=0.136254 sec
19/20: Leave-one-file-out CV for "coinSound/50/50nt_4.wav", time=0.132687 sec
20/20: Leave-one-file-out CV for "coinSound/50/50nt_5.wav", time=0.120326 sec
Frame-based leave-one-file-out RR=68.1481%
File-based leave-one-file-out RR=95%

We can plot the frame-based confusion matrix:

confMat=confMatGet(ds2.output, ds2.frameClassIdPredicted);
confOpt=confMatPlot('defaultOpt');
confOpt.className=ds.outputName;
figure; confMatPlot(confMat, confOpt);

We can also plot the file-based confusion matrix:

confMat=confMatGet(ds2.fileClassId, ds2.fileClassIdPredicted);
confOpt=confMatPlot('defaultOpt');
confOpt.className=ds.outputName;
figure; confMatPlot(confMat, confOpt);

We can also list all the misclassified sounds in a table:

for i=1:length(auSet)
	auSet(i).classPredicted=ds.outputName{ds2.fileClassIdPredicted(i)};
end
mmDataList(auSet);

List of 1 misclassified cases

Index\FieldFileGT ==> PredictedHiturl
 1 50nt_1.wav 50 ==> 10 false /jang/books/audioSignalProcessing/appNote/coinType/coinSound/50/50nt_1.wav

Dimensionality reduction

In order to visualize the distribution of the dataset, we need to project the original dataset into 2-D space. This can be achieved by LDA (linear discriminant analysis):

ds2d=lda(ds);
ds2d.input=ds2d.input(1:2, :);
figure; dsScatterPlot(ds2d); xlabel('Input 1'); ylabel('Input 2');
title('MFCC projected on the first 2 lda vectors');

As can be seen from the scatter plot, the overlap between "10" and "50" is the largest among all class pairs, indicating that these two classes are likely to be confused with each other. This is also verified by the confusion matrices shown earlier.

Actually it is possible to do LDA projection and obtain the corresponding accuracies vs. dimensionalities via leave-one-out cross validation over KNNC:

opt=ldaPerfViaKnncLoo('defaultOpt');
opt.mode='exact';
recogRate1=ldaPerfViaKnncLoo(ds, opt);
ds2=ds; ds2.input=inputNormalize(ds2.input);	% input normalization
recogRate2=ldaPerfViaKnncLoo(ds2, opt);
[featureNum, dataNum] = size(ds.input);
plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on
legend('Raw data', 'Normalized data', 'location', 'southeast');
xlabel('No. of projected features based on LDA');
ylabel('LOO recognition rates using KNNC (%)');

We can also perform input selection to reduce dimensionality:

myTic=tic;
z=inputSelectSequential(ds, inf, [], [], 1); figEnlarge;
toc(myTic)
Construct 91  models, each with up to 13 inputs selected from 13 candidates...

Selecting input 1:
Model 1/91: selected={ 1} => Recog. rate = 31.9%
Model 2/91: selected={ 2} => Recog. rate = 26.3%
Model 3/91: selected={ 3} => Recog. rate = 32.2%
Model 4/91: selected={ 4} => Recog. rate = 35.0%
Model 5/91: selected={ 5} => Recog. rate = 28.3%
Model 6/91: selected={ 6} => Recog. rate = 39.4%
Model 7/91: selected={ 7} => Recog. rate = 37.8%
Model 8/91: selected={ 8} => Recog. rate = 42.4%
Model 9/91: selected={ 9} => Recog. rate = 50.9%
Model 10/91: selected={10} => Recog. rate = 44.4%
Model 11/91: selected={11} => Recog. rate = 52.8%
Model 12/91: selected={12} => Recog. rate = 52.4%
Model 13/91: selected={13} => Recog. rate = 30.6%
Currently selected inputs: 11

Selecting input 2:
Model 14/91: selected={11,  1} => Recog. rate = 57.0%
Model 15/91: selected={11,  2} => Recog. rate = 61.1%
Model 16/91: selected={11,  3} => Recog. rate = 56.1%
Model 17/91: selected={11,  4} => Recog. rate = 54.4%
Model 18/91: selected={11,  5} => Recog. rate = 53.1%
Model 19/91: selected={11,  6} => Recog. rate = 57.2%
Model 20/91: selected={11,  7} => Recog. rate = 54.1%
Model 21/91: selected={11,  8} => Recog. rate = 61.5%
Model 22/91: selected={11,  9} => Recog. rate = 65.7%
Model 23/91: selected={11, 10} => Recog. rate = 60.6%
Model 24/91: selected={11, 12} => Recog. rate = 69.6%
Model 25/91: selected={11, 13} => Recog. rate = 55.0%
Currently selected inputs: 11, 12

Selecting input 3:
Model 26/91: selected={11, 12,  1} => Recog. rate = 70.7%
Model 27/91: selected={11, 12,  2} => Recog. rate = 74.6%
Model 28/91: selected={11, 12,  3} => Recog. rate = 73.0%
Model 29/91: selected={11, 12,  4} => Recog. rate = 69.4%
Model 30/91: selected={11, 12,  5} => Recog. rate = 70.9%
Model 31/91: selected={11, 12,  6} => Recog. rate = 74.3%
Model 32/91: selected={11, 12,  7} => Recog. rate = 69.8%
Model 33/91: selected={11, 12,  8} => Recog. rate = 72.0%
Model 34/91: selected={11, 12,  9} => Recog. rate = 73.5%
Model 35/91: selected={11, 12, 10} => Recog. rate = 77.0%
Model 36/91: selected={11, 12, 13} => Recog. rate = 69.6%
Currently selected inputs: 11, 12, 10

Selecting input 4:
Model 37/91: selected={11, 12, 10,  1} => Recog. rate = 76.3%
Model 38/91: selected={11, 12, 10,  2} => Recog. rate = 77.8%
Model 39/91: selected={11, 12, 10,  3} => Recog. rate = 77.0%
Model 40/91: selected={11, 12, 10,  4} => Recog. rate = 76.1%
Model 41/91: selected={11, 12, 10,  5} => Recog. rate = 76.3%
Model 42/91: selected={11, 12, 10,  6} => Recog. rate = 78.5%
Model 43/91: selected={11, 12, 10,  7} => Recog. rate = 78.0%
Model 44/91: selected={11, 12, 10,  8} => Recog. rate = 76.3%
Model 45/91: selected={11, 12, 10,  9} => Recog. rate = 76.3%
Model 46/91: selected={11, 12, 10, 13} => Recog. rate = 76.7%
Currently selected inputs: 11, 12, 10,  6

Selecting input 5:
Model 47/91: selected={11, 12, 10,  6,  1} => Recog. rate = 78.1%
Model 48/91: selected={11, 12, 10,  6,  2} => Recog. rate = 78.7%
Model 49/91: selected={11, 12, 10,  6,  3} => Recog. rate = 78.1%
Model 50/91: selected={11, 12, 10,  6,  4} => Recog. rate = 76.9%
Model 51/91: selected={11, 12, 10,  6,  5} => Recog. rate = 76.7%
Model 52/91: selected={11, 12, 10,  6,  7} => Recog. rate = 78.7%
Model 53/91: selected={11, 12, 10,  6,  8} => Recog. rate = 76.9%
Model 54/91: selected={11, 12, 10,  6,  9} => Recog. rate = 78.3%
Model 55/91: selected={11, 12, 10,  6, 13} => Recog. rate = 77.8%
Currently selected inputs: 11, 12, 10,  6,  2

Selecting input 6:
Model 56/91: selected={11, 12, 10,  6,  2,  1} => Recog. rate = 77.8%
Model 57/91: selected={11, 12, 10,  6,  2,  3} => Recog. rate = 79.4%
Model 58/91: selected={11, 12, 10,  6,  2,  4} => Recog. rate = 78.9%
Model 59/91: selected={11, 12, 10,  6,  2,  5} => Recog. rate = 78.1%
Model 60/91: selected={11, 12, 10,  6,  2,  7} => Recog. rate = 79.6%
Model 61/91: selected={11, 12, 10,  6,  2,  8} => Recog. rate = 78.5%
Model 62/91: selected={11, 12, 10,  6,  2,  9} => Recog. rate = 78.7%
Model 63/91: selected={11, 12, 10,  6,  2, 13} => Recog. rate = 79.1%
Currently selected inputs: 11, 12, 10,  6,  2,  7

Selecting input 7:
Model 64/91: selected={11, 12, 10,  6,  2,  7,  1} => Recog. rate = 80.6%
Model 65/91: selected={11, 12, 10,  6,  2,  7,  3} => Recog. rate = 81.7%
Model 66/91: selected={11, 12, 10,  6,  2,  7,  4} => Recog. rate = 81.5%
Model 67/91: selected={11, 12, 10,  6,  2,  7,  5} => Recog. rate = 81.7%
Model 68/91: selected={11, 12, 10,  6,  2,  7,  8} => Recog. rate = 80.7%
Model 69/91: selected={11, 12, 10,  6,  2,  7,  9} => Recog. rate = 79.8%
Model 70/91: selected={11, 12, 10,  6,  2,  7, 13} => Recog. rate = 80.4%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3

Selecting input 8:
Model 71/91: selected={11, 12, 10,  6,  2,  7,  3,  1} => Recog. rate = 81.3%
Model 72/91: selected={11, 12, 10,  6,  2,  7,  3,  4} => Recog. rate = 82.2%
Model 73/91: selected={11, 12, 10,  6,  2,  7,  3,  5} => Recog. rate = 81.5%
Model 74/91: selected={11, 12, 10,  6,  2,  7,  3,  8} => Recog. rate = 81.7%
Model 75/91: selected={11, 12, 10,  6,  2,  7,  3,  9} => Recog. rate = 81.1%
Model 76/91: selected={11, 12, 10,  6,  2,  7,  3, 13} => Recog. rate = 81.1%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3,  4

Selecting input 9:
Model 77/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  1} => Recog. rate = 81.1%
Model 78/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  5} => Recog. rate = 82.0%
Model 79/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8} => Recog. rate = 82.2%
Model 80/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  9} => Recog. rate = 81.5%
Model 81/91: selected={11, 12, 10,  6,  2,  7,  3,  4, 13} => Recog. rate = 80.6%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3,  4,  8

Selecting input 10:
Model 82/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  1} => Recog. rate = 81.3%
Model 83/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5} => Recog. rate = 81.7%
Model 84/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  9} => Recog. rate = 81.5%
Model 85/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8, 13} => Recog. rate = 80.7%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3,  4,  8,  5

Selecting input 11:
Model 86/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1} => Recog. rate = 81.5%
Model 87/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  9} => Recog. rate = 80.4%
Model 88/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5, 13} => Recog. rate = 80.9%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1

Selecting input 12:
Model 89/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1,  9} => Recog. rate = 81.3%
Model 90/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1, 13} => Recog. rate = 82.4%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1, 13

Selecting input 13:
Model 91/91: selected={11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1, 13,  9} => Recog. rate = 82.8%
Currently selected inputs: 11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1, 13,  9

Overall maximal recognition rate = 82.8%.
Selected 13 inputs (out of 13): 11, 12, 10,  6,  2,  7,  3,  4,  8,  5,  1, 13,  9
Elapsed time is 46.885277 seconds.

It seems the feature selection is not very effective since the accuracy is the best when all the inputs are selected.

After dimensionality reduction, we can perform all combinations of classifiers and input normalization to search the best performance via leave-one-out cross validation:

myTic=tic;
poOpt=perfCv4classifier('defaultOpt');
poOpt.foldNum=inf;	% Leave-one-out cross validation
figure; [perfData, bestId]=perfCv4classifier(ds, poOpt, 1);
toc(myTic)
structDispInHtml(perfData, 'Performance of various classifiers via cross validation');
Elapsed time is 606.656238 seconds.

Then we can display the confusion matrix corresponding to the best classifier and the best input normalization scheme:

confMat=confMatGet(ds.output, perfData(bestId).bestComputedClass);
confOpt=confMatPlot('defaultOpt');
confOpt.className=ds.outputName;
figure; confMatPlot(confMat, confOpt);
opt=perfLoo4audio('defaultOpt');
opt.classifier='svmc';
opt.classifierOpt=feval([opt.classifier, 'Train'], 'defaultOpt');
[ds2, fileRr, frameRr]=perfLoo4audio(ds, opt);
fprintf('Frame-based leave-one-file-out RR=%g%%\n', frameRr*100);
fprintf('File-based leave-one-file-out RR=%g%%\n', fileRr*100);
1/20: Leave-one-file-out CV for "coinSound/01/1nt_1.wav", time=0.174804 sec
2/20: Leave-one-file-out CV for "coinSound/01/1nt_2.wav", time=0.180456 sec
3/20: Leave-one-file-out CV for "coinSound/01/1nt_3.wav", time=0.152935 sec
4/20: Leave-one-file-out CV for "coinSound/01/1nt_4.wav", time=0.150658 sec
5/20: Leave-one-file-out CV for "coinSound/01/1nt_5.wav", time=0.150098 sec
6/20: Leave-one-file-out CV for "coinSound/05/5nt_1.wav", time=0.156212 sec
7/20: Leave-one-file-out CV for "coinSound/05/5nt_2.wav", time=0.1482 sec
8/20: Leave-one-file-out CV for "coinSound/05/5nt_3.wav", time=0.154819 sec
9/20: Leave-one-file-out CV for "coinSound/05/5nt_4.wav", time=0.153566 sec
10/20: Leave-one-file-out CV for "coinSound/05/5nt_5.wav", time=0.148955 sec
11/20: Leave-one-file-out CV for "coinSound/10/10nt_1.wav", time=0.148474 sec
12/20: Leave-one-file-out CV for "coinSound/10/10nt_2.wav", time=0.146257 sec
13/20: Leave-one-file-out CV for "coinSound/10/10nt_3.wav", time=0.146374 sec
14/20: Leave-one-file-out CV for "coinSound/10/10nt_4.wav", time=0.151159 sec
15/20: Leave-one-file-out CV for "coinSound/10/10nt_5.wav", time=0.152013 sec
16/20: Leave-one-file-out CV for "coinSound/50/50nt_1.wav", time=0.15416 sec
17/20: Leave-one-file-out CV for "coinSound/50/50nt_2.wav", time=0.1519 sec
18/20: Leave-one-file-out CV for "coinSound/50/50nt_3.wav", time=0.149667 sec
19/20: Leave-one-file-out CV for "coinSound/50/50nt_4.wav", time=0.155874 sec
20/20: Leave-one-file-out CV for "coinSound/50/50nt_5.wav", time=0.150383 sec
Frame-based leave-one-file-out RR=12.2222%
File-based leave-one-file-out RR=25%

Summary

This is a brief tutorial which uses the basic techniques in pattern recognition. There are several directions for further improvement:

Appendix

List of functions and datasets used in this script

Date and time when finishing this script:

fprintf('Date & time: %s\n', char(datetime));
Date & time: 04-Feb-2017 18:59:33

Overall elapsed time:

toc(scriptStartTime)
Elapsed time is 670.254524 seconds.

Jyh-Shing Roger Jang.