%% Tutorial on music genre classification
% This tutorial explains the basics of music genre classification (MGC)
% using MFCC (mel-frequency cepstral coefficients) as the features for classification.
% The dataset used here is .
%% Preprocessing
% Before we start, let's add necessary toolboxes to the search path of MATLAB:
% For using SVM classifier
%% 
% All the above toolboxes can be downloaded from the author's .
% Make sure you are using the latest toolboxes to work with this script.
%% 
% For compatibility, here we list the platform and MATLAB version that we used to run this script:
fprintf('Platform: %s\n', computer);
fprintf('MATLAB version: %s\n', version);
fprintf('Script starts at %s\n', char(datetime));
scriptStartTime=tic;		% Timing for the whole script
%% Dataset collection
% First of all, we shall collect all the audio files from the corpus
% directory. Note that % % * The audio files have extensions of "au". % * These files have been organized for easy parsing, with a subfolder for each class. auDir='d:/dataSet/musicGenreClassification/GTZAN'; opt=mmDataCollect('defaultOpt'); opt.extName='au'; auSet=mmDataCollect(auDir, opt, 1); %% Feature extraction % For each audio, we need to extract the corresponding feature vector for classification. % We shall use the function <../mgcFeaExtract.m mgcFeaExtract.m> (which MFCC and its statistics) for feature extraction. % We also need to put all the dataset into a single variable "ds" which is easier for further processing, including classifier construction and evaluation. if ~exist('ds.mat', 'file') myTic=tic; opt=dsCreateFromMm('defaultOpt'); opt.auFeaFcn=@mgcFeaExtract; % Function for feature extraction opt.auFeaOpt=feval(opt.auFeaFcn, 'defaultOpt'); % Feature options opt.auEpdFcn=''; % No need to do endpoint detection ds=dsCreateFromMm(auSet, opt, 1); fprintf('Time for feature extraction over %d files = %g sec\n', length(auSet), toc(myTic)); fprintf('Saving ds.mat...\n'); save ds ds else fprintf('Loading ds.mat...\n'); load ds.mat end %% % Note that if feature extraction is lengthy, we can simply load ds.mat which has been save in the above code snippet. %% % Basically the extracted features are based on MFCC's statistics, including mean, std, min, and max along each dimension. % Since MFCC has 39 dimensions, the extracted file-based features has 156 (= 39*4) dimensions. % You can type try "mgcFeaExtract" on one of the audio file to plot the result: auFile=[auDir, '/disco/disco.00001.au']; figure; mgcFeaExtract(auFile, [], 1); %% Dataset visualization % Once we have every piece of necessary information stored in "ds", % we can invoke many different functions in Machine Learning Toolbox for % data visualization and classification. %% % For instance, we can display the size of each class: figure; [classSize, classLabel]=dsClassSize(ds, 1); %% % We can plot the range of features of the dataset: figure; dsRangePlot(ds); %% % We can plot the feature vectors within each class: figure; dsFeaVecPlot(ds); figEnlarge; %% Dimensionality reduction % The dimension of the feature vector is quite large: dim=size(ds.input, 1); fprintf('Feature dimensions = %d\n', dim); %% % We shall consider dimensionality reduction via PCA (principal component % analysis). First, let's plot the cumulative variance given the descending % eigenvalues of PCA: [input2, eigVec, eigValue]=pca(ds.input); cumVar=cumsum(eigValue); cumVarPercent=cumVar/cumVar(end)*100; figure; plot(cumVarPercent, '.-'); xlabel('No. of eigenvalues'); ylabel('Cumulated variance percentage (%)'); title('Variance percentage vs. no. of eigenvalues'); %% % A reasonable choice is to retain the dimensionality such that the cumulative % variance percentage is larger than a threshold, say, 95%, as follows: cumVarTh=95; index=find(cumVarPercent>cumVarTh); newDim=index(1); ds2=ds; ds2.input=input2(1:newDim, :); fprintf('Reduce the dimensionality to %d to keep %g%% cumulative variance via PCA.\n', newDim, cumVarTh); %% % However, our experiment indicates that if we use PCA for dimensionality % reduction, the accuracy will be lower. As a result, we shall keep all the
% features for further exploration.
%% 
% In order to visualize the distribution of the dataset,
% we can project the original dataset into 2-D space.
% This can be achieved by LDA (linear discriminant analysis):
ds2d=lda(ds);
ds2d.input=ds2d.input(1:2, :);
figure; dsScatterPlot(ds2d);
xlabel('Input 1'); ylabel('Input 2');
title('Features projected on the first 2 lda vectors');
%% 
% Apparently the separation among classes is not obvious.
% This indicates that either the features or LDA are not very effective.
%% Classification
% We can try the most straightforward KNNC (k-nearest neighbor classifier):
[rr, ~]=knncLoo(ds);
fprintf('rr=%g%% for original ds\n', rr*100);
ds2=ds; ds2.input=inputNormalize(ds2.input);
[rr2, computed]=knncLoo(ds2);
fprintf('rr=%g%% for ds after input normalization\n', rr2*100);
%% 
% Again, the vanilla KNNC does not give satisfactory result.
% So we shall try other potentially better classifiers, such as SVM.
% Before try SVM, here we use a function <../mgcOptSet.m mgcOptSet.m> to put all the MGC-related options in a single file.
% This will be easier for us to change a single option in this file and check out the accuracy of MGC.
% Here is the code for using SVM:
myTic=tic;
mgcOpt=mgcOptSet;
if mgcOpt.useInputNormalize, ds.input=inputNormalize(ds.input); end		% Input normalization
cvPrm=crossValidate('defaultOpt');
cvPrm.foldNum=mgcOpt.foldNum;
cvPrm.classifier=mgcOpt.classifier;
plotOpt=1;
figure; [tRrMean, vRrMean, tRr, vRr, computedClass]=crossValidate(ds, cvPrm, plotOpt);
figEnlarge;
fprintf('Time for cross-validation = %g sec\n', toc(myTic));
%% 
% The recognition rate is 77%, indicating SVM is a much more effective classifier.
%% 
% We can plot the confusion matrix:
for i=1:length(computedClass)
	computed(i)=computedClass{i};
end
desired=ds.output;
confMat = confMatGet(desired, computed);
cmOpt=confMatPlot('defaultOpt');
cmOpt.className=ds.outputName;
confMatPlot(confMat, cmOpt);
figEnlarge;
%% Summary
% This is a brief tutorial on music genre classification based on MFCC's statistics.
% There are several directions for further improvement:
% 
% * Explore other features and feature selection for MGC
% * Explore other classifiers (and their combinations) for MGC
% 
%% Appendix
% List of functions and datasets used in this script
% 
% * .
% * <../list.asp List of files in this folder>
% 
%% 
% Date and time when finishing this script:
fprintf('Date & time: %s\n', char(datetime));
%% 
% Overall elapsed time:
toc(scriptStartTime)