%% Tutorial on leaf recognition % This tutorial covers the basics of leaf recognition based on its shape and color statistics. % The dataset is availabe at . %% Preprocessing % Before we start, let's add necessary toolboxes to the search path of MATLAB: addpath d:/users/jang/matlab/toolbox/utility addpath d:/users/jang/matlab/toolbox/machineLearning %% % All the above toolboxes can be downloaded from the author's . % Make sure you are using the latest toolboxes to work with this script. %% % For compatibility, here we list the platform and MATLAB version that we used to run this script: fprintf('Platform: %s\n', computer); fprintf('MATLAB version: %s\n', version); fprintf('Date & time: %s\n', char(datetime)); scriptStartTime=tic; %% Dataset construction % First of all, we shall collect all the image data from the image % directory. Note that % % * The images have been reorganized for easy parsing (with a subfolder for each class), which can be downloaded from <../leafSorted.rar here>. % * For simplicity, we shall only use 5 classes instead of the original 32 classes. % * During the data collection, we shall also plot the leaves for each class. imDir='D:\users\jang\books\dcpr\appNote\leafId\leafSorted'; opt=mmDataCollect('defaultOpt'); opt.extName='jpg'; opt.maxClassNum=5; imageData=mmDataCollect(imDir, opt, 1); %% Feature extraction % For each image, we need to extract the corresponding features for classification. % We shall use the function "leafFeaExtract" for feature extraction. % We also need to put all the dataset into a format that is easier for further processing, including classifier construction and evaluation. opt=dsCreateFromMm('defaultOpt'); if exist('ds.mat', 'file') fprintf('Loading ds.mat...\n'); load ds.mat else myTic=tic; opt=dsCreateFromMm('defaultOpt'); opt.imFeaFcn=@leafFeaExtract; % Function for feature extraction opt.imFeaOpt=feval(opt.imFeaFcn, 'defaultOpt'); % Feature options ds=dsCreateFromMm(imageData, opt); fprintf('Time for feature extraction over %d images = %g sec\n', length(imageData), toc(myTic)); fprintf('Saving ds.mat...\n'); save ds ds end %% % Note that since feature extraction is a lengthy process, we have save the resulting variable "ds" into "ds.mat". % If needed, you can simply load the file to restore the dataset variable "ds" and play around with it. % But if you have changed the feature extraction function, be sure to delete ds.mat first to enforce the feature extraction. %% % Basically the extracted features are based on the regions separated by Otsu's method. % We only consider the region with the maximum area, and compute its region properties and color statistics as features. % You can type "leafFeaExtract" to have a self-demo of the function: figure; leafFeaExtract; %% Dataset visualization % Once we have all the necessary information stored in "ds", % we can invoke many different functions in Machine Learning Toolbox for % data visualization and classification. %% % For instance, we can display the size of each class: figure; [classSize, classLabel]=dsClassSize(ds, 1); %% % We can plot the distribution of each features within each class: figure; dsBoxPlot(ds); %% % The box plots indicate the ranges of the features vary a lot. To verify this, % we can simply plot the range of features of the dataset: figure; dsRangePlot(ds); %% % Big range difference cause problems in distance-based classification. To % avoid this, we can simply apply z-normalization to each feature: ds2=ds; ds2.input=inputNormalize(ds2.input); %% % We can now plot the feature vectors within each class: figure; dsFeaVecPlot(ds); figEnlarge; %% % We can also do scatter plots on each pair of the original features: figure; dsProjPlot2(ds); figEnlarge; %% % It is hard to see the above plots due to a large difference in the range of each features. % We can try the same plot with normalized inputs: figure; dsProjPlot2(ds2); figEnlarge; %% % We can also do the scatter plots in the 3D space: figure; dsProjPlot3(ds2); figEnlarge; %% % In order to visualize the distribution of the dataset, % we can project the original dataset into 2-D space. % This can be achieved by LDA (linear discriminant analysis): ds2d=lda(ds); ds2d.input=ds2d.input(1:2, :); figure; dsScatterPlot(ds2d); xlabel('Input 1'); ylabel('Input 2'); title('Features projected on the first 2 lda vectors'); %% Classification % We can try the most straightforward KNNC (k-nearest neighbor classifier): rr=knncLoo(ds); fprintf('rr=%g%% for ds\n', rr*100); %% % For normalized dataset, usually we can obtain a better accuracy: [rr, computed]=knncLoo(ds2); fprintf('rr=%g%% for ds2 of normalized inputs\n', rr*100); %% % We can plot the confusion matrix: confMat=confMatGet(ds2.output, computed); opt=confMatPlot('defaultOpt'); opt.className=ds.outputName; opt.mode='both'; figure; confMatPlot(confMat, opt); %% % We can perform sequential input selection to find the best features: figure; tic; inputSelectSequential(ds2, inf, 'knnc'); toc %% % Since the number of features is not too big, we can also exhaustive search to find the best features: figure; tic; inputSelectExhaustive(ds2, inf, 'knnc'); toc %% % It is obvious that the exhaustive search can find the best features, but % at the cost of more computation. %% % We can even perform an exhaustive search on the classifiers and the way % of input normalization: opt=perfCv4classifier('defaultOpt'); opt.foldNum=10; tic; [perfData, bestId]=perfCv4classifier(ds, opt, 1); toc structDispInHtml(perfData, 'Performance of various classifiers via cross validation'); %% % We can then display the confusion matrix of the best classifier: confMat=confMatGet(ds.output, perfData(bestId).bestComputedClass); opt=confMatPlot('defaultOpt'); opt.className=ds.outputName; figure; confMatPlot(confMat, opt); %% % We can also list all the misclassified images in a table: for i=1:length(imageData) imageData(i).classIdPredicted=perfData(bestId).bestComputedClass(i); imageData(i).classPredicted=ds.outputName{imageData(i).classIdPredicted}; end listOpt=mmDataList('defaultOpt'); mmDataList(imageData, listOpt); %% Summary % This is a brief tutorial on leaf recognition based on its shape and color statistics. % There are several directions for further improvement: % % * Explore other features, such as vein distribution % * Try the classification problem using the whole dataset % * Use template matching as an alternative to improve the performance % %% Appendix % List of functions, scripts, and datasets used in this script: % % * <../leafSorted.rar Dataset> used in this script. % * <../list.asp List of files in this folder> % %% % Overall elapsed time: toc(scriptStartTime) %% % , created on datetime %% % If you are interested in the original MATLAB code for this page, you can % type "grabcode(URL)" under MATLAB, where URL is the web address of this % page.