Tutorial on Vibrato Detection (by Roger Jang)

This tutorial explains how to use HMM (Hidden Markov Models) for VD (vibrato detection) in human's singing voice. Here the human's singing voice is referred to pure vocals without any accompaniment. Such vibrato detection can be used in singing voice scoring, especially in karaoke machines.

Contents

Preprocessing

Before we start, let's add necessary toolboxes to the search path of MATLAB:

addpath d:/users/jang/matlab/toolbox/utility
addpath d:/users/jang/matlab/toolbox/sap
addpath d:/users/jang/matlab/toolbox/machineLearning

All the above toolboxes can be downloaded from the author's toolbox page. Make sure you are using the latest toolboxes to work with this script.

For compatibility, here we list the platform and MATLAB version that we used to run this script:

fprintf('Platform: %s\n', computer);
fprintf('MATLAB version: %s\n', version);
fprintf('Date & time: %s\n', char(datetime));
scriptStartTime=tic;	% Timing for the whole script
Platform: PCWIN64
MATLAB version: 8.5.0.197613 (R2015a)
Date & time: 18-Jun-2017 21:38:26

Most of the modifiable options for this vibrato detection task are set in vdOptSet.m:

type vdOptSet
function vdOpt=vdOptSet
% vdOptSet: Set options for VD (vibrato detection)
%
%	Usage:
%		vdOpt=vdOptSet;
%
%	Description:
%		vdOpt=vdOptSet returns the default options for vibrato detection.
%
%	Example:
%		vdOpt=vdOptSet

%	Category: Options for vibrato detection
%	Roger Jang, 20130114

%% === Function for feature extraction and plotting
vdOpt.feaExtractFcn=@vdFeaExtractFromFile;
%vdOpt.feaExtractFcn=@vibratoFeaExtract;
vdOpt.hmmPlotFcn=@vdHmmPlot;
%% === Folder for wave files
vdOpt.audioDir='D:\dataset\vibrato\TeresaTeng\waveAndPitch';
%% === Parameters for VD
vdOpt.frameSize=512;
vdOpt.overlap=0;
vdOpt.pfType=1;		% For pitch tracking, 0 for AMDF, 1 for ACF
vdOpt.sFrameSizeInSec=0.3;	% Super frame size
vdOpt.sOverlapInSec=0;		% Super overlap
vdOpt.featureName={'aPitch', 'bPitch', 'distPitch', 'aVol', 'bVol', 'distVol'};
%vdOpt.featureName={'aPitch', 'bPitch', 'distPitch'};
vdOpt.outputName={'nonvibrato', 'vibrato'};
vdOpt.classNum=length(vdOpt.outputName);
vdOpt.gaussianNum=3;	% No. of Gaussians for each class

If you want to run this script, you need to change vdOpt.audioDir such that it points to a folder of sound files containing singing voices with vibrato. The dataset used in the script can be downloaded from <http://mirlab.org/dataset/public>.

Dataset collection

First of all, we can collect all the sound files once they are downloaded. We can use the commmand "mmDataCollect" to collect all the file information:

vdOpt=vdOptSet;
opt=mmDataCollect('defaultOpt');
opt.extName='wav';
auData=mmDataCollect(vdOpt.audioDir, opt, 1);
Collecting 12 files with extension "wav" from "D:\dataset\vibrato\TeresaTeng\waveAndPitch"...

We can now read all the wave files (that have been labeled with vibrato segments), perform feature extraction, and store the result in a big structure varaible auSet:

myTic=tic;
auSet=auSetFeaExtract(auData, vdOpt, 1);
fprintf('Saving vdAuSet.mat...\n');
save vdAuSet auSet
fprintf('time=%g sec\n', toc(myTic));
1/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/一水隔天涯.wav, duration=31.6071 sec, time=4.4513 sec
2/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/何日君再來.wav, duration=68.7801 sec, time=14.1175 sec
3/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/你怎麼說.wav, duration=56.1401 sec, time=9.36667 sec
4/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/再見我的愛人.wav, duration=65.3401 sec, time=10.3648 sec
5/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/夜來香.wav, duration=61.0001 sec, time=4.37216 sec
6/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/小媳婦回娘家.wav, duration=61.6901 sec, time=11.4443 sec
7/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/帝女花.wav, duration=76.8601 sec, time=5.55387 sec
8/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/梅花.wav, duration=242.012 sec, time=27.0435 sec
9/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/獨上西樓.wav, duration=52.9102 sec, time=7.73513 sec
10/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/相似淚.wav, duration=91.3201 sec, time=18.2425 sec
11/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/郊道.wav, duration=122.16 sec, time=25.4719 sec
12/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/高山青.wav, duration=60.2945 sec, time=4.84922 sec
Deleting 0 wave file (due to no tOutput)...
Total time=143.614 sec
Saving vdAuSet.mat...
time=143.666 sec

We can also create DS for all kinds of data visualization tools:

feature=[auSet.feature];
output=[auSet.tOutput];
temp=[auSet.other]; invalidIndex=[temp.invalidIndex];
feature(:, invalidIndex)=[];
output(:, invalidIndex)=[];
DS.input=feature;
DS.output=output;
DS.inputName=vdOpt.featureName;
DS.outputName=vdOpt.outputName;
DS2=DS; DS2.input=inputNormalize(DS2.input);	% input normalization

Data analysis and visualization

Display data am ount:

[classSize, classLabel]=dsClassSize(DS, 1);
6 features
1328 instances
2 classes

Display feature distribution among different classes:

figure; dsBoxPlot(DS);

Scatter plot of the original DS to 2D:

figure; dsProjPlot2(DS); figEnlarge;

Scatter plot of the input-normalized DS to 2D:

figure; dsProjPlot2(DS2); figEnlarge;

Scatter plot of the original DS to 3D:

figure; dsProjPlot3(DS); figEnlarge;

Scatter plot of the input-normalized DS to 3D:

figure; dsProjPlot3(DS2); figEnlarge;

Input selection based on KNNC, using the original dataset:

myTic=tic;
inputSelectExhaustive(DS); figEnlarge;
fprintf('time=%g sec\n', toc(myTic));
Construct 63 knnc models, each with up to 6 inputs selected from 6 candidates...
modelIndex 1/63: selected={aPitch} => Recog. rate = 60.316265%
modelIndex 2/63: selected={bPitch} => Recog. rate = 73.493976%
modelIndex 3/63: selected={distPitch} => Recog. rate = 61.219880%
modelIndex 4/63: selected={aVol} => Recog. rate = 57.003012%
modelIndex 5/63: selected={bVol} => Recog. rate = 57.228916%
modelIndex 6/63: selected={distVol} => Recog. rate = 54.216867%
modelIndex 7/63: selected={aPitch, bPitch} => Recog. rate = 76.280120%
modelIndex 8/63: selected={aPitch, distPitch} => Recog. rate = 67.846386%
modelIndex 9/63: selected={aPitch, aVol} => Recog. rate = 62.650602%
modelIndex 10/63: selected={aPitch, bVol} => Recog. rate = 61.596386%
modelIndex 11/63: selected={aPitch, distVol} => Recog. rate = 60.617470%
modelIndex 12/63: selected={bPitch, distPitch} => Recog. rate = 72.364458%
modelIndex 13/63: selected={bPitch, aVol} => Recog. rate = 69.578313%
modelIndex 14/63: selected={bPitch, bVol} => Recog. rate = 72.063253%
modelIndex 15/63: selected={bPitch, distVol} => Recog. rate = 71.762048%
modelIndex 16/63: selected={distPitch, aVol} => Recog. rate = 58.810241%
modelIndex 17/63: selected={distPitch, bVol} => Recog. rate = 58.283133%
modelIndex 18/63: selected={distPitch, distVol} => Recog. rate = 57.003012%
modelIndex 19/63: selected={aVol, bVol} => Recog. rate = 54.367470%
modelIndex 20/63: selected={aVol, distVol} => Recog. rate = 54.292169%
modelIndex 21/63: selected={bVol, distVol} => Recog. rate = 54.518072%
modelIndex 22/63: selected={aPitch, bPitch, distPitch} => Recog. rate = 78.990964%
modelIndex 23/63: selected={aPitch, bPitch, aVol} => Recog. rate = 74.021084%
modelIndex 24/63: selected={aPitch, bPitch, bVol} => Recog. rate = 76.129518%
modelIndex 25/63: selected={aPitch, bPitch, distVol} => Recog. rate = 73.192771%
modelIndex 26/63: selected={aPitch, distPitch, aVol} => Recog. rate = 67.093373%
modelIndex 27/63: selected={aPitch, distPitch, bVol} => Recog. rate = 70.331325%
modelIndex 28/63: selected={aPitch, distPitch, distVol} => Recog. rate = 66.340361%
modelIndex 29/63: selected={aPitch, aVol, bVol} => Recog. rate = 62.123494%
modelIndex 30/63: selected={aPitch, aVol, distVol} => Recog. rate = 57.981928%
modelIndex 31/63: selected={aPitch, bVol, distVol} => Recog. rate = 58.659639%
modelIndex 32/63: selected={bPitch, distPitch, aVol} => Recog. rate = 70.406627%
modelIndex 33/63: selected={bPitch, distPitch, bVol} => Recog. rate = 73.945783%
modelIndex 34/63: selected={bPitch, distPitch, distVol} => Recog. rate = 71.159639%
modelIndex 35/63: selected={bPitch, aVol, bVol} => Recog. rate = 68.524096%
modelIndex 36/63: selected={bPitch, aVol, distVol} => Recog. rate = 63.554217%
modelIndex 37/63: selected={bPitch, bVol, distVol} => Recog. rate = 69.578313%
modelIndex 38/63: selected={distPitch, aVol, bVol} => Recog. rate = 60.918675%
modelIndex 39/63: selected={distPitch, aVol, distVol} => Recog. rate = 55.346386%
modelIndex 40/63: selected={distPitch, bVol, distVol} => Recog. rate = 57.906627%
modelIndex 41/63: selected={aVol, bVol, distVol} => Recog. rate = 55.647590%
modelIndex 42/63: selected={aPitch, bPitch, distPitch, aVol} => Recog. rate = 75.903614%
modelIndex 43/63: selected={aPitch, bPitch, distPitch, bVol} => Recog. rate = 77.861446%
modelIndex 44/63: selected={aPitch, bPitch, distPitch, distVol} => Recog. rate = 76.581325%
modelIndex 45/63: selected={aPitch, bPitch, aVol, bVol} => Recog. rate = 71.987952%
modelIndex 46/63: selected={aPitch, bPitch, aVol, distVol} => Recog. rate = 65.210843%
modelIndex 47/63: selected={aPitch, bPitch, bVol, distVol} => Recog. rate = 70.105422%
modelIndex 48/63: selected={aPitch, distPitch, aVol, bVol} => Recog. rate = 66.114458%
modelIndex 49/63: selected={aPitch, distPitch, aVol, distVol} => Recog. rate = 59.789157%
modelIndex 50/63: selected={aPitch, distPitch, bVol, distVol} => Recog. rate = 64.382530%
modelIndex 51/63: selected={aPitch, aVol, bVol, distVol} => Recog. rate = 59.789157%
modelIndex 52/63: selected={bPitch, distPitch, aVol, bVol} => Recog. rate = 69.954819%
modelIndex 53/63: selected={bPitch, distPitch, aVol, distVol} => Recog. rate = 63.478916%
modelIndex 54/63: selected={bPitch, distPitch, bVol, distVol} => Recog. rate = 69.126506%
modelIndex 55/63: selected={bPitch, aVol, bVol, distVol} => Recog. rate = 61.370482%
modelIndex 56/63: selected={distPitch, aVol, bVol, distVol} => Recog. rate = 56.099398%
modelIndex 57/63: selected={aPitch, bPitch, distPitch, aVol, bVol} => Recog. rate = 75.075301%
modelIndex 58/63: selected={aPitch, bPitch, distPitch, aVol, distVol} => Recog. rate = 65.436747%
modelIndex 59/63: selected={aPitch, bPitch, distPitch, bVol, distVol} => Recog. rate = 74.021084%
modelIndex 60/63: selected={aPitch, bPitch, aVol, bVol, distVol} => Recog. rate = 63.629518%
modelIndex 61/63: selected={aPitch, distPitch, aVol, bVol, distVol} => Recog. rate = 59.939759%
modelIndex 62/63: selected={bPitch, distPitch, aVol, bVol, distVol} => Recog. rate = 61.144578%
modelIndex 63/63: selected={aPitch, bPitch, distPitch, aVol, bVol, distVol} => Recog. rate = 64.081325%

Overall max recognition rate = 79.0%.
Selected 3 inputs (out of 6): aPitch, bPitch, distPitch
time=3.91904 sec

Input selection based on KNNC, using the input-normalizd dataset:

clf;
myTic=tic;
inputSelectExhaustive(DS2); figEnlarge;
fprintf('time=%g sec\n', toc(myTic));
Construct 63 knnc models, each with up to 6 inputs selected from 6 candidates...
modelIndex 1/63: selected={aPitch} => Recog. rate = 60.316265%
modelIndex 2/63: selected={bPitch} => Recog. rate = 73.493976%
modelIndex 3/63: selected={distPitch} => Recog. rate = 61.219880%
modelIndex 4/63: selected={aVol} => Recog. rate = 57.003012%
modelIndex 5/63: selected={bVol} => Recog. rate = 57.228916%
modelIndex 6/63: selected={distVol} => Recog. rate = 54.216867%
modelIndex 7/63: selected={aPitch, bPitch} => Recog. rate = 75.376506%
modelIndex 8/63: selected={aPitch, distPitch} => Recog. rate = 69.201807%
modelIndex 9/63: selected={aPitch, aVol} => Recog. rate = 61.144578%
modelIndex 10/63: selected={aPitch, bVol} => Recog. rate = 62.650602%
modelIndex 11/63: selected={aPitch, distVol} => Recog. rate = 60.391566%
modelIndex 12/63: selected={bPitch, distPitch} => Recog. rate = 74.021084%
modelIndex 13/63: selected={bPitch, aVol} => Recog. rate = 72.590361%
modelIndex 14/63: selected={bPitch, bVol} => Recog. rate = 72.213855%
modelIndex 15/63: selected={bPitch, distVol} => Recog. rate = 73.268072%
modelIndex 16/63: selected={distPitch, aVol} => Recog. rate = 59.939759%
modelIndex 17/63: selected={distPitch, bVol} => Recog. rate = 60.993976%
modelIndex 18/63: selected={distPitch, distVol} => Recog. rate = 59.111446%
modelIndex 19/63: selected={aVol, bVol} => Recog. rate = 57.304217%
modelIndex 20/63: selected={aVol, distVol} => Recog. rate = 56.024096%
modelIndex 21/63: selected={bVol, distVol} => Recog. rate = 54.292169%
modelIndex 22/63: selected={aPitch, bPitch, distPitch} => Recog. rate = 78.313253%
modelIndex 23/63: selected={aPitch, bPitch, aVol} => Recog. rate = 74.472892%
modelIndex 24/63: selected={aPitch, bPitch, bVol} => Recog. rate = 73.569277%
modelIndex 25/63: selected={aPitch, bPitch, distVol} => Recog. rate = 74.924699%
modelIndex 26/63: selected={aPitch, distPitch, aVol} => Recog. rate = 68.072289%
modelIndex 27/63: selected={aPitch, distPitch, bVol} => Recog. rate = 67.469880%
modelIndex 28/63: selected={aPitch, distPitch, distVol} => Recog. rate = 67.168675%
modelIndex 29/63: selected={aPitch, aVol, bVol} => Recog. rate = 60.692771%
modelIndex 30/63: selected={aPitch, aVol, distVol} => Recog. rate = 58.283133%
modelIndex 31/63: selected={aPitch, bVol, distVol} => Recog. rate = 58.734940%
modelIndex 32/63: selected={bPitch, distPitch, aVol} => Recog. rate = 72.138554%
modelIndex 33/63: selected={bPitch, distPitch, bVol} => Recog. rate = 75.451807%
modelIndex 34/63: selected={bPitch, distPitch, distVol} => Recog. rate = 72.966867%
modelIndex 35/63: selected={bPitch, aVol, bVol} => Recog. rate = 70.933735%
modelIndex 36/63: selected={bPitch, aVol, distVol} => Recog. rate = 71.686747%
modelIndex 37/63: selected={bPitch, bVol, distVol} => Recog. rate = 72.966867%
modelIndex 38/63: selected={distPitch, aVol, bVol} => Recog. rate = 59.939759%
modelIndex 39/63: selected={distPitch, aVol, distVol} => Recog. rate = 59.036145%
modelIndex 40/63: selected={distPitch, bVol, distVol} => Recog. rate = 56.701807%
modelIndex 41/63: selected={aVol, bVol, distVol} => Recog. rate = 57.304217%
modelIndex 42/63: selected={aPitch, bPitch, distPitch, aVol} => Recog. rate = 77.786145%
modelIndex 43/63: selected={aPitch, bPitch, distPitch, bVol} => Recog. rate = 76.731928%
modelIndex 44/63: selected={aPitch, bPitch, distPitch, distVol} => Recog. rate = 75.753012%
modelIndex 45/63: selected={aPitch, bPitch, aVol, bVol} => Recog. rate = 74.548193%
modelIndex 46/63: selected={aPitch, bPitch, aVol, distVol} => Recog. rate = 74.924699%
modelIndex 47/63: selected={aPitch, bPitch, bVol, distVol} => Recog. rate = 74.096386%
modelIndex 48/63: selected={aPitch, distPitch, aVol, bVol} => Recog. rate = 66.189759%
modelIndex 49/63: selected={aPitch, distPitch, aVol, distVol} => Recog. rate = 67.996988%
modelIndex 50/63: selected={aPitch, distPitch, bVol, distVol} => Recog. rate = 60.843373%
modelIndex 51/63: selected={aPitch, aVol, bVol, distVol} => Recog. rate = 60.542169%
modelIndex 52/63: selected={bPitch, distPitch, aVol, bVol} => Recog. rate = 73.719880%
modelIndex 53/63: selected={bPitch, distPitch, aVol, distVol} => Recog. rate = 72.138554%
modelIndex 54/63: selected={bPitch, distPitch, bVol, distVol} => Recog. rate = 71.762048%
modelIndex 55/63: selected={bPitch, aVol, bVol, distVol} => Recog. rate = 73.192771%
modelIndex 56/63: selected={distPitch, aVol, bVol, distVol} => Recog. rate = 57.153614%
modelIndex 57/63: selected={aPitch, bPitch, distPitch, aVol, bVol} => Recog. rate = 75.301205%
modelIndex 58/63: selected={aPitch, bPitch, distPitch, aVol, distVol} => Recog. rate = 75.677711%
modelIndex 59/63: selected={aPitch, bPitch, distPitch, bVol, distVol} => Recog. rate = 73.268072%
modelIndex 60/63: selected={aPitch, bPitch, aVol, bVol, distVol} => Recog. rate = 74.472892%
modelIndex 61/63: selected={aPitch, distPitch, aVol, bVol, distVol} => Recog. rate = 60.993976%
modelIndex 62/63: selected={bPitch, distPitch, aVol, bVol, distVol} => Recog. rate = 71.762048%
modelIndex 63/63: selected={aPitch, bPitch, distPitch, aVol, bVol, distVol} => Recog. rate = 73.493976%

Overall max recognition rate = 78.3%.
Selected 3 inputs (out of 6): aPitch, bPitch, distPitch
time=3.64535 sec

LDA evaluation of approximate LOO

figure;
myTic=tic;
opt=ldaPerfViaKnncLoo('defaultOpt');
opt.mode='approximate';
recogRate1=ldaPerfViaKnncLoo(DS, opt);
recogRate2=ldaPerfViaKnncLoo(DS2, opt);
featureNum=size(DS.input, 1);
plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on
legend('Raw data', 'Normalized data', 'location', 'northOutside', 'orientation', 'horizontal');
xlabel('No. of projected features based on LDA');
ylabel('LOO recognition rates using KNNC (%)');
fprintf('time=%g sec\n', toc(myTic));
time=0.811839 sec

LDA evaluation of exact LOO

figure
myTic=tic;
opt=ldaPerfViaKnncLoo('defaultOpt');
opt.mode='exact';
recogRate1=ldaPerfViaKnncLoo(DS, opt);
recogRate2=ldaPerfViaKnncLoo(DS2, opt);
[featureNum, dataNum] = size(DS.input);
plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on
legend('Raw data', 'Normalized data', 'location', 'northOutside', 'orientation', 'horizontal');
xlabel('No. of projected features based on LDA');
ylabel('LOO recognition rates using KNNC (%)');
fprintf('time=%g sec\n', toc(myTic));
time=7.9215 sec

HMM training

Using the collected auSet, we can start HMM training for vibrato detection:

figure;
myTic=tic;
vdHmmModel=hmmTrain4audio(auSet, vdOpt, 1);
fprintf('time=%g sec\n', toc(myTic));
time=0.604109 sec

HMM test

After the training, we can test the HMM using a wave file:

figure;
myTic=tic;
auFile='D:\dataset\vibrato\female\combined-female.wav';
wObj=hmmEval4audio(auFile, vdOpt, vdHmmModel, 1);
fprintf('time=%g sec\n', toc(myTic));
Accuracy=89.0805%
time=12.096 sec

Performance evaluation of HMM via LOO

To evaluate the performance objectively, we can test the LOO accuracy by using "leave-one-file-out":

myTic=tic;
showPlot=1;
[outsideRr, cvData]=hmmPerfLoo4audio(auSet, vdOpt, showPlot);
fprintf('time=%g sec\n', toc(myTic));
1/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/一水隔天涯.wav
	outsideRr=88.0734%, time=0.472091 sec
2/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/何日君再來.wav
	outsideRr=83.1933%, time=0.415094 sec
3/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/你怎麼說.wav
	outsideRr=85.567%, time=0.437504 sec
4/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/再見我的愛人.wav
	outsideRr=82.7434%, time=0.396338 sec
5/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/夜來香.wav
	outsideRr=99.0521%, time=0.460564 sec
6/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/小媳婦回娘家.wav
	outsideRr=81.7757%, time=0.362063 sec
7/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/帝女花.wav
	outsideRr=90.6015%, time=0.459397 sec
8/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/梅花.wav
	outsideRr=88.5714%, time=0.343212 sec
9/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/獨上西樓.wav
	outsideRr=82.5137%, time=0.396055 sec
10/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/相似淚.wav
	outsideRr=88.0126%, time=0.264777 sec
11/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/郊道.wav
	outsideRr=83.9623%, time=0.332093 sec
12/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/高山青.wav
	outsideRr=96.6507%, time=0.441817 sec
Overall LOO accuracy=87.5546%
time=4.88907 sec

Our previous analysis indicates that input normalization can improve the accuracy. So here we shall try the normalized input for HMM training and test:

myTic=tic;
[~, mu, sigma]=inputNormalize(DS.input);
for i=1:length(auSet)
	auSet(i).feature=inputNormalize(auSet(i).feature, mu, sigma);
end
[outsideRr, cvData]=hmmPerfLoo4audio(auSet, vdOpt, 1);
fprintf('time=%g sec\n', toc(myTic));
1/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/一水隔天涯.wav
	outsideRr=87.156%, time=0.525612 sec
2/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/何日君再來.wav
	outsideRr=82.3529%, time=0.501536 sec
3/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/你怎麼說.wav
	outsideRr=89.6907%, time=0.371044 sec
4/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/再見我的愛人.wav
	outsideRr=88.0531%, time=0.409394 sec
5/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/夜來香.wav
	outsideRr=99.0521%, time=0.492033 sec
6/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/小媳婦回娘家.wav
	outsideRr=79.9065%, time=0.428882 sec
7/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/帝女花.wav
	outsideRr=90.6015%, time=0.494059 sec
8/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/梅花.wav
	outsideRr=89.881%, time=0.345296 sec
9/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/獨上西樓.wav
	outsideRr=82.5137%, time=0.435193 sec
10/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/相似淚.wav
	outsideRr=83.2808%, time=0.423645 sec
11/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/郊道.wav
	outsideRr=84.9057%, time=0.443758 sec
12/12: file=D:\dataset\vibrato\TeresaTeng\waveAndPitch/高山青.wav
	outsideRr=96.6507%, time=0.454665 sec
Overall LOO accuracy=87.9335%
time=5.43335 sec

Summary

This is a brief tutorial on using HMM for vibrato detection. There are several directions for further improvement:

Appendix

List of functions, scripts, and datasets used in this script:

Date and time when finishing this script:

fprintf('Date & time: %s\n', char(datetime));
Date & time: 18-Jun-2017 21:41:41

Overall elapsed time:

toc(scriptStartTime)
Elapsed time is 194.615769 seconds.

Jyh-Shing Roger Jang, created on

datetime
ans = 

   18-Jun-2017 21:41:41

If you are interested in the original MATLAB code for this page, you can type "grabcode(URL)" under MATLAB, where URL is the web address of this page.