Tutorial on Beat Tracking for Polyphonic Audio Music

In this tutorial, we shall explain the basics of beat tracking for polyphonic audio music.

Contents

Preprocessing

Most of the modifiable options are set in vdOptSet.m:

type vdOptSet
Error using type
File 'vdOptSet' not found.

Error in goTutorial (line 5)
type vdOptSet

Most of the time, you can change options in this file to try other settings for VD. In particular, the following changes are mandatory if you want to run the examples in this tutorial:

For compatibility issue, I'll also list my OS and my MATLAB version:

fprintf('My platform: %s\n', computer);
fprintf('MATLAB version: %s\n', version);

Dataset collection

We can now read all the wave files (that have been labeled with vibrato segments), perform feature extraction, and store the result in a big structure varaible waveData:

myTic=tic;
vdOpt=vdOptSet;
if ~exist('vsdWaveData.mat', 'file')
	waveData=waveDataFeaCollect(vdOpt);
	fprintf('Saving vdWaveData.mat...\n');
	save vdWaveData waveData
else
	fprintf('Loading vdWaveData.mat...\n');
	load vdWaveData.mat
end
fprintf('time=%g sec\n', toc(myTic));

We can also create DS for all kinds of data visualization tools:

feature=[waveData.feature];
output=[waveData.tOutput];
temp=[waveData.other]; invalidIndex=[temp.invalidIndex];
feature(:, invalidIndex)=[];
output(:, invalidIndex)=[];
DS.input=feature;
DS.output=output;
DS.inputName=vdOpt.featureName;
DS.outputName=vdOpt.outputName;
DS2=DS; DS2.input=inputNormalize(DS2.input);	% input normalization

Data analysis and visualization

Display data amount:

[classSize, classLabel]=dsClassSize(DS, 1);

Display data distribution among classes:

feature1=feature(:, output==1);
feature2=feature(:, output==2);
subplot(121);
boxplot(feature1', DS.inputName, 'plotstyle', 'compact'); title('Features for class 1');
subplot(122);
boxplot(feature2', DS.inputName, 'plotstyle', 'compact'); title('Features for class 2');

Scatter plot of the original DS to 2D:

figure;
dsProjPlot2(DS);

Scatter plot of the input-normalized DS to 2D:

dsProjPlot2(DS2);

Scatter plot of the original DS to 3D:

dsProjPlot3(DS);

Scatter plot of the input-normalized DS to 3D:

dsProjPlot3(DS2);

Input selection based on KNNC, using the original dataset:

myTic=tic;
inputSelectExhaustive(DS);
fprintf('time=%g sec\n', toc(myTic));

Input selection based on KNNC, using the input-normalizd dataset:

clg;
myTic=tic;
inputSelectExhaustive(DS2);
fprintf('time=%g sec\n', toc(myTic));

LDA evaluation of approximate LOO

myTic=tic;
clf;
recogRate1=ldaPerfViaKnncLoo(DS);
recogRate2=ldaPerfViaKnncLoo(DS2);
[featureNum, dataNum] = size(DS.input);
plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on
legend('Raw data', 'Normalized data', 'location', 'northOutside', 'orientation', 'horizontal');
xlabel('No. of projected features based on LDA');
ylabel('LOO recognition rates using KNNC (%)');
fprintf('time=%g sec\n', toc(myTic));

LDA evaluation of exact LOO

myTic=tic;
opt=ldaPerfViaKnncLoo('defaultOpt');
opt.mode='exact';
recogRate1=ldaPerfViaKnncLoo(DS);
recogRate2=ldaPerfViaKnncLoo(DS2);
[featureNum, dataNum] = size(DS.input);
plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on
legend('Raw data', 'Normalized data', 'location', 'northOutside', 'orientation', 'horizontal');
xlabel('No. of projected features based on LDA');
ylabel('LOO recognition rates using KNNC (%)');
fprintf('time=%g sec\n', toc(myTic));

HMM training

Using the collected waveData, we can start HMM training for vibrato detection:

myTic=tic;
vdHmmModel=hmmTrain4audio(waveData, vdOpt, 1);
fprintf('time=%g sec\n', toc(myTic));

HMM test

After the training, we can test the HMM using a wave file:

myTic=tic;
waveFile='D:\dataset\vibrato\female\combined-female.wav';
wObj=hmmEval4audio(waveFile, vdOpt, vdHmmModel, 1);
fprintf('time=%g sec\n', toc(myTic));

Performance evaluation of HMM via LOO

To evaluate the performance objectively, we can test the LOO accuracy by using "leave-one-file-out":

myTic=tic;
showPlot=1;
[outsideRr, cvData]=hmmPerfLoo4audio(waveData, vdOpt, showPlot);
fprintf('time=%g sec\n', toc(myTic));

Our previous analysis indicates that input normalization can improve the accuracy. So here we shall try the normalized input for HMM training and test:

myTic=tic;
[~, mu, sigma]=inputNormalize(DS.input);
for i=1:length(waveData)
	waveData(i).feature=inputNormalize(waveData(i).feature, mu, sigma);
end
[outsideRr, cvData]=hmmPerfLoo4audio(waveData, vdOpt, 1);
fprintf('time=%g sec\n', toc(myTic));

Summary

This is a brief tutorial on using HMM for vibrato detection. There are several directions for further improvement:

Jyh-Shing Roger Jang, 2013/01/08.