Tutorial on Beat Tracking for Polyphonic Audio Music
In this tutorial, we shall explain the basics of beat tracking for polyphonic audio music.
Contents
Preprocessing
Most of the modifiable options are set in vdOptSet.m:
type vdOptSet
Error using type File 'vdOptSet' not found. Error in goTutorial (line 5) type vdOptSet
Most of the time, you can change options in this file to try other settings for VD. In particular, the following changes are mandatory if you want to run the examples in this tutorial:
- Change the "homeDir" statement to point to the right parent folder where the Utility, SAP, and Machine Learning toolboxes reside. These toolboxes are available at at <http://mirlab.org/jang/matlab/toolbox>. (Make sure you have downloaded the latest versions of the toolboxes.)
- Change the "vdOpt.waveDir" statement to point to the right location of the vibrato corpus. The corpus is available under "\\140.114.88.246\g$\dataSet\vibrato".
For compatibility issue, I'll also list my OS and my MATLAB version:
fprintf('My platform: %s\n', computer); fprintf('MATLAB version: %s\n', version);
Dataset collection
We can now read all the wave files (that have been labeled with vibrato segments), perform feature extraction, and store the result in a big structure varaible waveData:
myTic=tic; vdOpt=vdOptSet; if ~exist('vsdWaveData.mat', 'file') waveData=waveDataFeaCollect(vdOpt); fprintf('Saving vdWaveData.mat...\n'); save vdWaveData waveData else fprintf('Loading vdWaveData.mat...\n'); load vdWaveData.mat end fprintf('time=%g sec\n', toc(myTic));
We can also create DS for all kinds of data visualization tools:
feature=[waveData.feature];
output=[waveData.tOutput];
temp=[waveData.other]; invalidIndex=[temp.invalidIndex];
feature(:, invalidIndex)=[];
output(:, invalidIndex)=[];
DS.input=feature;
DS.output=output;
DS.inputName=vdOpt.featureName;
DS.outputName=vdOpt.outputName;
DS2=DS; DS2.input=inputNormalize(DS2.input); % input normalization
Data analysis and visualization
Display data amount:
[classSize, classLabel]=dsClassSize(DS, 1);
Display data distribution among classes:
feature1=feature(:, output==1); feature2=feature(:, output==2); subplot(121); boxplot(feature1', DS.inputName, 'plotstyle', 'compact'); title('Features for class 1'); subplot(122); boxplot(feature2', DS.inputName, 'plotstyle', 'compact'); title('Features for class 2');
Scatter plot of the original DS to 2D:
figure; dsProjPlot2(DS);
Scatter plot of the input-normalized DS to 2D:
dsProjPlot2(DS2);
Scatter plot of the original DS to 3D:
dsProjPlot3(DS);
Scatter plot of the input-normalized DS to 3D:
dsProjPlot3(DS2);
Input selection based on KNNC, using the original dataset:
myTic=tic;
inputSelectExhaustive(DS);
fprintf('time=%g sec\n', toc(myTic));
Input selection based on KNNC, using the input-normalizd dataset:
clg;
myTic=tic;
inputSelectExhaustive(DS2);
fprintf('time=%g sec\n', toc(myTic));
LDA evaluation of approximate LOO
myTic=tic; clf; recogRate1=ldaPerfViaKnncLoo(DS); recogRate2=ldaPerfViaKnncLoo(DS2); [featureNum, dataNum] = size(DS.input); plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on legend('Raw data', 'Normalized data', 'location', 'northOutside', 'orientation', 'horizontal'); xlabel('No. of projected features based on LDA'); ylabel('LOO recognition rates using KNNC (%)'); fprintf('time=%g sec\n', toc(myTic));
LDA evaluation of exact LOO
myTic=tic; opt=ldaPerfViaKnncLoo('defaultOpt'); opt.mode='exact'; recogRate1=ldaPerfViaKnncLoo(DS); recogRate2=ldaPerfViaKnncLoo(DS2); [featureNum, dataNum] = size(DS.input); plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on legend('Raw data', 'Normalized data', 'location', 'northOutside', 'orientation', 'horizontal'); xlabel('No. of projected features based on LDA'); ylabel('LOO recognition rates using KNNC (%)'); fprintf('time=%g sec\n', toc(myTic));
HMM training
Using the collected waveData, we can start HMM training for vibrato detection:
myTic=tic;
vdHmmModel=hmmTrain4audio(waveData, vdOpt, 1);
fprintf('time=%g sec\n', toc(myTic));
HMM test
After the training, we can test the HMM using a wave file:
myTic=tic; waveFile='D:\dataset\vibrato\female\combined-female.wav'; wObj=hmmEval4audio(waveFile, vdOpt, vdHmmModel, 1); fprintf('time=%g sec\n', toc(myTic));
Performance evaluation of HMM via LOO
To evaluate the performance objectively, we can test the LOO accuracy by using "leave-one-file-out":
myTic=tic;
showPlot=1;
[outsideRr, cvData]=hmmPerfLoo4audio(waveData, vdOpt, showPlot);
fprintf('time=%g sec\n', toc(myTic));
Our previous analysis indicates that input normalization can improve the accuracy. So here we shall try the normalized input for HMM training and test:
myTic=tic; [~, mu, sigma]=inputNormalize(DS.input); for i=1:length(waveData) waveData(i).feature=inputNormalize(waveData(i).feature, mu, sigma); end [outsideRr, cvData]=hmmPerfLoo4audio(waveData, vdOpt, 1); fprintf('time=%g sec\n', toc(myTic));
Summary
This is a brief tutorial on using HMM for vibrato detection. There are several directions for further improvement:
- Investigate new features for VD.
- Change the configuration of the GMM used in HMM.
- Use of other classifiers for VD.
Jyh-Shing Roger Jang, 2013/01/08.