This programming exercise provides a baseline for pitch tracking on singing recordings. You are expected to understand the code and fine tune the parameters to improve the accuracy of pitch tracking.
Training set:
2016 recordings, with groundtruth (human labeled pitch).
Test set:
2017 recordings, without groundtruth. You need to use this set to generate PV files, and upload them for performance evaluation.
How to run the example program
Modify the main program "goTest.m" as follows:
Add necessary toolboxes to the search path.
Assign the dataset path to the variable "auDir".
Run "goTest" under MATLAB to show the overall recognition rate as well as the recognition rate for each person.
After you have obtained the baseline recognition rate, you can perform error analysis.
Run "ptPersonRr(auSet)" to list the recognition rate of each person.
Run "ptFileCheck(auSet)" to check badly performed files.
Run "ptFileCheck(auSet, '09608050PETR')" to check badly performed files from a specific singer '09608050PETR'.
How to get better accuracy
How to take care of scattered abrupt change in pitch (just like salt and pepper noise in images)
Use a median filter to smooth the computed pitch.
Use rule-based method to find such pitch points, and replace them with average of previous and next few pitch points.
How to deal with pitch doubling/halving errors
Try other PDFs to see if it helps. (Optimize parameters for each PDF if possible.)
Try PDF combination
Keep several pitch candidates for each frame and then obtain the most smooth pitch curve with the least doubling/halving errors using dynamic programming.
Estimate some statistics of the groundtruth pitch and apply them to your search
Estimate the pitch range of a single file
Estimate the biggest deviation of neighboring pitch points
Estimate the likelihood of pitch distribution
To use existing functions of pitch tracking in SAP toolbox
pitchTrack.m: Simple peak picking and dynamic programming in PDF
pitchTrackForcedSmooth.m: Multiple passes of dynamic programming
(There is no guarantee that these function will generate the best result.)
In order to separate the problem of "voiced segment detection" and "pitch tracking", here we adopt the performance index of raw pitch accuracy which only uses the non-zero part of the ground-truth pitch to compute the accuracy. As a result, your best bet is to generate a pitch value for each frame, no matter the frame has pitch or not.
What to upload for performance evaluation
For past years, you need to upload the following files for evaluation:
myPt.m: Your main function for PT
myPtOptSet.m: The best parameters for myPt.m.
myMethod.txt: Please describe your method briefly, including the recognition rates and the lesson you learned.
The other files that might be used by your program.
However, since the no. of files for evaluation is getting larger (due to increasing number of students taking this class), we decided to follow the convention of Kaggle. That is, all you need to upload is the PV (pitch vector) files generated by your pitch tracking program. To generate the PV files, you can simply run goPvGen.m after modifying the following two lines:
auDir='audioWithoutGt'; % Replace this with the recordings of this year, which contains audio files without PV files
outputPvDir='b00902024_pv2'; % All the generated PV files are put here, so you can compress the folder for upload. Please follow the convention of "studentID_pv2".
After running goPvGen.m, it will generate a folder of "xxxxxxxxx_pv2" (where the leading characters are you student ID) which contains PV files (with extension "pv2") generated by your pitch tracking program. Please compress the whole directory tree and upload it for evaluation. (Note that manual change of the PV files will be considered cheating in this course. TA may ask you to submit the programs that can generate the PV files for double checking.)
Suggestions
If you find errors in manually labeled data, please post on FB so other students are aware of the errors. (TAs will try their best to correct the errors.)
If the performance tuning is time consuming, you can use a partial dataset (perhaps poorly performed files) to obtain a rough result quickly.
The H1 help of the example code
File convention:
pt*.m: Functions that you are not allowed to modify or upload.
go*.m: Main program that you can execute directly.