Programming Contest: Query By Singing/Humming

Roger Jang (張智星)

To goal of this exercise is to let the students get familiar with the steps involved in QBSH (query by singing/humming). In particular, the students are expected to fine-tune parameters or even find better methods for QBSH.

What to download
- Utility, SAP, and Machine Learning Toolboxes from Roger's toolbox homepage.
- Baseline program: exampleProgram.rar
- Dataset:
  - Small training set "queryInput": Included in the example program.
  - Small test set "queryInput-noGt": Included in the example program.
  - Training set: 2016 recordings (PV files only)
  - Test set: 2017 recordings (PV files only)
How to run the example program
- Modify two files as follows:
  - Modify "goTest.m" by assigning the dataset path to the variable "auDir".
  - Modify "myQbshOptSet.m" by adding necessary toolboxes to the search path.
- Run "goTest" under MATLAB to show the overall recognition rate as well as the recognition rate for each person. It also generates a output file under "output" directory to list the prediction of each file. You need to upload this output file for evaluating the performance for test set.
- After you have obtained the baseline recognition rate, you can perform error analysis. (This is hardest part of the assignment which requires time, patience, and your insight.)
  - Run "qbshPersonRr(auSet)" to list the recognition rate of each person.
How to get better accuracy
- Please modify the parameter settings in myQbshOptSet.m. (Be sure to understand the meanings of these parameters before modifying them.)
- Possible ways for improving linear scaling:
  1. Increase the resolution.
  2. Change lowerRatio and upperRatio.
  3. Here is an example. You can run "goPrmTune" to obtain a curve of recog. rate vs. computing time, with varying resolution in LS. A typical result is like this:
- Possible ways for improving DTW:
  1. Increase the number of key transpositions.
  2. Change the objective function of DTW to be the normalized distance (by number of mapping points) instead of total distance.
  3. Combine DTW distance and the number of mapping points to have a better scores for ranking.
  4. Observe the wrong and right DTW mapping paths to get an idea of how to constrain the paths.
- Combine several methods to improve the recognition rate.
What files to upload for performance evaluation:
- The output file of the test set, which should be "output/Result_2017-msar-pvOnly-noGt.txt" if your test set directory is "2017-msar-pvOnly-noGt". You can upload this file to the judge system for evaluation.
Be aware that
- If the performance tuning is time consuming, you can use a partial dataset (perhaps poorly performed files) to obtain a rough result quickly.
- The melody track in the song database is stored as songData.track in the format of [pitch, duration, pitch, duration, ...], where the units for pitch and duration are semitones and 1/64 second, respectively. If you want to hear the song, please refer to the script goSongPlay.m
- Before using LS/DTW for comparison, we need to convert the melody track into PV format. If you want to hear the song in PV format of 31.25 points/second, please also refer to the script goSongPlay.m for details.
- The H1 help of the example code
  exampleProgram
  File convention:
  - qbsh*.m: Functions you do not need to modify.
  - go*.m: Main program that you can execute directly.
  - my*.*: Files you need to modify to accommodate your own method.