Part-2 Tutorial on Singing Transcription (by Roger Jang)
Contents
This is the part-2 tutorial on singing transcription from polyphonic music, for AI Cup Competition. This part will focus on MATLAB functions for singing transcription. You should have read part-2 tutorial before reading this part.
Preprocessing
Before we start, let's add necessary toolboxes to the search path of MATLAB:
addpath d:/users/jang/matlab/toolbox/utility addpath d:/users/jang/matlab/toolbox/sap addpath d:/users/jang/matlab/toolbox/machineLearning
All the above toolboxes can be downloaded from Roger's toolbox page. Make sure you are using the latest toolboxes to work with this script.
For compatibility, here we list the platform and MATLAB version that we used to run this script:
fprintf('Platform: %s\n', computer); fprintf('MATLAB version: %s\n', version); fprintf('Date & time: %s\n', char(datetime)); scriptStartTime=tic; % Timing for the whole script
Platform: PCWIN64 MATLAB version: 9.6.0.1214997 (R2019a) Update 6 Date & time: 18-Jun-2020 00:17:32
Basic operations
In the part-1 tutorial, we have save "pv" and "note" of the first phrase of the Youtube video of 隱形的翅膀. Now we can load the saved file for further processing:
load transparentWings.mat
noteGt=note;
You can play the PV of the first phrase:
opt=pvPlay('defaultOpt');
opt.method=2;
figure; pvPlay(pv, opt, 1);

You can play the groundtruth note of the first phrase:
fprintf('Play the groundtruth music notes...\n'); opt=notePlay('defaultOpt'); opt.auFileName='note.wav'; figure; notePlay(noteGt, opt, 1);
Play the groundtruth music notes... Saving note.wav (within note2au)...

Now we can try to segment the singing pitch vector into notes. First of all, we can cut PV into segments based on where the silence occurs. Note that the transition from silence (with zero pitch) to non-silence (with nonzero pitch) indicates the onset of a note. Each segment could contain several notes. However, for simplicity, we can assume each segment corresponds to a note. This simple method is specified by 'simple00', as follows.
opt=pv2note('defaultOpt'); opt.method='simple00'; opt.gtNote=noteGt; figure; notePredicted=pv2note(pv, opt, 1); fprintf('No. of GT notes=%d\n', length(noteGt.pitch)); fprintf('No. of predicted notes=%d\n', length(notePredicted.pitch));; fMeasure=noteVecSim(notePredicted, noteGt); fprintf('fMeasure=%g\n', fMeasure);
No. of GT notes=12 No. of predicted notes=7 fMeasure=0.336842

If a segment contains large pitch variation, then this segment is likely to contains several music notes. We can use a simple criterion to cut a segment into notes whenever the pitch difference between neighboring frames is larger than 1 semitone. This method is specified by 'simple01', as follows:
opt=pv2note('defaultOpt'); opt.method='simple01'; opt.gtNote=noteGt; figure; notePredicted=pv2note(pv, opt, 1); fprintf('No. of GT notes=%d\n', length(noteGt.pitch)); fprintf('No. of predicted notes=%d\n', length(notePredicted.pitch));; fMeasure=noteVecSim(notePredicted, noteGt); fprintf('fMeasure=%g\n', fMeasure);
No. of GT notes=12 No. of predicted notes=10 fMeasure=0.290909

Summary
This is a brief tutorial on singing transcription. There are several directions for further improvement:
- Explore other features from the original audio files
- Use other models, such as LSTM, RNN, GRU, etc.
Appendix
List of functions, scripts, and datasets used in this script:
Date and time when finishing this script:
fprintf('Date & time: %s\n', char(datetime));
Date & time: 18-Jun-2020 00:17:36
Overall elapsed time:
toc(scriptStartTime)
Elapsed time is 3.942894 seconds.
Jyh-Shing Roger Jang, created on
datetime
ans = datetime 18-Jun-2020 00:17:36
If you are interested in the original MATLAB code for this page, you can type "grabcode(URL)" under MATLAB, where URL is the web address of this page.