## 14-9 Exercise: QBSH via Optimum Assignment of Singing Pitch to Music Notes

Note that the description of this homework may be lengthy, but the concept is pretty straightforward. If you have questions about this homework, please come to our TA's office hours.

### QBSH via Optimum Assignment of Singing Pitch to Music Notes

Outlines

#### Problem definition

Dynamic programming (DP) is a very important and useful method for sequence comparision. It has been used extensively in speech recognition, natural language processing, music analysis, and a variety of other areas. In this homework, we shall explore the use of DP for query by singing/humming (QBSH), which is an interesting paradigm of music information retrieval. (For simplicity, we shall assume our QBSH system always starts the comparison from the beginning of a song.)

In particular, you need to find the alignment between the pitch vector of human's singing and the note vector of a given melody, both in the unit of semitone or MIDI number. The correspondence between MIDI numbers and piano keyboard can be found here. Moreover,

• Here is a typical singing clip and its corresponding pitch vector which can be played here.
• The note vector is the music notes corresponging to the intended song of the singing. For simplicity, there no duration information in the vector. Therefore for the song of "Twinkle twinkle little star", here is the music note vector for the first two phrases: [60 60 67 67 69 69 67 65 65 64 64 62 62 60].

Given a pitch vector and a note vector, your mission is to find an optimum mapping to assign each pitch point to a music note, such that the total distance is minimized. To be more specific, let's use the following notation:

• $p(i)$ is element $i$ of the pitch vector, where $1 \leq i \leq m$.
• $q(j)$ is element $j$ of the note vector, where $1 \leq j \leq n$.
The alignment path can be denoted by its coordinates, that is, $\{(1, u_1), (2, u_2), ..., (m, u_{m})\}$, where $u_k, k=1, 2, \cdots, m$ are indices into the note vector with the following constraints:
• $u_1=1$.
• $u_1 \leq u_2 \leq u_3 \cdots \leq u_{m}$.

For instance, the alignment path of the above figure is ${(1, 1),(2, 1),(3, 2),(4, 2),(5, 2),(6, 2),(7, 3),(8, 3),(9, 4),(10, 4),(11, 4),(12, 4)}$, with $m=12$ (the length of $p$) and $u_{m}=3$ (the last index of music note that have at least one pitch point being assigned to). In other words:

• p(1) and p(2) are assigned to q(1).
• p(3), p(4), p(5) and p(6) are assigned to q(2).
• p(7) and p(8) are assigned to q(3).
• p(9), p(10), p(11) and p(12) are assigned to q(4).

The distance between a pitch vector $p$ and note vector $q$ can then be defined as follows: $$dist(p, q)=\min_{u_1, u_2, ... u_{m}}\sum_{i=1}^{m} |p(i)-q(u_i)|,$$ subject to $u_1=1$ and $u_1 \leq u_2 \leq u_3 \cdots \leq u_{m}$.

#### Approach

Based on the above description, we can solve the task using DP. The 3-step formula for DP can be described as follows:

1. Optimum-value function $D(i,j)$ is defined as the minimum distance between $p(1:i)$ and $q(1:j)$.
2. Recurrent equation for $D(i,j)$ is shown next, with $i>0$ and $j>0$: $$D(i,j)=|p(i)-q(j)| +\min \left\{ \begin{matrix} D(i-1,j)\\ D(i-1, j-1)\\ \end{matrix} \right.$$ (See the local constraint in the previous figure.) Please derive the recurrent equation when $i=1$ or $j=1$ by yourself. The boundary condition is $D(1, 1)=|p(1)-q(1)|$.
3. Answer to the original task: $$dist(p, q) = \min_{1\leq j \leq n}D(m, j).$$

#### Input/output formats

You need to write a function myPvAlign.m with the following input/output format:
[minDistance, dpPath]=myPvAlign(p, q);
Note that
• p is the pitch vector
• q is the music note vector
• minDistance is the min. distance after alignment
• dpPath is a 2-by-m arrays indicating the best alignment path, where each column is a point in the path.

#### Requirements & suggestions

• You need to use DP for this homework, otherwise it will be too slow.
• If there are repeated notes, the optimal path might not be unique. As a result, TA's judge system will check your output based on the returned minimum distance. If the deviation between your distance and TA's distance is less than 1, it'll be considered as correct.
• The alignment path shown in the previous plot is based on X-Y mode, which is commonly used in the Cartesian coordinate system. If you want to adopt I-J mode (commonly used in matrix indexing), you can simply rotate the plot by 90-degree clockwise.
• The process of converting a singing clip into its pitch vector is call pitch tracking, which is out of the scope of this class.
• If you are interested in trying out QBSH systems, try our lab's system at http://mirlab.org/demo/miracle.
• You need to write the program from scratch. You cannot use any open-source or readily available solvers.

Slides and video

#### Test sets

In order to run the following test programs, you need to download the following two p files:
• myPvAlign.p: This is the function you need to implement.
• myPvAlignPlot.p: This is a function for plotting the DP path. (You don't need to implement this one.)
Here we have three test sets:
1. Example 1: myPvAlign01.m% Twinkle twinkle little star (id=8); p=[62.9 61.8 61.8 61.8 61.8 61.8 61.4 61.4 61.4 61.4 60.9 61.4 61.8 61.4 60.9 60.9 60.9 60.9 60.9 60.9 60.9 60.9 61.4 61.8 62.9 63.9 65.7 65.7 66.3 66.3 67 67 67 67 67.7 67.7 67.7 67.7 67 68.4 67 67 67 67 67 67 67 67 67 67 67 67 67.7 67.7 68.4 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 68.4 68.4 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 68.4 67.7 67 66.3 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 66.3 66.3 65.7 65.7 65.7 65.1 64.5 64.5 64.5 65.1 65.1 65.1 65.1 65.1 65.1 65.1 65.1 65.7 65.7 65.7 65.1 65.1 65.1 65.1 65.1 65.1 65.1 65.1 64.5 64.5 63.4 64.5 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 64.5 63.4 65.1 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.4 62.3 63.9 62.3 61.8 61.4 61.4 61.4 61.8 61.8 61.8 61.8 61.8 61.8 61.4 63.4 62.3 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 60.9 61.4 60.4 60 60.4 60 60.4 60.4 60.4 60.4 60.4 60 60 60 60 60 60 61.4]; q=[60 60 67 67 69 69 67 65 65 64 64 62 62 60 67 67 65 65 64 64 62 67 67 65 65 64 64 62 60 60 67 67 69 69 67 65 65 64 64 62 62 60]; showPlot=1; figure; [minDist, dpPath]=myPvAlignSP(p, q, showPlot); return % If you want to do playback, download SAP toolbox at http://mirlab.org/jang/matlab/toolbox/sap and proceed as follows. addpath d:/users/jang/matlab/toolbox/sap % You need to change this to add SAP toolbox to the search path. % Play back of the oringal note note=[60 32 60 32 67 32 67 32 69 32 69 32 67 64 65 32 65 32 64 32 64 32 62 32 62 32 60 64 67 32 67 32 65 32 65 32 64 32 64 32 62 64 67 32 67 32 65 32 65 32 64 32 64 32 62 64 60 32 60 32 67 32 67 32 69 32 69 32 67 64 65 32 65 32 64 32 64 32 62 32 62 32 60 32]; notePlay(note); % Playback of the singing pitch vector pvPlay(p); % Playback of the induced singing pitch vector pvPlay(inducedP); minDist=86.4 dpPath=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207;1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14]

2. Example 2: myPvAlign02.m% 10 little indian (id=5) p=[43.9 57.4 58.7 59 59.5 60 60 60.2 60.2 60 60.1 60 59.8 60.3 60.5 60.5 60.3 59.9 60.9 60.4 60.2 60 60.7 82.1 61.2 60.5 60.2 60.4 60.3 60.4 60.4 60.5 60.5 60.3 59.3 60.1 60.4 60.3 60 58.1 66.4 61.4 60.8 60.4 60.7 61.3 65.7 65.7 65.6 65.4 65.2 65 65 65.2 65.2 65.3 65.7 67.5 68 68.2 67.8 67.8 69.5 68.5 68.4 68.3 68 67.6 66.9 66.3 66.3 65.9 65.5 65 64.4 64.7 64.8 64.9 64.9 64.5 63.6 61.7 60.2 60 60.4 60.8 61.2 61.7 62 62.1 66.6 63.7 63.6 63.4 63.3 63.3 63.3 63.4 63.3 63.3 63.1 62.9 62.6 63.2 63.2 63.2 62.8 64.7 63.9 63.3 63.2 63.5 62.8 64.1 63.7 63.5 63.5 63.5 63.5 63.5 63.4 63 61.7 62 62.8 63.7 63.4 63.5 63.2 63.6 64.6 64.2 63.9 63.7 63.2 61.3 59.6 59.4 59.7 59.6 59.9 60.3 61.2 62.7 63.2 63.2 63.4 67.7 64.5 64 63.6 63.4 63.4 63.3 62.5 61.1 60.5 59.9 60.2 60 59 59.8 60.2 60 60.2 60 59.1 57.6 57 57.2 83 60.7 60 60.2 60.3 60.1 60.2 61 61 60.8 60.5 60.6 60.7 61 60.6 60.2 60.2 59.3 61.4 60.9 60.5 60.2 60.3 60.2 59.7 59.5 60.2 60.3 60 59.9 59.8 60.2 60 60.4 60.8]; q=[60 60 60 60 60 60 64 67 67 64 64 60 62 62 62 62 62 62 59 62 62 59 59 55 60 60 60 60 60 60 64 67 67 64 64 60 67 65 65 64 64 62 60]; showPlot=1; figure; [minDist, dpPath]=myPvAlignSP(p, q, showPlot); return % If you want to do playback, download SAP toolbox at http://mirlab.org/jang/matlab/toolbox/sap and proceed as follows. addpath d:/users/jang/matlab/toolbox/sap % You need to change this to add SAP toolbox to the search path. % Play back of the oringal note note=[60 26 60 13 60 13 60 26 60 13 60 13 64 26 67 13 67 13 64 13 64 13 60 26 62 26 62 13 62 13 62 26 62 13 62 13 59 26 62 13 62 13 59 13 59 13 55 26 60 26 60 13 60 13 60 26 60 13 60 13 64 26 67 13 67 13 64 13 64 13 60 26 67 26 65 13 65 13 64 13 64 13 62 26 60 26]; notePlay(note); % Playback of the singing pitch vector pvPlay(p); % Playback of the induced singing pitch vector pvPlay(inducedP); minDist=245.1 dpPath=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205;1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 15 16 17 18 19 19 19 19 19 19 19 20 21 22 22 23 24 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25]

3. Example 3: myPvAlign03.m% Happy birthday to you (id=21) p=[57.1 58 58.5 59.5 60.1 60.4 60.1 60.4 60.1 59.5 59.8 59.5 59.8 60.1 59.2 62.6 62.6 62.6 62.2 61.8 61.5 61.5 61.5 62.2 62.2 62.6 62.2 61.8 61.5 61.1 58.9 58 58.3 58.5 59.5 60.1 60.1 59.8 60.1 60.4 60.4 59.8 59.8 59.8 64.1 65 65 65.4 65.4 65.4 65.4 65.4 65.4 65.4 65.4 65 65 64.1 64.1 64.6 65.4 65.4 65.9 65.4 65 65 65 65 65 65 65 65 65 65 64.6 64.6 63.7 58.9 58.9 58.5 58.9 59.5 60.1 60.1 59.8 59.5 59.8 59.2 59.5 59.2 65.4 63.3 63.3 62.9 62.9 62.9 62.9 62.6 62.6 62.6 62.6 62.6 62.2 61.8 61.1 59.8 59.5 59.5 59.8 60.1 60.1 59.8 59.5 59.5 59.2 59.2 59.2 66.3 67.3 67.8 67.8 67.3 67.3 67.3 67.3 67.3 67.3 67.3 67.3 67.8 66.8 66.3 65.9 65.4 65.9 66.3 66.3 65.9 65.4 65.9 65.9 65.9 65.9 65.9 65.9 65.9 65.4 63.7 59.8 59.8 60.1 60.8 60.8 60.4 59.8 60.8 59.8 60.1 60.4 60.1 70 71.2 71.8 72.4 72.4 72.4 72.4 72.4 73.1 73.1 73.1 73.1 72.4 72.4 71.2 68.8 67.8 68.3 68.8 69.4 69.4 69.4 69.4 69.4 69.4]; q=[60 60 62 60 65 64 60 60 62 60 67 65 60 60 72 69 65 64 62 70 70 69 65 67 65]; showPlot=1; figure; [minDist, dpPath]=myPvAlignSP(p, q, showPlot); return % If you want to do playback, download SAP toolbox at http://mirlab.org/jang/matlab/toolbox/sap and proceed as follows. addpath d:/users/jang/matlab/toolbox/sap % You need to change this to add SAP toolbox to the search path. % Play back of the oringal note note=[60 29 60 10 62 38 60 38 65 38 64 77 60 29 60 10 62 38 60 38 67 38 65 77 60 29 60 10 72 38 69 38 65 38 64 38 62 77 77 70 29 70 10 69 38 65 38 67 38 65 38]; notePlay(note); % Playback of the singing pitch vector pvPlay(p); % Playback of the induced singing pitch vector pvPlay(inducedP); minDist=103.4 dpPath=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185;1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 7 7 7 7 7 7 7 7 7 7 7 7 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16]

Audio Signal Processing and Recognition (­µ°T³B²z»P¿ëÃÑ)