Once we have successfully detect the onset of tapping, we can compute the IOI (inter onset interval) vector for comparing with those of the songs in the database. The IOI vector can be computed as the difference between two successtive oneset times.
Once the similarity (or distance) between two IOI vectors is defined, then we can return the most likely songs in the database according to their similarity/distance to the query input. Before defining possible distance functions, we shall describe the ideal characteristics of the distance function for query by tapping, as follows.
For simplicity, we should use t = [t1, t2, ..., tm] and r = [r1, r2, ..., rn] to denote the test and the reference IOI vectors, respectively, with m
- The distance function should be able to handle variations in tempos.
- The distance function should be able to handle insertion/deletion errors.
The distance functions for IOI vectors can be defined as follows:
Once we define the distance function, we can apply it to find the distance between the IOI query vector and that of each database songs. The system can then return the ranked list of all database songs. To give a single performance index of a method, we usually use the following figures:
- Simple Lp norm:
distance=Lp(t(1:m), r(1:m))This distance function assume that the query input has the same tempo as that of the intended song in the database. Thus it cannot handle tempo difference.
- To deal with tempo difference, we can define IOI ratio vector as a vector where each element is the ratio between consecutive elements in the original IOI vector. For instance, the ratio vector of t = [t1, t2, ..., tm] is t' = [t1/t2, t2/t3 ..., tm-1/tm]. Then the distance is
distance=Lp(t'(1:m-1), r'(1:m-1))This distance function assume that the query input has a constant tempo, such that we can take the ratio of consecutive IOI elements and then compared with those of the database songs. However, this distance function cannot deal with insertion and deletion errors. Similar distance functions are listed next: distance=minkLp(t(1:m), kr(1:m)) distance=variance of c(1:m), which c(i)=t(i)/r(i), i=1~m.
- To be able to deal with tempo difference and insertion/deletion errors, we can apply an alignment algorithm based on dynamic programming. The approach is detailed in this paper and the corresponding slides.
The following example demonstrates QBT using normalized L1-norm:
- Top-n hit rate: The retrieval is considered successfuy if the intended song appears in the top n of the returned ranked list. Otherwise it is consider failed. Usually we set n equal to 10 since it is easier for a user to browse the returned result of 10 songs. Sometimes we change the value of n to see how the performance varies with the value of n.
- Mean reciprocal rank (MRR): If the intended song appears as rank r in the returned ranked list, then it get a score of 1/r. For instance, if the intended song appear at rank 4 of the ranked list, then it receive a score of 1/4 = 0.25.
As you can see from the above example, the correct intended song was identified via the use of normalized L1-norm, with a distance of 0.100238.
Task of Query by Tapping at MIREX 2008
Audio Signal Processing and Recognition (音訊處理與辨識)