[english ][all ] (½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH^¤åª©¥»¦P¨B§ó·s¡I )
º¥ý§Ų́Ӥ¶²Ð¦p¦ó¦b®É°ì¶i¦æµ°ª°lÂÜ¡C
²Ä¤@ºØ¤èªk¡A¬Oª½±µ¨Ï¥Îµ¶q¨Ó¶i¦æºÝÂI°»´úªº¤èªk¡C³o¬O¤@ºØ³Ì²³æªº¤èªk¡A¥unµ¶q¤p©ó¬YÓªùÂeÈ¡A§ÚÌ´N»{©w¬OÀRµ©Î¬OÂø°T¡A¦Ü©ó³oÓªùÂeȦp¦ó¨M©w¡A°£¤F¾a¤Hªºª½Ä±¥~¡A¤ñ¸û«ÈÆ[ªº¤èªk¡AÁÙ¬O¾a¤j¶qªº´ú¸Õ¸ê®Æ¨Ó¨M©w³Ì¨ÎÈ¡C
Hint ¦bpºâµ¶q®É¡A½Ð°È¥²°O±on¥ý¸g¹L¹sÂI®Õ¥¿¡C
¤U¦C³oÓ½d¨Ò¨Ï¥Îµ¶q¨Ó¹ï sunday.wav ¶i¦æºÝÂI°»´ú¡C¦b¦¹½d¨Ò¤¤¡A§Ų́ϥΠ4 ºØ¤£¦P¤èªk¨Ópºâµ¶qªùÂe´Ó¡A¨Ã±Nµ²ªGµe¥X¨Ó¥H«KÆ[¹î¡A¦p¤U¡G
Example 1: epdByVolTh01.m waveFile='sunday.wav';
au=myAudioRead(waveFile); y=au.signal; fs=au.fs;
frameSize = 256;
overlap = 128;
y=y-mean(y); % zero-mean substraction
frameMat=buffer2(y, frameSize, overlap); % frame blocking
frameNum=size(frameMat, 2); % no. of frames
volume=frame2volume(frameMat); % volume
volumeTh1=max(volume)*0.1; % volume threshold 1
volumeTh2=median(volume)*0.1; % volume threshold 2
volumeTh3=min(volume)*10; % volume threshold 3
volumeTh4=volume(1)*5; % volume threshold 4
index1 = find(volume>volumeTh1);
index2 = find(volume>volumeTh2);
index3 = find(volume>volumeTh3);
index4 = find(volume>volumeTh4);
endPoint1=frame2sampleIndex([index1(1), index1(end)], frameSize, overlap);
endPoint2=frame2sampleIndex([index2(1), index2(end)], frameSize, overlap);
endPoint3=frame2sampleIndex([index3(1), index3(end)], frameSize, overlap);
endPoint4=frame2sampleIndex([index4(1), index4(end)], frameSize, overlap);
subplot(2,1,1);
time=(1:length(y))/fs;
plot(time, y);
ylabel('Amplitude'); title('Waveform');
axis([-inf inf -1 1]);
line(time(endPoint1( 1))*[1 1], [-1, 1], 'color', 'm');
line(time(endPoint2( 1))*[1 1], [-1, 1], 'color', 'g');
line(time(endPoint3( 1))*[1 1], [-1, 1], 'color', 'k');
line(time(endPoint4( 1))*[1 1], [-1, 1], 'color', 'r');
line(time(endPoint1(end))*[1 1], [-1, 1], 'color', 'm');
line(time(endPoint2(end))*[1 1], [-1, 1], 'color', 'g');
line(time(endPoint3(end))*[1 1], [-1, 1], 'color', 'k');
line(time(endPoint4(end))*[1 1], [-1, 1], 'color', 'r');
legend('Waveform', 'Boundaries by threshold 1', 'Boundaries by threshold 2', 'Boundaries by threshold 3', 'Boundaries by threshold 4');
subplot(2,1,2);
frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap);
plot(frameTime, volume, '.-');
ylabel('Sum of Abs.'); title('Volume');
axis tight;
line([min(frameTime), max(frameTime)], volumeTh1*[1 1], 'color', 'm');
line([min(frameTime), max(frameTime)], volumeTh2*[1 1], 'color', 'g');
line([min(frameTime), max(frameTime)], volumeTh3*[1 1], 'color', 'k');
line([min(frameTime), max(frameTime)], volumeTh4*[1 1], 'color', 'r');
legend('Volume', 'Threshold 1', 'Threshold 2', 'Threshold 3', 'Threshold 4');
¦b¤Wzªº½d¨Ò¤¤¡A§Ų́ϥΤF¤TÓµ¶qªùÂeȨӶi¦æºÝÂI°»´ú¡G
µ¶q³Ì¤jȪº 0.1¡G¦¹¤èªk¦bµ¶q©¿¤j©¿¤p®É©ÎÂø°T¤Ó±j®É¡A·|µo¥Í¿ù»~¡C
µ¶q³Ì¤pȪº 5 ¿¡G¦¹¤èªk¦bÂø°T¤Ó±j®É¡A·|µo¥Í¿ù»~¡C
²Ä¤@Óµ®Øªºµ¶qªº 4 ¿¡G¦¹¤èªk°²³]¤@¶}©l¬OÀRµ¡A¦ýY¤@¶}©l´N¦³Ánµ¡A©Î¬O¿ýµ¾¹§÷¤@¶}©l¦³°¾²¾¡A¦¹°µªk´N«Ü®e©öµo¥Í¿ù»~¡C
Example 2: epdByVol01.m waveFile='singaporeIsAFinePlace.wav';
au=myAudioRead(waveFile);
opt=endPointDetect('defaultOpt');
opt.method='vol';
showPlot=1;
endPoint=endPointDetect(au, opt, showPlot);
volTh=(volMax-volMin)*epdPrm.volRatio+volMin;
·íµM¡A¤Wz¤èªk¤¤ªº¬ÛÃö°Ñ¼ÆÈ¡]0.1¡B5¡B4 µ¥È¡^¡A³£¥u¯à¾A¥Î©ó³oÓµ°TÀɮסAYn§ä¥X¹ï¨ä¥LÁnµ¤]§¹¥þ¾A¥Îªº°Ñ¼ÆÈ¡A´Nn¾a¤j¶q¸ê®Æªº´ú¸Õ¤~¯à±o¨ì¡C
·íµM¡A§A¤]¥i¥H¾a§Aªº³Ð·N©M´¼¼z¡A§O¥X¤ßµô¡A·Q¤@Ó¨M©wµ¶qªùÂe´Óªº³Ì¨Î¤èªk¡A¨Ò¦p±Ä¥Îµ¶q³Ì¤jÈ©M³Ì¤pȪº¥[Åv¥§¡µ¥µ¥¡C
Y¬OÁnµ«Ü°®²b¡AÂø°T¤£¤j¡A¨º»ò¨Ï¥Îµ¶q¨Ó°»´úºÝÂI¥i±o¨ì¤£¿ùªº®ÄªG¡C¦ý¬O¦pªG¸I¨ì¤U¦C°ÝÃD¡A³oÓ²³æªº¤èªk´N¦æ¤£³q¡G
Âø°T¤ñ¸û±j
®ðµ¤ñ¸û¦h
¦P¤@¥y¸Üªºµ¶qÅܤƤӤj
¦¹®É³æ¤@µ¶qªùÂeȪº¿ï¨ú´N¤ñ¸û¤£®e©ö¡AºÝÂI°»´úªº¥¿½T²v¤]·|¤U°¡C¥t¥~¡A¹ï¤@¯ëºÝÂI°»´ú¦Ó¨¥¡AY§Æ±æ¨D±o°ª·Ç½T«×ªººÝÂI¡A§ÚÌ¥i¥HÅýµ®Ø©Mµ®Ø¤§¶¡ªº«Å|³¡¤À¥[¤j¡A¦ý¬O¬Û¹ï¦Ó¨¥¡Apºâ¶q¤]·|¸òµÛÅܤj¡C²Ä¤GºØ±`¥Îªº¤èªk¤èªk¡A«h¬O¥Î¨ì¤Fµ¶q©M¹L¹s²v¡A²z¦p¤U¡G
¥H°ªµ¶qªùÂeÈ¡]t u ¡^¬°¼Ð·Ç¡A¨M©wºÝÂI¡C
±NºÝÂI«e«á©µ¦ù¨ì§Cµ¶qªùÂeÈ¡]t l ¡^³B¡C
¦A±NºÝÂI«e«á©µ¦ù¨ì¹L¹s²vªùÂe¡]t zc ¡^³B¡A¥H¥]§t»yµ¤¤ªº®ðµ³¡¤À¡C
¦¹¤èªk¥Î¨ì¤TӰѼơ]t u ¡Bt l ¡Bt zc ¡^¡AY¹q¸£pºâ¯à¤O°÷±j¡A¥i¥Î¦UºØ·j´Mªk¨Ó½Õ¾ã³o¤TӰѼơA§_«h¡A´N¥u¦³¾aÆ[¹îªk¤Î¸gÅçÈ¡C¦¹¤èªkªº¥Ü·N¹Ï¦p¤U¡G
The above improved method uses only three thresholds, hence it is possible to use grid search to find the best values via a set of labeled training data.
Hint ¤Wz¤èªk¤ñ¸û¾A¥Î©ó»yµ¿ëÃÑ¡CY¬O¬°¤F±Û«ß¿ëÃÑ¡A«h§Ṳ́£¥²¦Ò¼{¹L¹s²v¡A¦]¬°®ðµ¥»¨Ó´N¨S¦³µ°ª¡A¹ï±Û«ß¿ëÃѤ£³y¦¨¼vÅT¡C
§Ú̦³¤@ÓºÝÂI°»´úªº¨ç¼Æ¡AY¹ï sunday.wav ¶i¦æ³B²z¡A±o¨ìªºµ²ªG¦p¤U¡G
Example 3: epdByVolZcr01.m waveFile='singaporeIsAFinePlace.wav';
au=myAudioRead(waveFile);
opt=endPointDetect('defaultOpt');
opt.method='volZcr';
showPlot=1;
endPoint=endPointDetect(au, opt, showPlot);
¨ä¤¤¬õ½uªí¥ÜÁnµªº¶}©l¡Aºñ½uªí¥ÜÁnµªºµ²§ô¡C³oÓ½d¨Ò¥Î¨ì¤F SAP Toolbox ¸Ì±ªº endPointDetect.m ¨ç¼Æ¡A¦¹¨ç¼Æ§Y¬O¨Ï¥Îµ¶q©M¹L¹s²v¨Ó¨M©wºÝÂI¡C
Now it should be obvious that the most difficult part in EPD is to distinguish unvoiced sounds from silence reliably. One way to achieve this goal is to use high-order difference of the waveform as a time-domain features. For instance, in the following example, we use order-1, 2, 3 differences on the waveform of beautifulSundays.wav :
¨Æ¹ê¤W¡A§Ṳ́]¥i¥H¤ÏÂШϥΪi§Îªº·L¤À¡A¦Apºâµ¶q¡A´N¥i¥H¥YÅã®ðµªº³¡¤À¡A¨Ò¦p¡G
Example 4: highOrderDiff01.m waveFile='singaporeIsAFinePlace.wav';
au=myAudioRead(waveFile); y=au.signal; fs=au.fs;
frameSize = 256;
overlap = 128;
y=y-mean(y); % zero-mean substraction
frameMat=buffer2(y, frameSize, overlap); % frame blocking
frameNum=size(frameMat, 2); % no. of frames
volume=frame2volume(frameMat);
sumAbsDiff1=sum(abs(diff(frameMat)));
sumAbsDiff2=sum(abs(diff(diff(frameMat))));
sumAbsDiff3=sum(abs(diff(diff(diff(frameMat)))));
sumAbsDiff4=sum(abs(diff(diff(diff(diff(frameMat))))));
subplot(2,1,1);
time=(1:length(y))/fs;
plot(time, y); ylabel('Amplitude'); title('Waveform');
subplot(2,1,2);
frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap)/fs;
plot(frameTime', [volume; sumAbsDiff1; sumAbsDiff2; sumAbsDiff3; sumAbsDiff4]', '.-');
legend('Volume', 'Order-1 diff', 'Order-2 diff', 'Order-3 diff', 'Order-4 diff');
xlabel('Time (sec)');
¦b¤W¹Ï¤¤¡AÀHµÛ§Ú̹ï frameMat ªº¤@¦A·L¤À¡A®ðµªºµ¶q´N·|¶V¨Ó¶V©úÅã¡A¦]¦¹¥i¥Î¨Ó°»´ú®ðµªº¦s¦b¡C
¦b SAP ¤u¨ã½c¤ºªº epdByVolHod.m §Y¬O¨Ï¥Îµ¶q©M HOD ¨Ó¶i¦æºÝÂI°»´ú¡A½Ð¨£¤U¦C¨Ï¥Î½d¨Ò¡C
Example 5: epdByVolHod01.m waveFile='singaporeIsAFinePlace.wav';
au=myAudioRead(waveFile);
opt=endPointDetect('defaultOpt');
opt.method='volHod';
showPlot=1;
endPoint=endPointDetect(au, opt, showPlot);
¤@¯ë¦Ó¨¥¡A¦P®É¨Ï¥Îµ¶q¤Î HOD¡A¥i¥H¹ï¤@¯ë¿ýµ¶i¦æ¤£¿ùªººÝÂI°»´ú¡A¦ý¬O¦¹¤èªk¤]¦³®zÂI¡A¦ýÀô¹ÒÂø°T¤Ó¤j®É¡A¦¹¤èªkªº¥¿½T²v¤@¼Ë·|°§C¡C
·íµM¡AÁÙ¦³«Ü¦h®É°ìªº¤èªk¡A´N¬Ý§Aªº¥©«ä¤F¡I
Audio Signal Processing and Recognition (µ°T³B²z»P¿ëÃÑ)