6-2 EPD in Time Domain (端é??µæ¸¬ï¼šæ??Ÿç??¹æ?)

[english][all]

(½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH­^¤åª©¥»¦P¨B§ó·s¡I)

­º¥ý§Ú­Ì¨Ó¤¶²Ð¦p¦ó¦b®É°ì¶i¦æ­µ°ª°lÂÜ¡C

²Ä¤@ºØ¤èªk¡A¬Oª½±µ¨Ï¥Î­µ¶q¨Ó¶i¦æºÝÂI°»´úªº¤èªk¡C³o¬O¤@ºØ³Ì²³æªº¤èªk¡A¥u­n­µ¶q¤p©ó¬Y­ÓªùÂe­È¡A§Ú­Ì´N»{©w¬OÀR­µ©Î¬OÂø°T¡A¦Ü©ó³o­ÓªùÂe­È¦p¦ó¨M©w¡A°£¤F¾a¤Hªºª½Ä±¥~¡A¤ñ¸û«ÈÆ[ªº¤èªk¡AÁÙ¬O¾a¤j¶qªº´ú¸Õ¸ê®Æ¨Ó¨M©w³Ì¨Î­È¡C

Hint
¦b­pºâ­µ¶q®É¡A½Ð°È¥²°O±o­n¥ý¸g¹L¹sÂI®Õ¥¿¡C

¤U¦C³o­Ó½d¨Ò¨Ï¥Î­µ¶q¨Ó¹ï sunday.wav ¶i¦æºÝÂI°»´ú¡C¦b¦¹½d¨Ò¤¤¡A§Ú­Ì¨Ï¥Î 4 ºØ¤£¦P¤èªk¨Ó­pºâ­µ¶qªùÂe´Ó¡A¨Ã±Nµ²ªGµe¥X¨Ó¥H«KÆ[¹î¡A¦p¤U¡G

Example 1: epdByVolTh01.mwaveFile='sunday.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; frameSize = 256; overlap = 128; y=y-mean(y); % zero-mean substraction frameMat=buffer2(y, frameSize, overlap); % frame blocking frameNum=size(frameMat, 2); % no. of frames volume=frame2volume(frameMat); % volume volumeTh1=max(volume)*0.1; % volume threshold 1 volumeTh2=median(volume)*0.1; % volume threshold 2 volumeTh3=min(volume)*10; % volume threshold 3 volumeTh4=volume(1)*5; % volume threshold 4 index1 = find(volume>volumeTh1); index2 = find(volume>volumeTh2); index3 = find(volume>volumeTh3); index4 = find(volume>volumeTh4); endPoint1=frame2sampleIndex([index1(1), index1(end)], frameSize, overlap); endPoint2=frame2sampleIndex([index2(1), index2(end)], frameSize, overlap); endPoint3=frame2sampleIndex([index3(1), index3(end)], frameSize, overlap); endPoint4=frame2sampleIndex([index4(1), index4(end)], frameSize, overlap); subplot(2,1,1); time=(1:length(y))/fs; plot(time, y); ylabel('Amplitude'); title('Waveform'); axis([-inf inf -1 1]); line(time(endPoint1( 1))*[1 1], [-1, 1], 'color', 'm'); line(time(endPoint2( 1))*[1 1], [-1, 1], 'color', 'g'); line(time(endPoint3( 1))*[1 1], [-1, 1], 'color', 'k'); line(time(endPoint4( 1))*[1 1], [-1, 1], 'color', 'r'); line(time(endPoint1(end))*[1 1], [-1, 1], 'color', 'm'); line(time(endPoint2(end))*[1 1], [-1, 1], 'color', 'g'); line(time(endPoint3(end))*[1 1], [-1, 1], 'color', 'k'); line(time(endPoint4(end))*[1 1], [-1, 1], 'color', 'r'); legend('Waveform', 'Boundaries by threshold 1', 'Boundaries by threshold 2', 'Boundaries by threshold 3', 'Boundaries by threshold 4'); subplot(2,1,2); frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap); plot(frameTime, volume, '.-'); ylabel('Sum of Abs.'); title('Volume'); axis tight; line([min(frameTime), max(frameTime)], volumeTh1*[1 1], 'color', 'm'); line([min(frameTime), max(frameTime)], volumeTh2*[1 1], 'color', 'g'); line([min(frameTime), max(frameTime)], volumeTh3*[1 1], 'color', 'k'); line([min(frameTime), max(frameTime)], volumeTh4*[1 1], 'color', 'r'); legend('Volume', 'Threshold 1', 'Threshold 2', 'Threshold 3', 'Threshold 4');

¦b¤W­zªº½d¨Ò¤¤¡A§Ú­Ì¨Ï¥Î¤F¤T­Ó­µ¶qªùÂe­È¨Ó¶i¦æºÝÂI°»´ú¡G

  1. ­µ¶q³Ì¤j­Èªº 0.1¡G¦¹¤èªk¦b­µ¶q©¿¤j©¿¤p®É©ÎÂø°T¤Ó±j®É¡A·|µo¥Í¿ù»~¡C
  2. ­µ¶q³Ì¤p­Èªº 5 ­¿¡G¦¹¤èªk¦bÂø°T¤Ó±j®É¡A·|µo¥Í¿ù»~¡C
  3. ²Ä¤@­Ó­µ®Øªº­µ¶qªº 4 ­¿¡G¦¹¤èªk°²³]¤@¶}©l¬OÀR­µ¡A¦ý­Y¤@¶}©l´N¦³Án­µ¡A©Î¬O¿ý­µ¾¹§÷¤@¶}©l¦³°¾²¾¡A¦¹°µªk´N«Ü®e©öµo¥Í¿ù»~¡C

Example 2: epdByVol01.mwaveFile='singaporeIsAFinePlace.wav'; au=myAudioRead(waveFile); opt=endPointDetect('defaultOpt'); opt.method='vol'; showPlot=1; endPoint=endPointDetect(au, opt, showPlot);

volTh=(volMax-volMin)*epdPrm.volRatio+volMin;

·íµM¡A¤W­z¤èªk¤¤ªº¬ÛÃö°Ñ¼Æ­È¡]0.1¡B5¡B4 µ¥­È¡^¡A³£¥u¯à¾A¥Î©ó³o­Ó­µ°TÀɮסA­Y­n§ä¥X¹ï¨ä¥LÁn­µ¤]§¹¥þ¾A¥Îªº°Ñ¼Æ­È¡A´N­n¾a¤j¶q¸ê®Æªº´ú¸Õ¤~¯à±o¨ì¡C

·íµM¡A§A¤]¥i¥H¾a§Aªº³Ð·N©M´¼¼z¡A§O¥X¤ßµô¡A·Q¤@­Ó¨M©w­µ¶qªùÂe´Óªº³Ì¨Î¤èªk¡A¨Ò¦p±Ä¥Î­µ¶q³Ì¤j­È©M³Ì¤p­Èªº¥[Åv¥­§¡µ¥µ¥¡C

­Y¬OÁn­µ«Ü°®²b¡AÂø°T¤£¤j¡A¨º»ò¨Ï¥Î­µ¶q¨Ó°»´úºÝÂI¥i±o¨ì¤£¿ùªº®ÄªG¡C¦ý¬O¦pªG¸I¨ì¤U¦C°ÝÃD¡A³o­Ó²³æªº¤èªk´N¦æ¤£³q¡G

¦¹®É³æ¤@­µ¶qªùÂe­Èªº¿ï¨ú´N¤ñ¸û¤£®e©ö¡AºÝÂI°»´úªº¥¿½T²v¤]·|¤U­°¡C¥t¥~¡A¹ï¤@¯ëºÝÂI°»´ú¦Ó¨¥¡A­Y§Æ±æ¨D±o°ª·Ç½T«×ªººÝÂI¡A§Ú­Ì¥i¥HÅý­µ®Ø©M­µ®Ø¤§¶¡ªº­«Å|³¡¤À¥[¤j¡A¦ý¬O¬Û¹ï¦Ó¨¥¡A­pºâ¶q¤]·|¸òµÛÅܤj¡C²Ä¤GºØ±`¥Îªº¤èªk¤èªk¡A«h¬O¥Î¨ì¤F­µ¶q©M¹L¹s²v¡A²­z¦p¤U¡G
  1. ¥H°ª­µ¶qªùÂe­È¡]tu¡^¬°¼Ð·Ç¡A¨M©wºÝÂI¡C
  2. ±NºÝÂI«e«á©µ¦ù¨ì§C­µ¶qªùÂe­È¡]tl¡^³B¡C
  3. ¦A±NºÝÂI«e«á©µ¦ù¨ì¹L¹s²vªùÂe¡]tzc¡^³B¡A¥H¥]§t»y­µ¤¤ªº®ð­µ³¡¤À¡C
¦¹¤èªk¥Î¨ì¤T­Ó°Ñ¼Æ¡]tu¡Btl¡Btzc¡^¡A­Y¹q¸£­pºâ¯à¤O°÷±j¡A¥i¥Î¦UºØ·j´Mªk¨Ó½Õ¾ã³o¤T­Ó°Ñ¼Æ¡A§_«h¡A´N¥u¦³¾aÆ[¹îªk¤Î¸gÅç­È¡C¦¹¤èªkªº¥Ü·N¹Ï¦p¤U¡G
The above improved method uses only three thresholds, hence it is possible to use grid search to find the best values via a set of labeled training data.

Hint
¤W­z¤èªk¤ñ¸û¾A¥Î©ó»y­µ¿ëÃÑ¡C­Y¬O¬°¤F±Û«ß¿ëÃÑ¡A«h§Ú­Ì¤£¥²¦Ò¼{¹L¹s²v¡A¦]¬°®ð­µ¥»¨Ó´N¨S¦³­µ°ª¡A¹ï±Û«ß¿ëÃѤ£³y¦¨¼vÅT¡C

§Ú­Ì¦³¤@­ÓºÝÂI°»´úªº¨ç¼Æ¡A­Y¹ï sunday.wav ¶i¦æ³B²z¡A±o¨ìªºµ²ªG¦p¤U¡G

Example 3: epdByVolZcr01.mwaveFile='singaporeIsAFinePlace.wav'; au=myAudioRead(waveFile); opt=endPointDetect('defaultOpt'); opt.method='volZcr'; showPlot=1; endPoint=endPointDetect(au, opt, showPlot);

¨ä¤¤¬õ½uªí¥ÜÁn­µªº¶}©l¡Aºñ½uªí¥ÜÁn­µªºµ²§ô¡C³o­Ó½d¨Ò¥Î¨ì¤F SAP Toolbox ¸Ì­±ªº endPointDetect.m ¨ç¼Æ¡A¦¹¨ç¼Æ§Y¬O¨Ï¥Î­µ¶q©M¹L¹s²v¨Ó¨M©wºÝÂI¡C

Now it should be obvious that the most difficult part in EPD is to distinguish unvoiced sounds from silence reliably. One way to achieve this goal is to use high-order difference of the waveform as a time-domain features. For instance, in the following example, we use order-1, 2, 3 differences on the waveform of beautifulSundays.wav:

¨Æ¹ê¤W¡A§Ú­Ì¤]¥i¥H¤ÏÂШϥΪi§Îªº·L¤À¡A¦A­pºâ­µ¶q¡A´N¥i¥H¥YÅã®ð­µªº³¡¤À¡A¨Ò¦p¡G

Example 4: highOrderDiff01.mwaveFile='singaporeIsAFinePlace.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; frameSize = 256; overlap = 128; y=y-mean(y); % zero-mean substraction frameMat=buffer2(y, frameSize, overlap); % frame blocking frameNum=size(frameMat, 2); % no. of frames volume=frame2volume(frameMat); sumAbsDiff1=sum(abs(diff(frameMat))); sumAbsDiff2=sum(abs(diff(diff(frameMat)))); sumAbsDiff3=sum(abs(diff(diff(diff(frameMat))))); sumAbsDiff4=sum(abs(diff(diff(diff(diff(frameMat)))))); subplot(2,1,1); time=(1:length(y))/fs; plot(time, y); ylabel('Amplitude'); title('Waveform'); subplot(2,1,2); frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap)/fs; plot(frameTime', [volume; sumAbsDiff1; sumAbsDiff2; sumAbsDiff3; sumAbsDiff4]', '.-'); legend('Volume', 'Order-1 diff', 'Order-2 diff', 'Order-3 diff', 'Order-4 diff'); xlabel('Time (sec)');

¦b¤W¹Ï¤¤¡AÀHµÛ§Ú­Ì¹ï frameMat ªº¤@¦A·L¤À¡A®ð­µªº­µ¶q´N·|¶V¨Ó¶V©úÅã¡A¦]¦¹¥i¥Î¨Ó°»´ú®ð­µªº¦s¦b¡C

¦b SAP ¤u¨ã½c¤ºªº epdByVolHod.m §Y¬O¨Ï¥Î­µ¶q©M HOD ¨Ó¶i¦æºÝÂI°»´ú¡A½Ð¨£¤U¦C¨Ï¥Î½d¨Ò¡C

Example 5: epdByVolHod01.mwaveFile='singaporeIsAFinePlace.wav'; au=myAudioRead(waveFile); opt=endPointDetect('defaultOpt'); opt.method='volHod'; showPlot=1; endPoint=endPointDetect(au, opt, showPlot);

¤@¯ë¦Ó¨¥¡A¦P®É¨Ï¥Î­µ¶q¤Î HOD¡A¥i¥H¹ï¤@¯ë¿ý­µ¶i¦æ¤£¿ùªººÝÂI°»´ú¡A¦ý¬O¦¹¤èªk¤]¦³®zÂI¡A¦ýÀô¹ÒÂø°T¤Ó¤j®É¡A¦¹¤èªkªº¥¿½T²v¤@¼Ë·|­°§C¡C

·íµM¡AÁÙ¦³«Ü¦h®É°ìªº¤èªk¡A´N¬Ý§Aªº¥©«ä¤F¡I


Audio Signal Processing and Recognition (­µ°T³B²z»P¿ëÃÑ)