22-2 ���T�����n�ǯS�x

¦b¤ÀªR¤@¬q­µ°T®É¡A§Ú­Ì³q±`±N­µ°T¤Á¦¨¤ñ¸ûµuªº³æ¦ì¡AºÙ¬°­µ®Ø¡]frame¡^¡A³q±`¤@­Ó­µ®Ø¥²¶·¥]§t¼Æ­Ó°ò¥»¶g´Á¡]fundamental period¡^¡A¤~¯à¥R¤ÀÂ^¨ú­µ°Tªº¯S¼x¡C±µµÛ§Ú­Ì´N¥i¥H±q¤@­Ó­µ®Ø¤º´£¨úÁn¾Ç¯S¼x¡]acoustic features¡^¡A¥H«K¶i¦æ¶i¤@¨Bªº¤ÀªR¡C³q±`§Ú­Ì¤¹³\­µ®Ø©M­µ®Ø¤§¶¡¥i¥H­«Å|¡A¦Ó¨C¬í¥X²{ªº­µ®Ø¼Æ«hºÙ¬°­µ®Ø²v¡]frame rate¡^¡A­µ®Ø²v¶V°ª¡A«h©Ò»Ý­nªº­pºâ¸ê·½¶V¤j¡C¥H¤U¬O±q¤@¬q­µ°T¤Á¥X¦h­Ó­µ®Øªº¥Ü·N¹Ï¡G


¹Ï 5.¡G¥Ñ¤@¬q­µ°T¤¤¤Á¥X­µ®Ø¡C

§Ú­Ì¤H¦ÕÅ¥¨ì¤@¬q­µ°T«á¡A¥ß§Y¥i¥H·P¨üªºªº¯S©Ê¦³­µ¶q¡]volume¡^¡B­µ°ª¡]pitch¡^©M­µ¦â¡]timbre¡^¡A¦ý§Ú­Ì­n¨Ï¥Î¹q¸£¨Ó¤ÀªR­µ°T¡A´N¥²¶·¨Ï¥Î¼Æ¾Ç¤½¦¡¨Ó´y­z¤W¦C¯S©Ê¡A¥H¡u¹Gªñ¡v¤H¦Õªº·Pı¡C³o¨Ç¥Ñ¨C¤@­Ó­µ®Ø©Ò©â¥X¨Óªº¼Æ­È©Î¦V¶q´NºÙ¬°Án¾Ç¯S¼x¡]acoustic features¡^¡A»¡©ú¦p¤U¡C

¤W­zªºÁn¾Ç¯S¼x¡Aªí²{¦b®É°ìªºªi§Î¤è­±¡A¥i¥HÅã¥Ü¦p¤U¡G


¹Ï 5.¡GÁn¾Ç¯S¼xÅã¥Ü©ó®É°ìªº¹ïÀ³ªí²{¡C

Example 3: frameDisp4fea01.mauFile='taiwanUniversity.wav'; au=myAudioRead(auFile); n=2; au.fs=au.fs/n; au.signal=au.signal(1:n:end, 1); % Down sampling index1=round(0.66*au.fs); frameSize=256; index2=index1+frameSize-1; frame=au.signal(index1:index2); % Take the frame for display audiowrite('frame.wav', frame, au.fs); subplot(2,1,1); plot(au.signal); grid on xlabel('Sample index'); ylabel('Amplitude'); title(auFile); %xlabel('¨ú¼ËÂI¯Á¤Þ­È'); ylabel('ªi§Î®¶´T'); title(auFile); axis([1, length(au.signal), -1 1]); line(index1*[1 1], [-1 1], 'color', 'm', 'linewidth', 1); line(index2*[1 1], [-1 1], 'color', 'm', 'linewidth', 1); subplot(2,1,2); plot(frame, '.-'); grid on xlabel('Sample index within the frame'); ylabel('Amplitude'); %xlabel('­µ®Ø¤ºªº¨ú¼ËÂI¯Á¤Þ­È'); ylabel('ªi§Î®¶´T'); axis([1, length(frame), -1 1]); %boxOverlay([60.5 -0.7 52 1.4], 'r', 1, '­µ¦â¡G°ò¥»¶g´Á¤ºªºªi§Î', 'top'); message=sprintf('Timbre:\nWaveform in a fundamental period'); boxOverlay([60.5 -0.5 52 1.2], 'r', 1, message, 'top'); subplot(211); loc1=get(gca, 'position'); subplot(212); loc2=get(gca, 'position'); %% ===== arrow 1 for closeup x1=[loc1(1)+(index1(1)-1)/(length(au.signal)-1)*loc1(3), loc2(1)]; y1=[loc1(2), loc2(2)+loc2(4)]; ah=annotation('arrow', x1, y1, 'color', 'm', 'linewidth', 1); %% ====== arrow 2 for closeup x2=[loc1(1)+(index2-1)/(length(au.signal)-1)*loc1(3), loc2(1)+loc2(3)]; y2=[loc1(2), loc2(2)+loc2(4)]; ah=annotation('arrow', x2, y2, 'color', 'm', 'linewidth', 1); %% Double arrow for fundamental period axisLimit=axis; % axisLimit=[1, 256, -1, 1] xPos=[171, 223]; yPos=[0.65, 0.65]; xRel=loc2(1)+(xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1))*loc2(3); yRel=loc2(2)+(yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3))*loc2(4); ah=annotation('doublearrow', xRel, yRel, 'color', 'r'); textH=text(mean(xPos), mean(yPos), 'Fundamental period', 'horizontal', 'center', 'vertical', 'bottom'); %% Double arrow for volume axisLimit=axis; % axisLimit=[1, 256, -1, 1] xPos=axisLimit(2)+[1, 1]; yPos=[min(frame), max(frame)]; xRel=loc2(1)+(xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1))*loc2(3); yRel=loc2(2)+(yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3))*loc2(4); ah=annotation('doublearrow', xRel, yRel, 'color', 'r'); textH=text(mean(xPos), mean(yPos), 'Volume', 'horizontal', 'center', 'vertical', 'top', 'rotation', 90);

­Y¬O¨Ï¥Î FFT±N¤@­Ó­µ®Øªº°T¸¹Âন´T«×ÀWÃСA¤W­zªºÁn¾Ç¯S¼x¥i¥HÅã¥Ü¦p¤U¡G


¹Ï 5.¡GÁn¾Ç¯S¼xÅã¥Ü©óÀW°ìªº¹ïÀ³ªí²{¡C

Example 4: frameDisp4fea02.mwaveFile='taiwanUniversity.wav'; au=myAudioRead(waveFile); n=2; au.fs=au.fs/n; au.signal=au.signal(1:n:end, 1); % Down sampling index1=round(0.66*au.fs); frameSize=256; index2=index1+frameSize-1; frame=au.signal(index1:index2); % Take the frame for display frameBasic=frame.*hanning(frameSize); frameExtended=zeros(8*frameSize, 1); frameExtended(1:frameSize)=frameBasic; [magSpec, phaseSpec, freqVec, powerSpecInDb]=fftOneSide(frameExtended, au.fs); plot(freqVec, powerSpecInDb); grid on title('Power spectrum'); xlabel('Frequency (Hz)'); ylabel('Power (dB)'); axis tight %xlabel('ÀW²v (Hz)'); ylabel('®¶´T (dB)'); order=10; freqNormalized=freqVec/max(freqVec); p=polyfit(freqNormalized, powerSpecInDb, order); f=polyval(p, freqNormalized); line(freqVec, f, 'color', 'g', 'linewidth', 2); %% Plot FF axisLimit=axis; axisLoc=get(gca, 'position'); xPos=[945.3125, 1085.9375]; yPos=[6.9, 6.9]; xRel=axisLoc(1)+((xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1)))*axisLoc(3); yRel=axisLoc(2)+((yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3)))*axisLoc(4); ah=annotation('doublearrow', xRel, yRel, 'color', 'r'); textH=text(mean(xPos), mean(yPos), 'FF', 'horizontal', 'center', 'vertical', 'bottom'); %% Plot first formant [maxValue, maxId]=max(f); xPos=[freqVec(maxId)+100, freqVec(maxId)]; yPos=[maxValue+20, maxValue]; line(xPos(2), yPos(2), 'marker', 'o', 'color', 'r'); xRel=axisLoc(1)+((xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1)))*axisLoc(3); yRel=axisLoc(2)+((yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3)))*axisLoc(4); ah=annotation('arrow', xRel, yRel, 'color', 'r'); textH=text(xPos(1), yPos(1), 'First formant', 'horizontal', 'center', 'vertical', 'bottom'); %% Plot second formant f(1:round(length(f)/2))=-inf; [maxValue, maxId]=max(f); xPos=[freqVec(maxId)+100, freqVec(maxId)]; yPos=[maxValue+30, maxValue]; line(xPos(2), yPos(2), 'marker', 'o', 'color', 'r'); xRel=axisLoc(1)+((xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1)))*axisLoc(3); yRel=axisLoc(2)+((yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3)))*axisLoc(4); ah=annotation('arrow', xRel, yRel, 'color', 'r'); textH=text(xPos(1), yPos(1), 'Second formant', 'horizontal', 'center', 'vertical', 'bottom'); %% Plot timbre curve point=[1184, -41.83]; xPos=[point(1)+300, point(1)]; yPos=[point(2)+25, point(2)]; xRel=axisLoc(1)+((xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1)))*axisLoc(3); yRel=axisLoc(2)+((yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3)))*axisLoc(4); ah=annotation('arrow', xRel, yRel, 'color', 'r'); textH=text(xPos(1), yPos(1), 'Timbre: Smoothed power spectrum', 'horizontal', 'center', 'vertical', 'bottom'); %% Plot energy point=[freqVec(1), powerSpecInDb(1)]; line(point(1), point(2), 'marker', 'o', 'color', 'r'); xPos=[point(1)+300, point(1)]; yPos=[point(2)-25, point(2)]; xRel=axisLoc(1)+((xPos-axisLimit(1))/(axisLimit(2)-axisLimit(1)))*axisLoc(3); yRel=axisLoc(2)+((yPos-axisLimit(3))/(axisLimit(4)-axisLimit(3)))*axisLoc(4); ah=annotation('arrow', xRel, yRel, 'color', 'r'); textH=text(xPos(1), yPos(1), 'Energy (volume)', 'horizontal', 'center', 'vertical', 'top');

§â¤@¬q­µ°T¤Á¦¨­µ®Øªº¶°¦X«á¡A§Ú­Ì´N¥i¥H°w¹ï¨C­Ó­µ®Ø¨Ó©â¨úÁn¾Ç¯S¼x¡]¥i¯à¬O¤@­Ó¼Æ­È¡A¨Ò¦p­µ¶q©Î¬O­µ°ª¡A©Î¬O¤@­Ó¦V¶q¡A¨Ò¦pÀWÃЩάO MFCC¡^¡A¤£¦PªºÀ³¥Î·|»Ý­n¥Î¨ì¤£¦PªºÁn¾Ç¯S¼x¡A¹q¸£¥²¶·¯à°÷¦Û°Ê¦a­pºâ³o¨Ç¯S¼x¡A¤~¯à¶i¤@¨B¶i¦æ«áÄòªº¤ÀªR©Î¤ÀÃþ¡C¥H¤U¦U¤p¸`±N»¡©ú­µ°T¿ëÃѪº¦U¶µÀ³¥Î¡A¥H¤Î¥i¯à¥Î¨ìªºÁn¾Ç¯S¼x¤Î¬ÛÃöªº¾÷¾¹¾Ç²ß¤èªk¡C

§@·~

  1. ¤@¬q­µ°Tªº¨ú¼ËÀW²v¬O 16 kHz¡A­Y¬O­µ®Øªø«×¬O 320 ­Ó¨ú¼ËÂI¡A½Ð¦^µª¤U¦C°ÝÃD¡G
    1. ¦pªG­µ®Ø¤§¶¡ªº­«Å|¬O 120 ÂI¡A¨º»ò¹ïÀ³ªº­µ®Ø²v¬O¡H
    2. ¦pªG­µ®Ø²v¬O 100 frame/sec¡A«h­µ®Ø¤§¶¡ªº­«Å|À³¸Ó¬O´XÂI¡H
  2. °²³]§Ú±q§Úªº»y­µ°T¸¹©â¥X¤@­Ó­µ®Ø¡A¦p¤U¹Ï¡C¦pªG¨ú¼ËÀW²v¬O 8 kHz¡A½Ð­pºâ³o­Ó­µ®Øªº°ò¥»ÀW²v¡C¡]¦b¿ï¨ú°ò¥»¶g´Á¨Ó¶i¦æ¥­§¡®É¡A°ò¥»¶g´Áªº­Ó¼Æ¥²¶·¶V¦h¶V¦n¡A¥H¨Dí©w¡C¡^


    ¹Ï 5.¡G½Ð¥Ñ¦¹­µ®Ø­pºâ¹ïÀ³ªº°ò¥»ÀW²v¡C

  3. ½Ð±qºô¸ô¤W´M§ä¸ê°T¡A¨Ó¸ÑÄÀ¤U¦C¦Wµü¡]½ÐºÉ¶q¨Ï¥Î¼Æ¾Ç¤èµ{¦¡¨Ó»¡©ú¡^¡A¨Ã»¡©ú¦b¤é±`¥Í¬¡¤¤¡A¦ó®É·|¹J¨ì³o¨Ç²{¶H¡G
    1. ©çÀW¡]beat frequency¡^
    2. §ù´¶°Ç®ÄÀ³¡]Doppler effect¡^

Audio Signal Processing and Recognition (­µ°T³B²z»P¿ëÃÑ)