3-2 Basic Acoustic Features (?ºæœ¬?²å­¸?¹å¾µ)

[english][all]

(½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH­^¤åª©¥»¦P¨B§ó·s¡I)

·í§Ú­Ì¦b¤ÀªRÁn­µ®É¡A³q±`¥H¡uµu®É¶Z¤ÀªR¡v¡]Short-term Analysis¡^¬°¥D¡A¦]¬°­µ°T¦bµu®É¶¡¤º¬O¬Û¹ïí©wªº¡C§Ú­Ì³q±`±NÁn­µ¥ý¤Á¦¨­µ®Ø¡]Frame¡^¡A¨C­Ó­µ®Øªø«×¤j¬ù¦b 20 ms ¥ª¥k¡A¦A®Ú¾Ú­µ®Ø¤ºªº°T¸¹¨Ó¶i¦æ¤ÀªR¡C¦b¤@­Ó¯S©w­µ®Ø¤º¡A§Ú­Ì¥i¥HÆ[¹î¨ìªº¤T­Ó¥D­nÁn­µ¯S¼x¥i»¡©ú¦p¤U¡G

³o¨Ç¯S¼x¥i¥Î¹Ï§Î»¡©ú¦p¤U¡G

¦pªG¬O¥Î¤HÁn¨Ó»¡©ú¡A³o¨Ç»y­µ¯S¼xªºª«²z·N¸q¦p¤U¡G

¦³Ãö³o¨Ç»y­µ¯S¼xªº§ì¨ú©M¤ÀªR¡A·|¦b«áÄò³¹¸`¦³¸Ô²Ó»¡©ú¡C¯S§O­nª`·Nªº¬O¡A³o¨Ç¯S¼x³£¬O¥Nªí¡u¤H¦Õªº·Pı¡v¡A¨Ã¨S¦³¤@©wªº¼Æ¾Ç¤½¦¡¥i´M¡A©Ò¥H·í§Ú­Ì¸ÕµÛ¦b¡u¶q¤Æ¡v³o¨Ç¯S¼x®É¡A¥u¬O®Ú¾Ú¤@¨Ç¼Æ¾Ú©M¸gÅç¨Ó¶q¤Æ¡A¨ÓºÉ¶q¹Gªñ¤H¦Õªº·Pı¡A¦ý¨Ã¤£¥Nªí³o¨Ç¡u¶q¤Æ¡v«áªº¼Æ¾Ú©Î¤½¦¡´N¥i¥H§¹¥þ¥NªíÁn­µªº¯S¼x¡C

­µ°T¯S¼x©â¨úªº°ò¥»¤è¦¡¦p¤U¡G

  1. ±N­µ°T¤Á¦¨¤@­Ó­Ó­µ®Ø¡A­µ®Øªø«×¤j¬ù¬O 20~30 ms¡C­µ®Ø­Y¤Ó¤j¡A´NµLªk§ì¥X­µ°TÀH®É¶¡Åܤƪº¯S©Ê¡F¤Ï¤§¡A­µ®Ø­Y¤Ó¤p¡A´NµLªk§ì¥X­µ°Tªº¯S©Ê¡C¤@¯ë¦Ó¨¥¡A­µ®Ø¥²¶·¯à°÷¥]§t¼Æ­Ó­µ°Tªº°ò¥»¶g´Á¡C¡]¥t¡A­µ®Øªø«×³q±`¬O 2 ªº¾ã¼Æ¦¸¤è¡A­Y¤£¬O¡A«h¦b¶i¦æ¡u³Å¥ß¸­Âà´«¡v®É¡A»Ý¸É¹s¦Ü 2 ªº¾ã¼Æ¦¸¤è¡A¥H«K¨Ï¥Î¡u§Ö³t³Å¥ß¸­Âà´«¡v¡C¡^
  2. ­Y¬O§Æ±æ¬Û¾F­µ®Ø¤§¶¡ªºÅܤƤ£¬O¤Ó¤j¡A¥i¥H¤¹³\­µ®Ø¤§¶¡¦³­«Å|¡A­«Å|³¡¤À¥i¥H¬O­µ®Øªø«×ªº 1/2 ¨ì 2/3 ¤£µ¥¡C¡]­«Å|³¡¤À¶V¦h¡A¹ïÀ³ªº­pºâ¶q¤]´N¶V¤j¡C¡^
  3. °²³]¦b¤@­Ó­µ®Ø¤ºªº­µ°T¬Oí©wªº¡A¹ï¦¹­µ®Ø¨D¨ú¯S¼x¡A¦p¹L¹s²v¡B­µ¶q¡B­µ°ª¡BMFCC °Ñ¼Æ¡BLPC °Ñ¼Æµ¥¡C
  4. ®Ú¾Ú¹L¹s²v¡B­µ¶q¤Î­µ°ªµ¥¡A¶i¦æºÝÂI°»´ú¡]Endpoint Detection¡^¡A¨Ã«O¯dºÝÂI¤ºªº¯S¼x¸ê°T¡A¥H«K¶i¦æ¤ÀªR©Î¿ëÃÑ¡C

¦b¶i¦æ¤W­z¤ÀªR®É¡A¦³´X­Ó¦Wµü±`¥Î¨ì¡A»¡©ú¦p¤U¡G

Hint
Note that these terminologies are not unified. Some papers use frame step to indicate hop size or frame rate instead. You should be cautious when reading papers with these terms.

Á|¨Ò¦Ó¨¥¡A¦pªG¨ú¼ËÀW²v fs=16000 ¥B¨C¤@­Ó­µ®Ø©Ò¹ïÀ³ªº®É¶¡¬O 25 ms¡A­«Å| 15 ms¡A¨º»ò


Audio Signal Processing and Recognition (­µ°T³B²z»P¿ëÃÑ)