Åý¹q¸£¯à°÷Å¥À´¤Hªº¹ï¸Ü¡A¤@ª½¬O¤HÃþªø¤[¥H¨Óªº¹Ú·Q¡Cªñ¦~¨Ó¥Ñ©ó¹q¸£³t«×ªº´£¤É¡A»yµ¿ëÃѪºÀ³¥Î¤]¶V¨Ó¶V´¶¹M¡A¨Ò¦p´¼¼z«¬¤â¾÷ªº»yµÀ³¥Î¡]¦pÄ«ªG¤â¾÷ªº Siri »yµ§U²z ©Î¬O¦w¨ô¤â¾÷ªº»yµÂà¤å¦r¡^¡B´¼¼zµ½c¡]¦p Amazon Alexa¡BGoogle Home¡^µ¥¡A³£¬O²`¤J¤H̤é±`¥Í¬¡ªº»yµ¿ëÃѹê»ÚÀ³¥Î¡C
»yµ¿ëÃѪºÀ³¥Î¡A¥i¥H®Ú¾Ú¤£¦Pªº¤è¦¡¨Ó¤ÀÃþ¡C²Ä¤@ºØ¤è¦¡¡A¬O®Ú¾Ú»yµ¿ëÃѨt²Îªº¨Ï¥ÎªÌ¨Ó¤ÀÃþ¡G
- »yªÌ¬ÛÃö¡]speaker dependent¡^¡G¨t²Îªº¨Ï¥ÎªÌ¥u©w¯S©w¤H¤h¡C
- »yªÌ¿W¥ß¡]speaker independent¡^¡G¨t²Îªº¨Ï¥ÎªÌ¥i¥H³q¥Î©ó¤@¯ë¤H¤h¡C
²Ä¤GºØ¤è¦¡¡A¬O®Ú¾Ú»yµ¿ëÃѨt²Îªº¥\¯à¨Ó¤ÀÃþ¡A¨Ì·ÓÃø«×¨Ó°Ï¤À¡A¥i¥H¦C¥X¦p¤U¡G¡G
- »yµ©R¥O¡]voice command¡^¡G¨Ï¥ÎªÌ¤U¹F¤@¥y»yµ©R¥O¡A¨t²Î±q¦³©R¥O¶°¤¤§ä¥X³Ì¦³¥i¯àªº©R¥O¡A¨Ã°õ¦æ¬ÛÃö°Ê§@¡C
- ÃöÁäµü°»´ú¡]keyword spotting¡^¡G¨Ï¥ÎªÌ¤U¹F¤@¥y»yµ¡]¨Ò¦p¡u½ÐÀ°§Ú¬d¸ß¤µ¤Ñ¤Ñ®ð¦p¦ó¡v¡^¡A¨t²Î¥i¥H°»´ú³o¤@¥y»yµ¬O§_§t¦³¯S©wªº¤º®e¡]¦p¡u¤µ¤Ñ¡v¤Î¡u¤Ñ®ð¡v¡^¡C
- Å¥¼g¡]dictation¡^¡G¨Ï¥ÎªÌ¤U¹F¤@¬q»yµ¡]¨Ò¦p¤@¬q·s»D¼½³ø¡^¡A¨t²Î¥i¥H¦Û°Ê²£¥Í¥¿½T³v¦r½Z¡A¨Ò¦p youtube ªº¦r¹õ²£¥Í¨t²Î¡C
- ¹ï¸Ü¡]dialog¡^¡G¨Ï¥ÎªÌ¥i¥Hª½±µ©M¹q¸£¹ï¸Ü¡A¹q¸£¦¬¨ì¤@¬q»yµ«á¡A¯à°÷ÁA¸Ñ¨Ï¥ÎªÌªº·N¹Ï¡]intention¡^¡A¨Ã¥H»yµ©Î¤å¦r¶i¦æ¦X²z¥B¥¿½Tªº¦^ÂСC¤@¯ëªº»yµ²á¤Ñ¾÷¾¹¤H¡]chatbot¡^¡A´N¬OÄÝ©ó³oºØ¨t²Î¡C
¦b«Ø¥ß»yµ¿ëÃѨt²Î¤§«e¡A§ÚÌ¥²¶·¥ý±q»yµ°T¸¹¤¤¤Á¥Xµ®Ø¡AµM«á±qµ®Ø¤¤©â¥X¸òµ¦â¬ÛÃöªº¯S¼x¡A´Á¤¤³Ì±`¥Îªº¯S¼x´N¬O MFCC¡A³o¬O¤@Ó¦b»yµ¿ëÃѳ̱`¥Î¨ìªº¯S¼x¡A¨C¤@Óµ®Ø³q±`¥i¥H©â¥X 13¡B26 ©Î 39 ºûªº MFCC ¦V¶q¡A³o¤è±ªº»¡©ú©Îpºâ¥N½X¡A³£¥i¥H¥Ñºô¸ô¤W¬d¨ì¡C
®Ú¾Ú¤W±ªº¤ÀÃþ¡A³Ì²³æªº»yµ¿ëÃѨt²Î¡A´N¬O¡u»yªÌ¬ÛÃöªº»yµ©R¥O¿ëÃѨt²Î¡v¡A³q±`´N¬O¡u¥Î¦Û¤vªºÁnµ¤ñ¹ï¦Û¤vªºÁnµ¡v¡A¨Ò¦p¦´Áªº¤â¾÷¡]¦p Sony Ericson T18¡^¡A§A¥i¥H¹w¿ý´X²Õ»yµ¡A¨C¤@Ó»yµ¹ïÀ³¨ì¤@²Õ¹q¸Ü¸¹½X¡A¨Ò¦p¡u©ÔÄÑ¡v¹ïÀ³¨ì©ÔÄÑ©±ªº¹q¸Ü¡A¦]¦¹·í§A¹ï¤â¾÷³Û¡u©ÔÄÑ¡v®É¡A¨t²Î·|¹ï§A¿é¤JªºÁnµ¥H¤Î¤w¸g¹w¿ý¦nªºÁnµ¶i¦æ¤ñ¹ï¡AY¤ñ¹ï¥¿½T¡A¤â¾÷´N·|¦Û¦æ¼·¹q¸Ü¨ì©ÔÄÑ©±¡C¡]¦ý¬O¡A¹ï¹q¸£¦Ó¨¥¡A¤HÃþªº»yµÅܤƫ׷¥¤j¡AY¬Oª÷«°ªZªº©f©f¹ïª÷«°ªZªº¤â¾÷³Û¡u©ÔÄÑ¡v¡A¤£¨£±o¦³®Ä¡A¦]¬°¤º³¡¥Î¨Ó¤ñ¹ïªº¿ýµ¬Oª÷«°ªZªºÁnµ¡A¦Ó¤£¬O¥L©f©fªºÁnµ¡C¡^
Yn«Ø¥ß¡u»yªÌ¬ÛÃöªº»yµ©R¥O¿ëÃÑ¡v¨t²Î¡A³Ì°ò¥»ªº¤èªk¡A´N¬O¨Ï¥Î°ÊºA®É¶¡§á¦±¡]dynamic time warping, DTW¡^¨Ó¶i¦æ¤ñ¹ï¡A³o¬O¤@Ó°ò©ó°ÊºA³W¹º¡]dynamic programming, DP¡^ªº¤èªk¡A¥¦¥i¥H®Ú¾ÚÁ¿¸Üªºµ¦â¨Ó¶i¦æ¤ñ¹ï¡A¦P®É¤]·|°w¹ï¤£¦Pªº»yµ³t«×¨Ó¶i¦æ§½³¡¦ùÁY¡A¥H¹F¨ì³Ì¦nªº¹ï¦ì¡]alignment¡^®ÄªG¡A¥Ü·N¹Ï¦p¤U¡C
¹Ï 5.¡GDTW ¤ñ¹ï«á©Ò±o¨ìªº³Ì¨Î¸ô®|¹Ï¡C ¥H¤Wz½d¨Ò¦Ó¨¥¡A¦bX¶b©MY¶bªº»yµ¤º®e³£¬O¡u²MµØ¤j¾Ç¡v¡A¦ý¬OY¶bªº»yµ¤ñ¸û¥Ã¡AX¶bªº»yµ«h¬O«e±§Ö«á±ºC¡A¸g¥Ñ DTW ªº¹ï¦ì¡A¥i¥H§ä¨ì¨âªÌ³Ì¨Îªº¹ï¦ì¡A¶i¦Ó¨D¥X¨â¬q»yµªº³Ìµu¶ZÂ÷¡C¦]¦¹¡AYn«Ø¥ß¤@Ó¡u»yªÌ¬ÛÃöªº»yµ©R¥O¿ëÃѨt²Î¡v¡A¥un½Ð¨Ï¥ÎªÌ¥ý¹w¿ý¤@²Õ»yµ©R¥O¡]¨C¤@Ó»yµ©R¥O¥i¥H¿ý»s¦h¦¸¡A¨Ò¦p¤T¦¸¡^¡A·í¨Ï¥ÎªÌµo¥X´ú¸Õ»yµ®É¡A´N¥i¥H¶i¦æºÝÂI°»´ú¡]endpoint detection¡^¨Ãpºâ MFCC¡A³Ì«á®³³o¤@²Õ MFCC ©M¹w¿ý»yµ©R¥Oªº MFCC ¨Ó¶i¦æ DTW ¤ñ¹ï¡A¶ZÂ÷³Ìµuªºªº»yµ©R¥O¡A´N¬O§ÚÌn§äªºµª®×¡C
¦Ó³Ì½ÆÂøªº»yµ¿ëÃѨt²Î¡A´N¬O¡u»yªÌµLÃöªº¹ï¸Ü¨t²Î¡v¡A¨Ò¦pÄ«ªG¤â¾÷ªº Siri »yµ§U²z¡B¨È°¨»¹ªº Alexa ´¼¼zµ½c¡A¥H¤Î Google Home »yµ§U²z¡A³o¨Ç¨t²Î´N¹³¬OµêÀÀ§U²z¯ë¡A³£¥i¥H©M¤H̶i¦æ²³æªº¹ï¸Ü¡A¦P®ÉÂÇ¥ÑÁA¸Ñ¨Ï¥ÎªÌªº·N¹Ï¡AÀ°¦£¤HÌ°µ¤@¨Ç²³æ¨Æ±¡¡A¦p¹w©w¨®²¼¡B¬d¸ß¤Ñ®ð©Î¹q¼vµ¥¡CYn«Øºc¦¹Ãþ¨t²Î¡A¨º´Nn§ï¥Î¤ñ¸û½ÆÂøªºÁn¾Ç¼Ò«¬¨Ó¶i¦æ¡A»yµªº¯S¼xÁÙ¬O MFCC¡A¦ý¬O§ÚÌn¨Ï¥Î¤£¦PªºÁn¾Ç¼Ò«¬¨Ó¥Nªí¤£¦Pªºµ¦â¡]¤lµ©Î¥Àµµ¥¡^¡A¨Ã®Ú¾Ú¦¹Án¾Ç¼Ò«¬¨Óºâ¥X¤@ÓMFCC¦V¶q©Ò¹ïÀ³ªº¾÷²v±K«×¡]probability density¡^¡CÁ|¨Ò¨Ó»¡¡A§ÚÌ¥i¥H¦¬¶° 100 ¤H©Òµo¥Xªº¥Àµ¡u£«¡v¡A¤Á¥X¨Óµ®Ø«á¡A¨C¤@Óµ®Ø¦A©â©â¥X 39 ºûªº MFCC ¦V¶q¡A¦A¨Ï¥Î¤@Ó°ªºû«×ªº¾÷²v±K«×¨ç¼Æ¡]probability density function, PDF¡^¨Ó«Ø¥ß³o¨Ç MFCC ¦V¶qªºÁn¾Ç¼Ò«¬¡A¦Ó«Ø¥ß¦¹¼Ò«¬³Ì±`¥Îªº¤èªk´N¬O³Ì¤j¦üµM²v¦ô´úªk¡]maximum likelihood estimate, MLE¡^¡C¤@¯ë³Ì±`¥Îªº PDF ¬O GMM (Gaussian mixture models)¡A¬O¥Ñ¤@²Õ°ª´µ¾÷²v±K«×¨ç¼Æ¡]Gaussian PDF¡^ªº¥[Åv¥§¡©Ò²Õ¦¨¡A®Ú¾Ú³Ì¤j¦üµM²v¦ô´úªk¡A§ÚÌ´N¥i¥H®Ú¾Ú©Òµ¹ªº¤@²Õ MFCC ¦V¶q¨Ópºâ GMM ªº³Ì¨Î°Ñ¼ÆÈ¡A¥]§t¨C¤@Ó°ª´µ¾÷²v±K«×¨ç¼Æªº¥§¡¦V¶q¡]mean vector¡^©M¦@Åܲ§¯x°}¡]covariance matrix¡^¡A¥H¤Î³o¨Ç¨ç¼Æªº¥[ÅvÅv«¡]weighting factors¡^¡C
¥H¤U¬O¨Ï¥Î°ª´µ PDF ¤Î GMM PDF ¨Ó¹ï 1-D ¸ê®Æ¶i¦æ«Ø¼Òªº¨å«¬½d¨Ò¡G
¹Ï 5.¡G¤@ºû°ª´µ PDF ªº½d¨Ò¡C
¹Ï 5.¡G¤@ºû GMM PDF ªº½d¨Ò¡A¦¹ GMM PDF ¥Ñ¤TÓ°ª´µ PDF ªº¥[Åv¥§¡©Ò²Õ¦¨¡C ¥H¤U¬O¨Ï¥Î°ª´µ PDF ¤Î GMM PDF ¨Ó¹ï 2-D ¸ê®Æ¶i¦æ«Ø¼Òªº¨å«¬½d¨Ò¡G
¹Ï 5.¡G¤Gºû°ª´µ PDF ªº½d¨Ò¡C
¹Ï 5.¡G¤Gºû GMM PDF ªº½d¨Ò¡A¦¹ GMM PDF ¥Ñ¥|Ó°ª´µ PDF ªº¥[Åv¥§¡©Ò²Õ¦¨¡C ¨Ì·Ó MLE ªº¤èªk¡A§Ṳ́]¥i¥H¹ï 39-D ªº MFCC ¨Ó¶i¦æ«Ø¼Ò¡A¥u¤£¹L¬O§ÚÌ«ÜÃø¥Î²³æªº¦±±¹Ï©Îµ¥°ª½u¨ÓÀ˵ø«Ø¼Ò¤§«áªºµ²ªG¡C
¨Ï¥Î GMM ¨Ó«Ø¥ßÁn¾Ç¼Ò«¬¡AÁÙ¬O¤@Ó¤ñ¸û°ò¥»ªº¤èªk¡C¦pªG§Ú̦Ҽ{µoµÀH®É¶¡¦ÓÅܪº±¡ªp¡A¨º»ò¨Ï¥Î¤@Ó³æ¤@ªºPDF¨Ó«Ø¥ßÁn¾Ç¼Ò«¬¬O¤£¦X²zªº¡C¨Ò¦p¥Àµ¡u£¯¡v¦bµoµªº¹Lµ{¤¤¡A§Ú̪º¼L§Î¬O³sÄòÅܤƪº¡A°ò¥»¤W¬O¥Ñ¡u£«¡vÅܨì¡u£¸¡v¡A¦]¦¹Yn«Ø¥ß§óºë·ÇªºÁn¾Ç¼Ò«¬¡A§ÚÌ¥i¥H§ï¥ÎHMM¡]hidden Markov models¡^¡A³o¬O¤@ӥΩó´yz§Ç¦C¡]sequences¡^ªº¾÷²v±K«×¨ç¼Æ¡A¨C¤@ÓHMM¥Ñ¼ÆÓª¬ºA¡]state¡^©Ò²Õ¦¨¡A¨C¤@Óª¬ºA´N¬O¤@ÓÀRºAªºPDF¡A¦Óª¬ºA¤§¶¡ªºÂಾ¥i¥H¥ÑÂಾ¾÷²v¡]transition probability¡^¨Óªí¥Ü¡C¥H¤U¬O¤@Ө㦳¤TÓª¬ºAªºHMM¼Ò«¬ªº¥Ü·N¹Ï¡G
Á|¨Ò¨Ó»¡¡AY¨Ï¥ÎHMM¨Ó¥Nªí¡u£¯¡vªºÁn¾Ç¼Ò«¬¡A¨º§Ú§ÚÌ¥i¥H¨Ï¥Î3Óª¬ºA¡A¨C¤@Óª¬ºA´N¬O¥Ñ¤@ÓGMM¨Ó¥Nªí¡Aª¬ºA¤§¶¡ªºÂಾ¥i¥H¨Ï¥Î¤@Ó 3x3 ªºÂಾ¾÷²v¯x°}¡]transition probability matrix¡^¨Ó¥Nªí¡C³oÓÁn¾Ç¼Ò«¬ªº°Ñ¼Æ¡]¥]§t¤TÓGMMªº°Ñ¼Æ¥H¤ÎÂಾ¾÷²v¯x°}¡^¡A¤]¬O¥ÑMLEªº¤èªk¨Ópºâ±o¥X¡A¦ý¥Ñ©ó¨Æ¥ý¨Ã¤£ª¾¹D¨C¤@Óµ®ØªºMFCC¦V¶q¬OÄÝ©óþ¤@Óª¬ºA¡A¦]¦¹¦b¹ê°µ¤W¥²¶·³v¦¸¶i¦æ¤À°t¡A³Ì«á¹F¤j³Ì¤jªº¦üµM²v¡A³oÓ¤èªkºÙ¬°¤ÀÂ_¦¡ k-means (segmental k-means)¡A¨BÆJ¦p¤U¡G
¹Ï 5.¡G¨ã¦³¤TÓª¬ºAªº HMM ¥Ü·N¹Ï¡C
- ¹ï©ó¨C¤@Ó»y¥y¡A¨Ï¥ÎDP¨Ó±N»y¥yªºMFCC¦V¶q¤À°t¨ì¨C¤@Óª¬ºA¡C
- ¹ï©ó¨C¤@Óª¬ºA¡A®Ú¾Ú³Q¤À¬£¨ìªº©Ò¦³MFCC¦V¶q¨Ópºâ¹ïÀ³ªºGMM³Ì¨Î°Ñ¼Æ¡C
- ®Ú¾Ú¨C¤@Óµ®Ø©Ò³Q¤À°t¨ìªºª¬ºA¡A¨ÓpºâÂಾ¾÷²v¯x°}¡C
- ¸õ¦^¨BÆJ¤@¡Aª½¨ì©Ò¦³ªº°Ñ¼Æ¦¬ÀÄ¡C
¨Ï¥ÎHMM¨Óªí¥Ü¤@ÓÁn¾Ç¼Ò«¬¡A³q±`±o¨ìªº®ÄªG·|§ó¦n¡A¦]¬°¥¦¯à°÷ªí¥Ü¤@ÓµoµÀH®É¶¡¦ÓÅܤƪº²{¶H¡C
¦b¹ê»Ú¿ëÃѨt²Î¤¤¡A§Ú̳q±`·|§ó¥J²Ó¦a±N©Ò¦³µoµ°Ï¤À¬°§ó°ò¥»ªº°ò¥»µoµ³æ¦ì¡AºÙ¬°µ¯À¡]phoeme¡^¡A³o¬O¤HÃþ»yµ¤¤¡A¯à°÷°Ï§O¤£¦Pµoµªº³Ì¤pÁnµ³æ¦ì¡A¦]¦¹§ÚÌ·|®Ú¾Úµ¯À¨Ó«Ø¥ßÁn¾Ç¼Ò«¬¡A¦Ó¤£¬O³æ¥Hª`µ²Å¸¹¤¤ªº¤lµ©Î¥Àµ¨Ó«Ø¥ß¼Ò«¬¡C¨Ò¦p¡G
¨Ò¦p¡AY¤£¦Ò¼{Án½Õ¡A¡u§A¦n¡vªºª`µ²Å¸¹¬O¡u£z£¸-£~£±¡v¡Aº~»y«÷µ¬O¡uni-hao¡v¡AÂà´«¦¨µ¯Àªºµ²ªG«h³£¬O¡un_i-h_a_u¡v¡C
- ¡u£«¡v¥i¥H¨Ï¥Î¤@Óµ¯À¨Óªí¥Ü¡Ga
- ¡u£¯¡v¥i¥H¨Ï¥Î¨âÓµ¯À¨Óªí¥Ü¡Ga ©M i¡]¤]´N¬O¡u£«¡v©M¡u£¸¡vªº¦ê±µ¡^
- ¡u£±¡v¥i¥H¨Ï¥Î¨âÓµ¯À¨Óªí¥Ü¡Ga ©M u¡]¤]´N¬O¡u£«¡v©M¡u£¹¡vªº¦ê±µ¡^
¦¹¥~¡A¬°¤F¯à°÷§óºë·Ç¦a§ì¥X¤£¦Pªºµoµ¡A§ÚÌ·|±Nµ¯À¦A²Ó¤À¦UºØ±¡ªp¨Ó¶i¦æÁn¾Ç¼Ò«¬ªº«Ø¼Ò¡A¥H¡u¥¦w¡v¡]£u£¸£¶-£³ ©Î ping-an¡^¬°¨Ò¡G
¥i¥H·Q¨£¡A¨Ï¥Îmonophone¨Ó«Ø¥ßªºÁn¾Ç¼Ò«¬·|¤ñ¸û²Ê²¤¡A¦ý¬O¦û¥ÎªÅ¶¡¤p¡A¥B»Ýnªº°V½m¸ê®Æ¶q©Mpºâ¸ê·½»Ý¨D³£¤ñ¸û¤Ö¡F¦Ó¨Ï¥Îtriphone«Ø¥ßªºÁn¾Ç¼Ò«¬·|¤ñ¸ûºë½o¡A¦ý¬O¦û¥ÎªÅ¶¡¤j¡A¥B»Ýnªº°V½m¸ê®Æ¶q©Mpºâ¸ê·½»Ý¨D³£·|¬Û¹ï¤ñ¸û¤j¡C
- monophone¡G¥H³æµ¯À¨Ó«Ø¥ßÁn¾Ç¼Ò«¬¡A¥i±o¨ì p-i-ng-a-n
- biphone¡G¥H¥k¬ÛÃö¡]right-content dependent, RCD¡^µ¯À¨Ó«Ø¥ßÁn¾Ç¼Ò«¬¡A¥i±o¨ì sil+p, p+i, i+ng, ng+a, a+n, n+sil¡C¡]sil¥Nªísilence¡C¡^
- triphone¡G¥H¥ª¥k¬ÛÃöµ¯À¨Ó«Ø¥ßÁn¾Ç¼Ò«¬¡A¥i±o¨ì sil+p-i, p+i-ng, i+ng-a, ng+a-n, a+n-sil¡C
¦]¦¹¡A¹ï©ó¤@¥y¤å¥y¡A§ÚÌ¥i¥H¥ýÂà¥X«÷µ¡AµM«á®Ú¾Ú«÷µÂà¥Xµ¯À§Ç¦C¡AµM«á´N¥i¥H±Nµ¯À§Ç¦C¦AÂà´«¦¨HMMÁn¾Ç¼Ò«¬ªº¦ê±µ¡CY¥H¤å¥y¡u§A¦n¡v¨Ó»¡©ú¡A«Ø¥ßbiphone sequence¡]¤£¦Ò¼{ leading silence¡^ªº¨BÆJ¦p¤U¡G
- Âà«÷µ¡G§A¦n ==> £z£¸-£~£± ©Î ni-hao
- Â൯À¡G£z£¸-£~£± ©Î ni-hao ==> n_i-h_a_u
- Âàbiphone: n+i, i+h, h+a, a+u, u+sil
- ¦ê±µ¦¨HMM¼Ò«¬¡A¦p¤U¡G
¹Ï 5.¡G¹ïÀ³¨ì¡u§A¦n¡vªº HMM ¼Ò«¬¥Ü·N¹Ï¡C °w¹ï¤@¥y»yµ¡A§ÚÌ¥i¥H¥ýºâ¥X¹ïÀ³ªºMFCC¦V¶q²Õ¡AµM«á´N¥i¥H±N³oÓ¦V¶q²Õ°e¨ì³oӦ걵ªºÁn¾Ç¼Ò«¬¡A¨Ï¥Î Viterbi search ¡]³o¤]¬O¤@ºØDPªº¤èªk¡^¨Ó±o¨ì³oÓ»yµ¹ï©ó³oÓHMMªº³Ì¤j¦üµM²v¡A§ÚÌ¥i¥H·Q¹³³oÓ¹Lµ{Ãþ¦ü¦b¶ñªí¡A·í§Ú̧¹¦¨¶ñªí¡A´N¥i¥Hª¾¹D¨CÓµ®Øn¤À°t¨ìþ¤@Óª¬ºA¡A¤~¯à±o¨ì¦üµM²vªº³Ì¤jÈ¡A¹Ï§Î»¡©ú¦p¤U¡G
¥Ñ Viterbi search ©Ò±o¨ìªº¦üµM²v¡A¥i¥H·Q¹³¦¨¬O»yµ»P¤å¥yªº²Å¦Xµ{«×¡A¦üµM²v¶V°ª¡A¥Nªí³o¤@¬q»yµ°T¸¹¶V¦³¥i¯à¬O¹ïÀ³¨ì³o¤@Ó¤å¥y¡CY¦³¥i¿ëÃѪºnÓ©R¥O¤å¥y¡A§ÚÌ´N¥i¥Hºâ¥XnÓ¦üµM²v¡A¦üµM²v³Ì°ªªº¤å¥y¡A´N¹ïÀ³¨ì»yµ©R¥O¿ëÃѪº³Ì¥i¯àµ²ªG¡A³o´N¬O¡u»yªÌµLÃöªº»yµ©R¥O¿ëÃÑ¡vªº°ò¥»ì²z¡C¦b¨t²Î¹ê§@®É¡A³oÓªí®æ¥i¯à«Ü¤j¡]¨Ò¦p¤Q¬íªº»yµ´N¤j¬ù·|¦³1000Óµ®Ø¡A5Ó¦rªº¤å¥y´N·|²£¥Í¤j¬ù30ÓHMMªºª¬ºA¡]°²³]¨CÓ¦r¥§¡¥Ñ6ÓHMMª¬ºA¨Ó¥Nªí¡^¡A¦]¦¹§A´N¥²¶·¹ï3¸UÓÀx¦s®æ¶i¦æ¶ñªí¡^¡AY¬O¥i¿ëÃÑ©R¥O¦³¤@¸UÓ¤å¥y¡]¥§¡¨C¤@¥y¦³5Ó¦r¡^¡A¨º¾ãÅé¹Bºâ´N»Ýn¶ñ¤J3»õÓÀx¦s®æ¡I¦b¹ê»Ú¹Bºâ®É¡A§Ú̳q±`ÁÙ·|¶i¦æ¦UºØÀu¤Æ¤Î²¤Æ¡A¥H«K¯à°÷¹F¨ì§Y®É¿ëÃѪºn¨D¡C
¹Ï 5.¡G§ÚÌ¥i¥H¨Ï¥Î Viterbi search ±N¨C¤@Óµ®Ø¤À°t¨ì HMM ªºª¬ºA¡A¥H±o¨ì³Ì¤jªº¦üµM²v¡C ¦pªG§Ú̧ó¶i¤@¨B·Q¶i¦æ§ó½ÆÂøªº¡uÅ¥¼g¡v¡A¨º´Nn¦Ò¼{¨ì¨CÓ¤HÁ¿¸Ü®É¡A¨ì©³·|¥Î¨ìþ¤@¨Çµü¡A¥H¤Î³o¨Çµü¦b¦ê±µ®Éªº¥i¯à©Ê¡C¥Î¨Ópºâ³o¨Ç¥i¯à©Êªº¼Æ¾Ç¼Ò«¬ºÙ¬°¡u»y¨¥¼Ò«¬¡v¡]language model¡^¡A©M¤§«e©Ò»¡©úªºÁn¾Ç¼Ò«¬è¦n¦b»yµ¿ëÃѧêºt¬Û»²¬Û¦¨ªº¨¤¦â¡C¤@¯ëªº»y¨¥¼Ò«¬¬O¥H n-gram ¼Ò«¬¬°¥D¡An-gram ´N¬OnÓµüªº¦ê±µ¡A¦]¦¹Â²³æ¦a»¡¡A¤@Ó¼Ò«¬¥i¥Hpºâ¤@²Õµü¦ê±µ¦b¤@°_ªº¾÷²v¡C¥H^¤å¬°¨Ò¡AY¤@¥y»yµ³Q¿ëÃѦ¨¨âºØ¥i¯à¡G
³o¨âÓ¤å¥yªºµoµ¬Û·í±µªñ¡A¦ý¬O§ÚÌY±Ò¥Î»y¨¥¼Ò«¬¡A´N·|ª¾¹D¤@¯ë¤H·|Á¿¡uwrech a nice beach¡vªº¾÷²v»·§C©ó¡urecognize speech¡v¡A¦]¦¹¹q¸£À³¸Ó·|¿ï¾Ü²Ä¤@Ó¤å¥y¬°¿ëÃѵ²ªG¡A³o¤]¬O§ÚÌnªº¥¿½Tµª®×¡C¦b¹ê»Ú¹Bºâ®É¡A§Ú̳q±`ÁÙ·|¥H¾ð¡]trees¡^©Î¹Ï¡]graphs¡^¨Ó«Ø¥ß§ó½ÆÂøªº¸ê®Æµ²ºc¡]¨Ò¦p word lattice¡^¡A¨Ã¦b¦¹¸ê®Æµ²ºc¶i¦æ¦UºØ·j´M¤ÎÀu¤Æ¡A¥H´Á±æ¦b¥i§Ô¨üªº®É¶¡¤º¡]¦p¤@¬í¡^¯à°÷¦^¶Ç¿ëÃѵ²ªG¡A¦ý³o¤è±²o¯A¬Û·í¦h¸ê®Æµ²ºc©Mºtºâªkªº²Ó¸`¡A¦b¦¹¤£¦AÂØz¡C¨å«¬ªº word lattice ½d¨Ò¦p¤U¡]¨Ó·½¡Ghttp://berlin.csie.ntnu.edu.tw/SpeechProject/Research/Transcription/Acoustic_Lookahead.htm¡^¡G
- It's hard to recognize speech.
- It's hard to wreck a nice beach.
¹Ï 5.¡G¨å«¬ªº word lattice¡A¥i¥[¤J»y¨¥¼Ò«¬¥Hpºâ¿ëÃѵ²ªG¡C ¤Wz¨Ï¥ÎHMMªº¤èªk¡A¤w¸g³Q¥Î¤F¼Æ¤Q¦~¡A¦ý¬Oªñ´Á¬y¦æªºDNN¡]deep neural networks¡^¤èªk¡A©Òªº¨ìªº¿ëÃѮĪG§ó¦n¡A¦ý¬O»Ýnªºpºâ¶q§ó¤j¡A¨ä°ò¥»·§©À¬O¨Ï¥ÎDNN¨Ó¨ú¥NGMM¡]¦]¦¹ì¨ÓªºGMM-HMMªº¬[ºc´N³Q¨ú¥N¦¨ DNN-HMM¡^¡A¨Ã¨Ï¥ÎGPU¨Ó¶i¦æ¤j¶qªºÀu¤Æ¹Bºâ¡A©Ò¥H¤~¯à±o¨ì§ó¦nªº¿ëÃѮĪG¡C
Audio Signal Processing and Recognition (µ°T³B²z»P¿ëÃÑ)