¤@¯ë»yµ¿ëÃѤ¤¡A·|¥HÁn¾Ç¼Ò«¬¨Ó§@¬°»yµ¿ëÃѪº°ò¥»³æ¦ì¡A¦]¦¹n¶i¦æ»yµ¸ê®Æªº°V½m¨Ó¨D¨úÁn¾Ç¼Ò«¬ªº°Ñ¼Æ®É¡A´N¥²¶·n¥ý½T»{Án¾Ç¼Ò«¬ªºµ²ºc¡C¥H¤U¥ý¤¶²Ð´XÓ±`¥Îªº¦Wµü¡G¥H§Ú̱`¥Îªº»yµ¿ëÃѨt²Î¦Ó¨¥¡A¬O¥H biphone ¬°Án¾Ç¼Ò«¬ªº³æ¦ì¡A®Ú¾Ú¥Ñµ¸`¨ìMixtureªº¶¥¼h¬[ºc¡A§ÚÌ¥i¥Hµe¥X¤U¦C¥Ü·N¹Ï¡G
- Án¾Ç¼Ò«¬¡]Acoustic Model¡A©Î²ºÙ Model¡^¡G¨Ï¥Î©ó HMM ªº¤@Ó©â¶H³æ¦ì¡A³q±`¤@ÓÁn¾Ç¼Ò«¬¥]§t¼ÆÓª¬ºA¡C§ÚÌ¥i¥H¨Ï¥Îµ¸`©Î¬Oµ¯À§@¬°¤@ÓÁn¾Ç¼Ò«¬¡C
- µ¸`¡]Syllables¡^¡G§¹¾ãµoµªº³æ¦ì¡A¥H¤¤¤å¨Ó»¡¡A¤@Ó¦r¤¸¹ïÀ³¤@Óµ¸`¡F¥H^¤å¨Ó»¡¡A¤@Óµü·J¥i¥H¹ïÀ³¨ì¼ÆÓµ¸`¡A¨Ò¦p tomorrow ¦³¤TÓµ¸`¡C
- µ¯À¡]Phoneme¡^¡G©Î²ºÙ Phone¡A¬Oµoµªº³Ì¤p³æ¦ì¡A¨Ò¦p¡u¤j¡vªºµoµ¥i¥H©î¸Ñ¦¨£x©M£«¨âÓµ¯À¡A¦ý¬Oµ¯Àªº©î¸Ñ¨Ã«D¤@¦¨¤£ÅÜ¡A¨Ò¦p¸I¨ì·Æ¥Àµ¡A§Ú̳q±`´N·|±N¤@Óª`µ²Å¸¹©î¦¨¨âÓµ¯À¡A¨Ò¦p£¯¡B£°¡B£±¡B£²µ¥¡A³o´XÓ¥Àµ¦bµoµ¹Lµ{¤¤¡A³£·|§e²{³sÄòªºÅܤơC
- Monophone¡G¥H³æ¤@µ¯À§@¬°¤@ÓÁn¾Ç¼Ò«¬¡A¨Ò¦p£v¡C
- Biphone¡G¥H³sÄò¨âÓµ¯À§@¬°Án¾Ç¼Ò«¬¡A³q±`¬O RCD (Right-context dependent)¡A¨Ò¦p±N£v¥X²{©ó£v-£«©M£v-£¸µø¬°¨âÓ¤£¦PªºÁn¾Ç¼Ò«¬¡C
- Triphone¡G¥H³sÄò¤TÓµ¯À§@¬°Án¾Ç¼Ò«¬¡A¨Ò¦p±N£¹¦b£}+£¹-£³¤Î£|+£¹-£µµø¬°¨âÓ¤£¦PªºÁn¾Ç¼Ò«¬¡C
¦b¤W¹Ï¤¤¡A¨C¤@Ó state ¤S¤À¦¨¤TÓ stream¡A¤À§O¬O MFCC¡B£GMFCC¡B£G£GMFCC¡A¥Ñ©ó MFCC ¬O³Ì«nªº»yµ¯S¼x¡A¦]¦¹§Ų́ϥΠ6 Ó mixture ¨Ó¹ï MFCC «Ø¼Ò¡A¦Ü©ó £GMFCC ¤Î £G£GMFCC¡A§Ú̦U¥Î¨âÓ mixture ¨Ó«Ø¼Ò¡C Y¥H¿ëÃѺô¸ô¤Î HMM ªºÆ[ÂI¨Ó¬Ý¡A¥Ü·N¹Ï¦p¤U¡G
¤W¹Ï¤¤¯S§Oµù©ú¤F¤TºØ transition: ¤@¥¹½T»{Án¾Ç¼Ò«¬ªº¬[ºc¡A§ÚÌ´N¥i¥H¨Ï¥Î HTK ¨Ó¹ï¤j¶q»y®Æ©â¨ú¥XÁn¾Ç¼Ò«¬ªº¾÷²v°Ñ¼Æ¡A½Ð¨£¤U¤@¸`ªº»¡©ú¡C
- Type 0: Transition between syllable
- Type 1: Transition between model
- Type 2: Transition between state
Audio Signal Processing and Recognition (µ°T³B²z»P¿ëÃÑ)