22-3 語音辨?

qťHܡA@OH[HӪڷQC~ӥѩqtתɡAyѪΤ]VӶVMAҦpzyΡ]pīG Siri yUz άOwyr^Bzc]p Amazon AlexaBGoogle Home^AO`JH̤`ͬyѹΡC

yѪΡAiHھڤP覡ӤCĤ@ؤ覡AOھڻyѨtΪϥΪ̨ӤG

ĤGؤ覡AOھڻyѨtΪ\ӤA̷רӰϤAiHCXpUGG

bإ߻yѨtΤeAڭ̥qyTXءAMqؤX򭵦SxA̱`ΪSxNO MFCCAoO@Ӧbyѳ̱`Ψ쪺SxAC@ӭسq`iHX 13B26 39 MFCC VqAo譱έpNXAiHѺWdC

Hint

ھڤWA²檺yѨtΡANOuy̬yROѨtΡvAq`NOuΦۤvnۤvnvAҦp]p Sony Ericson T18^AAiHwXջyAC@ӻy@չqܸXAҦpuѡvѩqܡA]AۡuѡvɡAtη|AJnHΤwgwnniAY勵TAN|ۦ漷qܨѩC]OAqӨAHyܤƫ׷jAYOZffZۡuѡvAoġA]ΨӤ諸OZnAӤOLffnC^

Hint
T18 Ѽsi]ϥλy̬yROѡ^A2000~Ghttps://www.youtube.com/watch?v=M4hFuUYBGf0

Ynإߡuy̬yROѡvtΡA̰򥻪kANOϥΰʺAɶᦱ]dynamic time warping, DTW^ӶiAoO@ӰʺAW]dynamic programming, DP^kAiHھܪӶiAPɤ]|w藍PytרӶi槽YAHF̦n]alignment^ĪGAܷNϦpUC


5.GDTW ұo쪺̨θ|ϡC

HWzdҦӨAbXbMYbyeOuMؤjǡvAOYbyíAXbyhOe֫᭱CAg DTW AiH̨̳ΪAiӨDXqy̵uZC]AYnإߤ@ӡuy̬yROѨtΡvAunШϥΪ̥w@ջyRO]C@ӻyROiHshAҦpT^AϥΪ̵oXջyɡANiHiI]endpoint detection^íp MFCCA̫᮳o@ MFCC MwyRO MFCC Ӷi DTW AZ̵uyROANOڭ̭n䪺סC

ӳ̽yѨtΡANOuy̵LܨtΡvAҦpīG Siri yUzBȰ Alexa zcAH Google Home yUzAoǨtδNOUzAiHMH̶i²檺ܡAPǥAѨϥΪ̪NϡAH̰@²ƱApwwBdߤѮιqvCYnغctΡANnΤnǼҫӶiAySx٬O MFCCAOڭ̭nϥΤPnǼҫӥNP]lΥ^AîھڦnǼҫӺX@MFCCVqҹvKס]probability density^C|ҨӻAڭ̥iH 100 HҵoXuvAXӭثAC@ӭئAX 39 MFCC VqAAϥΤ@ӰתvKרơ]probability density function, PDF^ӫإ߳o MFCC VqnǼҫAӫإߦҫ̱`ΪkNO̤jMvk]maximum likelihood estimate, MLE^C@̱`Ϊ PDF O GMM (Gaussian mixture models)AOѤ@հvKרơ]Gaussian PDF^[vҲզAھڳ̤jMvkAڭ̴NiHھکҵ@ MFCC Vqӭp GMM ̨ΰѼƭȡA]tC@ӰvKרƪVq]mean vector^M@ܲx}]covariance matrix^AHγoǨƪ[vv]weighting factors^C

HUOϥΰ PDF GMM PDF ӹ 1-D ƶiؼҪ嫬dҡG


5.G@ PDF dҡC


5.G@ GMM PDF dҡA GMM PDF ѤTӰ PDF [vҲզC

HUOϥΰ PDF GMM PDF ӹ 2-D ƶiؼҪ嫬dҡG


5.GG PDF dҡC


5.GG GMM PDF dҡA GMM PDF ѥ|Ӱ PDF [vҲզC

̷ MLE kAڭ̤]iH 39-D MFCC ӶiؼҡAuLOڭ̫²檺ϩεu˵ؼҤ᪺GC

Hint
@PDFҺXӪƭȡAOvKסAObڹB⤤Aڭ̱`nvK׶isAɭPƭȶVӶVpqƭȹBeͻ~tAFקKDAڭ̳q`NvKרơAPɱNusvאּus[vAHCqƭȹB~tC]Aڭ̳q`NuvKתƭȡv]log probability density^٬Mv]likelihood^C

ϥ GMM ӫإnǼҫA٬O@Ӥ򥻪kCpGڭ̦Ҽ{oHɶܪpAϥΤ@ӳ@PDFӫإnǼҫOXzCҦpuvboL{Aڭ̪LάOsܤƪA򥻤WOѡuvܨuvA]YnإߧǪnǼҫAڭ̥iHHMM]hidden Markov models^AoO@ӥΩyzǦC]sequences^vKרơAC@HMMѼƭӪA]state^ҲզAC@ӪANO@RAPDFAӪAಾiHಾv]transition probability^ӪܡCHUO@Ө㦳TӪAHMMҫܷNϡG


5.G㦳TӪA HMM ܷNϡC

|ҨӻAYϥHMMӥNuvnǼҫAڧڭ̥iHϥ3ӪAAC@ӪANOѤ@GMMӥNAAಾiHϥΤ@ 3x3 ಾvx}]transition probability matrix^ӥNConǼҫѼơ]]tTGMMѼƥHಾvx}^A]OMLEkӭpoXAѩƥäDC@ӭتMFCCVqOݩ@ӪAA]b갵WvitA̫Fj̤jMvAoӤk٬_ k-means (segmental k-means)ABJpUG
  1. C@ӻyyAϥDPӱNyyMFCCVqtC@ӪAC
  2. C@ӪAAھڳQ쪺ҦMFCCVqӭpGMM̨ΰѼơC
  3. ھڨC@ӭةҳQt쪺AAӭpಾvx}C
  4. ^BJ@AҦѼƦġC

ϥHMMӪܤ@nǼҫAq`o쪺ĪG|nA]ܤ@ӵoHɶܤƪ{HC

bڿѨtΤAڭ̳q`|JӦaNҦoϤ򥻪򥻵oA٬]phoeme^AoOHyAϧOPo̤pnA]ڭ̷|ھڭӫإnǼҫAӤOH`ŸlΥӫإ߼ҫCҦpG

ҦpAYҼ{nաAuAnv`ŸOuz-~vA~yOuni-haovAഫGhOun_i-h_a_uvC

~AFǦaXPoAڭ̷|NAӤUرpӶinǼҫؼҡAHuwv]u- ping-an^ҡG

iHQAϥmonophoneӫإߪnǼҫ|ʲAOΪŶpABݭnVmƶqMp귽ݨD֡FӨϥtriphoneإߪnǼҫ|oAOΪŶjABݭnVmƶqMp귽ݨD|۹jC

]A@yyAڭ̥iHXAMھګXǦCAMNiHNǦCAഫHMMnǼҫ걵CYHyuAnvӻAإbiphone sequence]Ҽ{ leading silence^BJpUG

  1. GAn ==> z-~ ni-hao
  2. ୵Gz-~ ni-hao ==> n_i-h_a_u
  3. biphone: n+i, i+h, h+a, a+u, u+sil
  4. 걵HMMҫApUG


    5.GuAnv HMM ҫܷNϡC

w@yyAڭ̥iHXMFCCVqաAMNiHNoӦVqհeoӦ걵nǼҫAϥ Viterbi search ]o]O@DPk^ӱooӻyoHMM̤jMvAڭ̥iHQoӹL{bAڭ̧ANiHDCӭحnt@ӪAA~oMv̤jȡAϧλpUG


5.Gڭ̥iHϥ Viterbi search NC@ӭؤt HMM AAHo̤jMvC

Viterbi search ұo쪺MvAiHQOyPyŦX{סAMvVANo@qyTViOo@ӤyCYiѪnөROyAڭ̴NiHXnӦMvAMv̰yANyROѪ̥i൲GAoNOuy̵LyROѡv򥻭zCbtι@ɡAoӪiܤj]ҦpQyNj|1000ӭءA5ӦryN|ͤj30HMMA]]CӦr6HMMAӥN^A]AN3Uxsi^AYOiѩRO@UӤy]C@y5Ӧr^ABNݭnJ3xsIbڹBɡAڭ̳q`ٷ|iUuƤ²ơAHKFYɿѪnDC

pGڭ̧i@BQiuťgvANnҼ{CӤHܮɡA쩳|Ψ@ǵAHγoǵb걵ɪiʡCΨӭpoǥiʪƾǼҫ٬uyҫv]language model^AMeһnǼҫnbyѧtۻۦC@몺yҫOH n-gram ҫDAn-gram NOnӵ걵A]²aA@ӼҫiHp@յ걵b@_vCH^嬰ҡAY@yyQѦإiG

oӤyo۷AOڭ̭YҥλyҫAN|D@H|uwrech a nice beachvvCurecognize speechvA]qӷ|ܲĤ@ӤyѵGAo]Oڭ̭nTסCbڹBɡAڭ̳q`ٷ|H]trees^ιϡ]graphs^ӫإߧƵc]Ҧp word lattice^AæbƵciUطjMuơAHbiԨɶ]p@^^ǿѵGAo譱oA۷hƵcMtkӸ`AbAحzC嫬 word lattice dҦpU]ӷGhttp://berlin.csie.ntnu.edu.tw/SpeechProject/Research/Transcription/Acoustic_Lookahead.htm^G


5.G嫬 word latticeAi[JyҫHpѵGC

WzϥHMMkAwgQΤFƤQ~AOy檺DNN]deep neural networks^kAҪ쪺ѮĪGnAOݭnpqjA򥻷OϥDNNӨNGMM]]ӪGMM-HMM[cNQN DNN-HMM^AèϥGPUӶijquƹBAҥH~onѮĪGC


Audio Signal Processing and Recognition (TBzP)