[english][all] (½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH^¤åª©¥»¦P¨B§ó·s¡I)
Slides
¶¥¼h¦¡¤À¸sªk¡]hierarchical clustering¡^³z¹L¤@ºØ¶¥¼h¬[ºcªº¤è¦¡¡A±N¸ê®Æ¼h¼h¤ÏÂЦa¶i¦æ¤Àµõ©Î»E¦X¡A¥H²£¥Í³Ì«áªº¾ðª¬µ²ºc¡A±`¨£ªº¤è¦¡¦³¨âºØ¡G
¥»¸`±N°w¹ï»E¦X¦¡¡]¥Ñ¤U¦Ó¤W¡^¶¥¼h¤À¸sªk¨Ó¶i¦æ»¡©ú¡C
- ¦pªG±Ä¥Î»E¦Xªº¤è¦¡¡A¶¥¼h¦¡¤À¸sªk¥i¥Ñ¾ðª¬µ²ºcªº©³³¡¶}©l¡A±N¸ê®Æ©Î¸s»E³v¦¸¦X¨Ö
- ¦pªG±Ä¥Î¤Àµõªº¤è¦¡¡A«h¥Ñ¾ðª¬µ²ºcªº³»ºÝ¶}©l¡A±N¸s»E³v¦¸¤Àµõ¡C
»E¦X¦¡¶¥¼h¤À¸sªk¡]agglomerative hierarchical clustering¡^¥Ñ¾ðª¬µ²ºcªº©³³¡¶}©l¼h¼h»E¦X¡C¤@¶}©l§Ú̱N¨C¤@µ§¸ê®Æµø¬°¤@Ó¸s»E¡]cluster¡^¡A°²³]§Ú̲{¦b¾Ö¦³nµ§¸ê®Æ¡A«h±N³onµ§¸ê®Æµø¬°nÓ¸s»E¡A¥ç§Y¨CÓ¸s»E¥]§t¤@µ§¸ê®Æ¡G
¤U¦C½d¨Ò®i¥Ü»E¦X¦¡¶¥¼h¤À¸sªk©Ò²£¥Íªº¾ðª¬¹Ï¡]dendrogram¡^¡C
- ±N¨Cµ§¸ê®Æµø¬°¤@Ó¸s»E Ci, i=1 1 to n.
- §ä¥X©Ò¦³¸s»E¶¡¡A¶ZÂ÷³Ì±µªñªº¨âÓ¸s»E Ci¡BCj
- ¦X¨Ö Ci¡B Cj ¦¨¬°¤@Ó·sªº¸s»E
- °²¦p¥Ø«eªº¸s»E¼Æ¥Ø¦h©ó§Ú̹w´Áªº¸s»E¼Æ¥Ø¡A«h¤ÏÂЫ½Æ¨BÆJ¤G¦Ü¥|¡Aª½¨ì¸s»E¼Æ¥Ø¤w±N°¨ì§ÚÌ©Òn¨Dªº¼Æ¥Ø¡C
¦b¨BÆJ¤G¤¤¡A¦ó¿×¡u¶ZÂ÷³Ì±µªñªº¨âÓ¸s»E Ci¡BCj¡v©O¡H¨Æ¹ê¤W¦b©w¸q¨âÓ¸s»E¤§¶¡ªº¶ZÂ÷¡A¦³¦UºØ¤£¦Pªº¤è¦¡¡A¨C¤@ºØ¤è¦¡©Ò±o¨ìªºµ²ªG³£¤£¤Ó¬Û¦P¡C³o¨Ç±`¥Îªº¸s»E¶ZÂ÷ªº©w¸q¥i¥H»¡©ú¦p¤U¡C
- ³æ¤@³sµ²»E¦Xºtºâªk¡]single-linkage agglomerative algorithm¡^¡G¸s»E»P¸s»E¶¡ªº¶ZÂ÷¥i¥H©w¸q¬°¤£¦P¸s»E¤¤³Ì±µªñ¨âÂI¶¡ªº¶ZÂ÷¡G $$d(C_i, C_j)=\min_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} d(\mathbf{a}, \mathbf{b})$$
- §¹¾ã³sµ²»E¦Xºtºâªk¡]complete-linkage agglomerative algorithm¡^¡G¸s»E¶¡ªº¶ZÂ÷©w¸q¬°¤£¦P¸s»E¤¤³Ì»·¨âÂI¶¡ªº¶ZÂ÷¡G $$d(C_i, C_j)=\max_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} d(\mathbf{a}, \mathbf{b})$$
- ¥§¡³sµ²»E¦Xºtºâªk¡]average-linkage agglomerative algorithm¡^¡G¸s»E¶¡ªº¶ZÂ÷«h©w¸q¬°¤£¦P¸s»E¶¡¦UÂI»P¦UÂI¶¡¶ZÂ÷Á`©Mªº¥§¡¡]¨ä¤¤¡A ªí¥Ü ªº¸ê®ÆӼơC¡^¡G $$d(C_i, C_j)=\sum_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} \frac{d(\mathbf{a}, \mathbf{b})}{|C_i||C_j|},$$ where $|C_i|$ and $|C_j|$ are the sizes for $C_i$ and $C_j$, respectively.
- ¨U¼wªk¡]Ward's method¡^¡G¸s»E¶¡ªº¶ZÂ÷©w¸q¬°¦b±N¨â¸s¦X¨Ö«á¡A¦UÂI¨ì¦X¨Ö«áªº¸s¤¤¤ßªº¶ZÂ÷¥¤è©M¡]¨ä¤¤¡Am ªí¥Ü Ci¡åCj ªº¥§¡È¡^ $$d(C_i, C_j)=\sum_{\mathbf{a}\in C_i \cup C_j} \|\mathbf{a}-\mathbf{\mu}\|,$$ where $\mathbf{\mu}$ is the mean vector of $C_i \cup C_j$.
§ÚÌ¥i¥H¨Ï¥Î¤£¦Pªº¸s»E¶ZÂ÷¨Ó²£¥Í¶¥¼h¦¡¸s»E¾ð¡A©Ò±o¨ìªºµ²ªG¦p¤U¡G
¥Ñ¤Wz¸s»E¾ð¡A§ÚÌ¥i¥Hµoı¤U¦C¯S©Ê¡G
- single linkage ·|¦b¸s»Eªº¹Lµ{¤¤²£¥Í¡u¤jªÌ«í¤j¡vªº®ÄªG¡C
- ¦Ó complete linkage ©M average linkage ¤ñ¸û®e©ö²£¥Í¡u»ôÀY¨Ã¶i¡vªº®ÄªG¡C
YnÆ[¬Ý¦b¸s»E¹Lµ{¤¤ªº°ÊºA®i¥Ü¡A½Ð¨£¤U¤@Ó½d¨Ò¡]¦¹½d¨Ò¥H single linkage ¬°¸s»E¶ZÂ÷¡^¡G
¨Æ¹ê¤W§ÚÌ¥i¥HÃÒ©ú¡A¦pªG¸ê®Æ¶°¬O¤Gºû¥±¤WªºÂI©Ò¦¨ªº¶°¦X¡A¦Ó¥B§Ú̱ĥΠsingle linkage¡A«h©Ò±o¨ìªº³sµ²¹Ï§Y¬O³o¨ÇÂIªº minimum spanning tree¡C
¾ãÅé¨Ó»¡¡A¶¥¼h¦¡¤À¸sªkªºÀuÂI¦p¤U¡G
¦ý¬O¶¥¼h¦¡¤À¸sªk¤]¦³¯ÊÂI¡A¥¦³q±`¥u¾A¥Î©ó¤Ö¶q¸ê®Æ¡A«ÜÃø³B²z¤j¶q¸ê®Æ¡C
- ·§©À²³æ¡A¥i¥Î¾ðª¬µ²ºc¨Óªí²{¾ãÓpºâ¹Lµ{¡C
- ¥u»Ýn¸ê®ÆÂI¨â¨â¤§¶¡ªº¶ZÂ÷¡A´N¥i¥H«Øºc¤À¸sµ²ªG¡A¦Ó¤£»Ýn¸ê®ÆÂIªº¹ê»Ú®y¼Ð¡C¡]¦]¦¹¨C¤@Ó¸ê®ÆÂI¥i¥H¥Ü¤@Óª«¥ó¡A¦Ó¤£¥²¬OªÅ¶¡¤¤ªº¤@ÂI¡C¡^
Data Clustering and Pattern Recognition (¸ê®Æ¤À¸s»P¼Ë¦¡¿ë»{)