3-2 Hierarchical Clustering (���h�����s�k)

[chinese][english]

Slides

Hierarchical clustering tries to combine or divide a dataset into clusters iteratively, such that a tree-like hierarchical structure is created, primarily for data visualization. To construct such a hierarchical tree structure, we can adopt two approaches:

We shall focus on the agglomerative (bottom-up) hierarchical clustering in this section.

¶¥¼h¦¡¤À¸sªk¡]hierarchical clustering¡^³z¹L¤@ºØ¶¥¼h¬[ºcªº¤è¦¡¡A±N¸ê®Æ¼h¼h¤ÏÂЦa¶i¦æ¤Àµõ©Î»E¦X¡A¥H²£¥Í³Ì«áªº¾ðª¬µ²ºc¡A±`¨£ªº¤è¦¡¦³¨âºØ¡G

¥»¸`±N°w¹ï»E¦X¦¡¡]¥Ñ¤U¦Ó¤W¡^¶¥¼h¤À¸sªk¨Ó¶i¦æ»¡©ú¡C

For agglomerative hierarchical clustering, we start with the leaves of the tree and move up. In other words, if we have a dataset of size n, then we have n clusters at the beginning, with each data point being a cluster. Then we merge clusters iteratively to move up the tree structure. The algorithm is described next.

  1. At the beginning, each data point is a cluster denoted by $C_i, i=1, 2, \dots, n$.
  2. Find the nearest two clusters $C_i$ and $C_j$ within all clusters.
  3. Combine $C_i$ and $C_j$ into a new cluster.
  4. If the number of clusters is equal to the desired one, stop. Otherwise go back to step 2 to create more clusters.
The following example demonstrates the dendrogram after agglomerative hierarchical clustering.

»E¦X¦¡¶¥¼h¤À¸sªk¡]agglomerative hierarchical clustering¡^¥Ñ¾ðª¬µ²ºcªº©³³¡¶}©l¼h¼h»E¦X¡C¤@¶}©l§Ú­Ì±N¨C¤@µ§¸ê®Æµø¬°¤@­Ó¸s»E¡]cluster¡^¡A°²³]§Ú­Ì²{¦b¾Ö¦³nµ§¸ê®Æ¡A«h±N³onµ§¸ê®Æµø¬°n­Ó¸s»E¡A¥ç§Y¨C­Ó¸s»E¥]§t¤@µ§¸ê®Æ¡G

  1. ±N¨Cµ§¸ê®Æµø¬°¤@­Ó¸s»E Ci, i=1 1 to n.
  2. §ä¥X©Ò¦³¸s»E¶¡¡A¶ZÂ÷³Ì±µªñªº¨â­Ó¸s»E Ci¡BCj
  3. ¦X¨Ö Ci¡B Cj ¦¨¬°¤@­Ó·sªº¸s»E
  4. °²¦p¥Ø«eªº¸s»E¼Æ¥Ø¦h©ó§Ú­Ì¹w´Áªº¸s»E¼Æ¥Ø¡A«h¤ÏÂЭ«½Æ¨BÆJ¤G¦Ü¥|¡Aª½¨ì¸s»E¼Æ¥Ø¤w±N­°¨ì§Ú­Ì©Ò­n¨Dªº¼Æ¥Ø¡C
¤U¦C½d¨Ò®i¥Ü»E¦X¦¡¶¥¼h¤À¸sªk©Ò²£¥Íªº¾ðª¬¹Ï¡]dendrogram¡^¡C

Example 1: hierClusteringPlot01.mdata=rand(2, 50); % 50 data instances of dim 2 distMat=distPairwise(data); % Distance matrix of 50 by 50 hcOutput=hierClustering(distMat); hierClusteringPlot(hcOutput); % Plot the dendrogram

In order to make the algorithm more concrete, We need to define what is meant by "the nearest two clusters". In fact there are several distance functions to compute the distance between two clusters. Different cluster distance functions result in different tree structures. Commonly used cluster distance functions are listed next.

¦b¨BÆJ¤G¤¤¡A¦ó¿×¡u¶ZÂ÷³Ì±µªñªº¨â­Ó¸s»E Ci¡BCj¡v©O¡H¨Æ¹ê¤W¦b©w¸q¨â­Ó¸s»E¤§¶¡ªº¶ZÂ÷¡A¦³¦UºØ¤£¦Pªº¤è¦¡¡A¨C¤@ºØ¤è¦¡©Ò±o¨ìªºµ²ªG³£¤£¤Ó¬Û¦P¡C³o¨Ç±`¥Îªº¸s»E¶ZÂ÷ªº©w¸q¥i¥H»¡©ú¦p¤U¡C

We apply different cluster distance functions to obtain the corresponding tree structures, as follows:

§Ú­Ì¥i¥H¨Ï¥Î¤£¦Pªº¸s»E¶ZÂ÷¨Ó²£¥Í¶¥¼h¦¡¸s»E¾ð¡A©Ò±o¨ìªºµ²ªG¦p¤U¡G

Example 2: hierClusteringPlot02.mdata=rand(2, 50); % 50 data instances of dim 2 distMat=distPairwise(data); % Distance matrix of 50 by 50 method='single'; hcOutput=hierClustering(distMat, method); subplot(1,2,1); hierClusteringPlot(hcOutput); title(['method=', method]); method='complete'; hcOutput=hierClustering(distMat, method); subplot(1,2,2); hierClusteringPlot(hcOutput); title(['method=', method]);

From the resultant tree structures, we can observe the following trends:

¥Ñ¤W­z¸s»E¾ð¡A§Ú­Ì¥i¥Hµoı¤U¦C¯S©Ê¡G

If you want to see the animation of the clustering process, try the next example:

­Y­nÆ[¬Ý¦b¸s»E¹Lµ{¤¤ªº°ÊºA®i¥Ü¡A½Ð¨£¤U¤@­Ó½d¨Ò¡]¦¹½d¨Ò¥H single linkage ¬°¸s»E¶ZÂ÷¡^¡G

Example 3: hierClusteringAnim01.mdata=dcData(6); data=data.input; dataNum=size(data,2); distMat=distPairwise(data, data); distMat(1:dataNum+1:dataNum^2)=inf; % Diagonal elements should always be inf. method='single'; % 'single' or 'complete' level=hierClustering(distMat, method); hierClusteringAnim(data, distMat, level);

It can be proved that the resultant connections via single linkage over a 2D dataset is actually the minimum spanning tree of the dataset.

¨Æ¹ê¤W§Ú­Ì¥i¥HÃÒ©ú¡A¦pªG¸ê®Æ¶°¬O¤Gºû¥­­±¤WªºÂI©Ò¦¨ªº¶°¦X¡A¦Ó¥B§Ú­Ì±Ä¥Î single linkage¡A«h©Ò±o¨ìªº³sµ²¹Ï§Y¬O³o¨ÇÂIªº minimum spanning tree¡C

In general, the advantages of hierarchical clustering can be summarized as follows:

The major disadvantage is that hierarchical clustering can only handle a small amount of data.

¾ãÅé¨Ó»¡¡A¶¥¼h¦¡¤À¸sªkªºÀuÂI¦p¤U¡G

¦ý¬O¶¥¼h¦¡¤À¸sªk¤]¦³¯ÊÂI¡A¥¦³q±`¥u¾A¥Î©ó¤Ö¶q¸ê®Æ¡A«ÜÃø³B²z¤j¶q¸ê®Æ¡C
Data Clustering and Pattern Recognition (¸ê®Æ¤À¸s»P¼Ë¦¡¿ë»{)