[chinese][english] Slides
Hierarchical clustering tries to combine or divide a dataset into clusters iteratively, such that a tree-like hierarchical structure is created, primarily for data visualization. To construct such a hierarchical tree structure, we can adopt two approaches:
We shall focus on the agglomerative (bottom-up) hierarchical clustering in this section.
- Agglomerative hierarchical clustering: This is a bottom-up approach. We can start from the tree leaves (or data instances) and combine two nearest clusters into one at each iteration.
- Divisive hierarchical clustering: This is a top-down approach. We can start from the root of the tree (which contains all data instances) and select a cluster to split at each iteration.
顶糷Αだ竤猭hierarchical clustering硓筁贺顶糷琜篶よΑ盢戈糷糷は滦秈︽だ吊┪籈玻ネ程攫挡篶盽ǎよΑΤㄢ贺
セ竊盢皐癸籈Αパτ顶糷だ竤猭ㄓ秈︽弧
- 狦蹦ノ籈よΑ顶糷Αだ竤猭パ攫挡篶┏场秨﹍盢戈┪竤籈硋Ωㄖ
- 狦蹦ノだ吊よΑ玥パ攫挡篶郴狠秨﹍盢竤籈硋Ωだ吊
For agglomerative hierarchical clustering, we start with the leaves of the tree and move up. In other words, if we have a dataset of size n, then we have n clusters at the beginning, with each data point being a cluster. Then we merge clusters iteratively to move up the tree structure. The algorithm is described next.
The following example demonstrates the dendrogram after agglomerative hierarchical clustering.
- At the beginning, each data point is a cluster denoted by $C_i, i=1, 2, \dots, n$.
- Find the nearest two clusters $C_i$ and $C_j$ within all clusters.
- Combine $C_i$ and $C_j$ into a new cluster.
- If the number of clusters is equal to the desired one, stop. Otherwise go back to step 2 to create more clusters.
籈Α顶糷だ竤猭agglomerative hierarchical clusteringパ攫挡篶┏场秨﹍糷糷籈秨﹍и盢–掸戈跌竤籈cluster安砞и瞷局Τn掸戈玥盢硂n掸戈跌n竤籈ョ–竤籈掸戈
絛ㄒ甶ボ籈Α顶糷だ竤猭┮玻ネ攫瓜dendrogram
- 盢–掸戈跌竤籈 Ci, i=1 1 to n.
- т┮Τ竤籈丁禯瞒程钡ㄢ竤籈 CiCj
- ㄖ Ci Cj Θ穝竤籈
- 安ヘ玡竤籈计ヘи箇戳竤籈计ヘ玥は滦狡˙艼竤籈计ヘ盢и┮璶―计ヘ
In order to make the algorithm more concrete, We need to define what is meant by "the nearest two clusters". In fact there are several distance functions to compute the distance between two clusters. Different cluster distance functions result in different tree structures. Commonly used cluster distance functions are listed next.
- Single-linkage agglomerative algorithm: The distance between two clusters is the the shortest distance between members of these two clusters: $$d(C_i, C_j)=\min_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} d(\mathbf{a}, \mathbf{b})$$
- Complete-linkage agglomerative algorithm: The distance between two clusters is the longest distance between members of these two clusters: $$d(C_i, C_j)=\max_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} d(\mathbf{a}, \mathbf{b})$$
- Average-linkage agglomerative algorithm: The distance between two clusters is the average distnace between members of these two clusters: $$d(C_i, C_j)=\sum_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} \frac{d(\mathbf{a}, \mathbf{b})}{|C_i||C_j|},$$ where $|C_i|$ and $|C_j|$ are the sizes for $C_i$ and $C_j$, respectively.
- Ward's method: The distance between two clusters is defined as the sum of the squared distance to the mean of the combined clusters: $$d(C_i, C_j)=\sum_{\mathbf{a}\in C_i \cup C_j} \|\mathbf{a}-\mathbf{\mu}\|,$$ where $\mathbf{\mu}$ is the mean vector of $C_i \cup C_j$.
˙艼い孔禯瞒程钡ㄢ竤籈 CiCj㎡ㄆ龟﹚竡ㄢ竤籈ぇ丁禯瞒Τ贺ぃよΑ–贺よΑ┮眔挡狦常ぃび硂ㄇ盽ノ竤籈禯瞒﹚竡弧
- 虫硈挡籈簍衡猭single-linkage agglomerative algorithm竤籈籔竤籈丁禯瞒﹚竡ぃ竤籈い程钡ㄢ翴丁禯瞒 $$d(C_i, C_j)=\min_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} d(\mathbf{a}, \mathbf{b})$$
- Ч俱硈挡籈簍衡猭complete-linkage agglomerative algorithm竤籈丁禯瞒﹚竡ぃ竤籈い程环ㄢ翴丁禯瞒 $$d(C_i, C_j)=\max_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} d(\mathbf{a}, \mathbf{b})$$
- キА硈挡籈簍衡猭average-linkage agglomerative algorithm竤籈丁禯瞒玥﹚竡ぃ竤籈丁翴籔翴丁禯瞒羆㎝キАㄤい ボ 戈计 $$d(C_i, C_j)=\sum_{\mathbf{a}\in C_i, \mathbf{b}\in C_j} \frac{d(\mathbf{a}, \mathbf{b})}{|C_i||C_j|},$$ where $|C_i|$ and $|C_j|$ are the sizes for $C_i$ and $C_j$, respectively.
- ║紈猭Ward's method竤籈丁禯瞒﹚竡盢ㄢ竤ㄖ翴ㄖ竤いみ禯瞒キよ㎝ㄤいm ボ Ci″Cj キА $$d(C_i, C_j)=\sum_{\mathbf{a}\in C_i \cup C_j} \|\mathbf{a}-\mathbf{\mu}\|,$$ where $\mathbf{\mu}$ is the mean vector of $C_i \cup C_j$.
We apply different cluster distance functions to obtain the corresponding tree structures, as follows:
иㄏノぃ竤籈禯瞒ㄓ玻ネ顶糷Α竤籈攫┮眔挡狦
From the resultant tree structures, we can observe the following trends:
- Single linkage tends to make the tree slide to one side, with bigger clusters stay big and smaller clusters stay small.
- Complete linkage tends to generate balanced trees, with all clusters growing big simultaneously.
パ瓃竤籈攫и祇谋疭┦
If you want to see the animation of the clustering process, try the next example:
- single linkage 穦竤籈筁祘い玻ネ狦
- τ complete linkage ㎝ average linkage ゑ耕甧玻ネ霍繷秈狦
璝璶芠竤籈筁祘い笆篈甶ボ叫ǎ絛ㄒ絛ㄒ single linkage 竤籈禯瞒
It can be proved that the resultant connections via single linkage over a 2D dataset is actually the minimum spanning tree of the dataset.
ㄆ龟и靡狦戈栋琌蝴キ翴┮Θ栋τи蹦ノ single linkage玥┮眔硈挡瓜琌硂ㄇ翴 minimum spanning tree
In general, the advantages of hierarchical clustering can be summarized as follows:
The major disadvantage is that hierarchical clustering can only handle a small amount of data.
- The concept is quite strqightforward, and the result can be represented by a tree structure.
- You only need to have the pairwise distance to start clustering. That is, you don't need to represent objects in a cluster as points in a high-dimensional space.
俱砰ㄓ弧顶糷Αだ竤猭纔翴
琌顶糷Αだ竤猭Τ翴ウ硄盽続ノぶ秖戈螟矪瞶秖戈
- 阀├虏虫ノ攫挡篶ㄓ瞷俱璸衡筁祘
- 惠璶戈翴ㄢㄢぇ丁禯瞒碞篶だ竤挡狦τぃ惠璶戈翴龟悔畒夹–戈翴ボンτぃゲ琌丁い翴
Data Clustering and Pattern Recognition (戈だ竤籔妓Α侩粄)