3-1 Introduction (嚙踝蕭嚙踝蕭嚙踝蕭)

[chinese][english]
The objective of data clustering is to identify clusters within the given dataset, such that similar data instances are likely to be within the same cluster. The original dataset is thus decomposed into disjoint (or fuzzy) clusters, with each cluster having a center to represent the cluster. We can use the cluster ceters (also known as centroids or prototypes) to represent the original dataset to acheve the following goals:

Data visualization
Data compression
Noise supression
Computation reduction

資料分群（data clustering）或是分群演算法（clustering algorithms）是一種將資料分類成群的方法，其主要的目的乃在於找出資料中較相似的幾個群聚（clusters），並找出各個群聚的代表點，稱為中心點（centroids）或是原型（prototypes）。使用這些中心點來代表原先大量的資料點，就可以達到兩個基本目標：

資料目視分析
資料壓縮
雜訊降低
降低計算量

Data Clustering and Pattern Recognition (資料分群與樣式辨認)