3-1 Introduction (蝪∩?)

[chinese][all]

The objective of data clustering is to identify clusters within the given dataset, such that similar data instances are likely to be within the same cluster. The original dataset is thus decomposed into disjoint (or fuzzy) clusters, with each cluster having a center to represent the cluster. We can use the cluster ceters (also known as centroids or prototypes) to represent the original dataset to acheve the following goals:

In general, clustering algorithms can be divided into two types:

Each data clustering task has similiar procedures:

  1. Collect dataset.
  2. Apply a certain clustering algorithm to get clustering results.
  3. Test the clustering results.
  4. If the test passes, stops. Otherwise go back to step 2 to repeat the clustering process.

Vector quantization (VQ) is a specific field of data clustering which emphasizes on algorithmic aspect of minimizing a distortion measure for a large amount of data. VQ is commonly used for image and audio data, with a goal similar to partitional clustering but a process similar to hierarchical clustering.


Data Clustering and Pattern Recognition (資料分群與樣式辨認)