6-1 Intro. to Recognition Rate Estimate of Classifiers (簡)

[chinese][english]
(請注意：中文版本並未隨英文版本同步更新！)
Slides
Once we have constructed a classifier using a certain pattern recognition method, we need to evaluate its performance objectively. The performance evaluation of a classifier usually involves two factors:
所謂分類器的「效能評估」（performance evaluation），是指我們在設計一個分類器之後，如何以一個有效的方式來預估此分類器的能力，通常可以分為兩部分來評估：

Recognition rate: The higher, the better. Some people prefer to use the error rate, which is equal to 1 minus the recognition rate.
Computation load: The lower, the better. In fact, we have two types of computation loads:

Computation load at the design stage
Computation load at the application stage

運算量：越小越好，此部分又包含

設計時的運算量
辨識時的運算量

辨識率：越高越好。「辨識率」（recognition rate）是指的是發生分類錯誤的機率，與辨識率相對的另一個名詞是「錯誤率」（error rate），指的是正確分類的機率，兩者總和應該等於100%。

The computation load of a classifier depends on the underlying classifier a lot, which we shall not go into detail in this chapter. Instead, the focus of this chapter is to cover several methods for estimating the ideally true recognition rate for a given classifier and a dataset.
不同的分類器，會有不同的運算量，本章將重點放在辨識率的估測，而不討論運算量。
Moreover, for a simple binary classification problem, the misclassified cases can be divided into two types of false positive and false negative. We shall also address the issue of selecting a threshold for the classifier based on the cost of false positive and false negative.
由於在現實世界中，所有的樣本資料（sample data）都是有限的，資料的收集過程本身就要耗費時間與人力，因此樣本資料也就益形珍貴。樣本資料越多，我們設計出來的分類器也會越精準，但是為了測試所設計出來的分類器的效能，所以在進行樣式辨識系統的設計流程中，我們會將所有的樣本資料切成兩部分：

訓練資料（training data）：又稱為「設計資料」（design data），我們用此資料來設計分類器。
測試資料（test data）：我們用此資料來測試分類器的效能。
不同的資料切分方式，就對應到不同的錯誤率估測方式，請見各小節詳述。
Data Clustering and Pattern Recognition (資料分群與樣式辨認)