[chinese][english] (請注意:中文版本並未隨英文版本同步更新!)
We can use three basic operations of "delete", "insert", and "substitute" to convert a string into another one. The edit distance between two strings is defined as the minimum number of the basic operations that are required to converting a string into another. Similarly, we can also define the costs of these operations and then find the required operations for conversion one string to another with the minimum cost.
ED 是另一種計算計算兩個字串之間距離的方法,其目標是在計算第一個字串要經過多少次的刪除(Delete)、插入(Insert)、代換(Substitute)等三個基本字串運算,才能得到第二個字串。
We can also use the concept of DP to compute the edit distance between two strings. Let the optimum-value function ED(a, b) defined as the edit distance between strings a and b. Then the recursive formula for ED is shown next, assuming the costs of "delete" and "insert" are 1, while "substitute" is 2.
The boundary conditions are ED(a, []) = len(a), ED([], b) = len(b). 相關的邊界條件則是:ED(a, []) = len(a), ED([], b) = len(b)。
- ED(ax, by) = ED(a, b) if x = y.
- ED(ax, by) = min(ED(a, b)+2, ED(ax, b)+1, ED(a, by)+1) if x ≠ y.
The following example demonstrates a typical result of edit distance of strings "prosperity" and "properties":
As shown in the plot, the circle positions indicate common characters.The next example demonstrates the result of edit distance between "execution" and "intention", which is the example used in 3.11 of "Speech and Language Processing".
Data Clustering and Pattern Recognition (資料分群與樣式辨認)![]()