8-1 Introduction to Dynamic Programming (妧AW)

[chinese][english]

(請注意:中文版本並未隨英文版本同步更新!)

Slides

Dynamic programming (DP) is an effective method for finding an optimum solution to a multi-stage decision problem, as long as the solution can be defined recursively. DP roots in the principle of optimality proposed by Richard Bellman:

「動態規劃」(Dynamic Programming,簡稱 DP)是一個很有效的方法來求得一個問題的最佳解,DP 的精神是來自於 Richard Bellman 所提出的 Principle of Optimality:

An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

In terms of path finding problems, the principle of optimality states that any partial path of the optimum path is also an optimum path given the starting and ending nodes. This is an obvious principle that can be proved by contradiction. In practice, a number of seemingly unrelated applications can be effectively solved by DP.

簡單地說,就是在一條最佳路徑上,其中任一條子路徑(Partial Path)也都必須是相關子問題的最佳路徑,否則原路徑就不是最佳路徑。這是一個很明顯的道理,可以很直覺地使用矛盾法來加以證明,可是在實際應用上面,可以有許多複雜的變形。

To employ the procedure of DP to solve a problem, usually we need to specify the following items:

  1. Define the optimum-value function.
  2. Derive the recursive formula for the optimum-value function, together with its initial condition.
  3. Specify the answer of the problem in term of the optimum-value function.

We shall give an example of DP for a path finding problem shown in the following graph:

舉例來說,以下列的圖形為例:

In the above graph, we assume that:

假設:

From the start point, how can we find a path that requires the minimum time to pass through node 7? This is a typical problem of DP which can be solved systematically and efficiently by the following three steps:

假設起點是 Start,終點是 Node 7,請問我們如何找到一條路徑,使得從起點到終點所花的時間最短?這是一個很典型的 DP 問題,我們可以經由下列三個步驟來迅速地解決這個問題:

  1. First of all, we can define the optimum-value function t(h) as the minimum time from the start point to node h (including the time for passing node h).
  2. Secondly, the optimum-value function should satisfy the following recursive formula:
    t(h) = min{t(a)+p(a,h), t(b)+p(b,h), t(c)+p(c,h)} + q(h)

    In the above equation, we assume the fan-ins of node h are a, b, and c. Please refer to the next figure.

    在上述方程式中,我們假設連接到 Node h 的節點有三個,分別是 a, b, c。示意圖如下:

    And the initial condition is t(0)=0 where 0 is the start node.
  3. And finally, the answer to the original problem should be t(7). By using the recursion, we can have the following steps for computing the time required by the optimum path:
    1. t(0) = 0
    2. t(1) = t(0)+4+5 = 9
    3. t(2) = t(0)+2+4 = 6
    4. t(3) = t(0)+3+1 = 4
    5. t(4) = min(9+3, 6+2)+2 = 10
    6. t(5) = min(6+5, 4+1)+4 = 9
    7. t(6) = min(6+5, 4+8)+6 = 17
    8. t(7) = min(10+1, 9+3, 17+3)+1 = 12
    The value of the optimum-value function is shown as the red number in each node in the following figure:

    Hint
    You can open a new window by clicking the above figure, and then click the newly opened window to see each step of the recursion by DP.

    Hint
    你可以點選上圖以開啟新視窗,然後就可以使用滑鼠點選新視窗,看到 DP 逐步計算的結果。

The above formulation of DP is usually referred to as the forward DP. On the other hand, we can define the formula for backward DP in a similar manner:
  1. Firstly, the optimum-value function s(h) is defined as the minimum time from a node h to the end node (including the time for passing the end node but excluding the time for passing node h.)
  2. Secondly, the optimum-value function should satisfy the following recursive formula:
    s(h) = q(h) + min{p(h,a)+s(a), p(h,b)+s(b), p(h,c)+s(c)}
    where a, b, c are the fan-out of node h. The initial condition is s(7) = q(7).
  3. Finally, the answer to the original problem is s(0). By using the recursion, we can have the following steps for computing the time required by the optimum path:
    1. s(7) = 1
    2. s(6) = q(6)+3+s(7) = 6+3+1 = 10
    3. s(5) = q(5)+3+s(7) = 4+3+1 = 8
    4. s(4) = q(4)+1+s(7) = 2+1+1 = 4
    5. s(3) = q(3)+min{1+s(5), 8+s(6)} = 1 + min{9, 18} = 10
    6. s(2) = q(2)+min{2+s(4), 5+s(5), 5+s(6)} = 4 + min{6, 13, 15} = 10
    7. s(1) = q(1)+3+s(4) = 5 + 3 + 4 = 12
    8. s(0) = min{4+s(1), 2+s(2), 3+s(3)} = min(16, 12, 13} = 12
    The answer obtained from backward DP is the same as that of the forward DP.

The efficiency of DP is based on the fact that we can use some previous results iteratively. Some common characteristics of DP are summarized as follows:

換句話說,在我們計算 DP 的過程中,可以反覆利用先前所計算出來的結果,而省掉許多不必要的運算。DP 的特性,可列出如下:

  1. 在計算 DP 的過程中,必須在每個節點記錄最佳路徑的來源,才能在走到終點後,經由回溯(Back Tracking)來找到整個過程的最佳路徑。
  2. 若要尋求第二最佳路徑,則必須使用較複雜的 Top-N 計算方法,而不能只靠上述簡單的 DP 算法。

Data Clustering and Pattern Recognition (資料分群與樣式辨認)