8-1 Introduction to Dynamic Programming (������������)

[english][all]

(½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH­^¤åª©¥»¦P¨B§ó·s¡I)

Slides

¡u°ÊºA³W¹º¡v¡]Dynamic Programming¡A²ºÙ DP¡^¬O¤@­Ó«Ü¦³®Äªº¤èªk¨Ó¨D±o¤@­Ó°ÝÃDªº³Ì¨Î¸Ñ¡ADP ªººë¯«¬O¨Ó¦Û©ó Richard Bellman ©Ò´£¥Xªº Principle of Optimality¡G

An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

²³æ¦a»¡¡A´N¬O¦b¤@±ø³Ì¨Î¸ô®|¤W¡A¨ä¤¤¥ô¤@±ø¤l¸ô®|¡]Partial Path¡^¤]³£¥²¶·¬O¬ÛÃö¤l°ÝÃDªº³Ì¨Î¸ô®|¡A§_«h­ì¸ô®|´N¤£¬O³Ì¨Î¸ô®|¡C³o¬O¤@­Ó«Ü©úÅ㪺¹D²z¡A¥i¥H«Üª½Ä±¦a¨Ï¥Î¥Ù¬Þªk¨Ó¥[¥HÃÒ©ú¡A¥i¬O¦b¹ê»ÚÀ³¥Î¤W­±¡A¥i¥H¦³³\¦h½ÆÂøªºÅܧΡC

To employ the procedure of DP to solve a problem, usually we need to specify the following items:

  1. Define the optimum-value function.
  2. Derive the recursive formula for the optimum-value function, together with its initial condition.
  3. Specify the answer of the problem in term of the optimum-value function.

Á|¨Ò¨Ó»¡¡A¥H¤U¦Cªº¹Ï§Î¬°¨Ò¡G

°²³]¡G

°²³]°_ÂI¬O Start¡A²×ÂI¬O Node 7¡A½Ð°Ý§Ú­Ì¦p¦ó§ä¨ì¤@±ø¸ô®|¡A¨Ï±o±q°_ÂI¨ì²×ÂI©Òªáªº®É¶¡³Ìµu¡H³o¬O¤@­Ó«Ü¨å«¬ªº DP °ÝÃD¡A§Ú­Ì¥i¥H¸g¥Ñ¤U¦C¤T­Ó¨BÆJ¨Ó¨³³t¦a¸Ñ¨M³o­Ó°ÝÃD¡G

  1. First of all, we can define the optimum-value function t(h) as the minimum time from the start point to node h (including the time for passing node h).
  2. Secondly, the optimum-value function should satisfy the following recursive formula:
    t(h) = min{t(a)+p(a,h), t(b)+p(b,h), t(c)+p(c,h)} + q(h)

    ¦b¤W­z¤èµ{¦¡¤¤¡A§Ú­Ì°²³]³s±µ¨ì Node h ªº¸`ÂI¦³¤T­Ó¡A¤À§O¬O a, b, c¡C¥Ü·N¹Ï¦p¤U¡G

    And the initial condition is t(0)=0 where 0 is the start node.
  3. And finally, the answer to the original problem should be t(7). By using the recursion, we can have the following steps for computing the time required by the optimum path:
    1. t(0) = 0
    2. t(1) = t(0)+4+5 = 9
    3. t(2) = t(0)+2+4 = 6
    4. t(3) = t(0)+3+1 = 4
    5. t(4) = min(9+3, 6+2)+2 = 10
    6. t(5) = min(6+5, 4+1)+4 = 9
    7. t(6) = min(6+5, 4+8)+6 = 17
    8. t(7) = min(10+1, 9+3, 17+3)+1 = 12
    The value of the optimum-value function is shown as the red number in each node in the following figure:

    Hint
    §A¥i¥HÂI¿ï¤W¹Ï¥H¶}±Ò·sµøµ¡¡AµM«á´N¥i¥H¨Ï¥Î·Æ¹«ÂI¿ï·sµøµ¡¡A¬Ý¨ì DP ³v¨B­pºâªºµ²ªG¡C

The above formulation of DP is usually referred to as the forward DP. On the other hand, we can define the formula for backward DP in a similar manner:
  1. Firstly, the optimum-value function s(h) is defined as the minimum time from a node h to the end node (including the time for passing the end node but excluding the time for passing node h.)
  2. Secondly, the optimum-value function should satisfy the following recursive formula:
    s(h) = q(h) + min{p(h,a)+s(a), p(h,b)+s(b), p(h,c)+s(c)}
    where a, b, c are the fan-out of node h. The initial condition is s(7) = q(7).
  3. Finally, the answer to the original problem is s(0). By using the recursion, we can have the following steps for computing the time required by the optimum path:
    1. s(7) = 1
    2. s(6) = q(6)+3+s(7) = 6+3+1 = 10
    3. s(5) = q(5)+3+s(7) = 4+3+1 = 8
    4. s(4) = q(4)+1+s(7) = 2+1+1 = 4
    5. s(3) = q(3)+min{1+s(5), 8+s(6)} = 1 + min{9, 18} = 10
    6. s(2) = q(2)+min{2+s(4), 5+s(5), 5+s(6)} = 4 + min{6, 13, 15} = 10
    7. s(1) = q(1)+3+s(4) = 5 + 3 + 4 = 12
    8. s(0) = min{4+s(1), 2+s(2), 3+s(3)} = min(16, 12, 13} = 12
    The answer obtained from backward DP is the same as that of the forward DP.

´«¥y¸Ü»¡¡A¦b§Ú­Ì­pºâ DP ªº¹Lµ{¤¤¡A¥i¥H¤ÏÂЧQ¥Î¥ý«e©Ò­pºâ¥X¨Óªºµ²ªG¡A¦Ó¬Ù±¼³\¦h¤£¥²­nªº¹Bºâ¡CDP ªº¯S©Ê¡A¥i¦C¥X¦p¤U¡G

  1. ¦b­pºâ DP ªº¹Lµ{¤¤¡A¥²¶·¦b¨C­Ó¸`ÂI°O¿ý³Ì¨Î¸ô®|ªº¨Ó·½¡A¤~¯à¦b¨«¨ì²×ÂI«á¡A¸g¥Ñ¦^·¹¡]Back Tracking¡^¨Ó§ä¨ì¾ã­Ó¹Lµ{ªº³Ì¨Î¸ô®|¡C
  2. ­Y­n´M¨D²Ä¤G³Ì¨Î¸ô®|¡A«h¥²¶·¨Ï¥Î¸û½ÆÂøªº Top-N ­pºâ¤èªk¡A¦Ó¤£¯à¥u¾a¤W­z²³æªº DP ºâªk¡C

Data Clustering and Pattern Recognition (¸ê®Æ¤À¸s»P¼Ë¦¡¿ë»{)