[english][all] (Ъ`NG媩åH^媩PBsI)
Slides
uʺAWv]Dynamic ProgrammingA² DP^O@ӫܦĪkӨDo@ӰD̨θѡADP 믫OӦ۩ Richard Bellman ҴX Principle of OptimalityG
An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.²aANOb@̨θ|WA䤤@l|]Partial Path^]OlD̨θ|A_h|NǪθ|CoO@ӫܩ㪺DzAiHܪıaϥΥ٬ުkӥ[HҩAiObΤWAiH\hܧΡC
To employ the procedure of DP to solve a problem, usually we need to specify the following items:
- Define the optimum-value function.
- Derive the recursive formula for the optimum-value function, together with its initial condition.
- Specify the answer of the problem in term of the optimum-value function.
|ҨӻAHUCϧάҡG
]G
- C@s]Link^N@AsWƦrhNqLһݪɶCڭ̨ϥ p(a, b) ӥNqLs Node a M Node b һݪɶC
- C@Ӹ`I]Node^N@ӫA`IWƦrhNqL{һݪɶCڭ̨ϥ q(a) ӥNqL Node a һݪɶC
]_IO StartAIO Node 7Aаݧڭ̦p@|Aϱoq_IIҪ᪺ɶ̵uHoO@ӫܨ嫬 DP DAڭ̥iHgѤUCTӨBJӨtaѨMoӰDG
The above formulation of DP is usually referred to as the forward DP. On the other hand, we can define the formula for backward DP in a similar manner:
- First of all, we can define the optimum-value function t(h) as the minimum time from the start point to node h (including the time for passing node h).
- Secondly, the optimum-value function should satisfy the following recursive formula:
t(h) = min{t(a)+p(a,h), t(b)+p(b,h), t(c)+p(c,h)} + q(h) bWz{Aڭ̰]s Node h `ITӡAOO a, b, cCܷNϦpUG
And the initial condition is t(0)=0 where 0 is the start node. - And finally, the answer to the original problem should be t(7). By using the recursion, we can have the following steps for computing the time required by the optimum path:
The value of the optimum-value function is shown as the red number in each node in the following figure:
- t(0) = 0
- t(1) = t(0)+4+5 = 9
- t(2) = t(0)+2+4 = 6
- t(3) = t(0)+3+1 = 4
- t(4) = min(9+3, 6+2)+2 = 10
- t(5) = min(6+5, 4+1)+4 = 9
- t(6) = min(6+5, 4+8)+6 = 17
- t(7) = min(10+1, 9+3, 17+3)+1 = 12
- Firstly, the optimum-value function s(h) is defined as the minimum time from a node h to the end node (including the time for passing the end node but excluding the time for passing node h.)
- Secondly, the optimum-value function should satisfy the following recursive formula:
s(h) = q(h) + min{p(h,a)+s(a), p(h,b)+s(b), p(h,c)+s(c)} where a, b, c are the fan-out of node h. The initial condition is s(7) = q(7).- Finally, the answer to the original problem is s(0). By using the recursion, we can have the following steps for computing the time required by the optimum path:
The answer obtained from backward DP is the same as that of the forward DP.
- s(7) = 1
- s(6) = q(6)+3+s(7) = 6+3+1 = 10
- s(5) = q(5)+3+s(7) = 4+3+1 = 8
- s(4) = q(4)+1+s(7) = 2+1+1 = 4
- s(3) = q(3)+min{1+s(5), 8+s(6)} = 1 + min{9, 18} = 10
- s(2) = q(2)+min{2+s(4), 5+s(5), 5+s(6)} = 4 + min{6, 13, 15} = 10
- s(1) = q(1)+3+s(4) = 5 + 3 + 4 = 12
- s(0) = min{4+s(1), 2+s(2), 3+s(3)} = min(16, 12, 13} = 12
yܻAbڭ̭p DP L{AiHЧQΥeҭpXӪGAӬٱ\hnBCDP SʡAiCXpUG
- bp DP L{AbCӸ`IǪθ|ӷA~bIAgѦ^]Back Tracking^ӧӹL{̨θ|C
- YnMDĤG̨θ|Ahϥθ Top-N pkAӤuaWz²檺 DP kC
Data Clustering and Pattern Recognition (ƤsP˦{)