Dynamic Programming and Optimal Control 第四章习题

阅读量：

3 Consider an inventory problem similar to the problem of Section 4.2 (zero fixed cost). The only difference is that at the beginning of each period $k$ the decision maker, in addition to knowing the current inventory level $x_k$ , receives an accurate forecast that the demand $w_k$ will be selected in accordance with one out of two possible probability distributions $P_{\ell},P_s$ (large demand, small demand). The a priori probability of a large demand forecast is known (d. Section 1.4).
(1) Obtain the optimal ordering policy for the case of a single-period problem.
(2) Extend the result to the $N$ -period case.
(3) Extend the result to the case of any finite number of possible distributions.
Solution. (1) Assume the priori probability of $P_{\ell},P_s$ is $\,p_{\ell},p_s$ respectively.
$J_1(x_1,y_1)=0$ $J_0(x_0,y_0)=p_s\cdot\min_{u_0}\left[cu_0+H(x_0+u_0-w_0)\right]+p_{\ell}\cdot\min_{u_0}\left[cu_0+H(x_0+u_0-w_0)\right]$ $=cu_0+p\cdot p_s(x_0+u_0-w_0)-h\cdot p_{\ell}(x_0+u_0-w_0)\qquad\qquad$ $=cu_0+\cdot (p\cdot p_s-h\cdot p_{\ell})(x_0+u_0-w_0)\qquad\qquad\;\;\;\;\;\qquad\qquad$
$?????????????$

4.5 Consider the inventory problem of Section 4.2 for the case where the cost has the general form
$\mathbf{E}\left\{\sum_{k=0}^Nr_k(x_k)\right\}$ The functions $r_k$ are convex and differentiable and
$\lim_{x\rightarrow -\infty}\frac{d r(x)}{dx}=-\infty,\qquad \lim_{x\rightarrow \infty}\frac{d r(x)}{dx}=\infty\qquad k=0,\cdots,N$ (1) Assume that the fixed cost is zero. Write the DP algorithm for this problem and show that the optimal ordering policy has the same form as the one derived in Section 4.2.
(2) Suppose there is a one-period time lag between the order and the delivery of inventory; that is, the system equation is of the form $x_{k+1}=x_k+u_{k-1}-w_k,\qquad k=0,1,\cdots,N-1$ where $u_{-1}$ is given. Reformulate the problem so that it has the form of the problem of part (1).
Solution.

4.13 (A Gambling Problem) A gambler enters a game whereby he may at time $k$ stake any amount $u_l\ge 0$ that does not exceed his current fortune $x_k$ (defined to be his initial capital plus his gain or minus his loss thus far). He wins his stake back and as much more with probability $p$ , $0<p<1$ , and he loses his stake with probability $(1-p)$ . Show that the gambling strategy that maximizes $\mathbf{E}\{\ln x_N\}$ , where $x_N$ denotes his fortune after $N$ plays, is to stake at each time $k$ an amount $u_k=(2p-1)x_k$ .

Solution. The gambling system can be formulated as follows: $x_{k+1}=x_k+w_ku_k$ where $w_k$ is a random variable with probability $p$ taking value $1$ , and probability $(1-p)$ taking value $-1$ . And the DP-algorithm can be written as: $J_N(x_N)=\ln (x_N)$ $J_k(x_k)=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left[J_{k+1}(x_k+w_ku_k)\right]$ For $k=N-1$ , we have $J_{N-1}(x_{N-1})=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\ln[x_{N-1}+w_{N-1}x_{N-1}]\right\}\qquad\qquad\qquad\qquad\qquad\qquad$ $=p\cdot \ln[x_{N-1}+u_{N-1}]+(1-p)\ln(x_{N-1}-u_{N-1})$ Calculate the derivative at $u_{N-1}$ of the right hand side, we obtain $\frac{p}{x_{N-1}+u_{N-1}}-\frac{1-p}{x_{N-1}-u_{N-1}}=\frac{(2p-1)x_{N-1}-u_{N-1}}{x_{N-1}^2-u_{N-1}^2}=0$ yielding $u_{N-1}^*=(2p-1)x_{N-1}$ and $J_{N-1}^*=p\ln(2p)+(1-p)\ln(2-2p)+\ln x_{N-1}=C_p+\ln(2-2p)$ Using the similar augement and by induction, we can prove that for any $k=0,1,\cdots,N-1$ , $u_k^*=(2p-1)x_k$ and $J_k^*(x_k)=C_k+\ln x_k$ for some constant $C_k$ . $\qquad\qquad\qquad\qquad\qquad\qquad\Box$

全部评论 (0)

还没有任何评论哟~

Dynamic Programming and Optimal Control 第四章习题

3ConsideraninventoryproblemsimilartotheproblemofSection4.2zerofixedcost.Theonlydifferenceisthatatthe...

Dynamic Programming and Optimal Control 第一章习题

1Considerthesystemxk+1=xk+uk+wk,\qquadk=0,1,2,3withinitialstatex0=5,andthecostfunction\sumk=0^3xk^2+...

Dynamic Programming and Optimal Control 第三章习题

1SolvetheproblemofExample3.2.1forthecasewherethecostfunctionisxT^2+\int0^Tut^2dtAlso,calculatethecos...

四、Dynamic-programming algorithm Dynamic--LCS

学习笔记，无什参考价值！ 1问题 2算法 2.1BruteforceLCSalgorithm 检查每一个subsequence是否是y的子列时，遍历y的每一个元素，看是否依次可以全部覆盖subsequ...

Speeding Up Dynamic Programming Computation: Tips and

作者：禅与计算机程序设计艺术 1.简介动态规划Dynamicprogramming是一种解决最优化问题的关键算法。它通过将子问题的解重复计算而节省时间。对于多种问题都可以用动态规划求解。动态规划算法...

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day1

HistoryofRL backgrand DynamicProgramming TrialandErrorLearning ExamplesofRLApplications TicTacToe Ch...

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day3

ClassificationofRLMethods preface DefinitionofRLProblems Bellman'sPrincipleofOptimality TwoKindsofBe...

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day2

FourElementsofRLProblems preface EnvironmentModel StateActionSample Policy TabularPolicy Parameteriz...

Dynamic Programming and State Compression (LeetCode #2209 & LeetCode #2212)

LeetCode2209 Youaregivena0indexedbinarystringfloor,whichrepresentsthecolorsoftilesonafloor: floor[i]...

刷题计划——动态规划dynamic programming(四)

322\.零钱兑换中等题目：给定不同面额的硬币coins和一个总金额amount。编写一个函数来计算可以凑成总金额所需的最少的硬币个数。如果没有任何一种硬币组合能组成总金额，返回1。

是否确定退出登录?

Dynamic Programming and Optimal Control 第四章习题

全部评论 (0)

相关文章推荐

Dynamic Programming and Optimal Control 第四章习题

Dynamic Programming and Optimal Control 第一章习题

Dynamic Programming and Optimal Control 第三章习题

四、Dynamic-programming algorithm Dynamic--LCS

Speeding Up Dynamic Programming Computation: Tips and

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day1

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day3

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day2

Dynamic Programming and State Compression (LeetCode #2209 & LeetCode #2212)

刷题计划——动态规划dynamic programming(四)