Advertisement

Dynamic Programming and Optimal Control 第四章习题

阅读量:

3 Consider an inventory problem similar to the problem of Section 4.2 (zero fixed cost). The only difference is that at the beginning of each period k the decision maker, in addition to knowing the current inventory level x_k, receives an accurate forecast that the demand w_k will be selected in accordance with one out of two possible probability distributions P_{\ell},P_s (large demand, small demand). The a priori probability of a large demand forecast is known (d. Section 1.4).
(1) Obtain the optimal ordering policy for the case of a single-period problem.
(2) Extend the result to the N-period case.
(3) Extend the result to the case of any finite number of possible distributions.
Solution. (1) Assume the priori probability of P_{\ell},P_s is \,p_{\ell},p_s respectively.
J_1(x_1,y_1)=0 J_0(x_0,y_0)=p_s\cdot\min_{u_0}\left[cu_0+H(x_0+u_0-w_0)\right]+p_{\ell}\cdot\min_{u_0}\left[cu_0+H(x_0+u_0-w_0)\right] =cu_0+p\cdot p_s(x_0+u_0-w_0)-h\cdot p_{\ell}(x_0+u_0-w_0)\qquad\qquad =cu_0+\cdot (p\cdot p_s-h\cdot p_{\ell})(x_0+u_0-w_0)\qquad\qquad\;\;\;\;\;\qquad\qquad
?????????????

4.5 Consider the inventory problem of Section 4.2 for the case where the cost has the general form
\mathbf{E}\left\{\sum_{k=0}^Nr_k(x_k)\right\} The functions r_k are convex and differentiable and
\lim_{x\rightarrow -\infty}\frac{d r(x)}{dx}=-\infty,\qquad \lim_{x\rightarrow \infty}\frac{d r(x)}{dx}=\infty\qquad k=0,\cdots,N (1) Assume that the fixed cost is zero. Write the DP algorithm for this problem and show that the optimal ordering policy has the same form as the one derived in Section 4.2.
(2) Suppose there is a one-period time lag between the order and the delivery of inventory; that is, the system equation is of the form x_{k+1}=x_k+u_{k-1}-w_k,\qquad k=0,1,\cdots,N-1 where u_{-1} is given. Reformulate the problem so that it has the form of the problem of part (1).
Solution.

4.13 (A Gambling Problem) A gambler enters a game whereby he may at time k stake any amount u_l\ge 0 that does not exceed his current fortune x_k (defined to be his initial capital plus his gain or minus his loss thus far). He wins his stake back and as much more with probability p, 0<p<1, and he loses his stake with probability (1-p). Show that the gambling strategy that maximizes \mathbf{E}\{\ln x_N\}, where x_N denotes his fortune after N plays, is to stake at each time k an amount u_k=(2p-1)x_k.

Solution. The gambling system can be formulated as follows: x_{k+1}=x_k+w_ku_k where w_k is a random variable with probability p taking value 1, and probability (1-p) taking value -1. And the DP-algorithm can be written as: J_N(x_N)=\ln (x_N) J_k(x_k)=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left[J_{k+1}(x_k+w_ku_k)\right] For k=N-1, we have J_{N-1}(x_{N-1})=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\ln[x_{N-1}+w_{N-1}x_{N-1}]\right\}\qquad\qquad\qquad\qquad\qquad\qquad =p\cdot \ln[x_{N-1}+u_{N-1}]+(1-p)\ln(x_{N-1}-u_{N-1}) Calculate the derivative at u_{N-1} of the right hand side, we obtain \frac{p}{x_{N-1}+u_{N-1}}-\frac{1-p}{x_{N-1}-u_{N-1}}=\frac{(2p-1)x_{N-1}-u_{N-1}}{x_{N-1}^2-u_{N-1}^2}=0 yielding u_{N-1}^*=(2p-1)x_{N-1}andJ_{N-1}^*=p\ln(2p)+(1-p)\ln(2-2p)+\ln x_{N-1}=C_p+\ln(2-2p) Using the similar augement and by induction, we can prove that for any k=0,1,\cdots,N-1, u_k^*=(2p-1)x_k and J_k^*(x_k)=C_k+\ln x_k for some constant C_k. \qquad\qquad\qquad\qquad\qquad\qquad\Box

全部评论 (0)

还没有任何评论哟~