Advertisement

Dynamic Programming and Optimal Control 第一章习题

阅读量:

1 Consider the systemx_{k+1}=x_k+u_k+w_k,\qquad k=0,1,2,3with initial state x_0=5, and the cost function \sum_{k=0}^3(x_k^2+u_k^2)Apply the DP algorithm for the following three cases:
(a)The control constraint set U_k(x_k) is \{u|0\leq x_k+u\leq 5, u \text{: integer}\} for all x_k and k, and the disturbance w_k is equal to zero for all k.
(b) The control constraint and the disturbance w_k are as in (a), but there is in addition a constraint x_4=5 on the final state.
(c)The control constraint is as in part (a) and the disturbance w_k takes the values -1 and 1 with equal probability 1/2 for all x_k and u_k, except if x_k+u_k is equal to 0 or 5, in which case w_k=0 with probability 1.
Solution.
(a)N=4, we assume that J_4(x_4)=g_4(x_4)=0. As the condition, the system equation takes the form f_{k+1}=x_k+u_k+w_k,\qquad k=0,1,2,3and the cost function is g_k=x_k^2+u_k^2In this case, w_k=0 for all k, so J_3(x_3)=\mathop{\min}\limits_{u_3\in U_3(x_3)} \mathop\mathbf{E}\limits_{w_3}\{g_3(x_3,u_3,w_3)+J_4(f_3(x_3,u_3,w_3))\}=\mathop{\min}\limits_{u_3=-x_3,1-x_3,...,5-x_3}(x_3^2+u_3^2)=x_3^2\qquad\qquad\qquadi.e. \qquad J_3(x_3)=x_3^2, \mu_3^*(x_3)=0.\qquad\qquad\qquad\qquad\qquad\qquad (1)
Then J_2(x_2)=\mathop{\min}\limits_{u_2=-x_2,1-x_2,...,5-x_2}(x_2^2+u_2^2+(x_2+u_2)^2)Let h_{x_2}(u_2)=x_2^2+u_2^2+(x_2+u_2)^2, we have h'_{x_2}(u_2)=4u_2+2x_2.

If x_2>10, h_{x_2}(u_2)=4(u_2+x_2)-2x_2 < 4(u_2+x_2)-20 < 0, then \mu_x^*(x_2)=5-x_2\;,\;J_x(x_2)=x_2^2+(5-x_2)^2+5^2=2x_2^2-10x_2+50

If x_2<0, h_{x_2}(u_2)=4(u_2+x_2)-2x_2>-2x_2>0, then \mu_x^2(x_2)=-x_2\;,\;J_2(x_2)=2x_2^2

If 0\leq x_2\leq 10 and x_2 is even, then \mu_2^*(x_2)=-\frac{x_2}{2} and J_2(x_2) = \frac{3}{2}x_2^2.

If 0\leq x_2\leq 10 and x_2 is odd, then \mu_2^*(x_2)=-\frac{x_2-1}{2}\;\text{or}\;-\frac{x_2+1}{2} and J_2(x_2) = \frac{3}{2}x_2^2+\frac{1}{2}.

So
\begin{cases} \mu_x^2(x_2)=-x_2\;,\;J_2(x_2)=2x_2^2 & \qquad\text{if }\; x_2<0 \\ \mu_2^*(x_2)=-\frac{x_2}{2}\;,\;J_2(x_2) = \frac{3}{2}x_2^2 & \qquad\text{if }\;0\leq x_2\leq 10 \;\text{and} \;x_2\; \text{is even}\\ \mu_2^*(x_2)=-\frac{x_2-1}{2}\;\text{or}\;-\frac{x_2+1}{2}\;,\;J_2(x_2) = \frac{3}{2}x_2^2+\frac{1}{2} & \qquad\text{if }\;0\leq x_2\leq 10 \;\text{and} \;x_2\; \text{is odd}\\ \mu_x^*(x_2)=5-x_2\;,\;J_x(x_2)=x_2^2+(5-x_2)^2+5^2=2x_2^2-10x_2+50 & \qquad\text{if }\; x_2>10 \end{cases}

In case k=1
J_1(x_1)=\mathop{\min}\limits_{u_1=-x_1,1-x_1,...,5-x_1}(x_1^2+u_1^2+J_2(x_1+u_1))\qquad\qquad\qquad\qquad\qquad\;\;\;=\mathop{\min}\limits_{u_1=-x_1,1-x_1,...,5-x_1}\big(x_1^2+u_1^2+\frac{3}{2}(x_1+u_1)^2+\frac{1}{2}\cdot\mathbf{1}_{x_1+u_1\;\text{is odd}}\big) In this case, we give an caculation in a direct way:

Let h_{x_1}(u_1)=x_1^2+u_1^2+\frac{3}{2}(x_1+u_1)^2+\frac{1}{2}\cdot\mathbf{1}_{x_1+u_1\;\text{is odd}}
h_{x_1}(-x_1)=2x_1^2h_{x_1}(1-x_1)=2x_1^2-2x_1+3h_{x_1}(2-x_1)=2x_1^2-4x_1+10h_{x_1}(3-x_1)=2x_1^2-6x_1+23h_{x_1}(4-x_1)=2x_1^2-8x_1+40h_{x_1}(5-x_1)=2x_1^2-10x_1+63Then we can get the following result easily
\begin{cases} \mu_1^*(x_1)=-x_1\;,\;J_1(x_1)=2x_1^2 & \qquad\text{if }\; x_1\leq\frac{3}{2} \\ \mu_1^*(x_1)=1-x_1\;,\;J_1(x_1)=2x_1^2-2x_1+3 & \qquad\text{if }\; \frac{3}{2}\frac{23}{2} \\ \end{cases}

For case k=0\,,\; x_0=5,J_0(x_0)=\mathop{\min}\limits_{u_0=-x_0,1-x_0,...,5-x_0}(x_0^2+u_0^2+J_1(x_0+u_0))=\mathop{\min}\limits_{u_0=-5,-4,...,0}(25+u_0^2+J_1(5+u_0))Leth_0(u_0)=25+u_0^2+J_1(5+u_0)we have
h_0(-5)=50,\; h_0(-4)=43,\;h_0(-3)=41,\;h_0(-2)=44,\;h_0(-1)=50,\;h_0(0)=65So \mu_0^*(x_0)=-3 and J_0(x_0)=41.

(b) In this case ,we still asume that J_4(x_4)=g_4(x_4)=0. By the additional condition, U_3(x_3)=\{5-x_3\}, and
J_3(x_3)=\mathop{\min}\limits_{u_3\in U_3(x_3)}(x_3^2+u_3^2)=x_3^2+(5-x_3)^2ThenJ_2(x_2)=\mathop{\min}\limits_{u_2=-x_2,1-x_2,...,5-x_2}(x_2^2+u_2^2+(5-x_2-u_2)^2)Let h_{x_2}(u_2)=x_2^2+u_2^2+(5-x_2-u_2)^2, take some easy caculation
h_{x_2}(-x_2)=2x_2^2+25h_{x_2}(1-x_2)=2x_2^2-2x_2+18h_{x_2}(2-x_2)=2x_2^2-4x_1+17h_{x_2}(3-x_2)=2x_2^2-6x_2+22h_{x_2}(4-x_2)=2x_2^2-8x_2+33h_{x_2}(5-x_2)=2x_2^2-10x_2+50and we get
\begin{cases} \mu_2^*(x_2)=-x_2\;,\;J_2(x_2)=2x_2^2+25 & \qquad\text{if }\; x_1\leq-\frac{7}{2} \\ \mu_2^*(x_2)=1-x_2\;,\;J_2(x_2)=2x_2^2-2x_2+18 & \qquad\text{if }\;- \frac{7}{2}\frac{17}{2} \\ \end{cases}

Similarly, for k=1 we have
\begin{cases} \mu_1^*(x_1)=-x_1\;,\;J_1(x_1)=2x_1^2 +17& \qquad\text{if }\; x_1\leq-\frac{1}{2} \\ \mu_1^*(x_1)=1-x_1\;,\;J_1(x_1)=2x_1^2-2x_1+16 & \qquad\text{if }\; -\frac{1}{2}\frac{21}{2} \\ \end{cases}

For case k=0\,,\; x_0=5
h_0(-5)=66,\; h_0(-4)=57,\;h_0(-3)=54,\;h_0(-2)=56,\;h_0(-1)=63,\;h_0(0)=76So \mu_0^*(x_0)=-3 and J_0(x_0)=54.

(c)In this case, w_k is not always be 0 and satisfies certern distributuion. J_3(x_3)=\mathop{\min}\limits_{u_3\in U_3(x_3)} \mathop\mathbf{E}\limits_{w_3}\{g_3(x_3,u_3,w_3)+J_4(f_3(x_3,u_3,w_3))\}=\mathop{\min}\limits_{u_3\in U_3(x_3)} \mathop\mathbf{E}\limits_{w_3}(x_3^2+u_3^2)which is not affected by the distritution of w_k, so the result is the same as case (a):

\begin{cases} \mu_3^*(x_3)=-x_3\;,\;J_3(x_3)=2x_3^2 & \qquad\text{if }\; x_3\leq\frac{1}{2} \\ \mu_3^*(x_3)=1-x_3\;,\;J_3(x_3)=2x_3^2-2x_3+1 & \qquad\text{if }\; \frac{1}{2}\frac{9}{2} \\ \end{cases}

Especially, we have J_3(0)=0, \,J_3(1)=1,\,J_3(2)=4,\,J_3(3)=9,\,J_3(4)=16,\,J_3(5)=25, which will be used in the case k=2. Now J_2(x_2)=\mathop{\min}\limits_{u_2\in U_2(x_2)} \mathop\mathbf{E}\limits_{w_2}(x_2^2+u_2^2+J_3(x_2+u_2+w_2))We calculate the expectation of the right side for each of the possible values of u_2:
u_2=-x_2\;:\;\mathbf{E}\{\cdot\}=x_2^2+(-x_2)^2+J_3(0)=2x_2^2u_2=1-x_2\;:\;\mathbf{E}\{\cdot\}=x_2^2+(1-x_2)^2+\frac{1}{2}[J_3(0)+J_3(2)]=2x_2^2-2x_2+3u_2=2-x_2\;:\;\mathbf{E}\{\cdot\}=x_2^2+(2-x_2)^2+\frac{1}{2}[J_3(1)+J_3(3)]=2x_2^2-4x_2+9u_2=3-x_2\;:\;\mathbf{E}\{\cdot\}=x_2^2+(3-x_2)^2+\frac{1}{2}[J_3(2)+J_3(4)]=2x_2^2-6x_2+19u_2=4-x_2\;:\;\mathbf{E}\{\cdot\}=x_2^2+(4-x_2)^2+\frac{1}{2}[J_3(3)+J_3(5)]=2x_2^2-8x_2+33u_2=5-x_2\;:\;\mathbf{E}\{\cdot\}=x_2^2+(5-x_2)^2+J_3(5)=2x_2^2-10x_2+50Hence we have

\begin{cases} \mu_2^*(x_2)=-x_2\;,\;J_2(x_2)=2x_2^2 & \qquad\text{if }\; x_2\leq\frac{3}{2} \\ \mu_2^*(x_2)=1-x_2\;,\;J_2(x_2)=2x_2^2-2x_2+3 & \qquad\text{if }\; \frac{3}{2}\frac{17}{2} \\ \end{cases}

Especially, J_2(0)=0,\,J_2(1)=2,\,J_2(2)=7,\,J_2(3)=15,\,J_2(4)=25,\,J_2(5)=39.

For case k=1J_1(x_1)=\mathop{\min}\limits_{u_1\in U_1(x_1)} \mathop\mathbf{E}\limits_{w_1}(x_1^2+u_1^2+J_2(x_1+u_1+w_1))We calculate the expectation of the right side for each of the possible values of u_1:
u_1=-x_1\;:\;\mathbf{E}\{\cdot\}=x_1^2+(-x_1)^2+J_2(0)=2x_1^2u_2=1-x_1\;:\;\mathbf{E}\{\cdot\}=x_1^2+(1-x_1)^2+\frac{1}{2}[J_2(0)+J_2(2)]=2x_1^2-2x_1+\frac{9}{2}u_1=2-x_1\;:\;\mathbf{E}\{\cdot\}=x_1^2+(2-x_1)^2+\frac{1}{2}[J_2(1)+J_2(3)]=2x_1^2-4x_1+\frac{25}{2}u_1=3-x_1\;:\;\mathbf{E}\{\cdot\}=x_1^2+(3-x_1)^2+\frac{1}{2}[J_2(2)+J_2(4)]=2x_1^2-6x_1+25u_1=4-x_1\;:\;\mathbf{E}\{\cdot\}=x_1^2+(4-x_1)^2+\frac{1}{2}[J_2(3)+J_2(5)]=2x_1^2-8x_1+43u_1=5-x_1\;:\;\mathbf{E}\{\cdot\}=x_1^2+(5-x_1)^2+J_2(5)=2x_1^2-10x_1+64Hence we have

\begin{cases} \mu_1^*(x_1)=-x_1\;,\;J_1(x_1)=2x_1^2 & \qquad\text{if }\; x_1\leq\frac{9}{4} \\ \mu_1^*(x_1)=1-x_1\;,\;J_1(x_1)=2x_1^2-2x_1+\frac{9}{2} & \qquad\text{if }\; \frac{9}{4}\frac{21}{2} \\ \end{cases}

Especially, J_1(0)=0,\,J_1(1)=2,\,J_1(2)=8,\,J_1(3)=\frac{15}{2},\,J_1(4)=\frac{57}{2},\,J_1(5)=\frac{85}{2}.

The last step, the algorithm takes the form
J_0(x_0)=\mathop{\min}\limits_{u_0\in U_0(x_0)} \mathop\mathbf{E}\limits_{w_0}(x_0^2+u_0^2+J_1(x_0+u_0+w_0))=\mathop{\min}\limits_{u_0\in U_0(x_0)} \mathop\mathbf{E}\limits_{w_0}\big(25+u_0^2+J_1(5+u_0+w_0)\big)u_0=-5\;:\;\mathbf{E}\{\cdot\}=25+25+J_1(0)=50u_0=-4\;:\;\mathbf{E}\{\cdot\}=25+16+\frac{1}{2}[J_1(0)+J_1(2)]=45u_0=-3\;:\;\mathbf{E}\{\cdot\}=25+9+\frac{1}{2}[J_1(1)+J_1(3)]=38\frac{3}{4}u_0=-2\;:\;\mathbf{E}\{\cdot\}=25+4+\frac{1}{2}[J_1(2)+J_1(4)]=47\frac{1}{4}u_1=-1\;:\;\mathbf{E}\{\cdot\}=25+1+\frac{1}{2}[J_1(3)+J_1(5)]=51u_0=0\;:\;\mathbf{E}\{\cdot\}=25+0+J_1(5)=67\frac{1}{2}So \mu_0^*(x_0)=-3,J_0(x_0)=38\frac{3}{4}.\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\Box
PS: In the book, a hint is given : Alternatively, you may use a terminal cost g_4(x_4) equal to a very large number for x_4\neq 5. Why a very large number ? My caculation is wrong?

1.2 Carry out the calculations needed to verify that J_0(1)=2.67 and J_0(2)=2.608 in Example 1.3.2.
Solution. The result in the exercise is wrong, even not the same as that in the text.
For x_0=1, we have J_0(1)=\min_{u_0=0,1} \mathop{\bf E}\limits_{w_0=0,1,2}\{u_0+(1+u_0-w_0)^2+J_1(\max(0,1+u_0-w_0)))\} u_0=0: \;\mathbf{E}(.)=\mathop{\mathbf{E}}\limits_{w_0=0,1,2}\{(1-w_0)^2+J_1(\max(0,1-w_0))\}
\qquad\qquad\qquad=0.1\times[1+J_1(1)]+0.7\times J_1(0)+0.2\times[1+J_1(0)]
\qquad\qquad\qquad=0.1\times2.5+0.7\times 2.5 + 0.2\times 3.5
\qquad\qquad\qquad=2.7
u_0=1: \;\mathbf{E}(.)=\mathop{\mathbf{E}}\limits_{w_0=0,1,2}\{1+(2-w_0)^2+J_1(\max(0,2-w_0))\}
\qquad\qquad\qquad=0.1\times[5+J_1(2)]+0.7\times [2+J_1(1)]+0.2\times[1+J_1(0)]
\qquad\qquad\qquad=0.1\times6.68+0.7\times3.5 + 0.2\times 3.5
\qquad\qquad\qquad=3.818
So J_0(1)=2.7, \mu_0^*(1)=0.

For x_0=2, we have J_0(2)=\min_{u_0=0} \mathop{\bf E}\limits_{w_0=0,1,2}\{u_0+(2+u_0-w_0)^2+J_1(\max(0,2+u_0-w_0)))\} \qquad\qquad\qquad\qquad=\mathop{\mathbf{E}}\limits_{w_0=0,1,2}\{(2-w_0)^2+J_1(\max(0,2-w_0))\}
\qquad\qquad\qquad\qquad=0.1\times[4+J_1(2)]+0.7\times[1+J_1(1)]+0.2\times J_1(0)
\qquad\qquad\qquad\qquad=0.1\times5.68+0.7\times2.5+0.2\times 2.5
\qquad\qquad\qquad\qquad=2.818
So J_0(2)=2.818,\mu_0^*(2)=0.\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\Box

1.6 (Discouted Cost per State) In the framework of the basic problem, consider the case where the cost is of the form \mathop\mathbf{E}\limits_{\mathop{w_k}\limits_{k=0,1,...,N-1}}\left\{\alpha^Ng_N(x_N)+\sum_{k=0}^{N-1}\alpha^kg_k(x_k,u_k,w_k)\right\}where \alpha is a discount factor with 0<\alpha<1. Show that an alternate form the DP algorithm is given by V_N(x_N)=g_N(x_N)V_k(x_k)=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\left\{g_k(x_k,u_k,w_k)+\alpha V_{k+1}\big(f_k(x_k,u_k,w_k)\big)\right\}.Solution. Using similar argument to Proposition 1.3.1, in the dicounted cost case, the minimizing cost takes the form
V^*(x_0)=\mathop{\min}\limits_{\pi\in\Pi}\mathop\mathbf{E}\limits_{\mathop{w_k}\limits_{k=0,1,...,N-1}}\left\{\alpha^Ng_N(x_N)+\sum_{k=0}^{N-1}\alpha^kg_k(x_k,u_k,w_k)\right\} Let V_k^*(x_k) be the optimal cost for the (N-k)-stage problem that starts at state x_k and time k, and ends at time N.
V_k^*(x_k)=\mathop{\min}\limits_{\pi^k}\mathop\mathbf{E}\limits_{w_k,...,w_{N-1}}\left\{\alpha^{N-k}g_N(x_N)+\sum_{i=k}^{N-1}\alpha^{i-k}g_i(x_i,u_i,w_i)\right\} Claim: \; V_k^*(x_k)=V_k(x_k).
We prove this cliam by induction on k. Firstly, V_N^*(x_N)=g_N(x_N)=V_N(x_N). Suppose the claim is true for any k+1, ..., N, then
V_k^*(x_k)=\mathop{\min}\limits_{\pi^k}\mathop\mathbf{E}\limits_{w_k,...,w_{N-1}}\left\{g_k(x_k,u_k,w_k)+\alpha^{N-k}g_N(x_N)+\sum_{i=k+1}^{N-1}\alpha^{i-k}g_i(x_i,u_i,w_i)\right\}\qquad\qquad\quad=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\{g_k(x_k,u_k,w_k\}+\alpha\cdot\mathop{\min}\limits_{\pi^k}\mathop\mathbf{E}\limits_{w_k,...,w_{N-1}}\left\{\alpha^{N-k-1}g_N(x_N)+\sum_{i=k+1}^{N-1}\alpha^{i-k-1}g_i(x_i,u_i,w_i)\right\}=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\{g_k(x_k,u_k,w_k\}+\alpha\cdot\mathop{\min}\limits_{\pi^{k+1}}\mathop\mathbf{E}\limits_{w_{k+1},...,w_{N-1}}\left\{\alpha^{N-k-1}g_N(x_N)+\sum_{i=k+1}^{N-1}\alpha^{i-k-1}g_i(x_i,u_i,w_i)\right\}=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\{g_k(x_k,u_k,w_k\}+\alpha V_{k+1}^*(x_{k+1})\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\{g_k(x_k,u_k,w_k\}+\alpha V_{k+1}(x_{k+1})\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\{g_k(x_k,u_k,w_k\}+\alpha \cdot\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}V_{k+1}(f_k(x_k,u_k,w_k))\qquad\qquad\qquad\qquad\qquad\qquad\;=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\left\{g_k(x_k,u_k,w_k)+\alpha V_{k+1}\big(f_k(x_k,u_k,w_k)\big)\right\}=V_k(x_k)\qquad\qquad\qquad\qquad\qquad\qquad\Box

1.7 (Exponential Cost Function) In the framework of the basic problem, consider the case where the cost is of the form \mathop\mathbf{E}\limits_{\mathop{w_k}\limits_{k=0,1,...,N-1}}\left\{\exp\left(g_N(x_N)+\sum_{k=0}^{N-1}g_k(x_k,u_k,w_k)\right)\right\}(a)Show that the optimal cost and an optimal policy can be obtained from the DP-like algorithm J_N(x_N)=\exp\left(g_N(x_N)\right)J_k(x_k)=\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop\mathbf{E}\limits_{w_k}\left\{J_{k+1}\left(f_k(x_k,u_k,w_k)\right)\exp\left(g_k(x_k,u_k,w_k)\right)\right\} (b)Define the functions V_k(x_k)=\ln J_k(x_k). Assume also that g_k is a function of x_k and u_k only (and not of w_k). Show that the above algorithm can be rewritten as V_N(x_N)=g_N(x_N)V_k(x_k)=\mathop{\min}\limits_{u_k\in U_k(x_k)}\left\{g_k(x_k,u_k)+\ln \mathop\mathbf{E}\limits_{w_k}\left\{\exp\left(V_{k+1}\left(f_k(x_k,u_k,w_k)\right)\right)\right\}\right\}
Solution. (a) Define J_k^*(x_k)=\mathop{\min}\limits_{\pi^k}\mathop\mathbf{E}\limits_{\mathop{w_i}\limits_{i=k,k+1,...,N-1}}\left\{\exp\left(g_N(x_N)+\sum_{i=k}^{N-1}g_i(x_i,u_i,w_i)\right)\right\} It is easy to check that J_k^*(x_k)=J_k(x_k).
J_k^*(x_k)=\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\exp\left(g_k(x_k,u_k,w_k)\right)\right\}\mathop{\min}\limits_{\pi^k}\mathop\mathbf{E}\limits_{\mathop{w_i}\limits_{i=k,k+1,...,N-1}}\left\{\exp\left(g_N(x_N)+\sum_{i=k+1}^{N-1}g_i(x_i,u_i,w_i)\right)\right\} =\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\exp\left(g_k(x_k,u_k,w_k)\right)\right\}\mathop{\min}\limits_{\mathop{\pi^{k+1}}\limits_{x_{k+1}=f_k(x_k,u_k,w_k)}}\mathop\mathbf{E}\limits_{\mathop{w_i}\limits_{i=k+1,....,N-1}}\left\{\exp\left(g_N(x_N)+\sum_{i=k+1}^{N-1}g_i(x_i,u_i,w_i)\right)\right\} =\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\exp\left(g_k(x_k,u_k,w_k)\right)\right\}\cdot \mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}J_{k+1}^*(f_k(x_k,u_k,w_k)))\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad =\mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}\left\{\exp\left(g_k(x_k,u_k,w_k)\right)\right\}\cdot \mathop{\min}\limits_{u_k}\mathop\mathbf{E}\limits_{w_k}J_{k+1}(f_k(x_k,u_k,w_k)))=J_k(x_k)\qquad\qquad\qquad\qquad\qquad (b) Since g_k is a function not of w_k, we can rewrite J_k(x_k) as follows: J_k(x_k)=\mathop{\min}\limits_{u_k\in U_k(x_k)}\left\{\exp\left(g_k(x_k,u_k,w_k)\right)\mathop\mathbf{E}\limits_{w_k}\left\{J_{k+1}\left(f_k(x_k,u_k,w_k)\right)\right\}\right\} So V_k(x_k)=\ln J_k(x_k)\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad=\mathop{\min}\limits_{u_k\in U_k(x_k)}\left\{g_k(x_k,u_k,w_k)+\ln\mathop\mathbf{E}\limits_{w_k}\left\{J_{k+1}\left(f_k(x_k,u_k,w_k)\right)\right\}\right\}\qquad\qquad\qquad=\mathop{\min}\limits_{u_k\in U_k(x_k)}\left\{g_k(x_k,u_k,w_k)+\ln\mathop\mathbf{E}\limits_{w_k}\left\{\exp \left(V_{k+1}\left(f_k(x_k,u_k,w_k)\right)\right)\right\}\right\}\qquad\;\Box

1.8 (Terminating Process) In the framework of the basic problem, consider the case where the system evolution terminates at time i when a given value \bar{w_i} of the disurbance at time i occurs, or when a termination decesion u_i is made by the controller. If termination occurs at time i, the resulting cost is T+\sum_{k=0}^ig_k(x_k,u_k,w_k)where T is a termination cost. If the process has not terminated up to the final time N, the resulting cost is g_N(x_N)+\sum_{k=0}^{N-1}g_k(x_k,u_k,w_k). Reformulate the problem into the framework of the basic problem.

1.9 (Multiplicative Cost) In the framework of the basic problem, consider the case where the cost has the multiplicative form \mathop\mathbf{E}\limits_{\mathop{w_k}\limits_{k=0,1,...,N-1}}\left\{g_N(x_N)\cdot g_{N-1}(x_{N-1}, u_{N-1}, w_{N-1})\ldots g_0(x_0,u_0,w_0)\right\} Develop a DP-like algorithm for this problem assuming that g_k(x_k,u_k,w_k)\ge0 for all x_k, u_k, w_k and k.
Solution. The DP-like algorithm for this problem is as follows:
J_N(x_N)=g_N(x_N) J_k(x_k)=\mathop\mathbf{E}\limits_{w_k}\left\{g_k(x_k,u_k,w_k)J_{k+1}\left(f_k(x_k,u_k,w_k)\right)\right\} The proof is standard as above, we may define
J_k^*(x_k)=\mathop\mathbf{E}\limits_{\mathop{w_k}\limits_{i=k,...,N-1}} \left\{g_N(x_N)\ldots g_i(x_i,u_i,w_i)\ldots g_k(x_k,u_k,w_k) \right\} and prove J_k^*(x_k)=J_k(x_k). \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\Box

1.10 Assume that we have a vessel whose maximum weight capacity is z and whose cargo is to consist of different quantities of N different items. Let v_i denote the value of the i-th type of item, w_i the weight of i-th type of item, and x_i the number of items of type i that are loaded in the vessel. The problem is to find the most valuable cargo, i.e., to maximize \sum_{i=1}^Nx_iv_i subject to the constraints \sum_{i=1}^Nx_iw_i\leq z and x_i=0,1,2,\ldots Formulate this problem in terms of DP.

Solution. We formulate this problem in terms of DP as follows:
\qquada state x'_k is defined by x'_k=\mathop{\sum}\limits_{i=0}^{k-1}x_iw_i
\qquadan action u'_k is taken on the space U_k(x'_k)=\left\{0,1,\ldots,\left[\frac{z-x'_k}{w_k}\right]\right\}
\qquada random parameter w'_k is always be 0
\qquadstate transform function is defined as x'_{k+1}=f_k(x'_k,u'_k,w'_k)=x'_k+u'_kw_k
\qquadcost function is defined as g_k(x'_k,u'_k,w'_k)=-u'_kv_k
So the problem is reformulated to get an optimal solution to the minimization problem\mathop{\min}\limits_{\pi=\{u'_1,\ldots,u'_N\}}g_k(x'_k,u'_k,w'_k)the problem can be solved by the following DP-algorithm:
J_{N+1}(x'_{N+1})=0\qquad\qquadJ_k(x'_k)=\mathop{\min}\limits_{u'_k\in U_k(x'_k)}\{-u'_kv_k+J_{k+1}(f_k(x'_k,u'_k,w'_k))\}\;,\qquad k=1,2,\ldots,N\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\Box

1.11 Consider a device consisting of N stages connected in series, where each stage consists of a particular component. The components are subject to failure, and to increase the reliablity of the device duplicate components are provided. For j=1,2,\ldots,N, let (1+m_j) be the number of components for the j-th stage, let p_j(m_j) be the probability of success operation of the j-th stage when (1+m_j) components are used, and let c_j denote the cost of a single component at the j-th stage. Formulate in terms of DP the problem of finding the number of components at each stage that maximize the realibility of the device expressed by p_1(m_1)\cdot p_2(m_2)\ldots p_N(m_N) subject to the cost constraint \sum_{j=1}^N c_jm_j\leq A, where A>a is given.

1.12 (Minimization over a Subset of Policies) This problem is primarily of theoretical interest. Consider a variant of the basic problem whereby we seek \mathop{\min}\limits_{\pi\in\tilde{\Pi}}J_k(x_0) where \tilde{\Pi} is some given subset of the set of sequences \{\mu_0,\mu_1,\ldots, \mu_{N-1}\} of functions \mu_k:S_k\rightarrow C_k with \mu_k(x_k)\in U_k(x_k) for all x_k\in S_k. Assume that \pi^*=\{\mu_0^*,\mu_1^*,\ldots, \mu_{N-1}^*\} belongs to \tilde{\Pi} and attains the minimum in the DP algorithm; that is, for all k=0,1,\ldots,N-1 and x_k\in S_k, J_k(x_k)=\mathop{\mathbf{E}}\limits_{w_k}\left\{g_k(x_k,\mu_k^*(x_k),w_k)+J_{k+1}\left(f_k(x_k,\mu_k^*(x_k),w_k)\right)\right\}\mathop{\min}\limits_{u_k\in U_k(x_k)}\mathop{\mathbf{E}}\limits_{w_k}\left\{g_k(x_k,u_k,w_k)+J_{k+1}\left(f_k(x_k,u_k,w_k)\right)\right\} with J_N(x_N)=g_N(x_N). Assume further that the function J_k are real-valued and that the preceding expectations are well-defined and finite. Show that \pi^* is optimal within \tilde{\Pi} and that the corresponding optimal cost is equal to J_0(x_0).

1.13 (Semilinear Systems) Consider a problem involving the system x_{k+1}=A_kx_k+f_k(u_k)+w_k where x_k\in\mathfrak{R}^n, f_k are given functions, and A_k and w_k are random n\times n matices and n-vectors, respectively, with given probability distributions that do not depend on x_k, u_k or prior values of A_k and w_k. Assume that the cost is of the form \mathop{\mathbf{E}}\limits_{\mathop{A_k,w_k}\limits_{k=0,1,\ldots,N-1}}\left\{c'_Nx_n+\sum_{k=0}^{N-1}\left(c'_kx_k+g_k\left(\mu_k(x_k)\right)\right)\right\}where c_k are given vectors and g_k are given functions. Show that if the optimal cost for this problem is finite and the control constraint sets U_k(x_k) are independent of x_k, then the cost-to-go functions of the DP algorithm are affine (linear plus constraint). Assuming that there is at least one optimal policy, show that there exists an optimal policy that consist of constant functions \mu_k^* that is \mu_k^*(x_k)= constant for all x_k\in\mathfrak{R}^n.

1.14 A farmer annually producing x_k units of a certain crop stores (1-u_k)x_k units of his production, where 0\leq u_k\leq 1 and invests the remaining u_kx_k units, thus increasing the next year’s production to a level x_{k+1} given by x_{k+1}=x_k+w_ku_kx_k,\qquad k=0,1,\cdots,N-1The scalars w_k are independent random variables with identical probability distributions that do not depend either on x_k or u_k. Furthermore, \mathbf{E}\{w_k\}=\overline{w}>0. The problem is to find the optimal investment policy that maximizes the total expected product stored over N years \mathop\mathbf{E}\limits_{\mathop{w_k}\limits_{k=0,1,\ldots,N-1}}\left\{x_N+\sum_{k=0}^{N-1}(1-u_k)x_k\right\} Show the optimality of the following policy that consists of constant functions:
(1) If \overline{w}>1, \mu_0^*(x_0)=\cdots=\mu_{N-1}^*(x_{N-1})=1.
(2) If 0<\overline{w}<1/N, \mu_0^*(x_0)=\cdots=\mu_{N-1}^*(x_{N-1})=0.
(3) If 1/N\leq\overline{w}\leq 1, \mu_0^*(x_0)=\cdots=\mu_{N-\bar{k}-1}^*(x_{N-\bar{k}-1})=1\mu_{N-\bar{k}}^*(x_{N-\bar{k}})=\cdots=\mu_{N-1}^*(x_{N-1})=0 where \bar{k} is such that 1/{(\bar{k}+1)}<\overline{w}\leq 1/{\bar{k}}.
Solution. The DP-algorithm can be formed as follows: J_N(x_N)=x_N J_k(x_k)=\mathop{\max}\limits_{u_k}\mathop{\mathbf E}\limits_{w_k}\left\{(1-u_k)x_k+J_{k+1}(x_k+w_ku_kx_k)\right\}\,,\;\; k=0,1,\cdots,N-1 (1) In the case \overline{w}>1 J_{N-1}(x_{N-1})=\mathop{\max}\limits_{u_{N-1}}\mathop{\mathbf E}\limits_{w_{N-1}}\left\{(1-u_{N-1})x_{N-1}+x_{N-1}+w_{N-1}u_{N-1}x_{N-1}\right\} =\mathop{\max}\limits_{u_{N-1}}\mathop{\mathbf E}\limits_{w_{N-1}}\left\{2x_{N-1}+(w_{N-1}-1)u_{N-1}x_{N-1}\right\} =\mathop{\max}\limits_{u_{N-1}}\left\{2x_{N-1}+(\overline{w}-1)u_{N-1}x_{N-1}\right\}\qquad\qquad we obtain that \mu_{N-1}^*(x_{N-1})=1 and J_{N-1}(x_{N-1})=(\overline{w}+1)x_{N-1}. By indunction and similar caculations, we can get \mu_k^*(x_k)=1\;,\;J_k(x_k)=(\overline{w}+N-k)x_k\;\;\text{for all }\;k=0,1,\cdots,N-1(2) In the case \overline{w}<1/N,
J_{N-1}(x_{N-1})=\mathop{\max}\limits_{u_{N-1}}\left\{2x_{N-1}+(\overline{w}-1)u_{N-1}x_{N-1}\right\}=2x_{N-1} with \mu_{N-1}^*(x_{N-1})=0. And then
J_{N-2}(x_{N-2})=\mathop{\max}\limits_{u_{N-2}}\mathop{\mathbf E}\limits_{w_{N-2}}\left\{(1-u_{N-2})x_{N-2}+2[x_{N-2}+w_{N-2}u_{N-2}x_{N-2}]\right\}=\mathop{\max}\limits_{u_{N-2}}\left\{3x_{N-2}+(2\overline{w}-1)u_{N-2}x_{N-2}\right\}=3x_{N-2} with \mu_{N-2}^*(x_{N-2})=0. Since \overline{w}<1/N, we can prove by induction that for all k=0,1,\cdots,N-1, \mu_k^*(x_k)=0\;,\;J_k(x_k)=(N-k+1)x_k (3) In the case 1/N\leq\overline{w}\leq 1, there is some k=\bar{k} satisfying 1/{(\bar{k}+1)}<\overline{w}\leq 1/{\bar{k}}, after which step the induction argument in (2) will stop.
For any k\ge N-\bar{k}, we have J_k(x_k)=\mathop{\max}\limits_{u_k}\left\{(N-k+1)x_k+((N-k)\overline{w}-1)u_kx_k\right\}Since (N-k)\overline{w}\leq\bar{k}\overline{w}\leq 1, then \mu_k^*(x_k)=0 and J_k(x_k)=(N-k+1)x_k.
Next for k= N-\bar{k}-1
J_{N-\bar{k}-1}(x_{N-\bar{k}-1})=\mathop{\max}\limits_{u_{N-\bar{k}-1}}\left\{(\bar{k}+2)x_{N-\bar{k}-1}+((\bar{k}+1)\overline{w}-1)u_{N-\bar{k}-1}x_{N-\bar{k}-1}\right\} In this case, (\bar{k}+1)\overline{w}-1>0, yielding \mu_{N-\bar{k}-1}^*(x_{N-\bar{k}-1})=1 and J_{N-\bar{k}-1}(x_{N-\bar{k}-1})=[(\bar{k}+1)\overline{w}+\bar{k}+1]x_{N-\bar{k}-1}.
And then by induction, we can prove that for all k<N-\bar{k}-1, \mu_k^*(x_k)=1.\qquad\qquad\qquad\qquad\Box

1.15 Let x_k denote the number of educators in a certain country at time k and let y_k denote the number of research scientists at time k. New scientists (potential educators or research scientists) are produced during the kth period by educators at a rate \gamma_k per educator, while educators and research scientists leave the field due to death, retirement, and transfer at a rate \delta_k. The scalars \gamma_k, k = 0,1, ... ,N-1, are independent identically distributed random variables taking values within a closed and bounded interval of positive numbers. Similarly \delta_k, k = 0,1, ... ,N-1, are independent identically distributed and take values in an interval [\delta,\delta'] with 0<\delta\leq\delta'<1. By means of incentives, a science policy maker can determine the proportion u_k of new scientists produced at time k who become educators. Thus, the number of research scientists and educators evolves according to the equations x_{k+1}=(1-\delta_k)x_k+u_k\gamma_k x_k y_{k+1}=(1-\delta_k)y_k+(1-u_k)\gamma_k x_k The initial numbers x_0,y_0 are known, and it is required to find a policy \{\mu_0^*(x_0,y_0),\cdots,\mu_{N-1}^*(x_{N-1},y_{N-1})\} with 0<\alpha\leq \mu_k^*(x_k,y_k)\leq\beta<1,\qquad\text{for all }x_k,y_k\text{ and }k which maximizes \mathbf{E}_{\gamma_k,\delta_k}\{y_N\} (i.e. the expected final number of research scientists after N periods). The scalars \alpha and \beta are given.
(1) Show that the cost-to-go functions J_k(x_k,y_k) are linear; that is, for some scalars \xi_k,\zeta_k, J_k(x_k,y_k)=\xi_kx_k+\zeta_ky_k (2) Derive an optimal policy \{\mu_0^*,\cdots,\mu_{N-1}^*\}, under the assumption \mathbf{E}\{\gamma_k\}>\mathbf{E}\{\delta_k\} and show that this optimal policy can consist of constant functions.
(3) Assume that the proportion of new scientists who become educators at time k is u_k+\epsilon_k (rather than u_k), where \epsilon_k are identically distributed independent random variables that are also independent of \gamma_k,\delta_k and take values in the interval [-\alpha,1-\beta]. Derive the form of the cost-to-go functions and the optimal policy.

全部评论 (0)

还没有任何评论哟~