Dynamic Programming and Optimal Control 第三章习题

阅读量：

1 Solve the problem of Example 3.2.1 for the case where the cost function is $(x(T))^2+\int_0^T(u(t))^2dt$ Also, calculate the cost-to-go function $J^*(t,x)$ and verify that it satisfies the HJB equation.
Solution. The scalar system $\dot x(t)=u(t)$ with the constaint $|u(t)|\leq 1$ for all $t\in [0,T]$ .

3.2 A young investor has earned in the stock market a large amount if money $S$ and plans to spend it so as to maximize his enjoyment through the rest of his life without working. He estimates that he will live exactly $T$ more than years and that his capital $x(t)$ should be reduced to zero at time $T$ , i.e., $x(T)=0$ . Also he models the evolution of his capital by the differential equation $\frac{dx(t)}{dt}=\alpha x(t)-u(t)$ where $x(0)=S$ is his initial capital, $\alpha >0$ is a given interest rate, and $u(t)\ge 0$ is his rate of expenditure. The total enjoyment he will obtain is given by $\int_0^Te^{-\beta t}\sqrt{u(t)}dt$ Here $\beta$ is some positive scalar, which serves to discount future enjoyment. Find the optimal $\{u(t)|t\in[0,T]\}$ .
Solution. We have $f(x,u)=\alpha x-u\;\;,\qquad g(x,u)=e^{-\beta t}\sqrt{u}$ giving the Hamiltonian as follows: $H(x,u,p)=e^{-\beta t}\sqrt{u}+p(\alpha x-u)$ and the adjoint equation is
$\dot p(t)=-\alpha p(t)$ yielding $p(t)=C_1e^{-\alpha t}\qquad\text{for some constant }C_1$ Notice that here $x(T)=0$ is given, so $p(T)=\nabla(h(x^*(T)))=0$ is not true anymore.
$\qquad$ The optimal control is obtained by maximizing the Hamiltonian with respect to $u$ , yielding
$u^*(t)=\arg\max_u\left[e^{-\beta t}\sqrt{u}+C_1e^{-\alpha t}(\alpha x^*-u)\right]=\frac{e^{(\alpha -\beta)t}}{2C_1}\qquad (3.2.1)$ Then by the differiential equation of the system we get $\dot{x}^*(t)=\alpha x^*(t)-\frac{e^{(\alpha -\beta)t}}{2C_1}$ Solving this equation, we obtain
$x^*(t)=C_2e^{\alpha t}+\frac{e^{(\alpha -\beta)t}}{2C_1\beta}\qquad\text{for some constant }C_2$ And together with the initial condition $x^*(0)=S$ and the final condition $x^*(T)=0$ , we can get the exact values of $C_1$ and $C_2$ . So $u^*(t)$ in (3.2.1) gives the optimal control. $\qquad\qquad\qquad\qquad\qquad\Box$

3.9 Use the Minimum Principle to solve the linear-quadratic problem of Example 3.2.2.
Solution. The $n$ -dimension linear-quadratic system is given by
$\dot x(t)=Ax(t)+Bu(t)$ where $A$ and $B$ are given matrices, and the quadratic cost
$x(T)'Q_Tx(T)+\int_0^T\left(x(t)'Qx(t)+u(t)'Ru(t)\right)dt$ where the matrices $Q_T$ and $Q$ are symmetric positive semidefinite, and the matrix $R$ is symmetric positive definite.
$\qquad$ The Hamiltonian here is $H(x,u,p)=x'Qx+u'Ru+p'(Ax+Bu)$ and the adjoint equation is
$\dot p(t)=2Qx+A'p(t)\qquad (1)$ with the terminal conditon $p(T)=\nabla h(x^*(T))=2Q_Tx^*(T)$ The optimal control can be obtained by minimizing the Hamiltonian with respect to $u$ , yielding
$u^*(t)=\arg\min_{u}\left\{x^*(t)'Qx^*(t)+u'Ru+p'(Ax^*(t)+Bu)\right\}$ Since $\nabla_u\{x^*(t)'Qx^*(t)+u'Ru+p'(Ax^*(t)+Bu)\}=2Ru+B'p$ , we get $u^*(t)=-\frac{1}{2}R^{-1}B'p(t)\qquad(2)$ together with the system function leading to $\dot x^*(t)=Ax^*(t)-\frac{1}{2}BR^{-1}B'p(t)\qquad (3)$ So $p(t)$ can be solved by (1) (But I don’t know the answer!!) , and then $x^*(t)$ can be solved by (3).

3.11 Use the discrete-time Minimum Principle to solve Exercise 1.14 of Chapter 1, assuming that each $w_k$ is fixed at a known deterministic value.
Solution. Let $w_k=\overline{w}$ for some fixed number $\overline{w}>0$ , the system is characterized by $x_{k+1}=f_k(x_k,u_k)=x_k+\overline{w}u_kx_k$ and the cost functiom becomes $J(u)=x_N+\mathop{\sum}\limits_{k=0}^{N-1}(1-u_k)x_k$ Then the Hamiltonian function can be written as $H_k(x_k,u_k,p_{k+1})=(1-u_k)x_k+p_{k+1}(x_k+\overline{w}u_kx_k)$ By the Discrete-time Minimum Principle, for $k=0,1,\cdots,N-1$ , we have $u_k^*=\arg\mathop{\max}\limits_{u_k}H_k(x_k^*,u_k,p_{k+1})\qquad\qquad\qquad\qquad\;\;$ $=\arg\mathop{\max}\limits_{u_k}\left[(p_{k+1}\overline{w}-1)u_kx_k+(p_{k+1}+1)x_k\right]$ $=\begin{cases} 1, & \text{ if }\; p_{k+1}\overline{w}>1\\ 0, & \text{ if }\; p_{k+1}\overline{w}\leq1 \end{cases}\qquad\qquad\qquad(3.11.1)$ On the other hand, for $k=0,1,\cdots,N-1$ , the adjoint equation reads $p_k=\nabla_{x_k}H_k(x_k^*,u_k^*,p_{k+1})=(p_{k+1}\overline{w}-1)u_k^*+p_{k+1}+1\qquad(3.11.2)$ with the terminal condition $p_N=\nabla_{g_N}(x_N^*)=1.$
Combing (3.11.1) with (3.11.2), we can obtain the following argument
$p_{k+1}\overline{w}>1\;\Rightarrow\;\mu_k^*=1\;\Rightarrow\;p_k=(\overline{w}+1)p_{k+1}\qquad (3.11.3)$ $p_{k+1}\overline{w}\leq1\;\Rightarrow\;\mu_k^*=0\;\Rightarrow\;p_k=p_{k+1}+1\;\;\qquad (3.11.4)$ So by induction, we can easily conclude the following optimal control results:
(1) If $\overline{w}>1$ , $u_0^*=\cdots=u_{N-1}^*=1$ .
(2) If $0<\overline{w}<1/N$ , $u_0^*=\cdots=u_{N-1}^*=0$ .
(3) If $1/N\leq\overline{w}\leq 1$ , $u_0^*=\cdots=u_{N-\bar{k}-1}^*=1$ $u_{N-\bar{k}}^*=\cdots=u_{N-1}^*=0$ where $\bar{k}$ is such that $1/{(\bar{k}+1)}<\overline{w}\leq 1/{\bar{k}}$ . $\qquad\qquad\qquad\qquad\qquad\qquad\qquad\Box$

3.12 Use the discrete-time Minimum Principle to solve Exercise 1.15 of Chapter 1, assuming that each $\gamma_k$ and $\delta_k$ are fixed at a known deterministic values.

全部评论 (0)

还没有任何评论哟~

是否确定退出登录?

Dynamic Programming and Optimal Control 第三章习题

全部评论 (0)

相关文章推荐

Dynamic Programming and Optimal Control 第三章习题

Dynamic Programming and Optimal Control 第一章习题

Dynamic Programming and Optimal Control 第四章习题

Speeding Up Dynamic Programming Computation: Tips and

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day1

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day3

RL学习日志：（Reinforcement Learning for Sequential Decision and Optimal Control）Day2

Dynamic Programming and State Compression (LeetCode #2209 & LeetCode #2212)

新书推荐｜Reinforcement learning for sequential decision and optimal control

Dynamic programming