Advertisement

【Machine Learning】Optimization

阅读量:

本笔记基于清华大学《机器学习》的课程讲义梯度下降相关部分,基本为笔者在考试前一两天所作的Cheat Sheet。内容较多,并不详细,主要作为复习和记忆的资料。

Smoothness assumption

Upper Bound for \nabla^2f(x): \left\|\nabla^2f(w)\right \|\le L

f(w')\le f(w)+\langle \nabla f(w), w'-w\rangle+\frac{L}{2}\left\|w'-w\right \|^2

equivalent to \left\|\nabla f(w)-\nabla f(w')\right \|\le L\left\|w-w'\right \|

Proof:

复制代码
  * $\Rightarrow$: $\left\|\nabla f(w)-\nabla f(w')\right \|\le \left\|\nabla^2 f(x)\right\|\left\|w-w'\right \|\le L\left\|w-w'\right \|$
  * $\Leftarrow$

2-side: \left|f(w')-f(w)-\langle \nabla f(w), w'-w\rangle\right|\le \frac{L}{2}\left\|w'-w\right \|^2

contains both upper/lower bound

When w'=w-\eta\nabla f(w), \eta=\frac{1}{L} to make sure

f(w')-f(w)=-\frac{1}{2\eta}\left\|\nabla f(x)\right \|^2<0

Convex Function

Lower Bound for \nabla^2f(x)

f(w')\ge f(w)+\nabla f(w)^T(w'-w)

复制代码
* $\lambda\min\nabla^2 f(w)\ge 0$

Strong convex function: f(w')\ge f(w)+\nabla f(w)^T(w'-w)+\frac{\mu}{2}\left\|w-w'\right\|^2

复制代码
* $\lambda\min \nabla^2f(w)\ge \mu \ge 0$

Convergence Analysis

f is L-smooth

The sequence is w_0,w_1,...,w_t and the optimized point is w^*,

w_i=w_{i-1}-\eta \nabla f(w_{i-1})

\eta<\frac{2}{L}

Gradient Descent make progress

f(w')-f(w)\le -\eta\left(1-\frac{L\eta}{2}\right)\left\|\nabla f(w)\right \|^2

  • Convex function: 1/t convergence rate

f(w_t)-f(w^*)\le \frac{\left\|w_0-w^*\right \|^2}{2t\eta}

Stochastic Gradient Descent(SGD)

f is L-smooth and convex

w_{t+1}=w_t-\eta G_t,E[G_t]=\nabla f(w_t)

E f(\overline{w_t})-f(w^*)\le \frac{\left\|w_0-w^*\right \|^2}{2t\eta}+\eta \sigma^2

  • Convergence rate 1/\sqrt{t}

SVRG

Proof: To read.

Mirror Descent

After k iterations, f(\bar{x})-f(u)\le \frac{1}{k}\sum_{t=0}^{k-1}\langle \nabla f(x_t),x_t-u\rangle (also calls regret)becomes smaller.

f is \rho-Lipschitz, that is |\nabla f(x)|\le \rho. After T=O(\frac{\rho^2}{\epsilon^2}), f(\bar{x})-f(x^*)\le \epsilon. 1/\sqrt{t} convergence rate.

Linear Coupling : 1/t^2 convergence rate. t\ge \Omega\left(\frac{1}{\epsilon}\right)^{1/2}

全部评论 (0)

还没有任何评论哟~