斯坦福CS229（吴恩达授）学习笔记（3）

阅读量：

CS229-notes1-part3

说明
正文
- Problem Set #1: Supervised learning
- - 1. Newton's method for computing least squares
  - 5. Exponential family and the geometric distribution

说明

此笔记 是cs229-notes1讲义中的第二部分学习内容，与B站上的“04 牛顿方法”视频对应，主要是对讲义中一些推理的补充以及一些重点内容的记录，另外还会附加该部分相对应的习题解答和算法的C++实现。
课程相关视频、讲义等资料可参照《斯坦福CS229（吴恩达授）学习笔记（1）》获取。

正文

Problem Set #1: Supervised learning

1. Newton’s method for computing least squares

原文题目如下：
Newton's method for computing least squares
解答：
（a）
假定：
$X:m\times n$ ， $\theta:n\times 1$ ， $\vec y:m\times 1$
$\begin{aligned} J(\theta)=&\frac{1}{2}\sum^m_{i=1}(\theta^Tx^{(i)}-y^{(i)})^2\\ =&\frac{1}{2}(X\theta-\vec y)^T(X\theta-\vec y)\\ =&\frac{1}{2}tr[(X\theta-\vec y)^T(X\theta-\vec y)]\\ =&\frac{1}{2}tr[\theta^TX^TX\theta-(\vec y^TX\theta)^T-\vec y^TX\theta+\vec y^T \vec y] \end{aligned}$
所以
$\begin{aligned} \nabla_\theta J(\theta)=&\frac{1}{2}\nabla_\theta tr[\theta^TX^TX\theta-(\vec y^TX\theta)^T-\vec y^TX\theta+\vec y^T \vec y]\\ =&\frac{1}{2}[\nabla_\theta tr\theta^TX^TX\theta-2\nabla_\theta tr\vec y^TX\theta]\\ =&\frac{1}{2}[\nabla_\theta tr\theta^T_cX^TX\theta+\nabla_\theta tr\theta^TX^TX\theta_c]-(\vec y^TX)^T\\ =&X^TX\theta-X^T\vec y \end{aligned}$
所以Hessian为
$\begin{aligned} H=\nabla_\theta\nabla_\theta J(\theta)=&\nabla_\theta( X^TX\theta-X^T\vec y)=X^TX \end{aligned}$
（b）
根据Newton-Raphson算法
$\begin{aligned} \theta :=&\theta-H^{-1}\nabla_\theta J(\theta)\\ :=&\theta-(X^TX)^{-1}(X^TX\theta-X^T\vec y)\\ :=&\theta-(X^TX)^{-1}X^TX\theta+(X^TX)^{-1}X^T\vec y\\ :=&\theta-\theta+(X^TX)^{-1}X^T\vec y\\ :=&(X^TX)^{-1}X^T\vec y \end{aligned}$

5. Exponential family and the geometric distribution

原文题目如下：
Exponential family and the geometric distribution
解答：
（a）
几何分布：在 $n$ 次伯努利实验中，实验 $y$ 次才得到第一次成功的概率。也就是前 $y-1$ 次皆失败，第 $y$ 次成功的概率。
$\begin{aligned} p(y;\phi)=&(1-\phi)^{y-1}\phi\\ =&exp(log((1-\phi)^{y-1}\phi))\\ =&exp((y-1)log(1-\phi)+log\phi)\\ =&exp(ylog(1-\phi)-log(1-\phi)+log\phi)\\ =&exp(ylog(1-\phi)+log\frac{\phi}{1-\phi}) \end{aligned}$
根据GLM的格式 $p(y;\phi)=b(y)exp(\eta^TT(y)-a(\eta))$ ，其中 $\eta$ 叫做natural parameter(或者canonical parameter)， $y$ 叫做response variable
所以
$\begin{aligned} b(y)=&1\\ \eta=&log(1-\phi)\\ T(y)=&y\\ a(\eta)=&-log\frac{\phi}{1-\phi}=-log\frac{1-e^\eta}{e^\eta}=\eta \end{aligned}$
（b）
根据题意， $y$ 服从几何分布，所以 $y$ 就叫做geometric response variable
根据canonical response function 定义
$\begin{aligned} g(\eta)=&E[T(y);\eta]\\ =&E[y;\eta]\\ =&\frac{1}{\phi}\\ =&\frac{1}{1-e^\eta} \end{aligned}$
（c）
根据题意，即当 $x$ 给定时， $y$ 服从几何分布。那么如何构造训练集的几何分布GLM模型呢（讲义中前面用到的logistic regression就是一种GLM模型）？先看讲义中的几段话。

简而言之，给定 $x$ ，我们的目标是预测 $y$ 的期望值 $h(x)$ ，即 $h(x)=E[y|x]$ 。而自然参数 $\eta$ 和 $x$ 线性相关： $\eta=\theta^Tx$ 。据(b)，几何分布的期望为 $\frac{1}{1-e^{\eta}}$ ，所以
$h(x)=\frac{1}{1-e^{\eta}}=\frac{1}{1-e^{\theta^Tx}}$
因为要求的是随机梯度上升算法，每次只考虑一个样本点，所以
$\begin{aligned} \ell_i(\theta)=logp(y^{(i)}|\phi)=logp(y^{(i)}|\eta)=&logp(y^{(i)}|x^{(i)},\theta)\\ =&log(exp(y^{(i)}log(1-\phi)+log\frac{\phi}{1-\phi}))\\ =&log(exp(y^{(i)}log(1-(1-e^\eta))+log\frac{1-e^\eta}{1-(1-e^\eta)}))\\ =&log(exp(y^{(i)}\eta-log\frac{e^\eta}{1-e^\eta}))\\ =&log(exp(y^{(i)}\theta^Tx^{(i)}-log\frac{e^{\theta^Tx^{(i)}}}{1-e^{\theta^Tx^{(i)}}}))\\ =&y\theta^Tx^{(i)}-log\frac{e^{\theta^Tx^{(i)}}}{1-e^{\theta^Tx^{(i)}}})\\ =&y\theta^Tx^{(i)}+log(e^{-\theta^Tx^{(i)}}-1) \end{aligned}$
根据随机梯度上升算法 $\theta_j:=\theta_j+\alpha \frac{\partial\ell_i(\theta)}{\partial \theta_j}$ ，求出
$\begin{aligned} \frac{\partial\ell_i(\theta)}{\partial \theta_j}=&x_j^{(i)}y^{(i)}+\frac{e^{-\theta^Tx^{(i)}}}{e^{-\theta^Tx^{(i)}}-1}(-x_j^{(i)})=(y^{(i)}-\frac{1}{1-e^{\theta^Tx^{(i)}}})x_j^{(i)} \end{aligned}$
故 $\theta_j:=\theta_j+\alpha(y^{(i)}-\frac{1}{1-e^{\theta^Tx^{(i)}}})x_j^{(i)}$

全部评论 (0)

还没有任何评论哟~

斯坦福CS229（吴恩达授）学习笔记（3）

CS229notes1part3 说明正文 ProblemSet1:Supervisedlearning 1\.Newton'smethodforcomputingleastsquares 5\.E...

斯坦福CS229（吴恩达授）学习笔记（1）

CS229notes1part1 说明正文 LinearRegression LMSalgorithm（迭代法、梯度下降法） Thenormalequations（解析法）说明此笔记是cs229...

斯坦福CS229（吴恩达授）学习笔记（5）

CS229notes3 说明正文 ProblemSet2:Kernels,SVMs,andTheory 1\.Kernelridgeregression 2\.\ell2normsoftmargin...

斯坦福CS229（吴恩达授）学习笔记（6）

CS229notes4 说明正文 ProblemSet2:Kernels,SVMs,andTheory 5\.Uniformconvergence 说明此笔记是cs229notes4讲义中的学习内...

coursera-斯坦福-机器学习-吴恩达-第1周笔记

coursera斯坦福机器学习吴恩达第1周笔记文章目录 coursera斯坦福机器学习吴恩达第1周笔记 0前言 1Introduction介绍对应笔记lectur 1Introduction介绍对应...

Coursera 斯坦福吴恩达机器学习课程笔记 (1)

看了课程一周后发现忘光了，决定做一个笔记用作复习。如果涉及到侵权问题请联系我，我会立马删除并道歉。同时，禁止任何形式的转载，包括全文转载和部分转载。如需使用请联系本人422892137@qq.com...

coursera-斯坦福-机器学习-吴恩达-第3周笔记-逻辑回归

coursera斯坦福机器学习吴恩达第3周笔记逻辑回归文章目录 coursera斯坦福机器学习吴恩达第3周笔记逻辑回归 1.分类和模型表示 1.1分类的概念Classification 1.2分类模...

斯坦福吴恩达《机器学习》--增强学习

增强学习和控制在监督学习中，算法试图模仿训练机的labelsy,训练集中的每一个输入x都有一个确定的对应的y，但是对于很多需要连续作决定的问题和控制问题，给算法提供一个明确的标签是很难的。

coursera-斯坦福-机器学习-吴恩达-第11周笔记-ORC系统

coursera斯坦福机器学习吴恩达第11周笔记ORC系统 coursera斯坦福机器学习吴恩达第11周笔记ORC系统 1图像ORC 1问题描述 2滑动窗slidingwindows 3获取大量的图片...

coursera-斯坦福-机器学习-吴恩达-第8周笔记-无监督学习

coursera斯坦福机器学习吴恩达第8周笔记无监督学习文章目录 coursera斯坦福机器学习吴恩达第8周笔记无监督学习 1聚类算法clutering 1.1聚类算法简介 1.2Kmeans 1....

是否确定退出登录?

斯坦福CS229（吴恩达授）学习笔记（3）

CS229-notes1-part3

说明

正文

Problem Set #1: Supervised learning

1. Newton’s method for computing least squares

5. Exponential family and the geometric distribution

全部评论 (0)

相关文章推荐

斯坦福CS229（吴恩达授）学习笔记（3）

斯坦福CS229（吴恩达授）学习笔记（1）

斯坦福CS229（吴恩达授）学习笔记（5）

斯坦福CS229（吴恩达授）学习笔记（6）

coursera-斯坦福-机器学习-吴恩达-第1周笔记

Coursera 斯坦福吴恩达机器学习课程笔记 (1)

coursera-斯坦福-机器学习-吴恩达-第3周笔记-逻辑回归

斯坦福吴恩达《机器学习》--增强学习

coursera-斯坦福-机器学习-吴恩达-第11周笔记-ORC系统

coursera-斯坦福-机器学习-吴恩达-第8周笔记-无监督学习