Advertisement

An introduction to Linear Regression

阅读量:

An Introduction to Linear Regression

Intent

supervised learning:
- regression (continious)
- classfication (discrete)

h(\theta)=\displaystyle\sum_{i=0}^{n}\theta_iX_i=\theta^{T}X

For historical reasons, this function h is called a hypothesis.

\theta^{'}s \quad\text{is the parameters}

(x^{(i)},y^{(i)})\qquad(training \quad example)

\{(x^{(i)},y^{(i)}); i=1,...,m\}\qquad(training \quad set)

要让h(\theta)接近于trainset中的y^{(i)}

cost function: J(\theta) =\frac{1}{2}\displaystyle\sum_{i=1}^{m}(h_{\theta}(x^{(i)} - y^{(i)})^2

为了使生成的模型达到稳定性能,在偏差与方差之间进行权衡是非常必要的。正则化回归采用L1(即拉索)和L2(即里德)作为惩罚项。其中,L1等同于特征选择过程(参数变为零)。其主要作用是降低方差。交叉验证与自助法也是常用的技术来权衡偏差与方差。

LMS

gradient descent

\theta_j被更新为当前值减去学习率乘以目标函数相对于\theta_j的偏导数;\alpha代表学习率

对θj求偏导数运算的结果等于二分之一乘以(h_θ(x)−y)再乘以xj

For a single training example, this gives the update rule:

\theta_j = \theta_j +\alpha(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}_j

The rule is called LMS update rule (LMS, “least mean squares”)

batch gradient descent
I n s i t i t u l t i v eu n t i lc o v e r g e n c e{ d o q u a d t h es u m m a t i o no v e rd o ff o r $i = 1t o$$m ( y ^{( i ) } - h _{θ}( x ^{( i ) } )) x _{ j } ^{( i ) } d o w n w a r d s θ _{ j } = θ _{ j } + α ×$
}

随机梯度下降法
Loop \quad\{
\quad for \, i \, from \, 1 \, to \, m,
\qquad update \, parameters: \, \theta_j = \theta_j + \alpha \sum_{k=1}^{m} (y^{(k)} - h_\theta(x^{(k)})) x^{(k)}_j
\qquad (as applied to each \, j)
\quad end.
end.

stochastic 相比于batch的方法收敛速度快。

An interpretation of Probability

please refer to [1]

An interpretation of Linear Algebra

在Ng的draft中分别运用了基于线性代数和基于概率论的方法来推导Linear regression模型。其中概率论的方法则利用高斯分布的最大似然估计来进行建模;相比之下,基于Linear Algebra的方法处理起来较为复杂;然而从投影的角度来看,则可能更为直观[2]。

这里尽量写的清楚点[2]:
Projection onto a line
A line goes through the origin in the direction of a = (a_1, . .. , a_m). Along that line, we want the point p closest to b = (b_1, . .. , b_m ). The key to projection is orthogonality: The line from b to p is perpendicular to the vector a. This is the dotted line marked e for error in Figure below-which we now compute by algebra.

projection on a line

Projection amounts to a multiple of. Let p equal x^a, which is represented as “x hat” multiplied by. Calculating this scalar x^a will yield the vector p. From the formula for [something], we derive P. These steps outline how to obtain all projection matrices: first determine x, then compute v, and finally obtain P.

Projection onto a Subspace

We aim to calculate projections onto n-dimensional subspaces following a structured approach. The process involves three key steps: identifying a suitable vector, computing its projection onto this subspace using the formula p = A\hat{x}, and determining the corresponding transformation matrix.

The residual b - A\hat{x} is perpendicular to each of the vectors a_1, ..., a_n. The n right angles result in n equations forming the system of equations.

a^T_1(b-A\hat{x})=0\\
\quad\quad\vdots
a^T_n(b-A\hat{x})=0\\

The matrix with those rows a^T is A^T. The equations are exactly A^T(b - A\hat{x}) = 0. Rewrite in its famous form A^T Ax = A^T b. This is the equation for \hat{x}, and the coefficient matrix is A^T A. Now we can find and and , in that order:
p = A\hat{x}=A(A^TA)^{-1}A^Tb
project matrix is:
P=A(A^TA)^{-1}A^T
有了projection matrix就可以求了,也就是在A的column space上的投影。
这里需要A的columns是相互独立的,这样A^TA才可逆。

classification (Logistic regression)

Logit回归是对线性回归采用sigmoid函数这一做法无需再多解释

Generalized Linear Models

线性和逻辑回归都属于广义线性模型(Generalized Linear Models, GLM)的关键组成部分。根据响应变量的不同概率分布特性及符号y可推导出一系列特定模型。具体而言,线性回归对应于响应变量服从正态分布的情况(Linear regression),而逻辑回归则适用于二元分类问题(Logistic regression)。Softmax回归适用于响应变量处于有限类别中的情况(Softmax regression)。有关此主题,请参考文献中的R实现部分( Softmax regression)。该算法所采用的目标函数采用了弹性网(Elastic Net)方法进行优化(其解决的cost function is elastic net's cost function)。

  1. CS 229 Notes - 监督学习
  2. 线性代数导论(第4章):吉尔伯特·斯特朗

**


全部评论 (0)

还没有任何评论哟~