统计学习精要 (Elements of Statistical Learning ) 习题 2.4

阅读量：

问题：

The edge effect problem discussed on page 23 is not peculiar to uniform sampling from bounded domains. Consider inputs drawn from a spherical multinormal distribution $X \sim {N}(0,{I}_p)$ . The squared distance from any sample point to the origin has a ${X}^2_p$ distribution with mean $p$ . Consider a prediction point $x_0$ drawn from this distribution, and let $a=x_0/\|x_0\|$ be an associated unit vector. Let $z_i=a^T x_i$ be the projection of each of the training points on this direction.
Show that the $z_i$ are distributed $N(0,1)$ with expected squared distance from the origin $1$ , while the target point has expected squared distance from the origin. Hence for $p = 10$ , a randomly drawn test point is about $3.1$ standard deviations from the origin, while all the training points are on average one standard deviation along direction $a$ . So most prediction points see themselves as lying on the edge of the training set.

对于在一定有界范围中的均匀抽样来说，第23页所讨论的边界效应问题并不是一个特殊或奇怪的现象。假设我们有一些从多维球状正态分布 $X \sim {N}({0},{I}_p)$ 抽样的输入数据，那么从任何一个抽样点到原点的距离的平方都服从自由度为的卡方分布，其期望为。记其中一个从这分布中采样的点为，并令为方向上的单位向量。让为每一个训练数据点在方向上的投影。

证明服从标准正态分布且到原点的距离平方的期望为1，而原来的到原点距离平方的期望则为。因此，对于 $p=10$ , 一个随机抽样的测试点到原点的距离大约是3.1个标准差，而所有训练点在方向上的距离平均只有一个标准差。所以，在大部分测试点看来，他们都位于训练集的边缘。

思路：

首先说明一下第一部分的一个点。对于任意随机向量 $x_i$ ，其到原点的距离平方为 $\|x_i-0\|^2=\sum_{j=1}^p x_{ij}^2$ 。因为其协方差矩阵是 ${I}_p$ ，所以向量中任意两个元素线性独立。而对于多维正态分布线性独立等同于独立，因此上述距离平方则是个独立的服从标准正态分布的随机变量的平方和，正好服从自由度为的卡方分布。

如果一个有限维随机向量服从多维正态分布的，那么其元素的任意线性组合服从一维正态分布。（参考维基百科）因此，都服从正态分布。而且，

因此服从标准正态分布。到原点的距离平方为 $z_i^2$ ，服从自由度为的卡方分布，因此期望为1。而到原点的距离平方为

其中 $E(x_{0i}^2) = Var(x_{0i}) + E(x_{0i})^2$ .

所以，到原点距离大概在 $\sqrt{p}$ 个标准差，但其他点只有一个标准差。从看来，相对其他点自己很可能是个“异常点”。

全部评论 (0)

还没有任何评论哟~

是否确定退出登录?

统计学习精要 (Elements of Statistical Learning ) 习题 2.4

全部评论 (0)

相关文章推荐

统计学习精要 (Elements of Statistical Learning ) 习题 2.4

统计学习精要 (Elements of Statistical Learning ) 习题 2.1

统计学习精要 (Elements of Statistical Learning ) 习题 5.9

统计学习精要 (Elements of Statistical Learning ) 习题 5.10

统计学习精要 (Elements of Statistical Learning ) 习题 2.2

统计学习精要 (Elements of Statistical Learning ) 习题 3.21

统计学习精要 (Elements of Statistical Learning ) 习题 2.3

统计学习精要 (Elements of Statistical Learning ) 习题 5.13

统计学习精要 (Elements of Statistical Learning ) 习题 5.12

统计学习精要 (Elements of Statistical Learning ) 习题 3.20