斯坦福CS224n-assignment1
总结:
1、softmax 函数公式,作用在输出层,把预测值转换成输出值的概率分布。
2、在应用softmax函数时如果在其基础上加上一个常数项则不会影响最终的结果这一特性可以用于解决指数运算过大导致的数值溢出问题具体来说在计算过程中我们会将每个元素减去对应行的最大值即X - \max(X)其中X是一个矩阵而\max(X)表示该矩阵每一行的最大值
3、sigmod函数,激活函数,把线性变成非线性。
4、损失函数
5、从损失函数到x输入的求导。
一、softmax

1.1 验证SoftMax对任意输入向量x及常数c保持不变性(如式(1)所示)。操作过程表明将常数值c叠加至向量x的每个维度上(如式(1)所示)。实践应用中可采用此特性进行处理。
c = -\max(xi, axis=1)(相当于对每一行取最大值),即对每一个xi进行扣除该行最大值的操作以避免e^{xi}计算时的溢出问题,并且这种方法相比直接计算原始结果具有更好的稳定性。
证明:softmax(x) = softmax(x + c)

1.2 python实现sotfmax函数,要求既能处理向量,也能处理矩阵
学习到的知识点:
1.2.1 x_max = np.max(x, axis=1, keepdims=True) keepdims 运算后的结构保持矩阵结构
1.2.2 x = np.exp(x - x_max) / np.sum(np.exp(x - x_max), axis=1) #axis = 1, 每一行一个样本,每一个样本的概率归一化
1.2.3 np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) #allclose方法,比较两个array是不是每一元素都相等,默认在1e-05的误差范围内
import numpy as np
def softmax(x):
"""Compute the softmax function for each row of the input x. #每行是一个样本,每一列是各类的评分,归一化以每一行为对象。
7. It is crucial that this function is optimized for speed because
it will be used frequently in later code. You might find numpy
functions np.exp, np.sum, np.reshape, np.max, and numpy
broadcasting useful for this task.
12. Numpy broadcasting documentation:
http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
15. You should also make sure that your code works for a single
D-dimensional vector (treat the vector as a single row) and
for N x D matrices. This may be useful for testing later. Also,
make sure that the dimensions of the output match the input.
20. You must implement the optimization in problem 1(a) of the
written assignment!
23. Arguments:
x -- A D dimensional vector or N x D dimensional numpy matrix.
26. Return:
x -- You are allowed to modify x in-place
"""
orig_shape = x.shape
if len(x.shape) > 1:
# Matrix
### YOUR CODE HERE
x_max = np.max(x, axis=1, keepdims=True) #取每行的最大值, keepdims保持二维的特性,这样在下一步相减情况下,不会出错
x = np.exp(x - x_max) / np.sum(np.exp(x - x_max), axis=1, keepdims=True) #axis = 1, 每一行一个样本,每一个样本的概率归一化
# raise NotImplementedError
### END YOUR CODE
else:
# Vector 列向量,只有一个样本。
### YOUR CODE HERE
x_max = np.max(x)
x = np.exp(x - x_max) / np.sum(np.exp(x - x_max))
# raise NotImplementedError
### END YOUR CODE
assert x.shape == orig_shape
return x
# from q1_softmax import softmax
# return 20 if softmax(测试值) == 正确值 else 0
def test_softmax_basic():
"""
Some simple tests to get you started.
Warning: these are not exhaustive.
"""
print("Running basic tests...")
test1 = softmax(np.array([1,2]))
print(test1)
ans1 = np.array([0.26894142, 0.73105858])
assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) #allclose方法,比较两个array是不是每一元素都相等,默认在1e-05的误差范围内
test2 = softmax(np.array([[1001,1002],[3,4]]))
print(test2)
ans2 = np.array([
[0.26894142, 0.73105858],
[0.26894142, 0.73105858]])
assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)
test3 = softmax(np.array([[-1001,-1002]]))
print(test3)
ans3 = np.array([0.73105858, 0.26894142])
assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06)
print("You should be able to verify these results by hand!\n")
def test_softmax():
"""
Use this space to test your softmax implementation by running:
python q1_softmax.py
This function will not be called by the autograder, nor will
your tests be graded.
"""
print("Running your tests...")
### YOUR CODE HERE
# raise NotImplementedError
### END YOUR CODE
if __name__ == "__main__":
test_softmax_basic()
test_softmax()
二、神经网络的基础
2.1 推导sigmoid函数的导数


θ为全连接层的输出结果,并将此值传递至softmax层;随后利用cross entropy损失函数进行计算;最终目标是计算loss关于θ的梯度。

softmax 求导 <>
<>
<>
<>


计算单隐层神经网络关于输入x的梯度(其中即求取∂J/∂x)。该网络在隐藏层使用了sigmoid激活函数,在输出层应用了softmax函数。y为one-hot编码向量。采用交叉熵损失作为损失函数。(其中σ’(x)表示sigmoid的导数,并对推导过程中的中间变量进行命名即可)。

在本处的求导其实就是将前面几个问题各自计算出的导数进行连乘,并然后再乘以变量x的一阶导数。

**2.4 你能否具体说明上面所说的这个神经网络有多少个参数?为了方便起见,在后续讨论中我们假设输入为Dx维、输出为Dy,并且其中隐藏层共有H个神经元。

(x+1)* h + (h+1)*y
在q2_sigmoid.py文件中,请编写相应的代码块以实现sigmoid激活函数,并计算其梯度。为了验证该函数的行为,请在Python环境中运行对应的脚本q2_sigmoid.py进行测试。同样地,请确保编写相应的代码块以实现sigmoid激活函数,并计算其梯度。由于测试案例可能不够全面,请仔细审查自己的代码确保其正确性

#!/usr/bin/env python
import numpy as np
def sigmoid(x):
"""
Compute the sigmoid function for the input here.
10. Arguments:
x -- A scalar or numpy array. x
13. Return:
s -- sigmoid(x)
"""
### YOUR CODE HERE
s = 1/(1+np.exp(-x))
### END YOUR CODE
return s
def sigmoid_grad(s):
"""
Compute the gradient for the sigmoid function here. Note that
for this implementation, the input s should be the sigmoid
function value of your original input x.
30. Arguments:
s -- A scalar or numpy array.
33. Return:
ds -- Your computed gradient.
"""
### YOUR CODE HERE
ds = s * (1 - s)
### END YOUR CODE
return ds
为了提高可调试性,在q2_gradcheck.py文件中实现一个gradient checker,并通过运行python q2_gradcheck.py脚本对自身代码进行验证。

梯度检查器,就是求导的定义。
# First implement a gradient checker by filling in the following functions
def gradcheck_naive(f, x):
""" Gradient check for a function f.
5. Arguments:
f -- a function that takes a single argument and outputs the
cost and its gradients
x -- the point (numpy array) to check the gradient at
"""
rndstate = random.getstate()
random.setstate(rndstate)
fx, grad = f(x) # Evaluate function value at original point #fx 是平方和函数,grad 2*x函数
h = 1e-4 # Do not change this!
# Iterate over all indexes ix in x to check the gradient.
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
ix = it.multi_index
# Try modifying x[ix] with h defined above to compute numerical
# gradients (numgrad).
# Use the centered difference of the gradient.
# It has smaller asymptotic error than forward / backward difference
# methods. If you are curious, check out here:
# https://math.stackexchange.com/questions/2326181/when-to-use-forward-or-central-difference-approximations
# Make sure you call random.setstate(rndstate)
# before calling f(x) each time. This will make it possible
# to test cost functions with built in randomness later.
### YOUR CODE HERE:
x[ix] += h
random.setstate(rndstate)
new_f1 = f(x)[0]
x[ix] -= 2*h
random.setstate(rndstate)
new_f2 = f(x)[0]
x[ix] += h
numgrad = (new_f1 - new_f2) / (2 * h)
### END YOUR CODE
# Compare gradients
reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
if reldiff > 1e-5:
print("Gradient check failed.")
print("First gradient error found at index %s" % str(ix))
print("Your gradient: %f \t Numerical gradient: %f" % (
grad[ix], numgrad))
return
it.iternext() # Step to next dimension
print("Gradient check passed!")
目前,在q2_neural.py文件中,请实现一个仅含一个隐层且激活函数为sigmoid的神经网络的前馈与反向传播的代码,并通过编写q2_neural.py来验证自己的实现。

#!/usr/bin/env python
import numpy as np
import random
from q1_softmax import softmax
from q2_sigmoid import sigmoid, sigmoid_grad
from q2_gradcheck import gradcheck_naive
def forward_backward_prop(X, labels, params, dimensions):
"""
Forward and backward propagation for a two-layer sigmoidal network
15. Compute the forward propagation and for the cross entropy cost,
the backward propagation for the gradients for all parameters.
18. Notice the gradients computed here are different from the gradients in
the assignment sheet: they are w.r.t. weights, not inputs.
21. Arguments:
X -- M x Dx matrix, where each row is a training example x.
labels -- M x Dy matrix, where each row is a one-hot vector.
params -- Model parameters, these are unpacked for you.
dimensions -- A tuple of input dimension, number of hidden units
and output dimension
"""
### Unpack network parameters (do not modify)
ofs = 0
Dx, H, Dy = (dimensions[0], dimensions[1], dimensions[2])
W1 = np.reshape(params[ofs:ofs+ Dx * H], (Dx, H))
ofs += Dx * H
b1 = np.reshape(params[ofs:ofs + H], (1, H))
ofs += H
W2 = np.reshape(params[ofs:ofs + H * Dy], (H, Dy))
ofs += H * Dy
b2 = np.reshape(params[ofs:ofs + Dy], (1, Dy))
# Note: compute cost based on `sum` not `mean`.
### YOUR CODE HERE: forward propagation
h = sigmoid(np.dot(X,W1) + b1)
yhat = softmax(np.dot(h,W2) + b2)
### END YOUR CODE
### YOUR CODE HERE: backward propagation
cost = np.sum(-np.log(yhat[labels==1])) / X.shape[0]
d3 = (yhat - labels) / X.shape[0]
gradW2 = np.dot(h.T, d3)
gradb2 = np.sum(d3,0,keepdims=True)
dh = np.dot(d3,W2.T)
grad_h = sigmoid_grad(h) * dh
gradW1 = np.dot(X.T,grad_h)
gradb1 = np.sum(grad_h,0)
### END YOUR CODE
### Stack gradients (do not modify)
grad = np.concatenate((gradW1.flatten(), gradb1.flatten(),
gradW2.flatten(), gradb2.flatten()))
return cost, grad
def sanity_check():
"""
Set up fake data and parameters for the neural network, and test using
gradcheck.
"""
print("Running sanity check...")
N = 20
dimensions = [10, 5, 10]
data = np.random.randn(N, dimensions[0]) # each row will be a datum
labels = np.zeros((N, dimensions[2]))
for i in range(N):
labels[i, random.randint(0,dimensions[2]-1)] = 1
params = np.random.randn((dimensions[0] + 1) * dimensions[1] + (
dimensions[1] + 1) * dimensions[2], )
gradcheck_naive(lambda params:
forward_backward_prop(data, labels, params, dimensions), params)
def your_sanity_checks():
"""
Use this space add any additional sanity checks by running:
python q2_neural.py
This function will not be called by the autograder, nor will
your additional tests be graded.
"""
print("Running your sanity checks...")
### YOUR CODE HERE
# raise NotImplementedError
### END YOUR CODE
if __name__ == "__main__":
sanity_check()
your_sanity_checks()
