Advertisement

斯坦福CS224n-assignment1

阅读量:

总结:

1、softmax 函数公式,作用在输出层,把预测值转换成输出值的概率分布。

2、在应用softmax函数时如果在其基础上加上一个常数项则不会影响最终的结果这一特性可以用于解决指数运算过大导致的数值溢出问题具体来说在计算过程中我们会将每个元素减去对应行的最大值即X - \max(X)其中X是一个矩阵而\max(X)表示该矩阵每一行的最大值

3、sigmod函数,激活函数,把线性变成非线性。

4、损失函数

5、从损失函数到x输入的求导。

一、softmax

1.1 验证SoftMax对任意输入向量x及常数c保持不变性(如式(1)所示)。操作过程表明将常数值c叠加至向量x的每个维度上(如式(1)所示)。实践应用中可采用此特性进行处理。

c = -\max(xi, axis=1)(相当于对每一行取最大值),即对每一个xi进行扣除该行最大值的操作以避免e^{xi}计算时的溢出问题,并且这种方法相比直接计算原始结果具有更好的稳定性。

证明:softmax(x) = softmax(x + c)

1.2 python实现sotfmax函数,要求既能处理向量,也能处理矩阵

学习到的知识点:

复制代码
 1.2.1 x_max = np.max(x, axis=1, keepdims=True) keepdims 运算后的结构保持矩阵结构

    
 1.2.2 x = np.exp(x - x_max) / np.sum(np.exp(x - x_max), axis=1) #axis = 1, 每一行一个样本,每一个样本的概率归一化
    
 1.2.3 np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) #allclose方法,比较两个array是不是每一元素都相等,默认在1e-05的误差范围内
复制代码
 import numpy as np

    
  
    
  
    
 def softmax(x):
    
     """Compute the softmax function for each row of the input x. #每行是一个样本,每一列是各类的评分,归一化以每一行为对象。
    
   7.     It is crucial that this function is optimized for speed because
    
     it will be used frequently in later code. You might find numpy
    
     functions np.exp, np.sum, np.reshape, np.max, and numpy
    
     broadcasting useful for this task.
    
   12.     Numpy broadcasting documentation:
    
     http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
    
   15.     You should also make sure that your code works for a single
    
     D-dimensional vector (treat the vector as a single row) and
    
     for N x D matrices. This may be useful for testing later. Also,
    
     make sure that the dimensions of the output match the input.
    
   20.     You must implement the optimization in problem 1(a) of the
    
     written assignment!
    
   23.     Arguments:
    
     x -- A D dimensional vector or N x D dimensional numpy matrix.
    
   26.     Return:
    
     x -- You are allowed to modify x in-place
    
     """
    
     orig_shape = x.shape
    
  
    
     if len(x.shape) > 1:
    
     # Matrix
    
     ### YOUR CODE HERE
    
     x_max = np.max(x, axis=1, keepdims=True) #取每行的最大值, keepdims保持二维的特性,这样在下一步相减情况下,不会出错
    
     x = np.exp(x - x_max) / np.sum(np.exp(x - x_max), axis=1, keepdims=True) #axis = 1, 每一行一个样本,每一个样本的概率归一化
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
     else:
    
     # Vector        列向量,只有一个样本。
    
     ### YOUR CODE HERE
    
     x_max = np.max(x)
    
     x = np.exp(x - x_max) / np.sum(np.exp(x - x_max))
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
  
    
     assert x.shape == orig_shape
    
     return x
    
  
    
 # from q1_softmax import softmax
    
 # return 20 if softmax(测试值) == 正确值 else 0
    
  
    
  
    
 def test_softmax_basic():
    
     """
    
     Some simple tests to get you started.
    
     Warning: these are not exhaustive.
    
     """
    
     print("Running basic tests...")
    
     test1 = softmax(np.array([1,2]))
    
     print(test1)
    
     ans1 = np.array([0.26894142,  0.73105858])
    
     assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) #allclose方法,比较两个array是不是每一元素都相等,默认在1e-05的误差范围内
    
  
    
     test2 = softmax(np.array([[1001,1002],[3,4]]))
    
     print(test2)
    
     ans2 = np.array([
    
     [0.26894142, 0.73105858],
    
     [0.26894142, 0.73105858]])
    
     assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)
    
  
    
     test3 = softmax(np.array([[-1001,-1002]]))
    
     print(test3)
    
     ans3 = np.array([0.73105858, 0.26894142])
    
     assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06)
    
  
    
     print("You should be able to verify these results by hand!\n")
    
  
    
  
    
 def test_softmax():
    
     """
    
     Use this space to test your softmax implementation by running:
    
     python q1_softmax.py
    
     This function will not be called by the autograder, nor will
    
     your tests be graded.
    
     """
    
     print("Running your tests...")
    
     ### YOUR CODE HERE
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
  
    
  
    
 if __name__ == "__main__":
    
     test_softmax_basic()
    
     test_softmax()

二、神经网络的基础

2.1 推导sigmoid函数的导数

θ为全连接层的输出结果,并将此值传递至softmax层;随后利用cross entropy损失函数进行计算;最终目标是计算loss关于θ的梯度。

softmax 求导 <>

<>

<>

<>

计算单隐层神经网络关于输入x的梯度(其中即求取∂J/∂x)。该网络在隐藏层使用了sigmoid激活函数,在输出层应用了softmax函数。y为one-hot编码向量。采用交叉熵损失作为损失函数。(其中σ’(x)表示sigmoid的导数,并对推导过程中的中间变量进行命名即可)。

在本处的求导其实就是将前面几个问题各自计算出的导数进行连乘,并然后再乘以变量x的一阶导数。

**2.4 你能否具体说明上面所说的这个神经网络有多少个参数?为了方便起见,在后续讨论中我们假设输入为Dx维、输出为Dy,并且其中隐藏层共有H个神经元。

(x+1)* h + (h+1)*y

在q2_sigmoid.py文件中,请编写相应的代码块以实现sigmoid激活函数,并计算其梯度。为了验证该函数的行为,请在Python环境中运行对应的脚本q2_sigmoid.py进行测试。同样地,请确保编写相应的代码块以实现sigmoid激活函数,并计算其梯度。由于测试案例可能不够全面,请仔细审查自己的代码确保其正确性

复制代码
 #!/usr/bin/env python

    
  
    
 import numpy as np
    
  
    
  
    
 def sigmoid(x):
    
     """
    
     Compute the sigmoid function for the input here.
    
   10.     Arguments:
    
     x -- A scalar or numpy array. x
    
   13.     Return:
    
     s -- sigmoid(x)
    
     """
    
  
    
     ### YOUR CODE HERE
    
     s = 1/(1+np.exp(-x))
    
     ### END YOUR CODE
    
  
    
     return s
    
  
    
  
    
 def sigmoid_grad(s):
    
     """
    
     Compute the gradient for the sigmoid function here. Note that
    
     for this implementation, the input s should be the sigmoid
    
     function value of your original input x.
    
   30.     Arguments:
    
     s -- A scalar or numpy array.
    
   33.     Return:
    
     ds -- Your computed gradient.
    
     """
    
  
    
     ### YOUR CODE HERE
    
     ds = s * (1 - s)
    
     ### END YOUR CODE
    
  
    
     return ds

为了提高可调试性,在q2_gradcheck.py文件中实现一个gradient checker,并通过运行python q2_gradcheck.py脚本对自身代码进行验证。

梯度检查器,就是求导的定义。

复制代码
 # First implement a gradient checker by filling in the following functions

    
 def gradcheck_naive(f, x):
    
     """ Gradient check for a function f.
    
   5.     Arguments:
    
     f -- a function that takes a single argument and outputs the
    
      cost and its gradients
    
     x -- the point (numpy array) to check the gradient at
    
     """
    
  
    
     rndstate = random.getstate()
    
     random.setstate(rndstate)
    
     fx, grad = f(x) # Evaluate function value at original point #fx 是平方和函数,grad 2*x函数
    
     h = 1e-4        # Do not change this!
    
  
    
     # Iterate over all indexes ix in x to check the gradient.
    
     it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    
     while not it.finished:
    
     ix = it.multi_index
    
  
    
     # Try modifying x[ix] with h defined above to compute numerical
    
     # gradients (numgrad).
    
  
    
     # Use the centered difference of the gradient.
    
     # It has smaller asymptotic error than forward / backward difference
    
     # methods. If you are curious, check out here:
    
     # https://math.stackexchange.com/questions/2326181/when-to-use-forward-or-central-difference-approximations
    
  
    
     # Make sure you call random.setstate(rndstate)
    
     # before calling f(x) each time. This will make it possible
    
     # to test cost functions with built in randomness later.
    
  
    
     ### YOUR CODE HERE:
    
     x[ix] += h
    
     random.setstate(rndstate)
    
     new_f1 = f(x)[0]
    
     x[ix] -= 2*h
    
     random.setstate(rndstate)
    
     new_f2 = f(x)[0]
    
     x[ix] += h
    
     numgrad = (new_f1 - new_f2) / (2 * h)
    
     ### END YOUR CODE
    
  
    
     # Compare gradients
    
     reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
    
     if reldiff > 1e-5:
    
         print("Gradient check failed.")
    
         print("First gradient error found at index %s" % str(ix))
    
         print("Your gradient: %f \t Numerical gradient: %f" % (
    
             grad[ix], numgrad))
    
         return
    
  
    
     it.iternext() # Step to next dimension
    
  
    
     print("Gradient check passed!")

目前,在q2_neural.py文件中,请实现一个仅含一个隐层且激活函数为sigmoid的神经网络的前馈与反向传播的代码,并通过编写q2_neural.py来验证自己的实现。

复制代码
 #!/usr/bin/env python

    
  
    
 import numpy as np
    
 import random
    
  
    
 from q1_softmax import softmax
    
 from q2_sigmoid import sigmoid, sigmoid_grad
    
 from q2_gradcheck import gradcheck_naive
    
  
    
  
    
 def forward_backward_prop(X, labels, params, dimensions):
    
     """
    
     Forward and backward propagation for a two-layer sigmoidal network
    
   15.     Compute the forward propagation and for the cross entropy cost,
    
     the backward propagation for the gradients for all parameters.
    
   18.     Notice the gradients computed here are different from the gradients in
    
     the assignment sheet: they are w.r.t. weights, not inputs.
    
   21.     Arguments:
    
     X -- M x Dx matrix, where each row is a training example x.
    
     labels -- M x Dy matrix, where each row is a one-hot vector.
    
     params -- Model parameters, these are unpacked for you.
    
     dimensions -- A tuple of input dimension, number of hidden units
    
               and output dimension
    
     """
    
  
    
     ### Unpack network parameters (do not modify)
    
     ofs = 0
    
     Dx, H, Dy = (dimensions[0], dimensions[1], dimensions[2])
    
  
    
     W1 = np.reshape(params[ofs:ofs+ Dx * H], (Dx, H))
    
     ofs += Dx * H
    
     b1 = np.reshape(params[ofs:ofs + H], (1, H))
    
     ofs += H
    
     W2 = np.reshape(params[ofs:ofs + H * Dy], (H, Dy))
    
     ofs += H * Dy
    
     b2 = np.reshape(params[ofs:ofs + Dy], (1, Dy))
    
  
    
     # Note: compute cost based on `sum` not `mean`.
    
     ### YOUR CODE HERE: forward propagation
    
     h = sigmoid(np.dot(X,W1) + b1)
    
     yhat = softmax(np.dot(h,W2) + b2)
    
     ### END YOUR CODE
    
  
    
     ### YOUR CODE HERE: backward propagation
    
     cost = np.sum(-np.log(yhat[labels==1])) / X.shape[0]
    
     d3 = (yhat - labels) / X.shape[0]
    
     gradW2 = np.dot(h.T, d3)
    
     gradb2 = np.sum(d3,0,keepdims=True)
    
     dh = np.dot(d3,W2.T)
    
     grad_h = sigmoid_grad(h) * dh
    
     gradW1 = np.dot(X.T,grad_h)
    
     gradb1 = np.sum(grad_h,0)
    
     ### END YOUR CODE
    
  
    
     ### Stack gradients (do not modify)
    
     grad = np.concatenate((gradW1.flatten(), gradb1.flatten(),
    
     gradW2.flatten(), gradb2.flatten()))
    
  
    
     return cost, grad
    
  
    
  
    
 def sanity_check():
    
     """
    
     Set up fake data and parameters for the neural network, and test using
    
     gradcheck.
    
     """
    
     print("Running sanity check...")
    
  
    
     N = 20
    
     dimensions = [10, 5, 10]
    
     data = np.random.randn(N, dimensions[0])   # each row will be a datum
    
     labels = np.zeros((N, dimensions[2]))
    
     for i in range(N):
    
     labels[i, random.randint(0,dimensions[2]-1)] = 1
    
  
    
     params = np.random.randn((dimensions[0] + 1) * dimensions[1] + (
    
     dimensions[1] + 1) * dimensions[2], )
    
  
    
     gradcheck_naive(lambda params:
    
     forward_backward_prop(data, labels, params, dimensions), params)
    
  
    
  
    
 def your_sanity_checks():
    
     """
    
     Use this space add any additional sanity checks by running:
    
     python q2_neural.py
    
     This function will not be called by the autograder, nor will
    
     your additional tests be graded.
    
     """
    
     print("Running your sanity checks...")
    
     ### YOUR CODE HERE
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
  
    
  
    
 if __name__ == "__main__":
    
     sanity_check()
    
     your_sanity_checks()

全部评论 (0)

还没有任何评论哟~