斯坦福CS224n-assignment1

阅读量：

总结：

1、softmax 函数公式，作用在输出层，把预测值转换成输出值的概率分布。

2、在应用softmax函数时如果在其基础上加上一个常数项则不会影响最终的结果这一特性可以用于解决指数运算过大导致的数值溢出问题具体来说在计算过程中我们会将每个元素减去对应行的最大值即 $X - \max(X)$ 其中 $X$ 是一个矩阵而 $\max(X)$ 表示该矩阵每一行的最大值

3、sigmod函数，激活函数，把线性变成非线性。

4、损失函数

5、从损失函数到x输入的求导。

一、softmax

1.1 验证SoftMax对任意输入向量x及常数c保持不变性（如式（1）所示）。操作过程表明将常数值c叠加至向量x的每个维度上（如式（1）所示）。实践应用中可采用此特性进行处理。

$c = -\max(xi, axis=1)$ （相当于对每一行取最大值），即对每一个 $xi$ 进行扣除该行最大值的操作以避免 $e^{xi}$ 计算时的溢出问题，并且这种方法相比直接计算原始结果具有更好的稳定性。

证明：softmax(x) = softmax(x + c)

1.2 python实现sotfmax函数，要求既能处理向量，也能处理矩阵

学习到的知识点：

复制代码

 1.2.1 x_max = np.max(x, axis=1, keepdims=True) keepdims 运算后的结构保持矩阵结构

    
 1.2.2 x = np.exp(x - x_max) / np.sum(np.exp(x - x_max), axis=1) #axis = 1, 每一行一个样本，每一个样本的概率归一化
    
 1.2.3 np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) #allclose方法，比较两个array是不是每一元素都相等，默认在1e-05的误差范围内

复制代码

 import numpy as np

    
  
    
  
    
 def softmax(x):
    
     """Compute the softmax function for each row of the input x. #每行是一个样本，每一列是各类的评分，归一化以每一行为对象。
    
   7.     It is crucial that this function is optimized for speed because
    
     it will be used frequently in later code. You might find numpy
    
     functions np.exp, np.sum, np.reshape, np.max, and numpy
    
     broadcasting useful for this task.
    
   12.     Numpy broadcasting documentation:
    
     http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
    
   15.     You should also make sure that your code works for a single
    
     D-dimensional vector (treat the vector as a single row) and
    
     for N x D matrices. This may be useful for testing later. Also,
    
     make sure that the dimensions of the output match the input.
    
   20.     You must implement the optimization in problem 1(a) of the
    
     written assignment!
    
   23.     Arguments:
    
     x -- A D dimensional vector or N x D dimensional numpy matrix.
    
   26.     Return:
    
     x -- You are allowed to modify x in-place
    
     """
    
     orig_shape = x.shape
    
  
    
     if len(x.shape) > 1:
    
     # Matrix
    
     ### YOUR CODE HERE
    
     x_max = np.max(x, axis=1, keepdims=True) #取每行的最大值, keepdims保持二维的特性，这样在下一步相减情况下，不会出错
    
     x = np.exp(x - x_max) / np.sum(np.exp(x - x_max), axis=1, keepdims=True) #axis = 1, 每一行一个样本，每一个样本的概率归一化
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
     else:
    
     # Vector        列向量，只有一个样本。
    
     ### YOUR CODE HERE
    
     x_max = np.max(x)
    
     x = np.exp(x - x_max) / np.sum(np.exp(x - x_max))
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
  
    
     assert x.shape == orig_shape
    
     return x
    
  
    
 # from q1_softmax import softmax
    
 # return 20 if softmax(测试值) == 正确值 else 0
    
  
    
  
    
 def test_softmax_basic():
    
     """
    
     Some simple tests to get you started.
    
     Warning: these are not exhaustive.
    
     """
    
     print("Running basic tests...")
    
     test1 = softmax(np.array([1,2]))
    
     print(test1)
    
     ans1 = np.array([0.26894142,  0.73105858])
    
     assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) #allclose方法，比较两个array是不是每一元素都相等，默认在1e-05的误差范围内
    
  
    
     test2 = softmax(np.array([[1001,1002],[3,4]]))
    
     print(test2)
    
     ans2 = np.array([
    
     [0.26894142, 0.73105858],
    
     [0.26894142, 0.73105858]])
    
     assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)
    
  
    
     test3 = softmax(np.array([[-1001,-1002]]))
    
     print(test3)
    
     ans3 = np.array([0.73105858, 0.26894142])
    
     assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06)
    
  
    
     print("You should be able to verify these results by hand!\n")
    
  
    
  
    
 def test_softmax():
    
     """
    
     Use this space to test your softmax implementation by running:
    
     python q1_softmax.py
    
     This function will not be called by the autograder, nor will
    
     your tests be graded.
    
     """
    
     print("Running your tests...")
    
     ### YOUR CODE HERE
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
  
    
  
    
 if __name__ == "__main__":
    
     test_softmax_basic()
    
     test_softmax()

二、神经网络的基础

2.1 推导sigmoid函数的导数

θ为全连接层的输出结果，并将此值传递至softmax层；随后利用cross entropy损失函数进行计算；最终目标是计算loss关于θ的梯度。

softmax 求导 <>

计算单隐层神经网络关于输入x的梯度（其中即求取∂J/∂x）。该网络在隐藏层使用了sigmoid激活函数，在输出层应用了softmax函数。y为one-hot编码向量。采用交叉熵损失作为损失函数。（其中σ’(x)表示sigmoid的导数，并对推导过程中的中间变量进行命名即可）。

在本处的求导其实就是将前面几个问题各自计算出的导数进行连乘，并然后再乘以变量x的一阶导数。

**2.4 你能否具体说明上面所说的这个神经网络有多少个参数？为了方便起见，在后续讨论中我们假设输入为Dx维、输出为Dy，并且其中隐藏层共有H个神经元。

（x+1）* h + （h+1）*y

在q2_sigmoid.py文件中，请编写相应的代码块以实现sigmoid激活函数，并计算其梯度。为了验证该函数的行为，请在Python环境中运行对应的脚本q2_sigmoid.py进行测试。同样地，请确保编写相应的代码块以实现sigmoid激活函数，并计算其梯度。由于测试案例可能不够全面，请仔细审查自己的代码确保其正确性

复制代码

 #!/usr/bin/env python

    
  
    
 import numpy as np
    
  
    
  
    
 def sigmoid(x):
    
     """
    
     Compute the sigmoid function for the input here.
    
   10.     Arguments:
    
     x -- A scalar or numpy array. x
    
   13.     Return:
    
     s -- sigmoid(x)
    
     """
    
  
    
     ### YOUR CODE HERE
    
     s = 1/(1+np.exp(-x))
    
     ### END YOUR CODE
    
  
    
     return s
    
  
    
  
    
 def sigmoid_grad(s):
    
     """
    
     Compute the gradient for the sigmoid function here. Note that
    
     for this implementation, the input s should be the sigmoid
    
     function value of your original input x.
    
   30.     Arguments:
    
     s -- A scalar or numpy array.
    
   33.     Return:
    
     ds -- Your computed gradient.
    
     """
    
  
    
     ### YOUR CODE HERE
    
     ds = s * (1 - s)
    
     ### END YOUR CODE
    
  
    
     return ds

为了提高可调试性，在q2_gradcheck.py文件中实现一个gradient checker，并通过运行python q2_gradcheck.py脚本对自身代码进行验证。

梯度检查器，就是求导的定义。

复制代码

 # First implement a gradient checker by filling in the following functions

    
 def gradcheck_naive(f, x):
    
     """ Gradient check for a function f.
    
   5.     Arguments:
    
     f -- a function that takes a single argument and outputs the
    
      cost and its gradients
    
     x -- the point (numpy array) to check the gradient at
    
     """
    
  
    
     rndstate = random.getstate()
    
     random.setstate(rndstate)
    
     fx, grad = f(x) # Evaluate function value at original point #fx 是平方和函数，grad 2*x函数
    
     h = 1e-4        # Do not change this!
    
  
    
     # Iterate over all indexes ix in x to check the gradient.
    
     it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    
     while not it.finished:
    
     ix = it.multi_index
    
  
    
     # Try modifying x[ix] with h defined above to compute numerical
    
     # gradients (numgrad).
    
  
    
     # Use the centered difference of the gradient.
    
     # It has smaller asymptotic error than forward / backward difference
    
     # methods. If you are curious, check out here:
    
     # https://math.stackexchange.com/questions/2326181/when-to-use-forward-or-central-difference-approximations
    
  
    
     # Make sure you call random.setstate(rndstate)
    
     # before calling f(x) each time. This will make it possible
    
     # to test cost functions with built in randomness later.
    
  
    
     ### YOUR CODE HERE:
    
     x[ix] += h
    
     random.setstate(rndstate)
    
     new_f1 = f(x)[0]
    
     x[ix] -= 2*h
    
     random.setstate(rndstate)
    
     new_f2 = f(x)[0]
    
     x[ix] += h
    
     numgrad = (new_f1 - new_f2) / (2 * h)
    
     ### END YOUR CODE
    
  
    
     # Compare gradients
    
     reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
    
     if reldiff > 1e-5:
    
         print("Gradient check failed.")
    
         print("First gradient error found at index %s" % str(ix))
    
         print("Your gradient: %f \t Numerical gradient: %f" % (
    
             grad[ix], numgrad))
    
         return
    
  
    
     it.iternext() # Step to next dimension
    
  
    
     print("Gradient check passed!")

目前，在q2_neural.py文件中，请实现一个仅含一个隐层且激活函数为sigmoid的神经网络的前馈与反向传播的代码，并通过编写q2_neural.py来验证自己的实现。

复制代码

 #!/usr/bin/env python

    
  
    
 import numpy as np
    
 import random
    
  
    
 from q1_softmax import softmax
    
 from q2_sigmoid import sigmoid, sigmoid_grad
    
 from q2_gradcheck import gradcheck_naive
    
  
    
  
    
 def forward_backward_prop(X, labels, params, dimensions):
    
     """
    
     Forward and backward propagation for a two-layer sigmoidal network
    
   15.     Compute the forward propagation and for the cross entropy cost,
    
     the backward propagation for the gradients for all parameters.
    
   18.     Notice the gradients computed here are different from the gradients in
    
     the assignment sheet: they are w.r.t. weights, not inputs.
    
   21.     Arguments:
    
     X -- M x Dx matrix, where each row is a training example x.
    
     labels -- M x Dy matrix, where each row is a one-hot vector.
    
     params -- Model parameters, these are unpacked for you.
    
     dimensions -- A tuple of input dimension, number of hidden units
    
               and output dimension
    
     """
    
  
    
     ### Unpack network parameters (do not modify)
    
     ofs = 0
    
     Dx, H, Dy = (dimensions[0], dimensions[1], dimensions[2])
    
  
    
     W1 = np.reshape(params[ofs:ofs+ Dx * H], (Dx, H))
    
     ofs += Dx * H
    
     b1 = np.reshape(params[ofs:ofs + H], (1, H))
    
     ofs += H
    
     W2 = np.reshape(params[ofs:ofs + H * Dy], (H, Dy))
    
     ofs += H * Dy
    
     b2 = np.reshape(params[ofs:ofs + Dy], (1, Dy))
    
  
    
     # Note: compute cost based on `sum` not `mean`.
    
     ### YOUR CODE HERE: forward propagation
    
     h = sigmoid(np.dot(X,W1) + b1)
    
     yhat = softmax(np.dot(h,W2) + b2)
    
     ### END YOUR CODE
    
  
    
     ### YOUR CODE HERE: backward propagation
    
     cost = np.sum(-np.log(yhat[labels==1])) / X.shape[0]
    
     d3 = (yhat - labels) / X.shape[0]
    
     gradW2 = np.dot(h.T, d3)
    
     gradb2 = np.sum(d3,0,keepdims=True)
    
     dh = np.dot(d3,W2.T)
    
     grad_h = sigmoid_grad(h) * dh
    
     gradW1 = np.dot(X.T,grad_h)
    
     gradb1 = np.sum(grad_h,0)
    
     ### END YOUR CODE
    
  
    
     ### Stack gradients (do not modify)
    
     grad = np.concatenate((gradW1.flatten(), gradb1.flatten(),
    
     gradW2.flatten(), gradb2.flatten()))
    
  
    
     return cost, grad
    
  
    
  
    
 def sanity_check():
    
     """
    
     Set up fake data and parameters for the neural network, and test using
    
     gradcheck.
    
     """
    
     print("Running sanity check...")
    
  
    
     N = 20
    
     dimensions = [10, 5, 10]
    
     data = np.random.randn(N, dimensions[0])   # each row will be a datum
    
     labels = np.zeros((N, dimensions[2]))
    
     for i in range(N):
    
     labels[i, random.randint(0,dimensions[2]-1)] = 1
    
  
    
     params = np.random.randn((dimensions[0] + 1) * dimensions[1] + (
    
     dimensions[1] + 1) * dimensions[2], )
    
  
    
     gradcheck_naive(lambda params:
    
     forward_backward_prop(data, labels, params, dimensions), params)
    
  
    
  
    
 def your_sanity_checks():
    
     """
    
     Use this space add any additional sanity checks by running:
    
     python q2_neural.py
    
     This function will not be called by the autograder, nor will
    
     your additional tests be graded.
    
     """
    
     print("Running your sanity checks...")
    
     ### YOUR CODE HERE
    
     # raise NotImplementedError
    
     ### END YOUR CODE
    
  
    
  
    
 if __name__ == "__main__":
    
     sanity_check()
    
     your_sanity_checks()

全部评论 (0)

还没有任何评论哟~

斯坦福CS224n-assignment1

总结： 1、softmax函数公式，作用在输出层，把预测值转换成输出值的概率分布。 2、softmax函数加一个常量，结果不变，这个性质可以用在指数太大防止数值溢出，将xmaxx,其中x是矩阵，max...

斯坦福CS224n-lecture02-词向量

学习内容： 1、单词意思 2、word2vec简介 3、word2vec目标函数 4、优化目标函数 wordnet词汇人们很难从同义词词典中获取跟多的价值，虽然有很多资源但是存在很多细微的差别。比...

斯坦福CS224n-lecture06- 依存分析

句法分析是自然语言处理中的关键底层技术之一，其基本任务是确定句子的句法结构或者句子中词汇之间的依存关系。句法分析分为句法结构分析（syntacticstructureparsing）和依存关系分析d...

斯坦福cs231n课程记录——assignment1 SVM

目录 SVM原理某些API解释 SVM实现作业问题记录 SVM优化 SVM运用参考文献一、SVM原理线性SVM分类是给每一个样本一个分数，其正确的分数应该比错误的分数大。

斯坦福cs231n课程记录——assignment1 KNN

目录 KNN原理某些API解释 KNN实现作业问题记录行业运用算法改进参考文献一、KNN原理 KNN是一种投票机制，依赖少数服从多数的原则，根据最近样本的标签进行分类的方法，属于局部近似。

斯坦福cs231n课程记录——assignment1 Two-layer neural network

目录 twolayerneuralnetwork原理某些API解释 twolayerneuralnetwork实现作业问题记录 twolayerneuralnetwork优化 twolayerne...

斯坦福CS231n assignment1：SVM图像分类原理及实现

斯坦福CS231nassignment1：SVM图像分类原理及实现 SVM模型原理 SVM的一种直观解释损失函数损失函数加入正则化项梯度下降和梯度检验图像预处理小批量数据梯度下降（Minib...

【课程笔记】Lecture2-斯坦福自然语言处理cs224n

Lecture2Stanfordcs224dSimpleWordVectorRepresentations:word2vec,GloVe 文章目录 Lecture2Stanfordcs224dSimp...

CS224n 斯坦福深度自然语言处理课笔记 Lecture02

前言从词义说起什么是WordNet？ WordNet的缺点存在的问题解决方法：分布相似性 Word2vec 词嵌入模型主要内容基础算法之一：SkipgramsSG 前言本课程由Chris...

斯坦福cs231n——assignment1_KNN

一、KNN原理全称K最近邻算法，找到最近的k个邻居，在选取到的k个样本中选取出最近的且占比最高的类别作为预测类别，因此可以看做一种投票机制，依赖少数服从多数的原则，属于局部近似。

是否确定退出登录?

斯坦福CS224n-assignment1

全部评论 (0)

相关文章推荐

斯坦福CS224n-assignment1

斯坦福CS224n-lecture02-词向量

斯坦福CS224n-lecture06- 依存分析

斯坦福cs231n课程记录——assignment1 SVM

斯坦福cs231n课程记录——assignment1 KNN

斯坦福cs231n课程记录——assignment1 Two-layer neural network

斯坦福CS231n assignment1：SVM图像分类原理及实现

【课程笔记】Lecture2-斯坦福自然语言处理cs224n

CS224n 斯坦福深度自然语言处理课 笔记 Lecture02

斯坦福cs231n——assignment1_KNN

CS224n 斯坦福深度自然语言处理课笔记 Lecture02