[学习笔记] Neuron Networks and Deep Learning

阅读量：

前言

文章内容是基于Micheal Nielsen在 http://neuralnetworksanddeeplearning.com 上提供教程《Neural networks and Deep learning》进行的个人学习理解。方便自己温故而知新，加深理解。
课程开头，Nielsen说明了他写书目的以及对学习者的一些建议，个人的简要理解如下：
1、《Neural networks and Deep learning》讲解了仿生神经网络的结构，数学表达，以及基于此类结构发展出来的深度学习算法。
2、学习此书，求精不求多，修炼内功（数学原理）以及基本的招式（Python数学工具），以能够融会贯通为主要目的。
鉴于Nielsen的建议，作者计划对此书进行精读之后，再进一步寻找基于Pytorch、Tensorflow等深度学习框架的实践练习。再人工智能学习的进程中，同时练习各种经典的机器学习算法，换换思路。毕竟可以预见，Nielse的这本书对我来讲，啃起来还是比较烧脑的;-P，半路出家的表示有些Hard，大家如果有学习建议，也欢迎提出交流~

使用神经网络识别手写数字

传统方式下，在程序中手写的数字识别存在困难；基于机器学习的方法在此提供了解决方案；经过 fonts 数据的训练后建立 model 进而实现 recognition。

Perceptrons

Perceptron结构如下图所示

threshold本质上是偏差项（bias项），它代表一种基准值，在评估经过权重与特征相乘后的输出结果时起到关键作用。这样的详细解释帮助我更好地理解了偏差项的作用机制。

Sigmoid Perceptrons

若希望调整权重能够精确控制输出的变化，则上文所述Perceptrons方法不具备这种调节能力。由于在bias位处会出现突变现象，在调整过程中可能造成系统性能的重大波动。例如，在训练识别数字9的过程中（例如当神经网络从识别8过渡到识别9时），各神经元（即neuron）的状态值会在0和1之间发生突变（即跳变），从而可能导致其他数字辨识效果出现显著波动。因此一种柔和化的调节思路是必要的（即Sigmoid在此处提供了解决方案）。

复制代码

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Sigmoid数据
    x = np.arange(-10,10,0.1)
    y_s= 1 / (1 + np.exp(-x))
    # StepFunction数据
    y_b = []
    for i in x:
    if i < 0:
        num = 0
        y_b.append(num)
    else:
        num = 1
        y_b.append(num)
    # Plot the results
    plt.figure()
    plt.plot(x, y_s, c="r", label="Sigmoid", linewidth=2, ls='-')
    plt.plot(x, y_b, c="g", label="Step Function", linewidth=2, ls='--')
    plt.xlabel("X")
    plt.ylabel("Y")
    plt.title("Sigmoid and Bias perceptron")
    plt.grid()
    plt.legend()
    plt.show()

神经网络结构

该神经网络架构如图所示。其由输入层、隐藏层以及输出层构成。其中输入层和输出层结构相对简单，而隐藏层则千变万化。

为了演示识别手写数字9的过程，则包含一个由 $64 \times 64 = 4096$ 个灰度值像素组成的图像。因此输入层是一个 $4096 \times 1$ 的一维向量，并且输出是一个介于 $[0,1]$ 之间的数值。当输出大于等于或超过 $0.5$ 时，则判定结果是"九"；数值越趋近于 $1$ 时，则判定结果的可能性越大。
这一过程主要关注的是最简单的神经网络架构。

一种手写数字识别的神经网络

该图像已被完成分割完毕，并且是一个尺寸为28×28的灰度图像。生成一个大小为10×1的向量，并且其中包含了从零到九共十个可能的结果。

梯度下降法（Gradient Descent）

Cost Funtion如下：

确定参数组 $\omega$ 与 $b$ ，从而使损失函数 $C(\omega, b)$ 趋近于零，在这一过程中我们假设有 $y(x) = \omega^\top x + b$ 。
1、梯度下降法的物理理解：
假设仅考虑参数组中的一个变量 $\omega$ 及其对应的特征向量 $x$ 。损失函数 $C(\omega, b)$ 在三维坐标系中呈现的状态可以通过以下方式理解：当 $\omega$ 逐渐减小时，在几何空间中表现为从一个较高的位置逐步降至较低的位置，并最终达到接近零的状态。

$\omega$ 代表的小球的位置持续进行迭代计算直至趋近于全局最小值的位置，在此过程中小球运动的方向与该位置对应的函数值梯度相关。
其最小化过程可由以下数学表达式描述：

\omega = \omega - \eta \cdot \nabla_{\omega} C(\omega, b)

定义：

可以看出，在算法中将参数ω（图中对应变量v）设定为其沿着与梯度相反的方向进行更新时，则能确保Cost Function持续下降直至参数收敛稳定的状态。这一过程基于数学模型构建的过程模拟了类似于"小球下山"这一具象化的物理现象。

随机梯度下降

参数 $(\omega,b)$ 的迭代过程如下：

其中，

当数据规模较大时（即n极大），运算量会变得非常巨大。这促使研究人员转向采用随机梯度下降算法作为优化方案。具体而言，在实现过程中（即通过合理选择m值），能够显著提升算法效率。

3、个人对此形象的理解就是：
Cost Function是关于 $(\omega,b)$ 的函数，训练数据及输出 $(x,a)$ 则是Cost Function的系数，系数的数量 $n$ 越大越全面，Cost Function对于系统误差的描述就越准确，但是同时也会增加计算的难度，因此需要找一组数量不多不少 $m$ 来描述这个系统的误差，以此兼顾系统误差描述的准确性及计算的高效性。
4、以一个实际的例子打比方：
政府做民意调查，如果每次调整都做全样本的调查则会特别准确，但是耗时耗力，如果每次调整采用抽样调查的形式，则有可能实现又快有准。
随机梯度下降的极端形式为每次迭代只选用一个样本点进行估计，这种情况则有可能导致系统无法收敛。

用代码实现上述方法

代码主体如下 ：

复制代码

    import numpy as np
    import matplotlib.pyplot as plt
    import random
    import mnist_loader
    
    #### Miscellaneous functions
    def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))
    
    def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))
    
    # 定义一个神经网络的类
    class Network(object):
    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.size = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for (x, y) in zip(sizes[:-1], sizes[1:])]
    
    def feedforward(self, a):
        '''
        返回输入a在神经网络中的输出
        '''
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a) + b)
        return a
    
    def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        if test_data:
            n_test = len(test_data)
        n = len(training_data)
        for j in range(epochs):
            # 每次学习（epoch）前，先random一下training_data
            random.shuffle(training_data)
            # 对training_data按mini_batch_size进行切片，形成一组mini_batchs用于梯度下降的模型评估
            mini_batchs = []
            for k in range(0, n, mini_batch_size):
                mini_batchs.append(training_data[k:k + mini_batch_size])
            # 参数迭代
            for mini_batch in mini_batchs:
                self.updata_mini_batch(mini_batch, eta)
            # 结果输出
            if test_data:
                print('Epoch{0}: {1} / {2}'.format(j, self.evaluate(test_data), n_test))
            else:
                print('Epoch{0} complete'.format(j))
    
    def updata_mini_batch(self, mini_batch, eta):
        nabala_b = []
        nabala_w = []
        for b in self.biases:
            nabala_b.append(np.zeros(b.shape))
        for w in self.weights:
            nabala_w.append(np.zeros(w.shape))
        for x, y in mini_batch:
            delta_nabala_b, delta_nabala_w = self.backprop(x, y)
            for i in range(len(nabala_b)):
                nabala_b[i] = nabala_b[i] + delta_nabala_b[i]
            for i in range(len(nabala_w)):
                nabala_w[i] = nabala_w[i] + delta_nabala_w[i]
        for i in range(len(self.biases)):
            self.biases[i] = self.biases[i] - (eta / len(mini_batch)) * nabala_b[i]
        for i in range(len(self.weights)):
            self.weights[i] = self.weights[i] - (eta / len(mini_batch)) * nabala_w[i]
    
    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)
    
    def evaluate(self, test_data):
        """Return the number of test inputs for which the neural
        network outputs the correct result. Note that the neural
        network's output is assumed to be the index of whichever
        neuron in the final layer has the highest activation."""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)
    
    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations - y)

*备注：在学习Python的过程中发现zip()函数难以理解，在此过程中我尝试用for循环进行了一次替代操作，并记录了相关测试结果。

复制代码

    a = [np.zeros(5),np.zeros(7)]
    b = [np.array([1,2,3,4,5]),np.array([1,1,1,1,1,1,1])]
    # 使用zip
    c = [na+nb for na,nb in zip(a,b)]
    print(c)
    # 使用for替代，比较输出结果
    for i in range(len(a)):
    a[i] = a[i] + b[i]
    print(a)

输出结果如下，两者结果一致

复制代码

    [array([1., 2., 3., 4., 5.]), array([1., 1., 1., 1., 1., 1., 1.])]
    [array([1., 2., 3., 4., 5.]), array([1., 1., 1., 1., 1., 1., 1.])]

运行程序 ：

复制代码

    import mnist_loader
    import network
    
    # 提取数据
    training_data, validation_data, test_data =  mnist_loader.load_data_wrapper()
    # 把Zip格式的数据
    training_data = list(training_data)
    validation_data = list(validation_data)
    test_data = list(test_data)
    
    # 初始化神经网络
    net = network.Network([784, 30, 10])
    
    # 梯度下降算法进行参数迭代和模型评估
    net.SGD(training_data, 20, 1000, 3.0, test_data=test_data)

注释：数据读取及处理过程具体实现细节参考Nielsen提供的mnist_loader.py模块，在当前讨论范围内不在作为学习重点。

运行结果

草稿杂记，用于时间久了忘记了，找回思路：

全部评论 (0)

还没有任何评论哟~

[学习笔记] Neuron Networks and Deep Learning

前言文章内容是基于MichealNielsen在http://neuralnetworksanddeeplearning.com上提供教程《NeuralnetworksandDeeplearning...

Neural networks and deep learning 学习笔记

原文翻译我的git 文章目录 About Chapter1:Usingneuralnetstorecognizehandwrittendigits\第一章：使用神经网络识别手写数字 Percept...

Neural Networks and Deep Learning 学习笔记(二)

1\.第一章习题解答。 Thereisawayofdeterminingthebitwiserepresentationofadigitbyaddinganextralayertothethreela...

Neural Networks and Deep Learning学习笔记ch2 - 反向传播

在上一节中简单介绍了神经网络的结够和sigmoidneuro，以及神经网络的目标函数和学习方法。但是没有讲如何调整权重，通常是需要对目标函数求导，也就是说回避了对目标函数求导这个问题。这一节就将讲述这...

Neural Networks and Deep Learning学习笔记ch1 - 神经网络

最近开始看一些深度学习的资料，想学习一下深度学习的基础知识。找到了一个比较好的tutorial，NeuralNetworksandDeepLearning，认真看完了之后觉得收获还是很多的。

读书笔记--Neural Networks and Deep Learning（CH1）

第一章：使用神经网络识别手写数字 1.1感知器 20世纪五、六十年代由科学家FrankRosenblatt发明，感知器是一种“人工神经元”。上图为一个简单的感知器输入：x1x2x3 权重（表⽰相应...

【学习笔记】Neural networks and deep learning-神经网络和深度学习

本文是OnlineBook《Neuralnetworksanddeeplearning神经网络和深度学习》的学习笔记，主要记录概括章节内容，可以说是一个大纲，供复习之用，详细内容见链接原文。

Relational inductive biases, deep learning, and graph networks阅读笔记

论文标题：Relationalinductivebiases,deeplearning,andgraphnetworks 论文地址:<https://arxiv.org/pdf/1806.01261....

Neural networks and deep learning阅读笔记（3）神经网络学习方式

这一章介绍了一些搭建网络的方式和技巧，可以帮助我们的网络更好的学习，包括：一种更好的损失函数叫crossentropy交叉熵损失函数；四种“正则化”方法（L1和L2正则化、dropout、训练数据的a...

《Neural Network and Deep Learning》学习笔记-hyper-parameters

本系列笔记为《NeuralNetworkandDeepLearning》学习笔记本系列笔记汇总各种待续中…… 第一章第二章第三章Improvingthewayneuralnetworkslea...

是否确定退出登录?

[学习笔记] Neuron Networks and Deep Learning

前言

使用神经网络识别手写数字

Perceptrons

Sigmoid Perceptrons

神经网络结构

一种手写数字识别的神经网络

梯度下降法（Gradient Descent）

随机梯度下降

用代码实现上述方法

全部评论 (0)

相关文章推荐

[学习笔记] Neuron Networks and Deep Learning

Neural networks and deep learning 学习笔记

Neural Networks and Deep Learning 学习笔记(二)

Neural Networks and Deep Learning学习笔记ch2 - 反向传播

Neural Networks and Deep Learning学习笔记ch1 - 神经网络

读书笔记--Neural Networks and Deep Learning（CH1）

【学习笔记】Neural networks and deep learning-神经网络和深度学习

Relational inductive biases, deep learning, and graph networks阅读笔记

Neural networks and deep learning阅读笔记（3）神经网络学习方式

《Neural Network and Deep Learning》学习笔记-hyper-parameters