Advertisement

学习笔记《Grokking Deep Learning》

阅读量:

文章目录

  • 第三章 神经网络预测导论:前向传播

    • 一个简单的神经网络
    • 使用多个输入进行预测
    • 单输入,多输出
    • 多输入,多输出
    • 用预测结果进一步预测
  • 第四章 梯度下降

    • 1.冷热法
    • 2.基于误差调节权重
    • 3.破坏梯度下降(当输入值过大)
  • 第五章 通用梯度下降:一次学习多个权重

    • 1.多输入,单输出,一次学习多个权重的梯度下降
    • 2.单项权重冻结
    • 3.多个输出的梯度下降学习
  • 第六章 建立你的第一个深度神经网络:反向传播

    • 6.6学习整个数据集
    • 6.7随机梯度下降,梯度下降,批量梯度下降
    • 6.22你的第一个深度学习神经网络:反向传播代码
  • 第八章 学习信号,忽略噪声:正则化和批处理

    • 1.从头训练一个神经网络
    • 2.正则化
    • 3. 批量梯度下降
  • 第九章 概率和非线性建模:激活函数

    • 1.什么是激活函数
    • 2.标准隐藏层激活函数
    • 3. 升级MNIST网络
  • 第十章 卷积神经网络概论:关于边与角的神经学习

    • 1.基于NumPy的简单实现

第三章 神经网络预测导论:前向传播

一个简单的神经网络

复制代码
    #网络代码
    weight = 0.1
    def neural_network(inpiut, weight):
    	prediction = input * weight
    	return prediction
    #如何用训练好的神经网络进行预测
    number_of_toes = [8.5, 9.5, 10, 9]
    
    input = number_of_toes[0]
    
    pred = neural_network(input, weight)
    print(pred)

使用多个输入进行预测

多输入,单输出

复制代码
    import numpy as np
    
    weights = np.array([0.1, 0.2, 0])
    
    def neural_network(input, weights):
    pred = input.dot(weights)
    return pred
    
    toes =  np.array([8.5, 9.5, 9.9, 10])
    wlrec =  np.array([0.65, 0.8, 0.8, 0.9])
    nfans =  np.array([1.2, 1.3, 0.5, 1.0])
    input = np.array([toes[0], wlrec[0], nfans[0]])
    
    perd = neural_network(input, weights)
    print(perd)

单输入,多输出

复制代码
    # 单输入多输出
    # 网络
    def ele_mul(number, vector):
    output = [0, 0, 0]
    assert(len(output) == len(vector))
    for i in range(len(output)):
        output[i] = number * vector[i]
    return output
    
    def neural_network(input, weights):
    pred = ele_mul(input, weights)
    return pred
    
    # 数据
    wlrec = [0.65, 0.8, 0.8, 0.9]
    weights = [0.3, 0.2, 0.9]
    input = wlrec[0]
    pred = neural_network(input, weights)
    print(pred)

多输入,多输出

复制代码
    def w_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output
    
    def vect_mat_mul(vect, matrix):
    assert(len(vect) == len(matrix))
    output = [0, 0, 0]
    for i in range(len(vect)):
        output[i] = w_sum(vect, matrix[i]) 
    return output
    
    def neural_network(input, weights):
    pred = vect_mat_mul(input, weights)
    return pred
    
    weights = [[0.1, 0.1, -0.3],
            [0.1, 0.2, 0.0],
            [0.0, 1.3, 0.1]]
    
    toes = [8.5, 9.5, 9.9, 10]
    wlrec = [0.65, 0.8, 0.8, 0.9]
    nfans = [1.2, 1.3, 0.5, 1.0]
    
    input = [toes[0], wlrec[0], nfans[0]]
    
    pred = neural_network(input, weights)
    print(pred)

用预测结果进一步预测

神经网络可以堆叠

复制代码
    import numpy as np
    
    ih_wgh = np.array([[0.1, 0.1, -0.1],
                    [-0.1, 0.1, 0.9],
                    [0.1, 0.4, 0.1]])
    
    hp_wgh = np.array([[0.3, 1.1, -0.3],
                    [0.1, 0.2, 0.0],
                    [0.0, 1.3, 0.1]])
    
    weights = [ih_wgh, hp_wgh]
    
    def neural_network(input, weights):
    hid = input.dot(weights[0])
    pred = hid.dot(weights[1])
    return pred
    
    toes =  np.array([8.5, 9.5, 9.9, 10])
    wlrec =  np.array([0.65, 0.8, 0.8, 0.9])
    nfans =  np.array([1.2, 1.3, 0.5, 1.0])
    
    input = np.array([toes[0], wlrec[0], nfans[0]])
    
    pred = neural_network(input, weights)
    print(pred)

第四章 梯度下降

1.冷热法

优点:简单明了
缺点:1.效率低下, 2、有时准确预测出目标是不可能的

复制代码
    # 冷热法
    
    weight = 0.5
    input = 0.5
    goal_prediction = 0.8
    #学习率?
    step_amount = 0.001 
    
    for _ in range(1101):
    prediction = input * weight
    error = (prediction - goal_prediction) *
    
    print("Error:" + str(error) + " Prediction:" + str(prediction))
    
    up_prediction = input * (weight + step_amount)
    up_error = (up_prediction - goal_prediction) *
    
    down_prediction = input * (weight - step_amount)
    down_error = (down_prediction - goal_prediction) *
    
    if(down_error < error):
        weight -= step_amount
    if(up_error < error):
        weight += step_amount
复制代码
    Error:4.000000000130569e-06 Prediction:0.7979999999999674
    Error:2.2500000000980924e-06 Prediction:0.7984999999999673
    Error:1.000000000065505e-06 Prediction:0.7989999999999673
    Error:2.5000000003280753e-07 Prediction:0.7994999999999672
    Error:1.0799505792475652e-27 Prediction:0.7999999999999672

2.基于误差调节权重

复制代码
    weight = 0.5
    goal_pred = 0.8
    input = 0.5
    
    for _ in range(20):
    pred = input * weight
    error = (pred - goal_pred) *
    # 有正负,预测比实际大,为 + ;预测比实际小, 为 -
    direction_and_amount = (pred - goal_pred) * input
    weight -= direction_and_amount
    print("Error:" + str(error) + " Prediction:" + str(pred))
复制代码
    Error:0.000303525861459885 Prediction:0.7825780063867569
    Error:0.00017073329707118678 Prediction:0.7869335047900676
    Error:9.603747960254256e-05 Prediction:0.7902001285925507
    Error:5.402108227642978e-05 Prediction:0.7926500964444131
    Error:3.038685878049206e-05 Prediction:0.7944875723333098
    Error:1.7092608064027242e-05 Prediction:0.7958656792499823
    Error:9.614592036015323e-06 Prediction:0.7968992594374867
    Error:5.408208020258491e-06 Prediction:0.7976744445781151

3.破坏梯度下降(当输入值过大)

把input改为2

复制代码
    weight = 0.5
    goal_pred = 0.8
    input = 2
    
    for _ in range(20):
    pred = input * weight
    error = (pred - goal_pred) *
    # 有正负,预测比实际大,为 + ;预测比实际小, 为 -
    direction_and_amount = (pred - goal_pred) * input
    weight -= direction_and_amount
    print("Error:" + str(error) + " Prediction:" + str(pred))
复制代码
    Error:101674633133.15994 Prediction:-318863.79999999993
    Error:915071698198.4395 Prediction:956594.5999999997
    Error:8235645283785.954 Prediction:-2869780.599999999
    Error:74120807554073.56 Prediction:8609344.999999996
    Error:667087267986662.1 Prediction:-25828031.799999986
    Error:6003785411879960.0 Prediction:77484098.59999996
    Error:5.403406870691965e+16 Prediction:-232452292.5999999

得到的结果没有意义
对权重的每次更新都会造成过度的修正

引入alpha 学习率?

alpha 过大: 权重更新步幅大, 容易在最低点跳过去, 不易取到最小值
alpha 过小: 权重更新步幅小,每次更新变化小,需要大量迭代计算,效率低下

alpha的取值没有明文规定, 一般取数量级: 10, 1, 0.1, 0.01 …

复制代码
    #引入 alpha
    weight = 0.5
    goal_pred = 0.8
    input = 2
    alpha = 0.1
    
    for _ in range(20):
    pred = input * weight
    error = (pred - goal_pred) *
    direction_and_amount = (pred - goal_pred) * input
    weight -= direction_and_amount * alpha
    print("Error:" + str(error) + " Prediction:" + str(pred))
复制代码
    Error:1.1284439629823931e-05 Prediction:0.803359232
    Error:4.062398266736526e-06 Prediction:0.8020155392
    Error:1.4624633760252567e-06 Prediction:0.8012093235200001
    Error:5.264868153690924e-07 Prediction:0.8007255941120001
    Error:1.8953525353291194e-07 Prediction:0.8004353564672001
    Error:6.82326912718715e-08 Prediction:0.8002612138803201
    Error:2.456376885786678e-08 Prediction:0.8001567283281921
    Error:8.842956788836216e-09 Prediction:0.8000940369969153
    Error:3.1834644439835434e-09 Prediction:0.8000564221981492
    Error:1.1460471998340758e-09 Prediction:0.8000338533188895
    Error:4.125769919393652e-10 Prediction:0.8000203119913337
    Error:1.485277170987127e-10 Prediction:0.8000121871948003

第五章 通用梯度下降:一次学习多个权重

1.多输入,单输出,一次学习多个权重的梯度下降

delta

delta = pred - true

与误差不同,考虑到误差为正,且专注大的误差,放弃小的误差

error = (pred - true) *

delta 用于衡量你所希望的当前节点变化,以便完美地预测结果

weight_delta

weights_deltas = ele_mul(delta, input)

代表基于导数的对权重移动方向和数量的估计,目标是降低 detal

复制代码
    # 多输入,单输出梯度下降
    
    def neural_network(input, weights):
    out = 0
    for i in range(len(input)):
        out += (input[i] * weights[i])
    return out 
    
    def ele_mul(scalar, vector):
    out = [0, 0, 0]
    for i in range(len(out)):
        out[i] = vector[i] * scalar
    return out
    
    toes = [8.5, 9.5, 9.9, 10]
    wlrec = [0.65, 0.8, 0.8, 0.9]
    nfans = [1.2, 1.3, 0.5, 1.0]
    
    win_or_loss_binary = [1, 1, 0, 1]
    true = win_or_loss_binary[0]
    
    alpha = 0.01
    weights = [0.1, 0.2, -0.1]
    input = [toes[0], wlrec[0], nfans[0]]
    
    for iter in range(3):
    pred = neural_network(input, weights)
    
    error = (pred - true) *
    delta = pred - true
    
    weights_deltas = ele_mul(delta, input)
    print("Error:" + str(error) + "     Prediction:" + str(pred) + "    Delta:" + str(delta)
            + "   Weight_delta:" + str(weights_deltas))
    for i in range(len(weights)):
        weights[i] -= weights_deltas[i] * alpha
复制代码
    Error:0.01959999999999997     Prediction:0.8600000000000001    Delta:-0.1399999999999999   Weight_delta:[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]
    Error:0.0013135188062500048     Prediction:0.9637574999999999    Delta:-0.036242500000000066   Weight_delta:[-0.30806125000000056, -0.023557625000000044, -0.04349100000000008]
    Error:8.802712522307997e-05     Prediction:0.9906177228125002    Delta:-0.009382277187499843   Weight_delta:[-0.07974935609374867, -0.006098480171874899, -0.011258732624999811]

2.单项权重冻结

冻结toes[0]的权重——将其对应的weights_delta设为0

复制代码
    # 单项权重冻结
    
    def neural_network(input, weights):
    out = 0
    for i in range(len(input)):
        out += (input[i] * weights[i])
    return out 
    
    def ele_mul(scalar, vector):
    out = [0, 0, 0]
    for i in range(len(out)):
        out[i] = vector[i] * scalar
    return out
    
    toes = [8.5, 9.5, 9.9, 10]
    wlrec = [0.65, 0.8, 0.8, 0.9]
    nfans = [1.2, 1.3, 0.5, 1.0]
    
    win_or_loss_binary = [1, 1, 0, 1]
    true = win_or_loss_binary[0]
    
    # 改变学习率
    alpha = 0.3
    weights = [0.1, 0.2, -0.1]
    input = [toes[0], wlrec[0], nfans[0]]
    
    for iter in range(3):
    pred = neural_network(input, weights)
    
    error = (pred - true) *
    delta = pred - true
    
    weights_deltas = ele_mul(delta, input)
    # 冻结
    weights_deltas[0] = 0
    print("Error:" + str(error) + "     Prediction:" + str(pred) + "    Delta:" + str(delta)
            + "   Weight_delta:" + str(weights_deltas))
    for i in range(len(weights)):
        weights[i] -= weights_deltas[i] * alpha
复制代码
    Error:0.01959999999999997     Prediction:0.8600000000000001    Delta:-0.1399999999999999   Weight_delta:[0, -0.09099999999999994, -0.16799999999999987]
    Error:0.003816150624999989     Prediction:0.9382250000000001    Delta:-0.06177499999999991   Weight_delta:[0, -0.040153749999999946, -0.07412999999999989]
    Error:0.000743010489422852     Prediction:0.97274178125    Delta:-0.027258218750000007   Weight_delta:[0, -0.017717842187500006, -0.032709862500000006]

因为权重市共享的,当一项权重到了梯度的底部(导数为0)所有权重都会到达梯度的底部。

神经网络潜在的一项负面特性:权重 a(toes[0])
可能对应着·重要的输入数据,会对预测结果产生举足轻重的影响,但如果网络在训练数据集中意外的发现了一种不需要a也可以准确预测(error=0)的情况,那么权重a将不会对预测结果产生任何影响。

误差由训练数据决定。任何网络的权重都可以任意取值,但给定任意特定权重后,误差完全由数据决定。

3.多个输出的梯度下降学习

单输入多输出

复制代码
    # 单输入,多输出的梯度下降学习
    
    def ele_mul(scalar, vector):
    out = [0, 0, 0]
    for i in range(len(out)):
        out[i] = vector[i] * scalar
    return out
    
    def scalar_ele_mul(number, vector):
    output = [0, 0, 0]
    assert(len(output) == len(vector))
    for i in range(len(vector)):
        output[i] = number * vector[i]
    return output
    
    def neural_network(input, weights):
    pred = ele_mul(input, weights)
    return pred
    
    weights = [0.3, 0.2, 0.9]
    
    wlrec = [0.65, 0.8, 0.8, 0.9]
    
    hurt = [0.1, 0.0, 0.0, 0.9]
    win = [1, 1, 0, 1]
    sad = [0.1, 0.0, 0.1, 0.2]
    alpha = 0.1
    input = wlrec[0]
    true = [hurt[0], win[0], sad[0]]
    for _ in range(20): 
    pred = neural_network(input, weights)
    error = [0, 0, 0]
    delta = [0, 0, 0]
    for i in range(len(true)):
        error[i] = (pred[i] - true[i]) *
        delta[i] = pred[i] - true[i]
    
    weight_deltas = ele_mul(input, delta)
    for i in range(len(weights)):
        weights[i] -= (weight_deltas[i] * alpha)
    
    print("Error:" + str(error) + " Prediction:" + str(pred))
复制代码
    Error:[0.003202573209336997, 0.268590322675587, 0.08347094550319063] Prediction:[0.15659128209660034, 0.481742995536397, 0.3889133875458018]
    Error:[0.0029376725664875124, 0.24637389092237108, 0.07656665146282832] Prediction:[0.15420030042801897, 0.5036393539749842, 0.3767067969219916]
    Error:[0.002694683163755034, 0.22599508993309528, 0.07023344567249623] Prediction:[0.15191033773493517, 0.5246105912695411, 0.36501593475203753]
    Error:[0.002471792614282734, 0.2073019201939724, 0.064424090603286] Prediction:[0.14971712596563416, 0.544695793788403, 0.3538190115087639]
    Error:[0.0022673384426793778, 0.19015495482149797, 0.059095255975540883] Prediction:[0.14761657739358613, 0.563932396500843, 0.34309515827251863]
    Error:[0.0020797956851018044, 0.17442629961812245, 0.054207195570977515] Prediction:[0.1456047769987071, 0.5823562527486824, 0.3328243878355047]
    Error:[0.0019077655149958386, 0.15999863914685317, 0.049723450777273845] Prediction:[0.14367797517051173, 0.6000017010700506, 0.3229875574494547]
    Error:[0.0017499648096583036, 0.14676436170973617, 0.04561057865394731] Prediction:[0.14183258071955762, 0.6169016291998409, 0.3135663331472152]

多输入多输出

复制代码
    # 多输入,多输出的梯度下降学习
    import numpy as np
    
    def neural_network(input, weights):
    pred = [0, 0, 0]
    for i in range(len(input)):
        pred[i] = input.dot(weights[i])
    return pred
    
    def outer_prod(a, b):
    out = np.zeros((len(a), len(b)))
    for i in range(len(a)):
        for j in range(len(b)):
            out[i][j] = a[i] * b[j]
    return out
    
    toes = np.array([8.5, 9.5, 9.9, 9.0])
    wlrec = np.array([0.65, 0.8, 0.8, 0.9])
    nfans = np.array([1.2, 1.3, 0.5, 1.0])
    
    hurt = np.array([0.1, 0.0, 0.0, 0.9])
    win = np.array([1, 1, 0, 1])
    sad = np.array([0.1, 0.0, 0.1, 0.2])
    
    weights = np.array([ [0.1, 0.1, -0.3],
                     [0.1, 0.2, 0.0],
                     [0.0, 1.3, 0.1] ] )
    alpha = 0.01
    input = np.array([toes[0], wlrec[0], nfans[0]])
    true = np.array([hurt[0], win[0], sad[0]])
    
    for _ in range(2000): 
    pred = neural_network(input, weights)
    error = [0, 0, 0]
    delta = [0, 0, 0]
    for i in range(len(true)):
        error[i] = (pred[i] - true[i]) *
        delta[i] = pred[i] - true[i]
    
    weight_deltas = outer_prod(input, delta)
    for i in range(len(weights)):
        for j in range(len(weights[i])):
            weights[i][j] -= (weight_deltas[i][j] * alpha)
    print("Error:" + str(error) )

结果不尽如人意,error下降的太慢,似乎到达不了0了,试着改变alpha的值,但变化毫无规律,就这样吧,就当了解一下好了

复制代码
    Error:[0.011262153166908923, 0.0039575961607914495, 0.617454285079451]
    Error:[0.011262153166920988, 0.003957596160791897, 0.6174542850794454]
    Error:[0.011262153166914955, 0.0039575961607914495, 0.6174542850794567]
    Error:[0.011262153166914955, 0.003957596160791673, 0.617454285079451]
    Error:[0.011262153166920988, 0.003957596160791897, 0.617454285079451]
    Error:[0.011262153166914955, 0.003957596160791673, 0.617454285079451]
    Error:[0.011262153166914955, 0.003957596160791673, 0.6174542850794454]
    Error:[0.011262153166908923, 0.003957596160791897, 0.6174542850794567]
    Error:[0.011262153166914955, 0.003957596160792344, 0.6174542850794454]
    Error:[0.011262153166920988, 0.003957596160792344, 0.617454285079451]
    Error:[0.011262153166914955, 0.003957596160792121, 0.6174542850794567]
    Error:[0.011262153166920988, 0.003957596160792344, 0.617454285079451]
    Error:[0.011262153166914955, 0.003957596160792121, 0.6174542850794567]
    Error:[0.011262153166908923, 0.003957596160792121, 0.617454285079451]
    Error:[0.011262153166914955, 0.0039575961607925675, 0.6174542850794399]
    Error:[0.011262153166914955, 0.003957596160792791, 0.6174542850794454]
    Error:[0.011262153166920988, 0.003957596160792791, 0.617454285079451]
    Error:[0.011262153166914955, 0.0039575961607925675, 0.617454285079451]
    Error:[0.011262153166914955, 0.0039575961607925675, 0.617454285079451]
    Error:[0.011262153166920988, 0.003957596160792791, 0.6174542850794454]
    Error:[0.011262153166914955, 0.0039575961607925675, 0.6174542850794567]

第六章 建立你的第一个深度神经网络:反向传播

  • 交通信号灯问题
  • 矩阵和矩阵的关系
  • 梯度下降,分批梯度下降, 随机梯度下降
  • 神经网络对相关性学习
  • 过拟合
  • 创建属于自己的相关性
  • 反向传播:远程错误归因
  • 线性与非线性
  • 第一个深度网络
  • 代码中的反向传播:信息整合

6.6学习整个数据集

复制代码
    import numpy as np
    
    weights = np.array([0.5, 0.48, -0.7])
    alpha = 0.1
    
    streetlights = np.array([[1,0,1],
                         [0,1,1],
                         [0,0,1],
                         [1,1,1],
                         [0,1,1],
                         [1,0,1]])
    walk_vs_stop = np.array([0, 1, 0, 1, 1, 0])
    
    input = streetlights[0]
    goal_prediction = walk_vs_stop[0]
    
    for _ in range(40):
    error_for_all_lights = 0
    for row_index in range(len(walk_vs_stop)):
        input = streetlights[row_index]
        goal_prediction = walk_vs_stop[row_index]
    
        prediction = input.dot(weights)
        error = (prediction - goal_prediction) *
        error_for_all_lights += error
    
        delta = prediction - goal_prediction
        weights = weights - delta * input * alpha
        print("Predction:   "+ str(prediction))
    print("Error:   " + str(error_for_all_lights) + '\n')
复制代码
    Predction:   -0.002388697618122871
    Predction:   0.9977021355600483
    Predction:   -0.01793930655497516
    Predction:   1.0162137740080082
    Predction:   0.9967128843019345
    Predction:   -0.0028012842268006904
    Error:   0.0006143435674831474
    
    Predction:   -0.0022410273814405524
    Predction:   0.9978745386023716
    Predction:   -0.016721264429884947
    Predction:   1.0151127459893812
    Predction:   0.9969492081270097
    Predction:   -0.0026256193329783125
    Error:   0.00053373677328488

2e-6 代表 2*(10^-6)

6.7随机梯度下降,梯度下降,批量梯度下降

  • 随机梯度下降:对每个样例计算一次 weight-deltas
  • 梯度下降:针对整个数据集计算 weight-deltas 的平均值
  • 批量梯度下降: 每个批次(通常8~256)计算weigh-delta的平均值

6.22你的第一个深度学习神经网络:反向传播代码

复制代码
    # 反向传播
    import numpy as np
    
    # 随机种子,让随后每次random的结果相同
    np.random.seed(1)
    
    # 当 x>0 返回x, 其他条件返回0
    def relu(x):
    return (x > 0) * x
    
    # 当 input>0 时,返回1,其他条件返回0
    def relu2deriv(input):
    return input > 0
    
    alpha = 0.2
    hidden_size = 4
    
    streetlights = np.array([[1,0,1],
                         [0,1,1],
                         [0,0,1],
                         [1,1,1]]) 
    
    walk_vs_stop = np.array([1, 1, 0, 0])
    
    weights_0_1 = 2*np.random.random((3, hidden_size)) - 1
    weights_1_2 = 2*np.random.random((hidden_size, 1)) - 1
    
    for iteration in range(60):
    layer_2_error = 0
    for i in range(len(streetlights)):
        # s[1].shape = (3, )
        # s[1:2].shape = (1, 3)
        layer_0 = streetlights[i:i+1]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
        layer_2 = np.dot(layer_1, weights_1_2)
    
        layer_2_error += np.sum((layer_2 - walk_vs_stop[i] ) ** 2)
        
        layer_2_delta = (walk_vs_stop[i] - layer_2)
        layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
    
        weights_1_2 += alpha * np.dot(layer_1.T, layer_2_delta)
        weights_0_1 += alpha * np.dot(layer_0.T, layer_1_delta)
    
    if(iteration % 10 == 9):
        print("Error:   " + str(layer_2_error))
复制代码
    Error:   0.6342311598444467
    Error:   0.35838407676317513
    Error:   0.0830183113303298
    Error:   0.006467054957103705
    Error:   0.0003292669000750734
    Error:   1.5055622665134859e-05

正向传播:计算误差
反向传播:更新权重

第八章 学习信号,忽略噪声:正则化和批处理

1.从头训练一个神经网络

这个要用到内置于keras的mnist数据,没有的要装一个

conda install keras

复制代码
    import sys, numpy as np
    from keras.datasets import mnist
    
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    
    images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
    
    one_hot_labels = np.zeros((1000, 10))
    
    for i, l in enumerate(labels):
    one_hot_labels[i][l] = 1
    labels = one_hot_labels
    
    test_images = x_test.reshape(len(x_test), 28*28)/255
    test_labels = np.zeros((len(y_test), 10))
    for i, l in enumerate(y_test):
    test_labels[i][l] = 1
    
    np.random.seed(1)
    relu = lambda x : (x>=0) * x
    relu2deriv = lambda x : x>=0
    alpha = 0.005
    iterations = 350
    hidden_size = 40
    pixels_per_image = 784
    num_labels = 10
    
    weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
    
    for j in range(iterations):
    error, correct_cnt = (0.0, 0)
    for i in range(len(images)):
        layer_0 = images[i:i+1]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
        layer_2 = np.dot(layer_1, weights_1_2)
    
        error += np.sum((labels[i:i+1] - layer_2) ** 2)
        correct_cnt += int(np.argmax(layer_2) == np.argmax(labels[i:i+1]))
    
        layer_2_delta = (labels[i:i+1] - layer_2)
        layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
    
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
    
    if(j % 10 == 0 or j == iterations - 1):
        test_error, test_correct = (0.0, 0)
    
        for k in range(len(test_images)):
            layer_0 = test_images[k:k+1]
            layer_1 = relu(np.dot(layer_0, weights_0_1))
            layer_2 = np.dot(layer_1, weights_1_2)
    
            test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
            test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
        
    print("Epochs: " + str(j) 
            + "  Train_Error: " + str(error/float(len(images)))[0:5]
            + "  Train_Correct: "  + str(correct_cnt/float(len(images)))
            + "  Test_Error: " + str(test_error/float(len(test_images)))[0:5]  
            + "  Test_Correct: " + str(test_correct/float(len(test_images))) )
复制代码
    Epochs: 338  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.637  Test_Correct: 0.7125
    Epochs: 339  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.637  Test_Correct: 0.7125
    Epochs: 340  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 341  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 342  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 343  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 344  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 345  Train_Error: 0.109  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 346  Train_Error: 0.108  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 347  Train_Error: 0.108  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 348  Train_Error: 0.108  Train_Correct: 1.0  Test_Error: 0.645  Test_Correct: 0.71
    Epochs: 349  Train_Error: 0.108  Train_Correct: 1.0  Test_Error: 0.653  Test_Correct: 0.7073

很明显,在训练集上,准确率高,在测试集上,准确率不高,这就过拟合了。

2.正则化

  1. 提前停止
  2. dropout

使用dropout正则化

复制代码
    import sys, numpy as np
    from keras.datasets import mnist
    
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    
    images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
    
    one_hot_labels = np.zeros((1000, 10))
    
    for i, l in enumerate(labels):
    one_hot_labels[i][l] = 1
    labels = one_hot_labels
    
    test_images = x_test.reshape(len(x_test), 28*28)/255
    test_labels = np.zeros((len(y_test), 10))
    for i, l in enumerate(y_test):
    test_labels[i][l] = 1
    
    np.random.seed(1)
    relu = lambda x : (x>=0) * x
    relu2deriv = lambda x : x>=0
    alpha = 0.005
    iterations = 300
    hidden_size = 40
    pixels_per_image = 784
    num_labels = 10
    
    weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
    
    for j in range(iterations):
    error, correct_cnt = (0.0, 0)
    for i in range(len(images)):
        layer_0 = images[i:i+1]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
    
        # dropout 随机生成一个矩阵, 只包含 0 或 1
        dropout_mask = np.random.randint(2, size=layer_1.shape)
        layer_1 *= dropout_mask 
    
        layer_2 = np.dot(layer_1, weights_1_2)
    
        error += np.sum((labels[i:i+1] - layer_2) ** 2)
        correct_cnt += int(np.argmax(layer_2) == np.argmax(labels[i:i+1]))
    
        layer_2_delta = (labels[i:i+1] - layer_2)
        layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
        # 反向传播时也要处理
        layer_1_delta *= dropout_mask
    
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
    
    if(j % 10 == 0 or j == iterations - 1):
        test_error, test_correct = (0.0, 0)
    
        for k in range(len(test_images)):
            layer_0 = test_images[k:k+1]
            layer_1 = relu(np.dot(layer_0, weights_0_1))
            layer_2 = np.dot(layer_1, weights_1_2)
    
            test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
            test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
        
        print("Epochs: " + str(j) 
            + "  Train_Error: " + str(error/float(len(images)))[0:5]
            + "  Train_Correct: "  + str(correct_cnt/float(len(images)))
            + "  Test_Error: " + str(test_error/float(len(test_images)))[0:5]  
            + "  Test_Correct: " + str(test_correct/float(len(test_images))) )
复制代码
    Epochs: 180  Train_Error: 0.453  Train_Correct: 0.782  Test_Error: 0.432  Test_Correct: 0.7955
    Epochs: 190  Train_Error: 0.433  Train_Correct: 0.784  Test_Error: 0.436  Test_Correct: 0.7997
    Epochs: 200  Train_Error: 0.442  Train_Correct: 0.796  Test_Error: 0.436  Test_Correct: 0.803
    Epochs: 210  Train_Error: 0.441  Train_Correct: 0.79  Test_Error: 0.434  Test_Correct: 0.8031
    Epochs: 220  Train_Error: 0.434  Train_Correct: 0.777  Test_Error: 0.426  Test_Correct: 0.8102
    Epochs: 230  Train_Error: 0.431  Train_Correct: 0.803  Test_Error: 0.429  Test_Correct: 0.8058
    Epochs: 240  Train_Error: 0.430  Train_Correct: 0.788  Test_Error: 0.436  Test_Correct: 0.8055
    Epochs: 250  Train_Error: 0.433  Train_Correct: 0.789  Test_Error: 0.421  Test_Correct: 0.8053
    Epochs: 260  Train_Error: 0.422  Train_Correct: 0.79  Test_Error: 0.422  Test_Correct: 0.8102
    Epochs: 270  Train_Error: 0.430  Train_Correct: 0.803  Test_Error: 0.438  Test_Correct: 0.8062
    Epochs: 280  Train_Error: 0.425  Train_Correct: 0.79  Test_Error: 0.431  Test_Correct: 0.7991
    Epochs: 290  Train_Error: 0.428  Train_Correct: 0.792  Test_Error: 0.433  Test_Correct: 0.8028
    Epochs: 299  Train_Error: 0.407  Train_Correct: 0.815  Test_Error: 0.412  Test_Correct: 0.8027

不知道为什么,我得到的数据收敛速度比书中慢一些,不过无伤大雅,还是能看出来,在测试集上,准确率得到了提升。
原因:书中的hidden_size此时是100,
amazing!

3. 批量梯度下降

复制代码
    import sys, numpy as np
    from keras.datasets import mnist
    
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    
    images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
    
    one_hot_labels = np.zeros((1000, 10))
    
    for i, l in enumerate(labels):
    one_hot_labels[i][l] = 1
    labels = one_hot_labels
    
    test_images = x_test.reshape(len(x_test), 28*28)/255
    test_labels = np.zeros((len(y_test), 10))
    for i, l in enumerate(y_test):
    test_labels[i][l] = 1
    
    np.random.seed(1)
    relu = lambda x : (x>=0) * x
    relu2deriv = lambda x : x>=0
    alpha = 0.01
    iterations = 300
    hidden_size = 100
    pixels_per_image = 784
    num_labels = 10
    
    # 批量
    batch_size = 100
    
    weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
    
    for j in range(iterations):
    error, correct_cnt = (0.0, 0)
    for i in range(int(len(images) / batch_size)):
        # !!!!!!
        batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))
    
        layer_0 = images[batch_start:batch_end]
        layer_1 = relu(np.dot(layer_0, weights_0_1))
    
        # dropout 随机生成一个矩阵, 只包含 0 或 1
        dropout_mask = np.random.randint(2, size=layer_1.shape)
        layer_1 *= dropout_mask 
    
        layer_2 = np.dot(layer_1, weights_1_2)
    
        error += np.sum((labels[batch_start:batch_end] - layer_2) ** 2)
        # !!!!!!
        for k in range(batch_size):
            correct_cnt += int(np.argmax(layer_2[k:k+1]) == 
                                np.argmax(labels[batch_start+k:batch_end+k+1]))
    
        layer_2_delta = (labels[batch_start:batch_end] - layer_2) / batch_size
    
        layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
        # 反向传播时也要处理
        layer_1_delta *= dropout_mask
    
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
    
    if(j % 10 == 0 ):
        test_error, test_correct = (0.0, 0)
        for k in range(len(test_images)):
            layer_0 = test_images[k:k+1]
            layer_1 = relu(np.dot(layer_0, weights_0_1))
            layer_2 = np.dot(layer_1, weights_1_2)
    
            test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
            test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
        
        print("Epochs: " + str(j) 
            + "  Train_Error: " + str(error/float(len(images)))[0:5]
            + "  Train_Correct: "  + str(correct_cnt/float(len(images)))
            + "  Test_Error: " + str(test_error/float(len(test_images)))[0:5]  
            + "  Test_Correct: " + str(test_correct/float(len(test_images))) )
复制代码
    Epochs: 170  Train_Error: 0.543  Train_Correct: 0.736  Test_Error: 0.528  Test_Correct: 0.7524
    Epochs: 180  Train_Error: 0.544  Train_Correct: 0.716  Test_Error: 0.523  Test_Correct: 0.7549
    Epochs: 190  Train_Error: 0.537  Train_Correct: 0.718  Test_Error: 0.519  Test_Correct: 0.7577
    Epochs: 200  Train_Error: 0.535  Train_Correct: 0.717  Test_Error: 0.514  Test_Correct: 0.7614
    Epochs: 210  Train_Error: 0.530  Train_Correct: 0.727  Test_Error: 0.511  Test_Correct: 0.763
    Epochs: 220  Train_Error: 0.524  Train_Correct: 0.721  Test_Error: 0.507  Test_Correct: 0.7636
    Epochs: 230  Train_Error: 0.519  Train_Correct: 0.747  Test_Error: 0.503  Test_Correct: 0.7667
    Epochs: 240  Train_Error: 0.521  Train_Correct: 0.736  Test_Error: 0.500  Test_Correct: 0.7681
    Epochs: 250  Train_Error: 0.512  Train_Correct: 0.744  Test_Error: 0.497  Test_Correct: 0.7711
    Epochs: 260  Train_Error: 0.515  Train_Correct: 0.732  Test_Error: 0.495  Test_Correct: 0.7727
    Epochs: 270  Train_Error: 0.503  Train_Correct: 0.76  Test_Error: 0.491  Test_Correct: 0.777
    Epochs: 280  Train_Error: 0.513  Train_Correct: 0.758  Test_Error: 0.490  Test_Correct: 0.7762
    Epochs: 290  Train_Error: 0.502  Train_Correct: 0.771  Test_Error: 0.487  Test_Correct: 0.7798

这里还是和书上的收敛熟速度不同,不知道为什么。
书上代码有个坑,反向传播的权重增量和权重更新,一共5行代码缩进了一个tab。
这本书上,代码小错误挺多的,不知道是不是出版社的锅。

第九章 概率和非线性建模:激活函数

1.什么是激活函数

简单来说,激活函数指的是任何一个可以接受一个数字并返回另一个数字的函数。
但并不是所有函数都可用作激活函数。
激活函数需要满足几个条件:

  1. 函数必须连续且定义域是无穷
  2. 好的激活函数是单调的,不会改变方向
  3. 好的激活函数是非线性的
  4. 合适的激活函数(及其导数)应该可以高效计算

2.标准隐藏层激活函数

  • 基础激活函数:sigmoid
  • 对隐藏层来说,tanh 比 sigmoid 更合适

keras 的各种函数选择:

问题类型 最后一层激活 损失函数
二分类问题 sigmoid binary_crossentropy
多分类、单标签问题 softmax categorical_crossentropy
多分类、多标签问题 sigmoid binary_crossentropy
回归到任意值 mse
回归到0~1范围内 sigmoid mse或binary_crossentropy

3. 升级MNIST网络

本章用表现更好的激活函数升级了前面的代码,还是有进步的。

书上说,用tanh时,要把权重矩阵的随机分布调整在-0.01~0.01之间,tanh喜欢随机分布范围更窄的初始化。

复制代码
    #tanh
    weights_0_1 = 0.02 * np.random.random((pixels_per_image, hidden_size)) - 0.01
    weights_1_2 = 0.02 * np.random.random((hidden_size, num_labels)) - 0.01
复制代码
    #relu
    weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1

或许是因为和函数在这些范围内的导数大小有关?
不知道。

我下面没改权重,得出的结果也还行。

复制代码
    import sys, numpy as np
    from keras.datasets import mnist
    np.random.seed(1)
    bian
    # 数据预处理,one-hot编码
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    
    images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
    
    one_hot_labels = np.zeros((1000, 10))
    
    for i, l in enumerate(labels):
    one_hot_labels[i][l] = 1
    labels = one_hot_labels
    
    test_images = x_test.reshape(len(x_test), 28*28)/255
    test_labels = np.zeros((len(y_test), 10))
    for i, l in enumerate(y_test):
    test_labels[i][l] = 1
    
    # 激活函数
    def tanh(x):
    return np.tanh(x)
    
    def tanh2deriv(output):
    return 1 - (output ** 2)
    
    def softmax(x):
    temp = np.exp(x)
    return temp / np.sum(temp, axis=1, keepdims=True)
    
    # 各种参数
    alpha = 0.02
    iterations = 300
    hidden_size = 100
    pixels_per_image = 784
    num_labels = 10
    batch_size = 100
    
    # 随机初始化权重
    weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
    
    for j in range(iterations):
    # 训练
    error, correct_cnt = (0.0, 0)
    for i in range(int(len(images) / batch_size)):
        batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))
    
        # 网络层结构
        layer_0 = images[batch_start:batch_end]
        layer_1 = tanh(np.dot(layer_0, weights_0_1))
        dropout_mask = np.random.randint(2, size=layer_1.shape)
        layer_1 *= dropout_mask 
        layer_2 = softmax(np.dot(layer_1, weights_1_2))
    
        # 计算误差 和 准确率
        error += np.sum((labels[batch_start:batch_end] - layer_2) ** 2)
        for k in range(batch_size):
            correct_cnt += int(np.argmax(layer_2[k:k+1]) == 
                                np.argmax(labels[batch_start+k:batch_end+k+1]))
    
        # 反向传播,权重增量
        layer_2_delta = (labels[batch_start:batch_end] - layer_2) / batch_size
        layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * tanh2deriv(layer_1)
        layer_1_delta *= dropout_mask
    
        # 更新权重
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
    
    # 测试,使用相同的网络结果,不需要做正则化处理
    # 为什么会有效呢?
    # 因为训练的过程中,学习到了新的权重,这些权重就是精华所在!
    if(j % 10 == 0 ):
        test_error, test_correct = (0.0, 0)
        for k in range(len(test_images)):
            layer_0 = test_images[k:k+1]
            layer_1 = tanh(np.dot(layer_0, weights_0_1))
            layer_2 = softmax(np.dot(layer_1, weights_1_2))
    
            test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
            test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
        
        print("Epochs: " + str(j) 
            + "  Train_Error: " + str(error/float(len(images)))[0:5]
            + "  Train_Correct: "  + str(correct_cnt/float(len(images)))
            + "  Test_Error: " + str(test_error/float(len(test_images)))[0:5]  
            + "  Test_Correct: " + str(test_correct/float(len(test_images))) )
复制代码
    Epochs: 170  Train_Error: 0.179  Train_Correct: 0.89  Test_Error: 0.231  Test_Correct: 0.8525
    Epochs: 180  Train_Error: 0.179  Train_Correct: 0.892  Test_Error: 0.226  Test_Correct: 0.8562
    Epochs: 190  Train_Error: 0.173  Train_Correct: 0.892  Test_Error: 0.221  Test_Correct: 0.8587
    Epochs: 200  Train_Error: 0.153  Train_Correct: 0.916  Test_Error: 0.217  Test_Correct: 0.8596
    Epochs: 210  Train_Error: 0.168  Train_Correct: 0.897  Test_Error: 0.214  Test_Correct: 0.8616
    Epochs: 220  Train_Error: 0.150  Train_Correct: 0.903  Test_Error: 0.211  Test_Correct: 0.863
    Epochs: 230  Train_Error: 0.145  Train_Correct: 0.916  Test_Error: 0.209  Test_Correct: 0.8633
    Epochs: 240  Train_Error: 0.156  Train_Correct: 0.904  Test_Error: 0.206  Test_Correct: 0.8652
    Epochs: 250  Train_Error: 0.140  Train_Correct: 0.92  Test_Error: 0.204  Test_Correct: 0.8664
    Epochs: 260  Train_Error: 0.137  Train_Correct: 0.919  Test_Error: 0.203  Test_Correct: 0.8654
    Epochs: 270  Train_Error: 0.135  Train_Correct: 0.924  Test_Error: 0.199  Test_Correct: 0.869
    Epochs: 280  Train_Error: 0.140  Train_Correct: 0.917  Test_Error: 0.197  Test_Correct: 0.8703
    Epochs: 290  Train_Error: 0.139  Train_Correct: 0.92  Test_Error: 0.196  Test_Correct: 0.8706

这是改权重的

复制代码
    Epochs: 170  Train_Error: 0.198  Train_Correct: 0.897  Test_Error: 0.261  Test_Correct: 0.8355
    Epochs: 180  Train_Error: 0.184  Train_Correct: 0.899  Test_Error: 0.253  Test_Correct: 0.8395
    Epochs: 190  Train_Error: 0.179  Train_Correct: 0.898  Test_Error: 0.246  Test_Correct: 0.8433
    Epochs: 200  Train_Error: 0.170  Train_Correct: 0.912  Test_Error: 0.239  Test_Correct: 0.8466
    Epochs: 210  Train_Error: 0.164  Train_Correct: 0.906  Test_Error: 0.234  Test_Correct: 0.8494
    Epochs: 220  Train_Error: 0.154  Train_Correct: 0.917  Test_Error: 0.230  Test_Correct: 0.8506
    Epochs: 230  Train_Error: 0.149  Train_Correct: 0.917  Test_Error: 0.226  Test_Correct: 0.8525
    Epochs: 240  Train_Error: 0.143  Train_Correct: 0.918  Test_Error: 0.222  Test_Correct: 0.8542
    Epochs: 250  Train_Error: 0.139  Train_Correct: 0.926  Test_Error: 0.219  Test_Correct: 0.856
    Epochs: 260  Train_Error: 0.131  Train_Correct: 0.929  Test_Error: 0.216  Test_Correct: 0.8573
    Epochs: 270  Train_Error: 0.127  Train_Correct: 0.933  Test_Error: 0.213  Test_Correct: 0.8584
    Epochs: 280  Train_Error: 0.124  Train_Correct: 0.933  Test_Error: 0.211  Test_Correct: 0.8589
    Epochs: 290  Train_Error: 0.122  Train_Correct: 0.928  Test_Error: 0.208  Test_Correct: 0.8592

第十章 卷积神经网络概论:关于边与角的神经学习

1.基于NumPy的简单实现

复制代码
    '''
    conv2D 基于 numpy 的简单实现
    '''
    
    import numpy as np
    from keras.datasets import mnist
    
    np.random.seed(1)
    
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    
    images, labels = (x_train[0:1000].reshape(1000, 28*28)/255, y_train)
    
    # 完成one_hot编码
    one_hot_labels = np.zeros((len(labels), 10))
    for i,l in enumerate(labels):
    one_hot_labels[i][l] = 1
    labels = one_hot_labels
    
    # 准备test数据和标签
    test_images = x_test.reshape(len(x_test), 28*28) / 255
    test_labels = np.zeros((len(y_test), 10))
    for i,l in enumerate(y_test):
    test_labels[i][l] = 1
    
    def tanh(x):
    return np.tanh(x)
    
    def tanh2deriv(output):
    return 1 - (output**2)
    
    def softmax(x):
    temp = np.exp(x)
    return temp / np.sum(temp, axis=1, keepdims=True)
    
    alpha, iterations = (2, 300)
    pixels_per_image, num_labels = (784, 10)
    batch_size = 128
    
    input_rows = 28
    input_cols = 28
    
    kernel_rows = 3
    kernel_cols = 3
    num_kernels = 16
    # 隐藏层节点个数
    hidden_size = ((input_rows - kernel_rows) * (input_cols - kernel_cols)) * num_kernels
    # num_kernels 个 卷积核
    kernels = 0.02 * np.random.random((kernel_rows * kernel_cols, num_kernels)) - 0.01
    # 权重
    weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
    
    def get_image_section(layer, row_from, row_to, col_from, col_to):
    '''
    为整批图像选择相同的子区域
    '''
    section = layer[:,row_from:row_to,col_from:col_to]
    return section.reshape(-1, 1, row_to - row_from, col_to - col_from)
    
    # 开始训练
    for j in range(iterations):
    correct_cnt = 0
    for i in range(int(len(images) / batch_size)):
        batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))
        layer_0 = images[batch_start:batch_end]
        layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
        layer_0.shape #多余的?
    
        sects = list()
        for row_start in range(layer_0.shape[1] - kernel_rows):
            for col_start in range(layer_0.shape[2] - kernel_cols):
                sect = get_image_section(layer_0,
                                         row_start,
                                         row_start+kernel_rows,
                                         col_start,
                                         col_start+kernel_cols)
                sects.append(sect)
        
        expanded_input = np.concatenate(sects, axis=1)
        es = expanded_input.shape
        flattened_input = expanded_input.reshape(es[0] * es[1], -1)
    
        kernel_output = flattened_input.dot(kernels)
        layer_1 = tanh(kernel_output.reshape(es[0], -1))
        dropout_mask = np.random.randint(2, size=layer_1.shape)
        layer_1 *= dropout_mask 
        layer_2 = softmax(np.dot(layer_1, weights_1_2))
    
        for k in range(batch_size):
            labelset = labels[batch_start+k : batch_start+k+1]
            _inc = int(np.argmax(layer_2[k:k+1]) == np.argmax(labelset))
            correct_cnt += _inc
    
        layer_2_delta = (labels[batch_start : batch_end] - layer_2) / (batch_size * layer_2.shape[0])
    
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * tanh2deriv(layer_1)
        layer_1_delta *= dropout_mask
    
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
        k_update = flattened_input.T.dot(l1d_reshape)
        kernels -= alpha * k_update
    
    # 测试结果
    test_correct_cnt = 0
    
    for i in range(len(test_images)):
        layer_0 = test_images[i:i+1]
        layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
        layer_0.shape #多余?
    
        sects = list()
        for row_start in range(layer_0.shape[1] - kernel_rows):
            for col_start in range(layer_0.shape[2] - kernel_cols):
                sect = get_image_section(layer_0,
                                         row_start,
                                         row_start+kernel_rows,
                                         col_start,
                                         col_start+kernel_cols)
                sects.append(sect)
        
        expanded_input = np.concatenate(sects, axis=1)
        es = expanded_input.shape
        flattened_input = expanded_input.reshape(es[0] * es[1], -1)
    
        kernel_output = flattened_input.dot(kernels)
        layer_1 = tanh(kernel_output.reshape(es[0], -1))
        layer_2 = np.dot(layer_1, weights_1_2)
    
        test_correct_cnt += int(np.argmax(layer_2) == 
                                    np.argmax(test_labels[i:i+1]))
    
    
    print("I: " + str(j) 
            + "   Train_acc: " + str(correct_cnt / float(len(images))) 
            + "   Test_acc: " + str(test_correct_cnt / float(len(test_images))) )
复制代码
    I: 0   Train_acc: 0.055   Test_acc: 0.0288
    I: 1   Train_acc: 0.037   Test_acc: 0.0273
    I: 2   Train_acc: 0.037   Test_acc: 0.028
    I: 3   Train_acc: 0.04   Test_acc: 0.0292
    I: 4   Train_acc: 0.046   Test_acc: 0.0339
    I: 5   Train_acc: 0.068   Test_acc: 0.0478
    I: 6   Train_acc: 0.083   Test_acc: 0.076
    I: 7   Train_acc: 0.096   Test_acc: 0.1316
    I: 8   Train_acc: 0.127   Test_acc: 0.2137
    I: 9   Train_acc: 0.148   Test_acc: 0.2941
    I: 10   Train_acc: 0.181   Test_acc: 0.3563
    I: 11   Train_acc: 0.209   Test_acc: 0.4023

代码有点长,远离相对来说还是比较好懂的,结果出来的特别的慢,cup占用率直接干到100%,运算量太大了吗?那为什么在用keras框架跑conv2d不会出现这么高的占用率?优化吗?
这个世界需要框架!

全部评论 (0)

还没有任何评论哟~