学习笔记《Grokking Deep Learning》
文章目录
-
第三章 神经网络预测导论:前向传播
-
- 一个简单的神经网络
- 使用多个输入进行预测
- 单输入,多输出
- 多输入,多输出
- 用预测结果进一步预测
-
第四章 梯度下降
-
- 1.冷热法
- 2.基于误差调节权重
- 3.破坏梯度下降(当输入值过大)
-
第五章 通用梯度下降:一次学习多个权重
-
- 1.多输入,单输出,一次学习多个权重的梯度下降
- 2.单项权重冻结
- 3.多个输出的梯度下降学习
-
第六章 建立你的第一个深度神经网络:反向传播
-
- 6.6学习整个数据集
- 6.7随机梯度下降,梯度下降,批量梯度下降
- 6.22你的第一个深度学习神经网络:反向传播代码
-
第八章 学习信号,忽略噪声:正则化和批处理
-
- 1.从头训练一个神经网络
- 2.正则化
- 3. 批量梯度下降
-
第九章 概率和非线性建模:激活函数
-
- 1.什么是激活函数
- 2.标准隐藏层激活函数
- 3. 升级MNIST网络
-
第十章 卷积神经网络概论:关于边与角的神经学习
-
- 1.基于NumPy的简单实现
第三章 神经网络预测导论:前向传播
一个简单的神经网络
#网络代码
weight = 0.1
def neural_network(inpiut, weight):
prediction = input * weight
return prediction
#如何用训练好的神经网络进行预测
number_of_toes = [8.5, 9.5, 10, 9]
input = number_of_toes[0]
pred = neural_network(input, weight)
print(pred)
使用多个输入进行预测
多输入,单输出
import numpy as np
weights = np.array([0.1, 0.2, 0])
def neural_network(input, weights):
pred = input.dot(weights)
return pred
toes = np.array([8.5, 9.5, 9.9, 10])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])
input = np.array([toes[0], wlrec[0], nfans[0]])
perd = neural_network(input, weights)
print(perd)
单输入,多输出
# 单输入多输出
# 网络
def ele_mul(number, vector):
output = [0, 0, 0]
assert(len(output) == len(vector))
for i in range(len(output)):
output[i] = number * vector[i]
return output
def neural_network(input, weights):
pred = ele_mul(input, weights)
return pred
# 数据
wlrec = [0.65, 0.8, 0.8, 0.9]
weights = [0.3, 0.2, 0.9]
input = wlrec[0]
pred = neural_network(input, weights)
print(pred)
多输入,多输出
def w_sum(a, b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
def vect_mat_mul(vect, matrix):
assert(len(vect) == len(matrix))
output = [0, 0, 0]
for i in range(len(vect)):
output[i] = w_sum(vect, matrix[i])
return output
def neural_network(input, weights):
pred = vect_mat_mul(input, weights)
return pred
weights = [[0.1, 0.1, -0.3],
[0.1, 0.2, 0.0],
[0.0, 1.3, 0.1]]
toes = [8.5, 9.5, 9.9, 10]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
input = [toes[0], wlrec[0], nfans[0]]
pred = neural_network(input, weights)
print(pred)
用预测结果进一步预测
神经网络可以堆叠
import numpy as np
ih_wgh = np.array([[0.1, 0.1, -0.1],
[-0.1, 0.1, 0.9],
[0.1, 0.4, 0.1]])
hp_wgh = np.array([[0.3, 1.1, -0.3],
[0.1, 0.2, 0.0],
[0.0, 1.3, 0.1]])
weights = [ih_wgh, hp_wgh]
def neural_network(input, weights):
hid = input.dot(weights[0])
pred = hid.dot(weights[1])
return pred
toes = np.array([8.5, 9.5, 9.9, 10])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])
input = np.array([toes[0], wlrec[0], nfans[0]])
pred = neural_network(input, weights)
print(pred)
第四章 梯度下降
1.冷热法
优点:简单明了
缺点:1.效率低下, 2、有时准确预测出目标是不可能的
# 冷热法
weight = 0.5
input = 0.5
goal_prediction = 0.8
#学习率?
step_amount = 0.001
for _ in range(1101):
prediction = input * weight
error = (prediction - goal_prediction) *
print("Error:" + str(error) + " Prediction:" + str(prediction))
up_prediction = input * (weight + step_amount)
up_error = (up_prediction - goal_prediction) *
down_prediction = input * (weight - step_amount)
down_error = (down_prediction - goal_prediction) *
if(down_error < error):
weight -= step_amount
if(up_error < error):
weight += step_amount
Error:4.000000000130569e-06 Prediction:0.7979999999999674
Error:2.2500000000980924e-06 Prediction:0.7984999999999673
Error:1.000000000065505e-06 Prediction:0.7989999999999673
Error:2.5000000003280753e-07 Prediction:0.7994999999999672
Error:1.0799505792475652e-27 Prediction:0.7999999999999672
2.基于误差调节权重
weight = 0.5
goal_pred = 0.8
input = 0.5
for _ in range(20):
pred = input * weight
error = (pred - goal_pred) *
# 有正负,预测比实际大,为 + ;预测比实际小, 为 -
direction_and_amount = (pred - goal_pred) * input
weight -= direction_and_amount
print("Error:" + str(error) + " Prediction:" + str(pred))
Error:0.000303525861459885 Prediction:0.7825780063867569
Error:0.00017073329707118678 Prediction:0.7869335047900676
Error:9.603747960254256e-05 Prediction:0.7902001285925507
Error:5.402108227642978e-05 Prediction:0.7926500964444131
Error:3.038685878049206e-05 Prediction:0.7944875723333098
Error:1.7092608064027242e-05 Prediction:0.7958656792499823
Error:9.614592036015323e-06 Prediction:0.7968992594374867
Error:5.408208020258491e-06 Prediction:0.7976744445781151
3.破坏梯度下降(当输入值过大)
把input改为2
weight = 0.5
goal_pred = 0.8
input = 2
for _ in range(20):
pred = input * weight
error = (pred - goal_pred) *
# 有正负,预测比实际大,为 + ;预测比实际小, 为 -
direction_and_amount = (pred - goal_pred) * input
weight -= direction_and_amount
print("Error:" + str(error) + " Prediction:" + str(pred))
Error:101674633133.15994 Prediction:-318863.79999999993
Error:915071698198.4395 Prediction:956594.5999999997
Error:8235645283785.954 Prediction:-2869780.599999999
Error:74120807554073.56 Prediction:8609344.999999996
Error:667087267986662.1 Prediction:-25828031.799999986
Error:6003785411879960.0 Prediction:77484098.59999996
Error:5.403406870691965e+16 Prediction:-232452292.5999999
得到的结果没有意义
对权重的每次更新都会造成过度的修正
引入alpha 学习率?
alpha 过大: 权重更新步幅大, 容易在最低点跳过去, 不易取到最小值
alpha 过小: 权重更新步幅小,每次更新变化小,需要大量迭代计算,效率低下
alpha的取值没有明文规定, 一般取数量级: 10, 1, 0.1, 0.01 …
#引入 alpha
weight = 0.5
goal_pred = 0.8
input = 2
alpha = 0.1
for _ in range(20):
pred = input * weight
error = (pred - goal_pred) *
direction_and_amount = (pred - goal_pred) * input
weight -= direction_and_amount * alpha
print("Error:" + str(error) + " Prediction:" + str(pred))
Error:1.1284439629823931e-05 Prediction:0.803359232
Error:4.062398266736526e-06 Prediction:0.8020155392
Error:1.4624633760252567e-06 Prediction:0.8012093235200001
Error:5.264868153690924e-07 Prediction:0.8007255941120001
Error:1.8953525353291194e-07 Prediction:0.8004353564672001
Error:6.82326912718715e-08 Prediction:0.8002612138803201
Error:2.456376885786678e-08 Prediction:0.8001567283281921
Error:8.842956788836216e-09 Prediction:0.8000940369969153
Error:3.1834644439835434e-09 Prediction:0.8000564221981492
Error:1.1460471998340758e-09 Prediction:0.8000338533188895
Error:4.125769919393652e-10 Prediction:0.8000203119913337
Error:1.485277170987127e-10 Prediction:0.8000121871948003
第五章 通用梯度下降:一次学习多个权重
1.多输入,单输出,一次学习多个权重的梯度下降
delta
delta = pred - true
与误差不同,考虑到误差为正,且专注大的误差,放弃小的误差
error = (pred - true) *
delta 用于衡量你所希望的当前节点变化,以便完美地预测结果
weight_delta
weights_deltas = ele_mul(delta, input)
代表基于导数的对权重移动方向和数量的估计,目标是降低 detal 。
# 多输入,单输出梯度下降
def neural_network(input, weights):
out = 0
for i in range(len(input)):
out += (input[i] * weights[i])
return out
def ele_mul(scalar, vector):
out = [0, 0, 0]
for i in range(len(out)):
out[i] = vector[i] * scalar
return out
toes = [8.5, 9.5, 9.9, 10]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
win_or_loss_binary = [1, 1, 0, 1]
true = win_or_loss_binary[0]
alpha = 0.01
weights = [0.1, 0.2, -0.1]
input = [toes[0], wlrec[0], nfans[0]]
for iter in range(3):
pred = neural_network(input, weights)
error = (pred - true) *
delta = pred - true
weights_deltas = ele_mul(delta, input)
print("Error:" + str(error) + " Prediction:" + str(pred) + " Delta:" + str(delta)
+ " Weight_delta:" + str(weights_deltas))
for i in range(len(weights)):
weights[i] -= weights_deltas[i] * alpha
Error:0.01959999999999997 Prediction:0.8600000000000001 Delta:-0.1399999999999999 Weight_delta:[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]
Error:0.0013135188062500048 Prediction:0.9637574999999999 Delta:-0.036242500000000066 Weight_delta:[-0.30806125000000056, -0.023557625000000044, -0.04349100000000008]
Error:8.802712522307997e-05 Prediction:0.9906177228125002 Delta:-0.009382277187499843 Weight_delta:[-0.07974935609374867, -0.006098480171874899, -0.011258732624999811]
2.单项权重冻结
冻结toes[0]的权重——将其对应的weights_delta设为0
# 单项权重冻结
def neural_network(input, weights):
out = 0
for i in range(len(input)):
out += (input[i] * weights[i])
return out
def ele_mul(scalar, vector):
out = [0, 0, 0]
for i in range(len(out)):
out[i] = vector[i] * scalar
return out
toes = [8.5, 9.5, 9.9, 10]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
win_or_loss_binary = [1, 1, 0, 1]
true = win_or_loss_binary[0]
# 改变学习率
alpha = 0.3
weights = [0.1, 0.2, -0.1]
input = [toes[0], wlrec[0], nfans[0]]
for iter in range(3):
pred = neural_network(input, weights)
error = (pred - true) *
delta = pred - true
weights_deltas = ele_mul(delta, input)
# 冻结
weights_deltas[0] = 0
print("Error:" + str(error) + " Prediction:" + str(pred) + " Delta:" + str(delta)
+ " Weight_delta:" + str(weights_deltas))
for i in range(len(weights)):
weights[i] -= weights_deltas[i] * alpha
Error:0.01959999999999997 Prediction:0.8600000000000001 Delta:-0.1399999999999999 Weight_delta:[0, -0.09099999999999994, -0.16799999999999987]
Error:0.003816150624999989 Prediction:0.9382250000000001 Delta:-0.06177499999999991 Weight_delta:[0, -0.040153749999999946, -0.07412999999999989]
Error:0.000743010489422852 Prediction:0.97274178125 Delta:-0.027258218750000007 Weight_delta:[0, -0.017717842187500006, -0.032709862500000006]
因为权重市共享的,当一项权重到了梯度的底部(导数为0)所有权重都会到达梯度的底部。
神经网络潜在的一项负面特性:权重 a(toes[0])
可能对应着·重要的输入数据,会对预测结果产生举足轻重的影响,但如果网络在训练数据集中意外的发现了一种不需要a也可以准确预测(error=0)的情况,那么权重a将不会对预测结果产生任何影响。
误差由训练数据决定。任何网络的权重都可以任意取值,但给定任意特定权重后,误差完全由数据决定。
3.多个输出的梯度下降学习
单输入多输出
# 单输入,多输出的梯度下降学习
def ele_mul(scalar, vector):
out = [0, 0, 0]
for i in range(len(out)):
out[i] = vector[i] * scalar
return out
def scalar_ele_mul(number, vector):
output = [0, 0, 0]
assert(len(output) == len(vector))
for i in range(len(vector)):
output[i] = number * vector[i]
return output
def neural_network(input, weights):
pred = ele_mul(input, weights)
return pred
weights = [0.3, 0.2, 0.9]
wlrec = [0.65, 0.8, 0.8, 0.9]
hurt = [0.1, 0.0, 0.0, 0.9]
win = [1, 1, 0, 1]
sad = [0.1, 0.0, 0.1, 0.2]
alpha = 0.1
input = wlrec[0]
true = [hurt[0], win[0], sad[0]]
for _ in range(20):
pred = neural_network(input, weights)
error = [0, 0, 0]
delta = [0, 0, 0]
for i in range(len(true)):
error[i] = (pred[i] - true[i]) *
delta[i] = pred[i] - true[i]
weight_deltas = ele_mul(input, delta)
for i in range(len(weights)):
weights[i] -= (weight_deltas[i] * alpha)
print("Error:" + str(error) + " Prediction:" + str(pred))
Error:[0.003202573209336997, 0.268590322675587, 0.08347094550319063] Prediction:[0.15659128209660034, 0.481742995536397, 0.3889133875458018]
Error:[0.0029376725664875124, 0.24637389092237108, 0.07656665146282832] Prediction:[0.15420030042801897, 0.5036393539749842, 0.3767067969219916]
Error:[0.002694683163755034, 0.22599508993309528, 0.07023344567249623] Prediction:[0.15191033773493517, 0.5246105912695411, 0.36501593475203753]
Error:[0.002471792614282734, 0.2073019201939724, 0.064424090603286] Prediction:[0.14971712596563416, 0.544695793788403, 0.3538190115087639]
Error:[0.0022673384426793778, 0.19015495482149797, 0.059095255975540883] Prediction:[0.14761657739358613, 0.563932396500843, 0.34309515827251863]
Error:[0.0020797956851018044, 0.17442629961812245, 0.054207195570977515] Prediction:[0.1456047769987071, 0.5823562527486824, 0.3328243878355047]
Error:[0.0019077655149958386, 0.15999863914685317, 0.049723450777273845] Prediction:[0.14367797517051173, 0.6000017010700506, 0.3229875574494547]
Error:[0.0017499648096583036, 0.14676436170973617, 0.04561057865394731] Prediction:[0.14183258071955762, 0.6169016291998409, 0.3135663331472152]
多输入多输出
# 多输入,多输出的梯度下降学习
import numpy as np
def neural_network(input, weights):
pred = [0, 0, 0]
for i in range(len(input)):
pred[i] = input.dot(weights[i])
return pred
def outer_prod(a, b):
out = np.zeros((len(a), len(b)))
for i in range(len(a)):
for j in range(len(b)):
out[i][j] = a[i] * b[j]
return out
toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])
hurt = np.array([0.1, 0.0, 0.0, 0.9])
win = np.array([1, 1, 0, 1])
sad = np.array([0.1, 0.0, 0.1, 0.2])
weights = np.array([ [0.1, 0.1, -0.3],
[0.1, 0.2, 0.0],
[0.0, 1.3, 0.1] ] )
alpha = 0.01
input = np.array([toes[0], wlrec[0], nfans[0]])
true = np.array([hurt[0], win[0], sad[0]])
for _ in range(2000):
pred = neural_network(input, weights)
error = [0, 0, 0]
delta = [0, 0, 0]
for i in range(len(true)):
error[i] = (pred[i] - true[i]) *
delta[i] = pred[i] - true[i]
weight_deltas = outer_prod(input, delta)
for i in range(len(weights)):
for j in range(len(weights[i])):
weights[i][j] -= (weight_deltas[i][j] * alpha)
print("Error:" + str(error) )
结果不尽如人意,error下降的太慢,似乎到达不了0了,试着改变alpha的值,但变化毫无规律,就这样吧,就当了解一下好了
Error:[0.011262153166908923, 0.0039575961607914495, 0.617454285079451]
Error:[0.011262153166920988, 0.003957596160791897, 0.6174542850794454]
Error:[0.011262153166914955, 0.0039575961607914495, 0.6174542850794567]
Error:[0.011262153166914955, 0.003957596160791673, 0.617454285079451]
Error:[0.011262153166920988, 0.003957596160791897, 0.617454285079451]
Error:[0.011262153166914955, 0.003957596160791673, 0.617454285079451]
Error:[0.011262153166914955, 0.003957596160791673, 0.6174542850794454]
Error:[0.011262153166908923, 0.003957596160791897, 0.6174542850794567]
Error:[0.011262153166914955, 0.003957596160792344, 0.6174542850794454]
Error:[0.011262153166920988, 0.003957596160792344, 0.617454285079451]
Error:[0.011262153166914955, 0.003957596160792121, 0.6174542850794567]
Error:[0.011262153166920988, 0.003957596160792344, 0.617454285079451]
Error:[0.011262153166914955, 0.003957596160792121, 0.6174542850794567]
Error:[0.011262153166908923, 0.003957596160792121, 0.617454285079451]
Error:[0.011262153166914955, 0.0039575961607925675, 0.6174542850794399]
Error:[0.011262153166914955, 0.003957596160792791, 0.6174542850794454]
Error:[0.011262153166920988, 0.003957596160792791, 0.617454285079451]
Error:[0.011262153166914955, 0.0039575961607925675, 0.617454285079451]
Error:[0.011262153166914955, 0.0039575961607925675, 0.617454285079451]
Error:[0.011262153166920988, 0.003957596160792791, 0.6174542850794454]
Error:[0.011262153166914955, 0.0039575961607925675, 0.6174542850794567]
第六章 建立你的第一个深度神经网络:反向传播
- 交通信号灯问题
- 矩阵和矩阵的关系
- 梯度下降,分批梯度下降, 随机梯度下降
- 神经网络对相关性学习
- 过拟合
- 创建属于自己的相关性
- 反向传播:远程错误归因
- 线性与非线性
- 第一个深度网络
- 代码中的反向传播:信息整合
6.6学习整个数据集
import numpy as np
weights = np.array([0.5, 0.48, -0.7])
alpha = 0.1
streetlights = np.array([[1,0,1],
[0,1,1],
[0,0,1],
[1,1,1],
[0,1,1],
[1,0,1]])
walk_vs_stop = np.array([0, 1, 0, 1, 1, 0])
input = streetlights[0]
goal_prediction = walk_vs_stop[0]
for _ in range(40):
error_for_all_lights = 0
for row_index in range(len(walk_vs_stop)):
input = streetlights[row_index]
goal_prediction = walk_vs_stop[row_index]
prediction = input.dot(weights)
error = (prediction - goal_prediction) *
error_for_all_lights += error
delta = prediction - goal_prediction
weights = weights - delta * input * alpha
print("Predction: "+ str(prediction))
print("Error: " + str(error_for_all_lights) + '\n')
Predction: -0.002388697618122871
Predction: 0.9977021355600483
Predction: -0.01793930655497516
Predction: 1.0162137740080082
Predction: 0.9967128843019345
Predction: -0.0028012842268006904
Error: 0.0006143435674831474
Predction: -0.0022410273814405524
Predction: 0.9978745386023716
Predction: -0.016721264429884947
Predction: 1.0151127459893812
Predction: 0.9969492081270097
Predction: -0.0026256193329783125
Error: 0.00053373677328488
2e-6 代表 2*(10^-6)
6.7随机梯度下降,梯度下降,批量梯度下降
- 随机梯度下降:对每个样例计算一次 weight-deltas
- 梯度下降:针对整个数据集计算 weight-deltas 的平均值
- 批量梯度下降: 每个批次(通常8~256)计算weigh-delta的平均值
6.22你的第一个深度学习神经网络:反向传播代码
# 反向传播
import numpy as np
# 随机种子,让随后每次random的结果相同
np.random.seed(1)
# 当 x>0 返回x, 其他条件返回0
def relu(x):
return (x > 0) * x
# 当 input>0 时,返回1,其他条件返回0
def relu2deriv(input):
return input > 0
alpha = 0.2
hidden_size = 4
streetlights = np.array([[1,0,1],
[0,1,1],
[0,0,1],
[1,1,1]])
walk_vs_stop = np.array([1, 1, 0, 0])
weights_0_1 = 2*np.random.random((3, hidden_size)) - 1
weights_1_2 = 2*np.random.random((hidden_size, 1)) - 1
for iteration in range(60):
layer_2_error = 0
for i in range(len(streetlights)):
# s[1].shape = (3, )
# s[1:2].shape = (1, 3)
layer_0 = streetlights[i:i+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
layer_2_error += np.sum((layer_2 - walk_vs_stop[i] ) ** 2)
layer_2_delta = (walk_vs_stop[i] - layer_2)
layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
weights_1_2 += alpha * np.dot(layer_1.T, layer_2_delta)
weights_0_1 += alpha * np.dot(layer_0.T, layer_1_delta)
if(iteration % 10 == 9):
print("Error: " + str(layer_2_error))
Error: 0.6342311598444467
Error: 0.35838407676317513
Error: 0.0830183113303298
Error: 0.006467054957103705
Error: 0.0003292669000750734
Error: 1.5055622665134859e-05
正向传播:计算误差
反向传播:更新权重
第八章 学习信号,忽略噪声:正则化和批处理
1.从头训练一个神经网络
这个要用到内置于keras的mnist数据,没有的要装一个
conda install keras
import sys, numpy as np
from keras.datasets import mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
one_hot_labels = np.zeros((1000, 10))
for i, l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test), 28*28)/255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
test_labels[i][l] = 1
np.random.seed(1)
relu = lambda x : (x>=0) * x
relu2deriv = lambda x : x>=0
alpha = 0.005
iterations = 350
hidden_size = 40
pixels_per_image = 784
num_labels = 10
weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
for j in range(iterations):
error, correct_cnt = (0.0, 0)
for i in range(len(images)):
layer_0 = images[i:i+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
error += np.sum((labels[i:i+1] - layer_2) ** 2)
correct_cnt += int(np.argmax(layer_2) == np.argmax(labels[i:i+1]))
layer_2_delta = (labels[i:i+1] - layer_2)
layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
if(j % 10 == 0 or j == iterations - 1):
test_error, test_correct = (0.0, 0)
for k in range(len(test_images)):
layer_0 = test_images[k:k+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
print("Epochs: " + str(j)
+ " Train_Error: " + str(error/float(len(images)))[0:5]
+ " Train_Correct: " + str(correct_cnt/float(len(images)))
+ " Test_Error: " + str(test_error/float(len(test_images)))[0:5]
+ " Test_Correct: " + str(test_correct/float(len(test_images))) )
Epochs: 338 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.637 Test_Correct: 0.7125
Epochs: 339 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.637 Test_Correct: 0.7125
Epochs: 340 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 341 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 342 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 343 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 344 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 345 Train_Error: 0.109 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 346 Train_Error: 0.108 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 347 Train_Error: 0.108 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 348 Train_Error: 0.108 Train_Correct: 1.0 Test_Error: 0.645 Test_Correct: 0.71
Epochs: 349 Train_Error: 0.108 Train_Correct: 1.0 Test_Error: 0.653 Test_Correct: 0.7073
很明显,在训练集上,准确率高,在测试集上,准确率不高,这就过拟合了。
2.正则化
- 提前停止
- dropout
使用dropout正则化
import sys, numpy as np
from keras.datasets import mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
one_hot_labels = np.zeros((1000, 10))
for i, l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test), 28*28)/255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
test_labels[i][l] = 1
np.random.seed(1)
relu = lambda x : (x>=0) * x
relu2deriv = lambda x : x>=0
alpha = 0.005
iterations = 300
hidden_size = 40
pixels_per_image = 784
num_labels = 10
weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
for j in range(iterations):
error, correct_cnt = (0.0, 0)
for i in range(len(images)):
layer_0 = images[i:i+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
# dropout 随机生成一个矩阵, 只包含 0 或 1
dropout_mask = np.random.randint(2, size=layer_1.shape)
layer_1 *= dropout_mask
layer_2 = np.dot(layer_1, weights_1_2)
error += np.sum((labels[i:i+1] - layer_2) ** 2)
correct_cnt += int(np.argmax(layer_2) == np.argmax(labels[i:i+1]))
layer_2_delta = (labels[i:i+1] - layer_2)
layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
# 反向传播时也要处理
layer_1_delta *= dropout_mask
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
if(j % 10 == 0 or j == iterations - 1):
test_error, test_correct = (0.0, 0)
for k in range(len(test_images)):
layer_0 = test_images[k:k+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
print("Epochs: " + str(j)
+ " Train_Error: " + str(error/float(len(images)))[0:5]
+ " Train_Correct: " + str(correct_cnt/float(len(images)))
+ " Test_Error: " + str(test_error/float(len(test_images)))[0:5]
+ " Test_Correct: " + str(test_correct/float(len(test_images))) )
Epochs: 180 Train_Error: 0.453 Train_Correct: 0.782 Test_Error: 0.432 Test_Correct: 0.7955
Epochs: 190 Train_Error: 0.433 Train_Correct: 0.784 Test_Error: 0.436 Test_Correct: 0.7997
Epochs: 200 Train_Error: 0.442 Train_Correct: 0.796 Test_Error: 0.436 Test_Correct: 0.803
Epochs: 210 Train_Error: 0.441 Train_Correct: 0.79 Test_Error: 0.434 Test_Correct: 0.8031
Epochs: 220 Train_Error: 0.434 Train_Correct: 0.777 Test_Error: 0.426 Test_Correct: 0.8102
Epochs: 230 Train_Error: 0.431 Train_Correct: 0.803 Test_Error: 0.429 Test_Correct: 0.8058
Epochs: 240 Train_Error: 0.430 Train_Correct: 0.788 Test_Error: 0.436 Test_Correct: 0.8055
Epochs: 250 Train_Error: 0.433 Train_Correct: 0.789 Test_Error: 0.421 Test_Correct: 0.8053
Epochs: 260 Train_Error: 0.422 Train_Correct: 0.79 Test_Error: 0.422 Test_Correct: 0.8102
Epochs: 270 Train_Error: 0.430 Train_Correct: 0.803 Test_Error: 0.438 Test_Correct: 0.8062
Epochs: 280 Train_Error: 0.425 Train_Correct: 0.79 Test_Error: 0.431 Test_Correct: 0.7991
Epochs: 290 Train_Error: 0.428 Train_Correct: 0.792 Test_Error: 0.433 Test_Correct: 0.8028
Epochs: 299 Train_Error: 0.407 Train_Correct: 0.815 Test_Error: 0.412 Test_Correct: 0.8027
不知道为什么,我得到的数据收敛速度比书中慢一些,不过无伤大雅,还是能看出来,在测试集上,准确率得到了提升。
原因:书中的hidden_size此时是100,
amazing!
3. 批量梯度下降
import sys, numpy as np
from keras.datasets import mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
one_hot_labels = np.zeros((1000, 10))
for i, l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test), 28*28)/255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
test_labels[i][l] = 1
np.random.seed(1)
relu = lambda x : (x>=0) * x
relu2deriv = lambda x : x>=0
alpha = 0.01
iterations = 300
hidden_size = 100
pixels_per_image = 784
num_labels = 10
# 批量
batch_size = 100
weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
for j in range(iterations):
error, correct_cnt = (0.0, 0)
for i in range(int(len(images) / batch_size)):
# !!!!!!
batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))
layer_0 = images[batch_start:batch_end]
layer_1 = relu(np.dot(layer_0, weights_0_1))
# dropout 随机生成一个矩阵, 只包含 0 或 1
dropout_mask = np.random.randint(2, size=layer_1.shape)
layer_1 *= dropout_mask
layer_2 = np.dot(layer_1, weights_1_2)
error += np.sum((labels[batch_start:batch_end] - layer_2) ** 2)
# !!!!!!
for k in range(batch_size):
correct_cnt += int(np.argmax(layer_2[k:k+1]) ==
np.argmax(labels[batch_start+k:batch_end+k+1]))
layer_2_delta = (labels[batch_start:batch_end] - layer_2) / batch_size
layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * relu2deriv(layer_1)
# 反向传播时也要处理
layer_1_delta *= dropout_mask
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
if(j % 10 == 0 ):
test_error, test_correct = (0.0, 0)
for k in range(len(test_images)):
layer_0 = test_images[k:k+1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
print("Epochs: " + str(j)
+ " Train_Error: " + str(error/float(len(images)))[0:5]
+ " Train_Correct: " + str(correct_cnt/float(len(images)))
+ " Test_Error: " + str(test_error/float(len(test_images)))[0:5]
+ " Test_Correct: " + str(test_correct/float(len(test_images))) )
Epochs: 170 Train_Error: 0.543 Train_Correct: 0.736 Test_Error: 0.528 Test_Correct: 0.7524
Epochs: 180 Train_Error: 0.544 Train_Correct: 0.716 Test_Error: 0.523 Test_Correct: 0.7549
Epochs: 190 Train_Error: 0.537 Train_Correct: 0.718 Test_Error: 0.519 Test_Correct: 0.7577
Epochs: 200 Train_Error: 0.535 Train_Correct: 0.717 Test_Error: 0.514 Test_Correct: 0.7614
Epochs: 210 Train_Error: 0.530 Train_Correct: 0.727 Test_Error: 0.511 Test_Correct: 0.763
Epochs: 220 Train_Error: 0.524 Train_Correct: 0.721 Test_Error: 0.507 Test_Correct: 0.7636
Epochs: 230 Train_Error: 0.519 Train_Correct: 0.747 Test_Error: 0.503 Test_Correct: 0.7667
Epochs: 240 Train_Error: 0.521 Train_Correct: 0.736 Test_Error: 0.500 Test_Correct: 0.7681
Epochs: 250 Train_Error: 0.512 Train_Correct: 0.744 Test_Error: 0.497 Test_Correct: 0.7711
Epochs: 260 Train_Error: 0.515 Train_Correct: 0.732 Test_Error: 0.495 Test_Correct: 0.7727
Epochs: 270 Train_Error: 0.503 Train_Correct: 0.76 Test_Error: 0.491 Test_Correct: 0.777
Epochs: 280 Train_Error: 0.513 Train_Correct: 0.758 Test_Error: 0.490 Test_Correct: 0.7762
Epochs: 290 Train_Error: 0.502 Train_Correct: 0.771 Test_Error: 0.487 Test_Correct: 0.7798
这里还是和书上的收敛熟速度不同,不知道为什么。
书上代码有个坑,反向传播的权重增量和权重更新,一共5行代码缩进了一个tab。
这本书上,代码小错误挺多的,不知道是不是出版社的锅。
第九章 概率和非线性建模:激活函数
1.什么是激活函数
简单来说,激活函数指的是任何一个可以接受一个数字并返回另一个数字的函数。
但并不是所有函数都可用作激活函数。
激活函数需要满足几个条件:
- 函数必须连续且定义域是无穷
- 好的激活函数是单调的,不会改变方向
- 好的激活函数是非线性的
- 合适的激活函数(及其导数)应该可以高效计算
2.标准隐藏层激活函数
- 基础激活函数:sigmoid
- 对隐藏层来说,tanh 比 sigmoid 更合适
keras 的各种函数选择:
| 问题类型 | 最后一层激活 | 损失函数 |
|---|---|---|
| 二分类问题 | sigmoid | binary_crossentropy |
| 多分类、单标签问题 | softmax | categorical_crossentropy |
| 多分类、多标签问题 | sigmoid | binary_crossentropy |
| 回归到任意值 | 无 | mse |
| 回归到0~1范围内 | sigmoid | mse或binary_crossentropy |
3. 升级MNIST网络
本章用表现更好的激活函数升级了前面的代码,还是有进步的。
书上说,用tanh时,要把权重矩阵的随机分布调整在-0.01~0.01之间,tanh喜欢随机分布范围更窄的初始化。
#tanh
weights_0_1 = 0.02 * np.random.random((pixels_per_image, hidden_size)) - 0.01
weights_1_2 = 0.02 * np.random.random((hidden_size, num_labels)) - 0.01
#relu
weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
或许是因为和函数在这些范围内的导数大小有关?
不知道。
我下面没改权重,得出的结果也还行。
import sys, numpy as np
from keras.datasets import mnist
np.random.seed(1)
bian
# 数据预处理,one-hot编码
(x_train, y_train),(x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000,28*28)/255, y_train[0:1000])
one_hot_labels = np.zeros((1000, 10))
for i, l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test), 28*28)/255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
test_labels[i][l] = 1
# 激活函数
def tanh(x):
return np.tanh(x)
def tanh2deriv(output):
return 1 - (output ** 2)
def softmax(x):
temp = np.exp(x)
return temp / np.sum(temp, axis=1, keepdims=True)
# 各种参数
alpha = 0.02
iterations = 300
hidden_size = 100
pixels_per_image = 784
num_labels = 10
batch_size = 100
# 随机初始化权重
weights_0_1 = 0.2 * np.random.random((pixels_per_image, hidden_size)) - 0.1
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
for j in range(iterations):
# 训练
error, correct_cnt = (0.0, 0)
for i in range(int(len(images) / batch_size)):
batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))
# 网络层结构
layer_0 = images[batch_start:batch_end]
layer_1 = tanh(np.dot(layer_0, weights_0_1))
dropout_mask = np.random.randint(2, size=layer_1.shape)
layer_1 *= dropout_mask
layer_2 = softmax(np.dot(layer_1, weights_1_2))
# 计算误差 和 准确率
error += np.sum((labels[batch_start:batch_end] - layer_2) ** 2)
for k in range(batch_size):
correct_cnt += int(np.argmax(layer_2[k:k+1]) ==
np.argmax(labels[batch_start+k:batch_end+k+1]))
# 反向传播,权重增量
layer_2_delta = (labels[batch_start:batch_end] - layer_2) / batch_size
layer_1_delta = np.dot(layer_2_delta, weights_1_2.T) * tanh2deriv(layer_1)
layer_1_delta *= dropout_mask
# 更新权重
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
# 测试,使用相同的网络结果,不需要做正则化处理
# 为什么会有效呢?
# 因为训练的过程中,学习到了新的权重,这些权重就是精华所在!
if(j % 10 == 0 ):
test_error, test_correct = (0.0, 0)
for k in range(len(test_images)):
layer_0 = test_images[k:k+1]
layer_1 = tanh(np.dot(layer_0, weights_0_1))
layer_2 = softmax(np.dot(layer_1, weights_1_2))
test_error += np.sum((test_labels[k:k+1] - layer_2) ** 2)
test_correct += int(np.argmax(layer_2) == np.argmax(test_labels[k:k+1]))
print("Epochs: " + str(j)
+ " Train_Error: " + str(error/float(len(images)))[0:5]
+ " Train_Correct: " + str(correct_cnt/float(len(images)))
+ " Test_Error: " + str(test_error/float(len(test_images)))[0:5]
+ " Test_Correct: " + str(test_correct/float(len(test_images))) )
Epochs: 170 Train_Error: 0.179 Train_Correct: 0.89 Test_Error: 0.231 Test_Correct: 0.8525
Epochs: 180 Train_Error: 0.179 Train_Correct: 0.892 Test_Error: 0.226 Test_Correct: 0.8562
Epochs: 190 Train_Error: 0.173 Train_Correct: 0.892 Test_Error: 0.221 Test_Correct: 0.8587
Epochs: 200 Train_Error: 0.153 Train_Correct: 0.916 Test_Error: 0.217 Test_Correct: 0.8596
Epochs: 210 Train_Error: 0.168 Train_Correct: 0.897 Test_Error: 0.214 Test_Correct: 0.8616
Epochs: 220 Train_Error: 0.150 Train_Correct: 0.903 Test_Error: 0.211 Test_Correct: 0.863
Epochs: 230 Train_Error: 0.145 Train_Correct: 0.916 Test_Error: 0.209 Test_Correct: 0.8633
Epochs: 240 Train_Error: 0.156 Train_Correct: 0.904 Test_Error: 0.206 Test_Correct: 0.8652
Epochs: 250 Train_Error: 0.140 Train_Correct: 0.92 Test_Error: 0.204 Test_Correct: 0.8664
Epochs: 260 Train_Error: 0.137 Train_Correct: 0.919 Test_Error: 0.203 Test_Correct: 0.8654
Epochs: 270 Train_Error: 0.135 Train_Correct: 0.924 Test_Error: 0.199 Test_Correct: 0.869
Epochs: 280 Train_Error: 0.140 Train_Correct: 0.917 Test_Error: 0.197 Test_Correct: 0.8703
Epochs: 290 Train_Error: 0.139 Train_Correct: 0.92 Test_Error: 0.196 Test_Correct: 0.8706
这是改权重的
Epochs: 170 Train_Error: 0.198 Train_Correct: 0.897 Test_Error: 0.261 Test_Correct: 0.8355
Epochs: 180 Train_Error: 0.184 Train_Correct: 0.899 Test_Error: 0.253 Test_Correct: 0.8395
Epochs: 190 Train_Error: 0.179 Train_Correct: 0.898 Test_Error: 0.246 Test_Correct: 0.8433
Epochs: 200 Train_Error: 0.170 Train_Correct: 0.912 Test_Error: 0.239 Test_Correct: 0.8466
Epochs: 210 Train_Error: 0.164 Train_Correct: 0.906 Test_Error: 0.234 Test_Correct: 0.8494
Epochs: 220 Train_Error: 0.154 Train_Correct: 0.917 Test_Error: 0.230 Test_Correct: 0.8506
Epochs: 230 Train_Error: 0.149 Train_Correct: 0.917 Test_Error: 0.226 Test_Correct: 0.8525
Epochs: 240 Train_Error: 0.143 Train_Correct: 0.918 Test_Error: 0.222 Test_Correct: 0.8542
Epochs: 250 Train_Error: 0.139 Train_Correct: 0.926 Test_Error: 0.219 Test_Correct: 0.856
Epochs: 260 Train_Error: 0.131 Train_Correct: 0.929 Test_Error: 0.216 Test_Correct: 0.8573
Epochs: 270 Train_Error: 0.127 Train_Correct: 0.933 Test_Error: 0.213 Test_Correct: 0.8584
Epochs: 280 Train_Error: 0.124 Train_Correct: 0.933 Test_Error: 0.211 Test_Correct: 0.8589
Epochs: 290 Train_Error: 0.122 Train_Correct: 0.928 Test_Error: 0.208 Test_Correct: 0.8592
第十章 卷积神经网络概论:关于边与角的神经学习
1.基于NumPy的简单实现
'''
conv2D 基于 numpy 的简单实现
'''
import numpy as np
from keras.datasets import mnist
np.random.seed(1)
(x_train, y_train),(x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000, 28*28)/255, y_train)
# 完成one_hot编码
one_hot_labels = np.zeros((len(labels), 10))
for i,l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
# 准备test数据和标签
test_images = x_test.reshape(len(x_test), 28*28) / 255
test_labels = np.zeros((len(y_test), 10))
for i,l in enumerate(y_test):
test_labels[i][l] = 1
def tanh(x):
return np.tanh(x)
def tanh2deriv(output):
return 1 - (output**2)
def softmax(x):
temp = np.exp(x)
return temp / np.sum(temp, axis=1, keepdims=True)
alpha, iterations = (2, 300)
pixels_per_image, num_labels = (784, 10)
batch_size = 128
input_rows = 28
input_cols = 28
kernel_rows = 3
kernel_cols = 3
num_kernels = 16
# 隐藏层节点个数
hidden_size = ((input_rows - kernel_rows) * (input_cols - kernel_cols)) * num_kernels
# num_kernels 个 卷积核
kernels = 0.02 * np.random.random((kernel_rows * kernel_cols, num_kernels)) - 0.01
# 权重
weights_1_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
def get_image_section(layer, row_from, row_to, col_from, col_to):
'''
为整批图像选择相同的子区域
'''
section = layer[:,row_from:row_to,col_from:col_to]
return section.reshape(-1, 1, row_to - row_from, col_to - col_from)
# 开始训练
for j in range(iterations):
correct_cnt = 0
for i in range(int(len(images) / batch_size)):
batch_start, batch_end = ((i * batch_size), ((i+1) * batch_size))
layer_0 = images[batch_start:batch_end]
layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
layer_0.shape #多余的?
sects = list()
for row_start in range(layer_0.shape[1] - kernel_rows):
for col_start in range(layer_0.shape[2] - kernel_cols):
sect = get_image_section(layer_0,
row_start,
row_start+kernel_rows,
col_start,
col_start+kernel_cols)
sects.append(sect)
expanded_input = np.concatenate(sects, axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0] * es[1], -1)
kernel_output = flattened_input.dot(kernels)
layer_1 = tanh(kernel_output.reshape(es[0], -1))
dropout_mask = np.random.randint(2, size=layer_1.shape)
layer_1 *= dropout_mask
layer_2 = softmax(np.dot(layer_1, weights_1_2))
for k in range(batch_size):
labelset = labels[batch_start+k : batch_start+k+1]
_inc = int(np.argmax(layer_2[k:k+1]) == np.argmax(labelset))
correct_cnt += _inc
layer_2_delta = (labels[batch_start : batch_end] - layer_2) / (batch_size * layer_2.shape[0])
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * tanh2deriv(layer_1)
layer_1_delta *= dropout_mask
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
k_update = flattened_input.T.dot(l1d_reshape)
kernels -= alpha * k_update
# 测试结果
test_correct_cnt = 0
for i in range(len(test_images)):
layer_0 = test_images[i:i+1]
layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
layer_0.shape #多余?
sects = list()
for row_start in range(layer_0.shape[1] - kernel_rows):
for col_start in range(layer_0.shape[2] - kernel_cols):
sect = get_image_section(layer_0,
row_start,
row_start+kernel_rows,
col_start,
col_start+kernel_cols)
sects.append(sect)
expanded_input = np.concatenate(sects, axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0] * es[1], -1)
kernel_output = flattened_input.dot(kernels)
layer_1 = tanh(kernel_output.reshape(es[0], -1))
layer_2 = np.dot(layer_1, weights_1_2)
test_correct_cnt += int(np.argmax(layer_2) ==
np.argmax(test_labels[i:i+1]))
print("I: " + str(j)
+ " Train_acc: " + str(correct_cnt / float(len(images)))
+ " Test_acc: " + str(test_correct_cnt / float(len(test_images))) )
I: 0 Train_acc: 0.055 Test_acc: 0.0288
I: 1 Train_acc: 0.037 Test_acc: 0.0273
I: 2 Train_acc: 0.037 Test_acc: 0.028
I: 3 Train_acc: 0.04 Test_acc: 0.0292
I: 4 Train_acc: 0.046 Test_acc: 0.0339
I: 5 Train_acc: 0.068 Test_acc: 0.0478
I: 6 Train_acc: 0.083 Test_acc: 0.076
I: 7 Train_acc: 0.096 Test_acc: 0.1316
I: 8 Train_acc: 0.127 Test_acc: 0.2137
I: 9 Train_acc: 0.148 Test_acc: 0.2941
I: 10 Train_acc: 0.181 Test_acc: 0.3563
I: 11 Train_acc: 0.209 Test_acc: 0.4023
代码有点长,远离相对来说还是比较好懂的,结果出来的特别的慢,cup占用率直接干到100%,运算量太大了吗?那为什么在用keras框架跑conv2d不会出现这么高的占用率?优化吗?
这个世界需要框架!
