Advertisement

[深度学习论文笔记][Image Classification] Very Deep Convolutional Networks for Large-Scale Image Recognitio

阅读量:

Simonyan, Karen and Andrew Zisserman. "High-depth convolutional neural networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). [Citations: 1986].

1 Motivation

Ways to Improve Accuracy

2 Architecture

[In a Nutshell (138M Parameters)]
• Input (3 × 224 × 224).
• conv1-1 (64@3 × 3, s1, p1), relu1-1.
• conv1-2 (64@3 × 3, s1, p1), relu1-2.
• pool1 (2 × 2, s2), output 64 × 112 × 112.
• conv2-1 (128@3 × 3, s1, p1), relu2-1.
• conv2-2 (128@3 × 3, s1, p1), relu2-2.
• pool2 (2 × 2, s2), output 128 × 56 × 56.
• conv3-1 (256@3 × 3, s1, p1), relu3-1.
• conv3-2 (256@3 × 3, s1, p1), relu3-2.
• conv3-3 (256@3 × 3, s1, p1), relu3-3.
• pool3 (2 × 2, s2), output 256 × 28 × 28.
• conv4-1 (512@3 × 3, s1, p1), relu4-1.
• conv4-2 (512@3 × 3, s1, p1), relu4-2.
• conv4-3 (512@3 × 3, s1, p1), relu4-3.
• pool4 (2 × 2, s2), output 512 × 14 × 14.
• conv5-1 (512@3 × 3, s1, p1), relu5-1.
• conv5-2 (512@3 × 3, s1, p1), relu5-2.
• conv5-3 (512@3 × 3, s1, p1), relu5-3.
• pool5 (2 × 2, s2), output 512 × 7 × 7 = 25088.
• fc6 (4096), relu6, drop6.
• fc7 (4096), relu7, drop7.
• fc8 (1000).

Data Preparation (Training)

Data Preparation (Testing)

[Data Augmentation (Training)]

• Random crop.
• Horizontal flips.
• Color jittering.

Data Augmentation (Testing)

• Horizontal flip the images and average the final scores.

[Why 3 × 3 conv?] Stacked conv layers have a large receptive field.
• Two 3 × 3 layers — 5 × 5 receptive field.
• Three 3 × 3 layers — 7 × 7 receptive field.
• But stacked 3 × 3 layers have more non-linearity, which make the decision function more discriminative.
Less parameters
• E.g., both the input and output size are D × H × W .
• A single 7 × 7 layer has parameters: D^2 × 7 × 7 = 49 D^2 .
• Three 3 × 3 layers have parameters: 3 × (D 2 × 3 × 3) = 27 D^2 .

3 Training Details

使用动量为α= \text{动量}的SGD算法进行优化。
• 批次大小设定为B=256
• 权重衰减系数设为\lambda= \text{权重衰减}
• 初始化网络中前向传播中的前K个卷积层和后L个全连接层参数值来自预训练模型。
• 其余权重从均值为\mu= \text{均值}、方差\sigma^2= \text{方差}的正态分布中随机初始化,偏置参数则置零。
• 基础学习率设置为\eta= \text{基础学习率}
• 经过E=74轮训练。
• 当验证误差出现平台期(连续出现三次)时,将学习率除以十。

4 Results

Second place of ILSVRC-2014, for top-5 error
• 1 CNN: 7.0%.
• 7 CNNs: 7.3%.
• 2 best CNNs: 6.8%.

5 Analysis

• 当卷积神经网络应用于一块作物时,所得到的卷积特征图被零填充。
• 对于完全卷积的情况,在同一块作物的填充自然来源于图像周围部分(由于卷积操作和空间池化的作用),从而显著地增加了整个网络的感受野范围(receptive field),因此能够捕获更多上下文信息。

[LRN Does Not Improve Accuracy]

[Deeper is Better]
And 3 × 3 conv layer is better than corresponding 1 × 1 conv layer.

[Muti-Scale]
• Multi-Scale policy in training is better than fixed scale policy.

• Multi-Scale evaluation is better than fixed scale policy.

集合由两个表现最佳的模型组成相对于包含所有模型的方法来说更为优秀

6 References

[1]. ILSVRC2014 Talk. https://www.youtube.com/watch?v=j1jIoHN3m0s.

全部评论 (0)

还没有任何评论哟~