斯坦福cs231课程笔记(下)
Lecture 8: Deep Learning Software
Lecture 9: CNN Architectures
AlexNet

VGGNet

GoogLeNet

- 22 total levels equipped with weights (each of which is a parallel layer in an Inception module)
Inception module: architect an effective network structure (hierarchical structure) and integrate these modules in an integrated manner.

Naive Inception module: 原始的inception module 计算复杂度太高
Apply parallel filter operations on the input from previous layer:
- 多种卷积尺寸(1×1、3×3和5×5)
- 池化操作([...]^{})
深度方向上连接所有滤波器的输出
- 池化操作([...]^{})
What is the problem with this? Computational complexity
Solution: bottleneck layers which apply 1x1 convolutions to reduce feature depth
- Preserves spatial dimensions, reduces depth! 减小深度
- Projects depth to lower dimension (combination of 32 feature maps)


ResNet

当我们继续在plain CNN上stacking deeper layers,会发生什么?

56-layer model performs worse on both training and test error
-> The deeper model performs worse, but it’s not caused by overfitting!
在训练集和测试集上的性能均呈现下降趋势,并且这表明问题并非由过拟合所导致。
问题出在 optimization上,更深的网络更难优化
Deeper-level models are capable of performing equally or better than shallow-level models.
A constructed solution involves replicating the learned layers from the shallower model and setting further layers to identity mappings.


Training ResNet in practice:
- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout used
Lecture 10
RNN


LSTM



Lecture 11: Detection and Segmentation
R-CNN
目标检测
Lecture 12: Visualizing and Understanding
Lecture 13: 生成模型
生成模型是一种无监督(不需要external label)
Given training data, generate new samples from same distribution
Want to learn p model(x) similar to p data(x)
- pixelRNN/CNN 实现了对概率密度函数的直接计算,
- 显式的密度估计方法需要明确定义并求解pmodel(x)。
- GAN采用隐式的方式对概率分布进行建模,在这种设定下,
生成对抗网络通过训练生成样本来学习pmodel(x),无需显式地定义该分布。

PixelRNN and PixelCNN




Variational Autoencoders(VAE)



After training, throw away decoder

Auto encoders are capable of encoding data, and they are trained to utilize latent features to set up a supervised learning framework.
Features represent variations in training data. Are they capable of generating new images using an autoencoder?

We want to estimate the true parameters of this generative model.
How should we represent this model?
Choose prior p(z) to be simple, e.g.Gaussian.
Reasonable for latent attributes, e.g. pose, how much smile.
条件概率分布p(x|z)具有高度复杂性,并且能够生成图像;因此,我们将其表示为神经网络的形式。







Calculating the computational bounds for a given batch of input data during the forward pass is essential.


Diagonal prior on z => independent latent variables
Different dimensions of z encode interpretable factors of variation

Generative Adversarial Networks(GANs)
GANs: don’t work with any explicit density function!
Instead, adopt a game-theoretic methodology to be capable of generating from the training distribution by modeling interactions between two agents.




Aside: 同时训练两个网络是一项具有挑战性的任务, 可能会导致不稳定现象出现. 选择那些拥有更优损失函数设计的目标能够提高训练效果, 并已成为当前研究的热门领域.


GAN从架构上来看既巧妙又简单(值得注意的是与其他经典工作Idea存在相似之处[6~7]),非常容易理解
整个模型只有两个部件:1.生成器G;2.判别器D。
生成模型并非久经考验,在此之前已有相当长的时间用于其发展与完善。然而由此可知也可想而知的是目前所使用的生成器也不过只是皮毛阶段而已。基于这些前提条件我们设想一个能够有效模仿真实数据分布的假设数据分布函数G它将作为整个系统的核心模块存在其主要任务在于通过不断迭代优化使得所生成的数据尽可能贴近真实样例从而实现对未知数据的有效拟合与预测功能
在没有判别器D的情况下,在每次迭代中生成器都会返回当前生成样本与真实样本之间的差异(将这一差异转化为损失值用于参数优化)。
然而,在此之前的情况发生了改变。判别器D的引入打破了这一局面,并且其核心目标是实现对生成数据样本与真实数据样本之间的精准区分。
而此时生成器G的训练目标则由通过最小化"生成与真实样本之间的差异"转而试图降低判别器D对其识别能力(此时训练的目标函数中包含判别器D的输出)。
GAN模型的大体框架如下图所示:

总结:

