Advertisement

【论文阅读笔记】[Semantic Segmentation] FC-DenseNet阅读笔记

阅读量:

jingwenlai 2018-8-15

简介 & 主要贡献

Tiramisu - FC-DenseNet (https://arxiv.org/abs/1611.09326),

经典的semantic image segmentation的方法基于CNN,主要包含3个组件:

(a) 一个downsampling path,主要用来提取特征

(b) 一个upsampling path,被训练用来在输出结果中逐步恢复输入精度

(c) 可选地,一个后处理模块(如Conditional Radom Fields)来优化模型输出。

经典的U-Net即是这种结构。

DenseNet在图像分类任务中展现了卓越的性能,并通过其网络架构设计更为精准的同时实现了更快捷的训练过程。

该网络无需依赖预训练模型或后续处理,在CamVid和Catech数据集上实现了最佳性能。此外,该架构具有较少的参数数量,并且具有良好的可训练性。

代码与实验:https://github.com/SimJeg/FC-DenseNet/blob/master/train.py

A highly effective PyTorch-based alternative solution & Explanation: https://github.com/bfortuner/pytorch_tiramisu

实现思路及方法

其中,DenseBlock的结构如下:

从DenseNet至FC-DenseNet

特征规模的提升伴随着池化操作后带来的空间分辨率减少(这也是常见的操作)

downsampling path的最后一层称之为bottleneck

In order to restore the input resolution, fully convolutional networks (FCNs) typically employ upsampling operations such as ConvTranspose and unpooling. Within the FC-DenseNet architecture, replacing the traditional convolution operation with a dense block is a key feature, and this process along with the associated upsampling operations is termed as transition up.

为了确保内存不足的问题不发生,在应用转置卷积时仅使用从最后一个密集块获得的特征图,并不会应用于到目前为止所有拼接的特征图。

DenseBlock的最后一层通过聚合前一层中所有密集块中的信息(这是实现一个关键依据),即skip connections仅连接主要部分以减少计算负担)。具体的连接方式可参考上述公式([1])。

By employing this method, our upsampling path strategy enables the construction of very deep FC-DenseNets without experiencing feature map explosion. Another optional method involves using consecutive transposed convolutions and skip connections in a similar manner (i.e., akin to U-Net and FCN-like architectures).

The training of the model is achieved through the minimization of the pixel-wise cross-entropy loss.

完整的FC-DenseNet的结构如下:

训练的一些细节

通过定义如下的pixel-wise cross-entropy loss:

加上一些训练的小技巧:

1. initialize models using HeUniform

2. train with RMSprop, initial lr: 1e-3, 0.995 decay after each epoch

3. data augmentation: random crops , veritcal flips.

4. finetune models with full size images and lr: 1e-4

5. use validation set to earlystop the training and the fine-tuning.

结果

CamVid包含了包括以下几部分的数据:用于训练的帧数共计有367帧(Training),用于验证的有101帧(Validation),以及用于测试的有约有约有约有约有约有约有的样本共有约有约有约有约有约有的数目达到多少?该数据集的整体分辨率设置为...×...像素(Resolution)。实验采用基于FC-DenseNet架构的设计方案,在使用大小为...×...像素并截取图像块的方式(Crops)进行模型训练的基础上,并结合了批处理规模设定为...(Batch Size)。经过模型的学习与优化后,在完成所有参数微调的基础上(Fine-tuning),最终实现了对测试数据集的有效学习与预测能力(Testing Performance)。[TODO: 如何在PyTorch中实现这一细节的具体操作?]

以及Gatech database.

总结 & 扩展

The central concept behind DenseNets is embodied in dense blocks that execute a sequential merging process of feature maps. We developed an upsampling pathway to alleviate the issue where the number of feature maps would linearly increase if we naively extend DenseNets.

The resulting network is extremely deep and significantly fewer in terms of parameters compared to classical models by approximately a factor of 10. It also requires no additional post-processing steps, pretraining, or temporal data.

全部评论 (0)

还没有任何评论哟~