[论文笔记]DeconvNet语义分割

阅读量：

《2015_Noh_Cite=4488_Learning deconvolution network for semantic segmentation》

铺垫和引入

encoder基于预训练的卷积神经网络模块VGG-16架构执行特征提取任务；而解码器则通过反向传播技术中的反卷积操作和逆池化操作实现上采样过程以恢复原始图像细节

将候选区域（通过edge box进行标记）输入到训练后的网络中，在整个图像中各区域被独立处理并拼接而成。这样就能缓解物体大小不一带来的分割困难的同时提升了现有基于FCN技术的表现

这种方法存在不足。导致自动化水平有所降低。因此建议避免采用这种方法。

FCN的critical limitations

（1）对于某些较大的物体（objects），我的感受野（sensory field）设置为预定义的固定尺寸（fixed-size）可能会导致图像分割出现错误（errors）。这是因为感受野不足以覆盖整个大物体的全部区域（whole object），导致系统只能识别到部分特征。这样就会出现将一个整体物体误判为两个分离物体的情况（splitting），即把一个东西分割成了两个独立的对象。对于大物体而言，在图像分割过程中仅利用局部信息进行预测（Label prediction is done with only local information for large objects），并且属于同一对象的不同区域可能被赋予不同的标签（different-不一致的 labels）。如图所示，你的感受野过小且无法捕捉到大物体的全局特征会导致多个小区域被误判为独立的对象（multiple small regions are incorrectly labeled as separate objects）。在训练过程中（train by cycle person），每个像素点都会被赋予对应的标签信息，在这种情况下即使小区域也可能会被正确识别出来。

（2）尺寸过小的对象容易受到感知范围的影响而被误判为背景部分。由于我的感知范围相对较大，在图像中这些微物通常会伴随较多的信息和细节特征而难以正确识别。因此，在图中的这些人尺寸过小的情况下, FCN往往将这些微物误判为背景区域中的纹理特征，并未能正确识别出它们。

该特征图经过多次下采样处理后变得模糊不清、分辨率显著降低且图像尺寸急剧缩小，在这一过程中物体的细节信息容易被丢失。随后将此模糊特征图输入反卷积层进行上采样处理后仍难以恢复原有的精细分割效果。

模型结构DeconvNet

Encoder和Decoder部分是在模仿VGG-16

Encoder(其名称为Convolution network)：该系统连续应用两次'卷积–卷积–池化'模块，并在第三次时重复应用'卷积–卷积–卷积–池化'结构。

在中间部分设置了两个全连接层。然而我认为这并非必要之举。其一会导致模型参数数量急剧上升其二将一个二维图像转换为一维向量并没有带来实质性的价值对模型性能提升的作用也不够显著

Decoder（其通常被称作Deconvolution network）:其在工作流程中执行了三种不同的upsampling operation序列：每种序列包含三个 convolutional layers；随后又进行了两种不同的upsampling operation序列：每种序列包含两个 convolutional layers.

一个softmax分类层

池化的作用：

去除上一层噪声的具体方式是什么？卷积网络中的池化操作如何通过使用感受野内的单一代表性值来提取所需特征？卷积网络中的池化操作旨在通过抽象化处理感受野内的激活值。

然而池化操作同样削弱了网络对输入图像位置信息的捕捉能力。spatial information in the receptive field is diminished during pooling operations.

代码实现

各层网络的参数和尺寸表

复制代码

 import torch

    
 import torchvision.models as models
    
 from torch import nn
    
  
    
  
    
 vgg16_pretrained = models.vgg16(pretrained=False)
    
  
    
  
    
 def decoder(input_channel, output_channel, num=3):
    
     if num == 3:
    
     decoder_body = nn.Sequential(
    
         nn.ConvTranspose2d(input_channel, input_channel, 3, padding=1),
    
         nn.ConvTranspose2d(input_channel, input_channel, 3, padding=1),
    
         nn.ConvTranspose2d(input_channel, output_channel, 3, padding=1))
    
     elif num == 2:
    
     decoder_body = nn.Sequential(
    
         nn.ConvTranspose2d(input_channel, input_channel, 3, padding=1),
    
         nn.ConvTranspose2d(input_channel, output_channel, 3, padding=1))
    
  
    
     return decoder_body
    
  
    
  
    
 class VGG16_deconv(torch.nn.Module):
    
     def __init__(self):
    
     super(VGG16_deconv, self).__init__()
    
  
    
     pool_list = [4, 9, 16, 23, 30]
    
     for index in pool_list:
    
         vgg16_pretrained.features[index].return_indices = True
    
  
    
     self.encoder1 = vgg16_pretrained.features[:4]
    
     self.pool1 = vgg16_pretrained.features[4]
    
  
    
     self.encoder2 = vgg16_pretrained.features[5:9]
    
     self.pool2 = vgg16_pretrained.features[9]
    
  
    
     self.encoder3 = vgg16_pretrained.features[10:16]
    
     self.pool3 = vgg16_pretrained.features[16]
    
  
    
     self.encoder4 = vgg16_pretrained.features[17:23]
    
     self.pool4 = vgg16_pretrained.features[23]
    
  
    
     self.encoder5 = vgg16_pretrained.features[24:30]
    
     self.pool5 = vgg16_pretrained.features[30]
    
  
    
     self.classifier = nn.Sequential(
    
         torch.nn.Linear(512 * 11 * 15, 4096),
    
         torch.nn.ReLU(),
    
         torch.nn.Linear(4096, 512 * 11 * 15),
    
         torch.nn.ReLU(),
    
     )
    
  
    
     self.decoder5 = decoder(512, 512)
    
     self.unpool5 = nn.MaxUnpool2d(2, 2)
    
  
    
     self.decoder4 = decoder(512, 256)
    
     self.unpool4 = nn.MaxUnpool2d(2, 2)
    
  
    
     self.decoder3 = decoder(256, 128)
    
     self.unpool3 = nn.MaxUnpool2d(2, 2)
    
  
    
     self.decoder2 = decoder(128, 64, 2)
    
     self.unpool2 = nn.MaxUnpool2d(2, 2)
    
  
    
     self.decoder1 = decoder(64, 12, 2)
    
     self.unpool1 = nn.MaxUnpool2d(2, 2)
    
  
    
     def forward(self, x):                       # 3, 352, 480
    
     encoder1 = self.encoder1(x)             # 64, 352, 480
    
     output_size1 = encoder1.size()          # 64, 352, 480
    
     pool1, indices1 = self.pool1(encoder1)  # 64, 176, 240
    
  
    
     encoder2 = self.encoder2(pool1)         # 128, 176, 240
    
     output_size2 = encoder2.size()          # 128, 176, 240
    
     pool2, indices2 = self.pool2(encoder2)  # 128, 88, 120
    
  
    
     encoder3 = self.encoder3(pool2)         # 256, 88, 120
    
     output_size3 = encoder3.size()          # 256, 88, 120
    
     pool3, indices3 = self.pool3(encoder3)  # 256, 44, 60
    
  
    
     encoder4 = self.encoder4(pool3)         # 512, 44, 60
    
     output_size4 = encoder4.size()          # 512, 44, 60
    
     pool4, indices4 = self.pool4(encoder4)  # 512, 22, 30
    
  
    
     encoder5 = self.encoder5(pool4)         # 512, 22, 30
    
     output_size5 = encoder5.size()          # 512, 22, 30
    
     pool5, indices5 = self.pool5(encoder5)  # 512, 11, 15
    
  
    
     pool5 = pool5.view(pool5.size(0), -1)
    
     fc = self.classifier(pool5)
    
     fc = fc.reshape(1, 512, 11, 15)
    
  
    
     unpool5 = self.unpool5(input=fc, indices=indices5, output_size=output_size5)    # 512, 22, 30
    
     decoder5 = self.decoder5(unpool5)   # 512, 22, 30
    
  
    
     unpool4 = self.unpool4(input=decoder5, indices=indices4, output_size=output_size4)  # 512, 44, 60
    
     decoder4 = self.decoder4(unpool4)   # 256, 44, 60
    
  
    
     unpool3 = self.unpool3(input=decoder4, indices=indices3, output_size=output_size3)  # 256, 88, 120
    
     decoder3 = self.decoder3(unpool3)   # 128, 88, 120
    
  
    
     unpool2 = self.unpool2(input=decoder3, indices=indices2, output_size=output_size2)  # 128, 176, 240
    
     decoder2 = self.decoder2(unpool2)  # 64, 176, 240
    
  
    
     unpool1 = self.unpool1(input=decoder2, indices=indices1, output_size=output_size1)  # 64, 352, 480
    
     decoder1 = self.decoder1(unpool1)  # 12, 352, 480
    
  
    
     return decoder1
    
  
    
  
    
 if __name__ == "__main__":
    
     import torch as t
    
  
    
     rgb = t.randn(1, 3, 352, 480)
    
  
    
     net = VGG16_deconv()
    
  
    
     out = net(rgb)
    
  
    
     print(out.shape)

全部评论 (0)

还没有任何评论哟~

[论文笔记]DeconvNet语义分割

《2015NohCite=4488Learningdeconvolutionnetworkforsemanticsegmentation》铺垫和引入 encoder使用VGG16的卷积层进行学习，d...

论文笔记：图像分割——SegNet & DeconvNet

①Segnet:Adeepconvolutionalencoderdecoderarchitectureforimagesegmentation ②LearningDeconvolutionNetwo...

语义分割-DeconvNet

DeconvNet 论文原文 0\.简介 1\.网络架构 2\.反卷积网络Unpooling 3\.反卷积网络Deconvolution 4\.网络训练 5\.网络推理 6\.与FCN集成 7\.实验...

语义分割 - SegNet / DeconvNet

这里写目录标题论文精读先验知识编码器解码器结构（EncoderDecoderframe）随机丢弃层（Dropout）反池化（unpooling） SegNet 本文创新点算法架构 Deco...

英语语义分割_[论文笔记] 图像语义分割——RefineNet（CVPR 2017）

[论文笔记]图像语义分割——RefineNetCVPR2017 bluestyle•2019年05月04日介绍对于语义分割任务而言，高层次的语义特征有助于图像区域的分类识别，而低层次的视觉特征有助...

[论文笔记] 弱监督语义分割半监督语义分割

Adversariallearningforsemisupervisedsemanticsegmentation BLOG@viwsc GAN生成对抗网络：由两个子网络组成，generator和dis...

unet 层_【语义分割】——UNet / 论文笔记

因为你所以我 Contents 1网络整体结构 2Keypoints 2.1OverlaptileStrategy 2.2Decode中的上采样 2.2.1UnPooling 2.2.2UpSampl...

语义分割论文阅读笔记7：DDRNet

DeepDualresolutionNetworksforRealtimeandAccurateSemanticSegmentationofRoadScenes,CVPR,2021 论文地址摘要语...

语义分割笔记

在语义分割任务中，提升自制数据集上baselinemodel的平均交并比（mIoU）和平均精度（mAcc）的难度取决于多个因素。以下是一些关键因素及其对难度的影响： 1. 数据集质量：标注质量：高质...

场景解析和语义分割区别_[论文笔记] 图像语义分割——PSPNet（CVPR 2017）

[论文笔记]图像语义分割——PSPNetCVPR2017 bluestyle•2019年05月07日介绍 Motivation：基于语义分割的场景解析sceneparsing是计算机视觉的基本问题...

是否确定退出登录?

[论文笔记]DeconvNet语义分割

铺垫和引入

FCN的critical limitations

模型结构DeconvNet

代码实现

全部评论 (0)

相关文章推荐

[论文笔记]DeconvNet语义分割

论文笔记：图像分割——SegNet & DeconvNet

语义分割-DeconvNet

语义分割 - SegNet / DeconvNet

英语 语义分割_[论文笔记] 图像语义分割——RefineNet（CVPR 2017）

[论文笔记] 弱监督语义分割 半监督语义分割

unet 层_【语义分割】——UNet / 论文笔记

语义分割论文阅读笔记7：DDRNet

语义分割笔记

场景解析和语义分割区别_[论文笔记] 图像语义分割——PSPNet（CVPR 2017）

英语语义分割_[论文笔记] 图像语义分割——RefineNet（CVPR 2017）

[论文笔记] 弱监督语义分割半监督语义分割