Advertisement

金字塔卷积: 视觉识别的卷积神经网络的重新思考(Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual )

阅读量:

引言

大多数卷积神经网络(CNN)采用较小的卷积核尺寸作为默认设置(通常为3×3),这一设计选择源于增大卷积核会对模型参数量及计算复杂度造成显著提升的影响。为了应对小尺寸卷积核无法有效捕捉输入空间中大范围区域的问题,CNNs通过构建由小尺寸卷积层与下采样层组成的卷积链条,逐步缩小输入尺寸并扩大网络的感受野范围.然而,这一架构在实际应用中会遇到两个关键挑战:首先,尽管当前许多主流CNN声称其理论感受野能够覆盖较大比例甚至全部输入数据,但研究表明[19]其实际感受野明显小于理论值,尤其是在网络高层次,实际感受野较理论值低2.7倍以上.其次,在对输入图像进行下采样操作时缺乏足够的上下文信息(尤其是在复杂的场景如图1所示的情况下)可能会严重影响学习效果及网络识别性能[

在同一图像中,同一对象类别可以呈现不同的比例分布情况。

为了应对上述挑战,这项工作提供了以下主要贡献 :

  1. 我们开发了一种称为金字塔形卷积 (PyConv) 的新方法,在该方案中整合了多样化的核级联模块。这些模块以其不同尺寸和深度组合在一起,在不影响整体性能的前提下实现了多级特征提取能力的显著增强。通过这一创新设计,在维持与传统卷积相近计算开销的同时实现更高层次的空间细节捕捉能力。
    此外,在模型训练阶段我们发现,在标准卷积的基础上适当调增核级联规模能够显著提升模型在目标检测任务中的识别精度。
  2. 在图像分类领域我们开发了两套新型网络架构方案,在实验数据显示这种架构体系在准确率指标上较现有基准方法提升了约15%以上。
    值得注意的是这种架构体系不仅在参数规模上实现了与主流轻量化模型相当的优势而且在推理速度方面也表现出了明显优势。
  3. 我们研发了一种创新性的语义分割方案其核心组件采用了基于主干特征提取的新颖分支结构。
    这种设计使得模型能够在有限的计算资源消耗下实现对长尾类别物体实例的有效识别。

相关工作

针对图像识别的各种方法而言,
残差网络系列(ResNets)代表了该领域最具影响力的及最广泛应用的方法之一。
通过引入跳跃连接机制,
这些架构则促进了复杂任务的高效处理。
这些模型则被广泛应用于多个复杂任务领域,
例如目标检测与实例分割等研究方向。
在此研究中,
我们选择ResNet作为基准模型,
并在构建不同深度结构时采用了此类设计基础。

为了提升ResNeXt架构的识别能力我们采用了grouped convolution技术作为基础模块并将其扩展至网络主体结构中。此外,在我们的设计中也采用了grouped convolution技术但在不同的网络层次上进行应用以优化特征提取效率。研究团队[17,22]则开发了压缩激励机制与非局部模块以捕捉空间关系。值得注意的是这类辅助模块虽然有助于增强模型性能但需在常规CNN框架内集成从而可能增加计算负担。然而这些辅助组件往往会导致模型参数数量与计算开销显著增加。

在语义分割这一极具挑战性的领域中,PSPNet [23]展现出卓越的性能。该架构通过在其主体框架顶部引入金字塔池模块 (PPM),能够有效从场景中提取多级细节信息。**研究者在文献[24]**中也探索了另一种创新性架构,在主体框架顶部采用了更为复杂的atrous空间金字塔池 (ASPP) 头部设计。与现有方法相比,在骨干网络特征图提取方面我们开发了一种全新的分支架构(Branch Module),该架构不仅能够捕捉到图像的空间关系信息,并且通过多层次融合模块实现了对输入数据的深入分析能力提升。

金字塔形卷积

复制代码
 import torch

    
 from torch import nn
    
  
    
 def ConvBNReLU(in_channels, out_channels, kernel_size, stride, groups=1):
    
     return nn.Sequential(
    
     nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
    
               padding=kernel_size // 2, groups=groups),
    
     nn.BatchNorm2d(out_channels),
    
     nn.ReLU(inplace=True)
    
     )
    
  
    
 class PyConv(nn.Module):
    
  
    
     def __init__(self, in_channels, out_channels, kernel_sizes, groups, stride=1):
    
     super(PyConv, self).__init__()
    
     if out_channels is None:
    
         out_channels = []
    
     assert len(out_channels) == len(kernel_sizes) == len(groups)
    
  
    
     self.pyconv_list = nn.ModuleList()
    
     for i in range(len(kernel_sizes)):
    
         self.pyconv_list.append(ConvBNReLU(in_channels=in_channels,out_channels=out_channels[i],kernel_size=kernel_sizes[i],stride=stride,groups=groups[i]))
    
  
    
     def forward(self, x):
    
     outputs = []
    
     for pyconv in self.pyconv_list:
    
         outputs.append(pyconv(x))
    
     return torch.cat(outputs, 1)
    
  
    
 if __name__=='__main__':
    
     input=torch.randn(1,64,64,64)
    
     model=PyConv(64,[16,16,8,24],[3,5,7,9],groups=[4,4,2,8])
    
     output=model(input)
    
     print(output.shape)

标准卷 convolution仅包含一个 kernel size 和一个 depth 参数。进一步说明:kernel 的尺寸直接影响着感受野(field of view)及其所包含的空间信息量。通过增大 kernel 大小(尺寸),我们能够捕获更多的空间信息。其中 depth 参数则代表了每层 convolution 过程中所使用的 feature maps 数量。标准卷 convolution 的参数数量和浮点运算强度是衡量模型复杂度的重要指标

提出了一种新型卷积操作——金字塔形卷积(PyConv),如图2(b)所示,在各个层级上设置不同尺寸的空间窗。该网络架构的主要目标是实现对输入信号在不同尺度下进行处理,并且不额外增加计算开销或模型复杂度(仅从参数数量层面)。具体而言,在各个层级上设置不同尺寸的空间窗:底部(即第1层)的空间窗最小而逐渐增大至顶部(即第n层)。同时,在金字塔的不同层次中也实现了滤波器深度的变化:当空间尺寸逐渐扩大时(即从第1层向第n层发展),滤波器深度逐步减少。

在PyConv的不同层级中采用不同深度的内核以实现多级特征提取的目的下

PyConv的主要优点

多尺度处理

除了相较于标准卷积而言,在不增加额外计算成本的情况下,PyConv能够显著扩大内核的感受范围之外,它还能够适应不同空间分辨率及深度的内核类型。因此,在多个尺度上解析输入信号并捕获更多信息的能力得以实现。这种基于内核类型的双金字塔架构在设计上呈现出两个明显的特点:一方面随着内核尺寸逐渐增大(即连接性增强),另一方面则伴随内核深度(即分支数量)逐步减少(反之亦然)。这种权衡关系使得PyConv具备多样化的组合网络选择能力,在学习过程中能够探索不同类型的内核配置模式。这些配置模式从一种极端(低连接性、大感受范围)到另一种极端(高连接性、小感受范围)实现了平滑过渡,并且通过互补的信息来源提升了整体网络的表现能力。其中具有小感受范围的分支更适合聚焦于细节信息提取,并能够在一定程度上捕捉到较小对象及其部分特征信息;而随着内核尺寸逐渐增大,则能够更加有效地提取较大对象及其上下文相关的信息,并且这种增强效应是更为可靠的

Efficiency

相较于标准卷积而言,在计算资源上维持了相近的数量模型参数及需求(如公式所示)。同时,“由于金字塔级能够独立进行并行计算”,这使得PyConv具备了很高的并行能力。

Flexibility.

该架构为各种网络设计提供了灵活的选择,在不影响计算资源的前提下实现多样的配置方案。具体而言,在每一层金字塔结构中,默认情况下内核大小与深度设置可以根据需求自由决定;同时,在每一层处理中输出特征图的数量可以根据具体任务目标进行调节。值得注意的是,在不同层级之间可以选择不同的金字塔设置策略:例如,在关注局部特征的任务中可采用较小感受野下具有较少输出通道的方式;而在涉及全局信息提取的任务中,则适合使用较大感受野下拥有较多输出通道的情况。此外,在整个网络构建过程中还可以根据输入特征图分辨率的变化动态调整各个层级的金字塔级别设置:例如,在较高分辨率输入时可以选择较低级别的金字塔结构;而当分辨率逐渐降低时则可逐步提升金字塔级别以适应更高层次的空间信息提取需求。基于此原则构建起来的模型体系能够有效地适应多种视觉识别任务场景

用于图像分类的PyConv网络

对于图像分类任务中的PyConv网络体系结构设计,在参考文献[7]的基础上借鉴了残余瓶颈构建块这一关键组件。如图4所示,在本网络架构的第一阶段中采用了该模块的具体实例进行展示。具体而言,在该模块中首先通过一个1×1卷积操作将输入的空间特征图尺寸缩减至64通道;随后我们提出了创新性的PyConv设计,并将其划分为四个层级(9×9, 7×7, 5×5, 和3×3)的不同卷积核尺寸设置;值得注意的是,在这一系列层级的设计中,默认情况下各个层级的空间维度输出均为16个通道;此外在每一层处理完毕后均输出对应的特征图集合共计64个通道;随后再通过一个1×1卷积操作恢复特征图的空间维度以完成整个模块功能流程;通常在每个模块执行完主干操作后会遵循批归一化[6]和remu激活函数 [25] 的标准流程设计;此外为了保证网络性能的有效性在此设计中特意引入了跳跃连接机制作为辅助手段以增强模型对输入数据的理解能力

考虑到该网络配置了多层次的内核架构,在各个层级上设置了不同的计算粒度参数。从而该网络能够通过调整各层内核尺寸来实现对特征图的降采样。

复制代码
 """ PyConv networks for image recognition as presented in our paper:

    
     Duta et al. "Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition"
    
     https://arxiv.org/pdf/2006.11538.pdf
    
 """
    
 import torch
    
 import torch.nn as nn
    
 import os
    
 from div.download_from_url import download_from_url
    
  
    
 try:
    
     from torch.hub import _get_torch_home
    
     torch_cache_home = _get_torch_home()
    
 except ImportError:
    
     torch_cache_home = os.path.expanduser(
    
     os.getenv('TORCH_HOME', os.path.join(
    
         os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')))
    
 default_cache_path = os.path.join(torch_cache_home, 'pretrained')
    
  
    
 __all__ = ['PyConvResNet', 'pyconvresnet18', 'pyconvresnet34', 'pyconvresnet50', 'pyconvresnet101', 'pyconvresnet152']
    
  
    
  
    
 model_urls = {
    
     'pyconvresnet50': 'https://drive.google.com/uc?export=download&id=128iMzBnHQSPNehgb8nUF5cJyKBIB7do5',
    
     'pyconvresnet101': 'https://drive.google.com/uc?export=download&id=1fn0eKdtGG7HA30O5SJ1XrmGR_FsQxTb1',
    
     'pyconvresnet152': 'https://drive.google.com/uc?export=download&id=1zR6HOTaHB0t15n6Nh12adX86AhBMo46m',
    
 }
    
  
    
  
    
 class PyConv2d(nn.Module):
    
     """PyConv2d with padding (general case). Applies a 2D PyConv over an input signal composed of several input planes.
    
     Args:
    
     in_channels (int): Number of channels in the input image
    
     out_channels (list): Number of channels for each pyramid level produced by the convolution
    
     pyconv_kernels (list): Spatial size of the kernel for each pyramid level
    
     pyconv_groups (list): Number of blocked connections from input channels to output channels for each pyramid level
    
     stride (int or tuple, optional): Stride of the convolution. Default: 1
    
     dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
    
     bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``False``
    
     Example::
    
     >>> # PyConv with two pyramid levels, kernels: 3x3, 5x5
    
     >>> m = PyConv2d(in_channels=64, out_channels=[32, 32], pyconv_kernels=[3, 5], pyconv_groups=[1, 4])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     >>> # PyConv with three pyramid levels, kernels: 3x3, 5x5, 7x7
    
     >>> m = PyConv2d(in_channels=64, out_channels=[16, 16, 32], pyconv_kernels=[3, 5, 7], pyconv_groups=[1, 4, 8])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     """
    
     def __init__(self, in_channels, out_channels, pyconv_kernels, pyconv_groups, stride=1, dilation=1, bias=False):
    
     super(PyConv2d, self).__init__()
    
  
    
     assert len(out_channels) == len(pyconv_kernels) == len(pyconv_groups)
    
  
    
     self.pyconv_levels = [None] * len(pyconv_kernels)
    
     for i in range(len(pyconv_kernels)):
    
         self.pyconv_levels[i] = nn.Conv2d(in_channels, out_channels[i], kernel_size=pyconv_kernels[i],
    
                                           stride=stride, padding=pyconv_kernels[i] // 2, groups=pyconv_groups[i],
    
                                           dilation=dilation, bias=bias)
    
     self.pyconv_levels = nn.ModuleList(self.pyconv_levels)
    
  
    
     def forward(self, x):
    
     out = []
    
     for level in self.pyconv_levels:
    
         out.append(level(x))
    
  
    
     return torch.cat(out, 1)
    
  
    
  
    
 def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dilation=1, groups=1):
    
     """standard convolution with padding"""
    
     return nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
    
                  padding=padding, dilation=dilation, groups=groups, bias=False)
    
  
    
  
    
 def conv1x1(in_planes, out_planes, stride=1):
    
     """1x1 convolution"""
    
     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
    
  
    
  
    
 class PyConv4(nn.Module):
    
  
    
     def __init__(self, inplans, planes, pyconv_kernels=[3, 5, 7, 9], stride=1, pyconv_groups=[1, 4, 8, 16]):
    
     super(PyConv4, self).__init__()
    
     self.conv2_1 = conv(inplans, planes//4, kernel_size=pyconv_kernels[0], padding=pyconv_kernels[0]//2,
    
                         stride=stride, groups=pyconv_groups[0])
    
     self.conv2_2 = conv(inplans, planes//4, kernel_size=pyconv_kernels[1], padding=pyconv_kernels[1]//2,
    
                         stride=stride, groups=pyconv_groups[1])
    
     self.conv2_3 = conv(inplans, planes//4, kernel_size=pyconv_kernels[2], padding=pyconv_kernels[2]//2,
    
                         stride=stride, groups=pyconv_groups[2])
    
     self.conv2_4 = conv(inplans, planes//4, kernel_size=pyconv_kernels[3], padding=pyconv_kernels[3]//2,
    
                         stride=stride, groups=pyconv_groups[3])
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x), self.conv2_3(x), self.conv2_4(x)), dim=1)
    
  
    
  
    
 class PyConv3(nn.Module):
    
  
    
     def __init__(self, inplans, planes,  pyconv_kernels=[3, 5, 7], stride=1, pyconv_groups=[1, 4, 8]):
    
     super(PyConv3, self).__init__()
    
     self.conv2_1 = conv(inplans, planes // 4, kernel_size=pyconv_kernels[0], padding=pyconv_kernels[0] // 2,
    
                         stride=stride, groups=pyconv_groups[0])
    
     self.conv2_2 = conv(inplans, planes // 4, kernel_size=pyconv_kernels[1], padding=pyconv_kernels[1] // 2,
    
                         stride=stride, groups=pyconv_groups[1])
    
     self.conv2_3 = conv(inplans, planes // 2, kernel_size=pyconv_kernels[2], padding=pyconv_kernels[2] // 2,
    
                         stride=stride, groups=pyconv_groups[2])
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x), self.conv2_3(x)), dim=1)
    
  
    
  
    
 class PyConv2(nn.Module):
    
  
    
     def __init__(self, inplans, planes,pyconv_kernels=[3, 5], stride=1, pyconv_groups=[1, 4]):
    
     super(PyConv2, self).__init__()
    
     self.conv2_1 = conv(inplans, planes // 2, kernel_size=pyconv_kernels[0], padding=pyconv_kernels[0] // 2,
    
                         stride=stride, groups=pyconv_groups[0])
    
     self.conv2_2 = conv(inplans, planes // 2, kernel_size=pyconv_kernels[1], padding=pyconv_kernels[1] // 2,
    
                         stride=stride, groups=pyconv_groups[1])
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x)), dim=1)
    
  
    
  
    
 def get_pyconv(inplans, planes, pyconv_kernels, stride=1, pyconv_groups=[1]):
    
     if len(pyconv_kernels) == 1:
    
     return conv(inplans, planes, kernel_size=pyconv_kernels[0], stride=stride, groups=pyconv_groups[0])
    
     elif len(pyconv_kernels) == 2:
    
     return PyConv2(inplans, planes, pyconv_kernels=pyconv_kernels, stride=stride, pyconv_groups=pyconv_groups)
    
     elif len(pyconv_kernels) == 3:
    
     return PyConv3(inplans, planes, pyconv_kernels=pyconv_kernels, stride=stride, pyconv_groups=pyconv_groups)
    
     elif len(pyconv_kernels) == 4:
    
     return PyConv4(inplans, planes, pyconv_kernels=pyconv_kernels, stride=stride, pyconv_groups=pyconv_groups)
    
  
    
  
    
 class PyConvBlock(nn.Module):
    
     expansion = 4
    
  
    
     def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None, pyconv_groups=1, pyconv_kernels=1):
    
     super(PyConvBlock, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     # Both self.conv2 and self.downsample layers downsample the input when stride != 1
    
     self.conv1 = conv1x1(inplanes, planes)
    
     self.bn1 = norm_layer(planes)
    
     self.conv2 = get_pyconv(planes, planes, pyconv_kernels=pyconv_kernels, stride=stride,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn2 = norm_layer(planes)
    
     self.conv3 = conv1x1(planes, planes * self.expansion)
    
     self.bn3 = norm_layer(planes * self.expansion)
    
     self.relu = nn.ReLU(inplace=True)
    
     self.downsample = downsample
    
     self.stride = stride
    
  
    
     def forward(self, x):
    
     identity = x
    
  
    
     out = self.conv1(x)
    
     out = self.bn1(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv2(out)
    
     out = self.bn2(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv3(out)
    
     out = self.bn3(out)
    
  
    
     if self.downsample is not None:
    
         identity = self.downsample(x)
    
  
    
     out += identity
    
     out = self.relu(out)
    
  
    
     return out
    
  
    
  
    
 class PyConvBasicBlock1(nn.Module):
    
     expansion = 1
    
  
    
     def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None, pyconv_groups=1, pyconv_kernels=1):
    
     super(PyConvBasicBlock1, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     # Both self.conv1 and self.downsample layers downsample the input when stride != 1
    
     self.conv1 = get_pyconv(inplanes, planes, pyconv_kernels=pyconv_kernels, stride=stride,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn1 = norm_layer(planes)
    
     self.relu = nn.ReLU(inplace=True)
    
     self.conv2 = get_pyconv(planes, planes, pyconv_kernels=pyconv_kernels, stride=1,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn2 = norm_layer(planes)
    
     self.downsample = downsample
    
     self.stride = stride
    
  
    
     def forward(self, x):
    
     identity = x
    
  
    
     out = self.conv1(x)
    
     out = self.bn1(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv2(out)
    
     out = self.bn2(out)
    
  
    
     if self.downsample is not None:
    
         identity = self.downsample(x)
    
  
    
     out += identity
    
     out = self.relu(out)
    
  
    
     return out
    
  
    
  
    
 class PyConvBasicBlock2(nn.Module):
    
     expansion = 1
    
  
    
     def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None, pyconv_groups=1, pyconv_kernels=1):
    
     super(PyConvBasicBlock2, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     # Both self.conv1 and self.downsample layers downsample the input when stride != 1
    
     self.conv1 = get_pyconv(inplanes, planes, pyconv_kernels=pyconv_kernels, stride=stride,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn1 = norm_layer(planes)
    
     self.relu = nn.ReLU(inplace=True)
    
     self.conv2 = conv1x1(planes, planes * self.expansion)
    
     self.bn2 = norm_layer(planes)
    
     self.downsample = downsample
    
     self.stride = stride
    
  
    
     def forward(self, x):
    
     identity = x
    
  
    
     out = self.conv1(x)
    
     out = self.bn1(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv2(out)
    
     out = self.bn2(out)
    
  
    
     if self.downsample is not None:
    
         identity = self.downsample(x)
    
  
    
     out += identity
    
     out = self.relu(out)
    
  
    
     return out
    
  
    
  
    
 class PyConvResNet(nn.Module):
    
  
    
     def __init__(self, block, layers, num_classes=1000, zero_init_residual=False, norm_layer=None, dropout_prob0=0.0):
    
     super(PyConvResNet, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
  
    
     self.inplanes = 64
    
     self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
    
     self.bn1 = norm_layer(64)
    
     self.relu = nn.ReLU(inplace=True)
    
  
    
     self.layer1 = self._make_layer(block, 64, layers[0], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3, 5, 7, 9], pyconv_groups=[1, 4, 8, 16])
    
     self.layer2 = self._make_layer(block, 128, layers[1], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3, 5, 7], pyconv_groups=[1, 4, 8])
    
     self.layer3 = self._make_layer(block, 256, layers[2], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3, 5], pyconv_groups=[1, 4])
    
     self.layer4 = self._make_layer(block, 512, layers[3], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3], pyconv_groups=[1])
    
     self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    
  
    
     if dropout_prob0 > 0.0:
    
         self.dp = nn.Dropout(dropout_prob0, inplace=True)
    
         print("Using Dropout with the prob to set to 0 of: ", dropout_prob0)
    
     else:
    
         self.dp = None
    
  
    
     self.fc = nn.Linear(512 * block.expansion, num_classes)
    
  
    
     for m in self.modules():
    
         if isinstance(m, nn.Conv2d):
    
             nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    
         elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
    
             nn.init.constant_(m.weight, 1)
    
             nn.init.constant_(m.bias, 0)
    
  
    
     # Zero-initialize the last BN in each residual branch,
    
     # so that the residual branch starts with zeros, and each residual block behaves like an identity.
    
     # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
    
     if zero_init_residual:
    
         for m in self.modules():
    
             if isinstance(m, PyConvBlock):
    
                 nn.init.constant_(m.bn3.weight, 0)
    
  
    
     def _make_layer(self, block, planes, blocks, stride=1, norm_layer=None, pyconv_kernels=[3], pyconv_groups=[1]):
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     downsample = None
    
     if stride != 1 and self.inplanes != planes * block.expansion:
    
         downsample = nn.Sequential(
    
             nn.MaxPool2d(kernel_size=3, stride=stride, padding=1),
    
             conv1x1(self.inplanes, planes * block.expansion),
    
             norm_layer(planes * block.expansion),
    
         )
    
     elif self.inplanes != planes * block.expansion:
    
         downsample = nn.Sequential(
    
             conv1x1(self.inplanes, planes * block.expansion),
    
             norm_layer(planes * block.expansion),
    
         )
    
     elif stride != 1:
    
         downsample = nn.MaxPool2d(kernel_size=3, stride=stride, padding=1)
    
  
    
     layers = []
    
     layers.append(block(self.inplanes, planes, stride=stride, downsample=downsample, norm_layer=norm_layer,
    
                         pyconv_kernels=pyconv_kernels, pyconv_groups=pyconv_groups))
    
     self.inplanes = planes * block.expansion
    
  
    
     for _ in range(1, blocks):
    
         layers.append(block(self.inplanes, planes, norm_layer=norm_layer,
    
                             pyconv_kernels=pyconv_kernels, pyconv_groups=pyconv_groups))
    
  
    
     return nn.Sequential(*layers)
    
  
    
     def forward(self, x):
    
     x = self.conv1(x)
    
     x = self.bn1(x)
    
     x = self.relu(x)
    
  
    
     x = self.layer1(x)
    
     x = self.layer2(x)
    
     x = self.layer3(x)
    
     x = self.layer4(x)
    
  
    
     x = self.avgpool(x)
    
     x = x.view(x.size(0), -1)
    
  
    
     if self.dp is not None:
    
         x = self.dp(x)
    
  
    
     x = self.fc(x)
    
  
    
     return x
    
  
    
  
    
 def pyconvresnet18(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-18 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     #model = PyConvResNet(PyConvBasicBlock1, [2, 2, 2, 2], **kwargs) #params=11.21M GFLOPs 1.55
    
     model = PyConvResNet(PyConvBasicBlock2, [2, 2, 2, 2], **kwargs)  #params=5.91M GFLOPs 0.88
    
     if pretrained:
    
     raise NotImplementedError("Not available the pretrained model yet!")
    
  
    
     return model
    
  
    
  
    
 def pyconvresnet34(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-34 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     #model = PyConvResNet(PyConvBasicBlock1, [3, 4, 6, 3], **kwargs) #params=20.44M GFLOPs 3.09
    
     model = PyConvResNet(PyConvBasicBlock2, [3, 4, 6, 3], **kwargs)  #params=11.09M GFLOPs 1.75
    
     if pretrained:
    
     raise NotImplementedError("Not available the pretrained model yet!")
    
  
    
     return model
    
  
    
  
    
 def pyconvresnet50(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-50 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     model = PyConvResNet(PyConvBlock, [3, 4, 6, 3], **kwargs)
    
     if pretrained:
    
     os.makedirs(default_cache_path, exist_ok=True)
    
     model.load_state_dict(torch.load(download_from_url(model_urls['pyconvresnet50'],
    
                                                        root=default_cache_path)))
    
     return model
    
  
    
  
    
 def pyconvresnet101(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-101 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     model = PyConvResNet(PyConvBlock, [3, 4, 23, 3], **kwargs)
    
     if pretrained:
    
     os.makedirs(default_cache_path, exist_ok=True)
    
     model.load_state_dict(torch.load(download_from_url(model_urls['pyconvresnet101'],
    
                                                        root=default_cache_path)))
    
     return model
    
  
    
  
    
 def pyconvresnet152(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-152 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     model = PyConvResNet(PyConvBlock, [3, 8, 36, 3], **kwargs)
    
     if pretrained:
    
     os.makedirs(default_cache_path, exist_ok=True)
    
     model.load_state_dict(torch.load(download_from_url(model_urls['pyconvresnet152'],
    
                                                        root=default_cache_path)))
    
     return model

语义分割的PyConv网络

该方案所设计的场景解析(图像分割)架构已在图5中展示。为了构建高效的场景解析管道,在设计过程中需通过设计能够有效提取主干特征图的关键模块来实现。这些模块不仅能在微观层面上捕捉细节特征,在处理细节层面时具有高度专注力的同时需充分考虑各部分之间的相互作用关系。为此开发了一种创新性的PyConv heads架构——即PyConv heads with Partitioned Hierarchical Attention(简称PyConv-PHA)。这种架构能系统地综合考虑局内与局外特征并可支持多尺度分析以更好地适应复杂模式的变化趋势。

PyConvPH包含三个主要组成部分

LocalPyConv块(LocalPyConv)是专门设计用于处理小规模物体的一种模块,在多尺度上有效提取局部精细细节(如图5所示)。该模块采用了多种不同尺寸和深度的内核(即不同类型的内核),这也可以被视为一种局部多尺度上下文聚合机制。图6(a)展示了该模块各组成部分的具体信息分布情况。该模块从主干网络中提取输出特征图后,通过1×1卷积将512通道缩减为更少数量特征图;随后执行四次PyConv操作(分别使用9×9、7×7、5×5和3×3大小内核),以在不同尺度下捕捉多样化的局部细节特征(如图4所示)。值得注意的是,在这些操作中,默认采用G组连接方式以增强模型参数效率;最后再通过1×1卷积整合来自不同内核大小及深度层次的信息输出结果。(注:所有卷积操作后通常会紧跟批归一化层[6]并附加relu激活函数)

GlobalPyConv模块专门负责收集与场景相关的全局细节信息 ,并能够有效处理较大尺寸的对象。它设计了一个多尺度的全局聚合机制 。从图6(b)可以看出该模块包含的主要组件部分。考虑到输入图像尺寸可能存在差异性的影响因素,在保证能够完整捕捉全局信息的前提下 ,我们将最大空间维度设定为9单位长度 。为此我们采用了自适应均值池化技术 ,将特征图的空间尺度压缩至9×9(适用于正方形图像的情况),这不仅降低了计算复杂度 ,还能维持合理的空间分辨率水平 。随后我们使用1×1卷积操作来缩减特征图的空间维度并降低通道数量 ,具体应用于512个通道层面上 。与局部PyConv(LocalPyConv)类似 ,当前模块同样采用了四层的PyConv结构进行设计 。然而由于当前的空间尺度已经被压缩至9×9范围内 ,因此其内核响应范围显著扩大 ,最终形成了一个能够覆盖整个输入区域的全局感知机制 (如图5所示) 。在此基础上我们进一步采用1×1卷积操作来进行跨尺度特征融合 。最后通过双线性插值技术 将上一层的特征映射放大至与自适应均值池化操作缩放前的原始尺寸一致

在PyConv块中首先结合来自LocalPyConv和GlobalPyConv输出特征图进行串联操作,在生成的1024特征图块上应用了单个级别的PyConv操作。由于先前层已整合所有上下文信息,在此阶段重点聚焦于整合这些信息以接近最终分类目标。为了生成最终输出结果,在此基础上框架采用了上采样层(配合双线性插值)将特征图恢复至初始输入图像尺寸;随后引入一个1 × 1卷积层来生成与类数相当尺寸的输出。如图5所示,在多个内核尺度上捕获本地和全局信息并解析图像的能力使我们的框架表现出色;经过测试其性能优于现有最先进的同类框架。

复制代码
 """ PyConv network for semantic segmentation  as presented in our paper:

    
     Duta et al. "Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition"
    
     https://arxiv.org/pdf/2006.11538.pdf
    
 """
    
 import torch
    
 from torch import nn
    
 import torch.nn.functional as F
    
  
    
 from model.build_backbone_layers import build_backbone_layers
    
  
    
  
    
 class PyConv2d(nn.Module):
    
     """PyConv2d with padding (general case). Applies a 2D PyConv over an input signal composed of several input planes.
    
     Args:
    
     in_channels (int): Number of channels in the input image
    
     out_channels (list): Number of channels for each pyramid level produced by the convolution
    
     pyconv_kernels (list): Spatial size of the kernel for each pyramid level
    
     pyconv_groups (list): Number of blocked connections from input channels to output channels for each pyramid level
    
     stride (int or tuple, optional): Stride of the convolution. Default: 1
    
     dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
    
     bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``False``
    
     Example::
    
     >>> # PyConv with two pyramid levels, kernels: 3x3, 5x5
    
     >>> m = PyConv2d(in_channels=64, out_channels=[32, 32], pyconv_kernels=[3, 5], pyconv_groups=[1, 4])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     >>> # PyConv with three pyramid levels, kernels: 3x3, 5x5, 7x7
    
     >>> m = PyConv2d(in_channels=64, out_channels=[16, 16, 32], pyconv_kernels=[3, 5, 7], pyconv_groups=[1, 4, 8])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     """
    
     def __init__(self, in_channels, out_channels, pyconv_kernels, pyconv_groups, stride=1, dilation=1, bias=False):
    
     super(PyConv2d, self).__init__()
    
  
    
     assert len(out_channels) == len(pyconv_kernels) == len(pyconv_groups)
    
  
    
     self.pyconv_levels = [None] * len(pyconv_kernels)
    
     for i in range(len(pyconv_kernels)):
    
         self.pyconv_levels[i] = nn.Conv2d(in_channels, out_channels[i], kernel_size=pyconv_kernels[i],
    
                                           stride=stride, padding=pyconv_kernels[i] // 2, groups=pyconv_groups[i],
    
                                           dilation=dilation, bias=bias)
    
     self.pyconv_levels = nn.ModuleList(self.pyconv_levels)
    
  
    
     def forward(self, x):
    
     out = []
    
     for level in self.pyconv_levels:
    
         out.append(level(x))
    
  
    
     return torch.cat(out, 1)
    
  
    
  
    
 class PyConv4(nn.Module):
    
  
    
     def __init__(self, inplans, planes, pyconv_kernels=[3, 5, 7, 9], stride=1, pyconv_groups=[1, 4, 8, 16]):
    
     super(PyConv4, self).__init__()
    
  
    
     self.conv2_1 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[0], stride=stride,
    
                              padding=pyconv_kernels[0]//2, dilation=1, groups=pyconv_groups[0], bias=False)
    
     self.conv2_2 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[1], stride=stride,
    
                              padding=pyconv_kernels[1] // 2, dilation=1, groups=pyconv_groups[1], bias=False)
    
     self.conv2_3 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[2], stride=stride,
    
                              padding=pyconv_kernels[2] // 2, dilation=1, groups=pyconv_groups[2], bias=False)
    
     self.conv2_4 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[3], stride=stride,
    
                              padding=pyconv_kernels[3] // 2, dilation=1, groups=pyconv_groups[3], bias=False)
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x), self.conv2_3(x), self.conv2_4(x)), dim=1)
    
  
    
  
    
 class GlobalPyConvBlock(nn.Module):
    
     def __init__(self, in_dim, reduction_dim, bins, BatchNorm):
    
     super(GlobalPyConvBlock, self).__init__()
    
     self.features = nn.Sequential(
    
             nn.AdaptiveAvgPool2d(bins),
    
             nn.Conv2d(in_dim, reduction_dim, kernel_size=1, bias=False),
    
             BatchNorm(reduction_dim),
    
             nn.ReLU(inplace=True),
    
             PyConv4(reduction_dim, reduction_dim),
    
             BatchNorm(reduction_dim),
    
             nn.ReLU(inplace=True),
    
             nn.Conv2d(reduction_dim, reduction_dim, kernel_size=1, bias=False),
    
             BatchNorm(reduction_dim),
    
             nn.ReLU(inplace=True)
    
     )
    
  
    
     def forward(self, x):
    
     x_size = x.size()
    
     x = F.interpolate(self.features(x), x_size[2:], mode='bilinear', align_corners=True)
    
     return x
    
  
    
  
    
 class LocalPyConvBlock(nn.Module):
    
     def __init__(self, inplanes, planes, BatchNorm, reduction1=4):
    
     super(LocalPyConvBlock, self).__init__()
    
     self.layers = nn.Sequential(
    
         nn.Conv2d(inplanes, inplanes//reduction1, kernel_size=1, bias=False),
    
         BatchNorm(inplanes // reduction1),
    
         nn.ReLU(inplace=True),
    
         PyConv4(inplanes // reduction1, inplanes // reduction1),
    
         BatchNorm(inplanes // reduction1),
    
         nn.ReLU(inplace=True),
    
         nn.Conv2d(inplanes // reduction1, planes, kernel_size=1, bias=False),
    
         BatchNorm(planes),
    
         nn.ReLU(inplace=True),
    
  
    
     )
    
  
    
     def forward(self, x):
    
     return self.layers(x)
    
  
    
  
    
 class MergeLocalGlobal(nn.Module):
    
     def __init__(self, inplanes, planes, BatchNorm):
    
     super(MergeLocalGlobal, self).__init__()
    
  
    
     self.features = nn.Sequential(
    
         nn.Conv2d(inplanes, planes,  kernel_size=3, padding=1, groups=1, bias=False),
    
         BatchNorm(planes),
    
         nn.ReLU(inplace=True)
    
     )
    
  
    
     def forward(self, local_context, global_context):
    
     x = torch.cat((local_context, global_context), dim=1)
    
     x = self.features(x)
    
     return x
    
  
    
  
    
 class PyConvHead(nn.Module):
    
     def __init__(self, inplanes, planes, BatchNorm):
    
     super(PyConvHead, self).__init__()
    
  
    
     out_size_local_context = 512
    
     out_size_global_context = 512
    
  
    
     self.local_context = LocalPyConvBlock(inplanes, out_size_local_context, BatchNorm, reduction1=4)
    
     self.global_context = GlobalPyConvBlock(inplanes, out_size_global_context, 9, BatchNorm)
    
  
    
     self.merge_context = MergeLocalGlobal(out_size_local_context + out_size_global_context, planes, BatchNorm)
    
  
    
     def forward(self, x):
    
     x = self.merge_context(self.local_context(x), self.global_context(x))
    
     return x
    
  
    
  
    
 class PyConvSegNet(nn.Module):
    
     def __init__(self, layers=50, dropout=0.1, classes=2, zoom_factor=8,
    
              criterion=nn.CrossEntropyLoss(ignore_index=255), BatchNorm=nn.BatchNorm2d, pretrained=True,
    
              backbone_output_stride=16, backbone_net='resnet'):
    
     super(PyConvSegNet, self).__init__()
    
     assert layers in [50, 101, 152, 200]
    
     assert classes > 1
    
     assert zoom_factor in [1, 2, 4, 8]
    
     self.zoom_factor = zoom_factor
    
     self.criterion = criterion
    
     self.layer0, self.layer1, self.layer2, self.layer3, self.layer4 = build_backbone_layers(backbone_net,
    
                                                                                              layers,
    
                                                                                              pretrained,
    
                                                                                              backbone_output_stride=backbone_output_stride,
    
                                                                                              convert_bn=BatchNorm)
    
     backbone_output_maps = 2048
    
     out_merge_all = 256
    
     self.pyconvhead = PyConvHead(backbone_output_maps, out_merge_all, BatchNorm)
    
  
    
     self.aux = nn.Sequential(
    
         nn.Conv2d(1024, 256, kernel_size=3, padding=1, bias=False),
    
         BatchNorm(256),
    
         nn.ReLU(inplace=True),
    
         nn.Dropout2d(p=dropout),
    
         nn.Conv2d(256, classes, kernel_size=1)
    
     )
    
  
    
     self.cls = nn.Sequential(
    
         nn.Dropout2d(p=dropout),
    
         nn.Conv2d(out_merge_all, classes, kernel_size=1)
    
     )
    
  
    
     def forward(self, x, y=None):
    
     x_size = x.size()
    
     assert (x_size[2]-1) % 8 == 0 and (x_size[3]-1) % 8 == 0
    
     h = int((x_size[2] - 1) / 8 * self.zoom_factor + 1)
    
     w = int((x_size[3] - 1) / 8 * self.zoom_factor + 1)
    
  
    
     x = self.layer0(x)
    
     x = self.layer1(x)
    
     x = self.layer2(x)
    
     out_stage3 = self.layer3(x)
    
     x = self.layer4(out_stage3)
    
  
    
     x = self.pyconvhead(x)
    
  
    
     x = self.cls(x)
    
  
    
     if self.zoom_factor != 1:
    
         x = F.interpolate(x, size=(h, w), mode='bilinear', align_corners=True)
    
  
    
     if self.training:
    
         main_loss = self.criterion(x, y)
    
  
    
         aux = self.aux(out_stage3)
    
         if self.zoom_factor != 1:
    
             aux = F.interpolate(aux, size=(h, w), mode='bilinear', align_corners=True)
    
             aux_loss = self.criterion(aux, y)
    
  
    
         return x.max(1)[1], main_loss, aux_loss
    
  
    
     else:
    
         return x
    
  
    
  
    
 if __name__ == '__main__':
    
     import os
    
     os.environ["CUDA_VISIBLE_DEVICES"] = '0, 1'
    
     size_input = 473 #817 #577 #473 #713
    
     input = torch.rand(1, 3, size_input, size_input)#.cuda()
    
     model = PyConvSegNet(layers=50, dropout=0.1, classes=150, zoom_factor=8,
    
                   pretrained=False, backbone_output_stride=8, backbone_net='resnet')#.cuda()
    
     model.eval()
    
     print(model)
    
     output = model(input)
    
     print('PyConvSegNet', output.size())

参考文献:

该模型基于深度学习框架进行设计与实现,并结合了残差网络的基本特性进行优化改进

在该项目的主仓库位置上提供了特定Python代码文件

全部评论 (0)

还没有任何评论哟~