金字塔卷积: 视觉识别的卷积神经网络的重新思考（Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual ）

阅读量：

引言

大多数卷积神经网络（CNN）采用较小的卷积核尺寸作为默认设置（通常为3×3），这一设计选择源于增大卷积核会对模型参数量及计算复杂度造成显著提升的影响。为了应对小尺寸卷积核无法有效捕捉输入空间中大范围区域的问题,CNNs通过构建由小尺寸卷积层与下采样层组成的卷积链条,逐步缩小输入尺寸并扩大网络的感受野范围.然而,这一架构在实际应用中会遇到两个关键挑战:首先,尽管当前许多主流CNN声称其理论感受野能够覆盖较大比例甚至全部输入数据,但研究表明[19]其实际感受野明显小于理论值,尤其是在网络高层次,实际感受野较理论值低2.7倍以上.其次,在对输入图像进行下采样操作时缺乏足够的上下文信息（尤其是在复杂的场景如图1所示的情况下）可能会严重影响学习效果及网络识别性能[

在同一图像中，同一对象类别可以呈现不同的比例分布情况。

为了应对上述挑战，这项工作提供了以下主要贡献 :

我们开发了一种称为金字塔形卷积 (PyConv) 的新方法，在该方案中整合了多样化的核级联模块。这些模块以其不同尺寸和深度组合在一起，在不影响整体性能的前提下实现了多级特征提取能力的显著增强。通过这一创新设计，在维持与传统卷积相近计算开销的同时实现更高层次的空间细节捕捉能力。
此外，在模型训练阶段我们发现，在标准卷积的基础上适当调增核级联规模能够显著提升模型在目标检测任务中的识别精度。
在图像分类领域我们开发了两套新型网络架构方案，在实验数据显示这种架构体系在准确率指标上较现有基准方法提升了约15%以上。
值得注意的是这种架构体系不仅在参数规模上实现了与主流轻量化模型相当的优势而且在推理速度方面也表现出了明显优势。
我们研发了一种创新性的语义分割方案其核心组件采用了基于主干特征提取的新颖分支结构。
这种设计使得模型能够在有限的计算资源消耗下实现对长尾类别物体实例的有效识别。

金字塔形卷积

复制代码

 import torch

    
 from torch import nn
    
  
    
 def ConvBNReLU(in_channels, out_channels, kernel_size, stride, groups=1):
    
     return nn.Sequential(
    
     nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
    
               padding=kernel_size // 2, groups=groups),
    
     nn.BatchNorm2d(out_channels),
    
     nn.ReLU(inplace=True)
    
     )
    
  
    
 class PyConv(nn.Module):
    
  
    
     def __init__(self, in_channels, out_channels, kernel_sizes, groups, stride=1):
    
     super(PyConv, self).__init__()
    
     if out_channels is None:
    
         out_channels = []
    
     assert len(out_channels) == len(kernel_sizes) == len(groups)
    
  
    
     self.pyconv_list = nn.ModuleList()
    
     for i in range(len(kernel_sizes)):
    
         self.pyconv_list.append(ConvBNReLU(in_channels=in_channels,out_channels=out_channels[i],kernel_size=kernel_sizes[i],stride=stride,groups=groups[i]))
    
  
    
     def forward(self, x):
    
     outputs = []
    
     for pyconv in self.pyconv_list:
    
         outputs.append(pyconv(x))
    
     return torch.cat(outputs, 1)
    
  
    
 if __name__=='__main__':
    
     input=torch.randn(1,64,64,64)
    
     model=PyConv(64,[16,16,8,24],[3,5,7,9],groups=[4,4,2,8])
    
     output=model(input)
    
     print(output.shape)

标准卷 convolution仅包含一个 kernel size 和一个 depth 参数。进一步说明：kernel 的尺寸直接影响着感受野（field of view）及其所包含的空间信息量。通过增大 kernel 大小（尺寸），我们能够捕获更多的空间信息。其中 depth 参数则代表了每层 convolution 过程中所使用的 feature maps 数量。标准卷 convolution 的参数数量和浮点运算强度是衡量模型复杂度的重要指标

提出了一种新型卷积操作——金字塔形卷积（PyConv），如图2(b)所示，在各个层级上设置不同尺寸的空间窗。该网络架构的主要目标是实现对输入信号在不同尺度下进行处理，并且不额外增加计算开销或模型复杂度（仅从参数数量层面）。具体而言，在各个层级上设置不同尺寸的空间窗：底部（即第1层）的空间窗最小而逐渐增大至顶部（即第n层）。同时，在金字塔的不同层次中也实现了滤波器深度的变化：当空间尺寸逐渐扩大时（即从第1层向第n层发展），滤波器深度逐步减少。

在PyConv的不同层级中采用不同深度的内核以实现多级特征提取的目的下

PyConv的主要优点

多尺度处理

除了相较于标准卷积而言，在不增加额外计算成本的情况下，PyConv能够显著扩大内核的感受范围之外，它还能够适应不同空间分辨率及深度的内核类型。因此，在多个尺度上解析输入信号并捕获更多信息的能力得以实现。这种基于内核类型的双金字塔架构在设计上呈现出两个明显的特点：一方面随着内核尺寸逐渐增大（即连接性增强），另一方面则伴随内核深度（即分支数量）逐步减少（反之亦然）。这种权衡关系使得PyConv具备多样化的组合网络选择能力，在学习过程中能够探索不同类型的内核配置模式。这些配置模式从一种极端（低连接性、大感受范围）到另一种极端（高连接性、小感受范围）实现了平滑过渡，并且通过互补的信息来源提升了整体网络的表现能力。其中具有小感受范围的分支更适合聚焦于细节信息提取，并能够在一定程度上捕捉到较小对象及其部分特征信息；而随着内核尺寸逐渐增大，则能够更加有效地提取较大对象及其上下文相关的信息，并且这种增强效应是更为可靠的

Efﬁciency

相较于标准卷积而言，在计算资源上维持了相近的数量模型参数及需求（如公式所示）。同时，“由于金字塔级能够独立进行并行计算”，这使得PyConv具备了很高的并行能力。

Flexibility.

该架构为各种网络设计提供了灵活的选择，在不影响计算资源的前提下实现多样的配置方案。具体而言，在每一层金字塔结构中，默认情况下内核大小与深度设置可以根据需求自由决定；同时，在每一层处理中输出特征图的数量可以根据具体任务目标进行调节。值得注意的是，在不同层级之间可以选择不同的金字塔设置策略：例如，在关注局部特征的任务中可采用较小感受野下具有较少输出通道的方式；而在涉及全局信息提取的任务中，则适合使用较大感受野下拥有较多输出通道的情况。此外，在整个网络构建过程中还可以根据输入特征图分辨率的变化动态调整各个层级的金字塔级别设置：例如，在较高分辨率输入时可以选择较低级别的金字塔结构；而当分辨率逐渐降低时则可逐步提升金字塔级别以适应更高层次的空间信息提取需求。基于此原则构建起来的模型体系能够有效地适应多种视觉识别任务场景

用于图像分类的PyConv网络

对于图像分类任务中的PyConv网络体系结构设计，在参考文献[7]的基础上借鉴了残余瓶颈构建块这一关键组件。如图4所示，在本网络架构的第一阶段中采用了该模块的具体实例进行展示。具体而言，在该模块中首先通过一个1×1卷积操作将输入的空间特征图尺寸缩减至64通道；随后我们提出了创新性的PyConv设计，并将其划分为四个层级（9×9, 7×7, 5×5, 和3×3）的不同卷积核尺寸设置；值得注意的是，在这一系列层级的设计中，默认情况下各个层级的空间维度输出均为16个通道；此外在每一层处理完毕后均输出对应的特征图集合共计64个通道；随后再通过一个1×1卷积操作恢复特征图的空间维度以完成整个模块功能流程；通常在每个模块执行完主干操作后会遵循批归一化[6]和remu激活函数 [25] 的标准流程设计；此外为了保证网络性能的有效性在此设计中特意引入了跳跃连接机制作为辅助手段以增强模型对输入数据的理解能力

考虑到该网络配置了多层次的内核架构，在各个层级上设置了不同的计算粒度参数。从而该网络能够通过调整各层内核尺寸来实现对特征图的降采样。

复制代码

 """ PyConv networks for image recognition as presented in our paper:

    
     Duta et al. "Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition"
    
     https://arxiv.org/pdf/2006.11538.pdf
    
 """
    
 import torch
    
 import torch.nn as nn
    
 import os
    
 from div.download_from_url import download_from_url
    
  
    
 try:
    
     from torch.hub import _get_torch_home
    
     torch_cache_home = _get_torch_home()
    
 except ImportError:
    
     torch_cache_home = os.path.expanduser(
    
     os.getenv('TORCH_HOME', os.path.join(
    
         os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')))
    
 default_cache_path = os.path.join(torch_cache_home, 'pretrained')
    
  
    
 __all__ = ['PyConvResNet', 'pyconvresnet18', 'pyconvresnet34', 'pyconvresnet50', 'pyconvresnet101', 'pyconvresnet152']
    
  
    
  
    
 model_urls = {
    
     'pyconvresnet50': 'https://drive.google.com/uc?export=download&id=128iMzBnHQSPNehgb8nUF5cJyKBIB7do5',
    
     'pyconvresnet101': 'https://drive.google.com/uc?export=download&id=1fn0eKdtGG7HA30O5SJ1XrmGR_FsQxTb1',
    
     'pyconvresnet152': 'https://drive.google.com/uc?export=download&id=1zR6HOTaHB0t15n6Nh12adX86AhBMo46m',
    
 }
    
  
    
  
    
 class PyConv2d(nn.Module):
    
     """PyConv2d with padding (general case). Applies a 2D PyConv over an input signal composed of several input planes.
    
     Args:
    
     in_channels (int): Number of channels in the input image
    
     out_channels (list): Number of channels for each pyramid level produced by the convolution
    
     pyconv_kernels (list): Spatial size of the kernel for each pyramid level
    
     pyconv_groups (list): Number of blocked connections from input channels to output channels for each pyramid level
    
     stride (int or tuple, optional): Stride of the convolution. Default: 1
    
     dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
    
     bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``False``
    
     Example::
    
     >>> # PyConv with two pyramid levels, kernels: 3x3, 5x5
    
     >>> m = PyConv2d(in_channels=64, out_channels=[32, 32], pyconv_kernels=[3, 5], pyconv_groups=[1, 4])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     >>> # PyConv with three pyramid levels, kernels: 3x3, 5x5, 7x7
    
     >>> m = PyConv2d(in_channels=64, out_channels=[16, 16, 32], pyconv_kernels=[3, 5, 7], pyconv_groups=[1, 4, 8])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     """
    
     def __init__(self, in_channels, out_channels, pyconv_kernels, pyconv_groups, stride=1, dilation=1, bias=False):
    
     super(PyConv2d, self).__init__()
    
  
    
     assert len(out_channels) == len(pyconv_kernels) == len(pyconv_groups)
    
  
    
     self.pyconv_levels = [None] * len(pyconv_kernels)
    
     for i in range(len(pyconv_kernels)):
    
         self.pyconv_levels[i] = nn.Conv2d(in_channels, out_channels[i], kernel_size=pyconv_kernels[i],
    
                                           stride=stride, padding=pyconv_kernels[i] // 2, groups=pyconv_groups[i],
    
                                           dilation=dilation, bias=bias)
    
     self.pyconv_levels = nn.ModuleList(self.pyconv_levels)
    
  
    
     def forward(self, x):
    
     out = []
    
     for level in self.pyconv_levels:
    
         out.append(level(x))
    
  
    
     return torch.cat(out, 1)
    
  
    
  
    
 def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dilation=1, groups=1):
    
     """standard convolution with padding"""
    
     return nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
    
                  padding=padding, dilation=dilation, groups=groups, bias=False)
    
  
    
  
    
 def conv1x1(in_planes, out_planes, stride=1):
    
     """1x1 convolution"""
    
     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
    
  
    
  
    
 class PyConv4(nn.Module):
    
  
    
     def __init__(self, inplans, planes, pyconv_kernels=[3, 5, 7, 9], stride=1, pyconv_groups=[1, 4, 8, 16]):
    
     super(PyConv4, self).__init__()
    
     self.conv2_1 = conv(inplans, planes//4, kernel_size=pyconv_kernels[0], padding=pyconv_kernels[0]//2,
    
                         stride=stride, groups=pyconv_groups[0])
    
     self.conv2_2 = conv(inplans, planes//4, kernel_size=pyconv_kernels[1], padding=pyconv_kernels[1]//2,
    
                         stride=stride, groups=pyconv_groups[1])
    
     self.conv2_3 = conv(inplans, planes//4, kernel_size=pyconv_kernels[2], padding=pyconv_kernels[2]//2,
    
                         stride=stride, groups=pyconv_groups[2])
    
     self.conv2_4 = conv(inplans, planes//4, kernel_size=pyconv_kernels[3], padding=pyconv_kernels[3]//2,
    
                         stride=stride, groups=pyconv_groups[3])
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x), self.conv2_3(x), self.conv2_4(x)), dim=1)
    
  
    
  
    
 class PyConv3(nn.Module):
    
  
    
     def __init__(self, inplans, planes,  pyconv_kernels=[3, 5, 7], stride=1, pyconv_groups=[1, 4, 8]):
    
     super(PyConv3, self).__init__()
    
     self.conv2_1 = conv(inplans, planes // 4, kernel_size=pyconv_kernels[0], padding=pyconv_kernels[0] // 2,
    
                         stride=stride, groups=pyconv_groups[0])
    
     self.conv2_2 = conv(inplans, planes // 4, kernel_size=pyconv_kernels[1], padding=pyconv_kernels[1] // 2,
    
                         stride=stride, groups=pyconv_groups[1])
    
     self.conv2_3 = conv(inplans, planes // 2, kernel_size=pyconv_kernels[2], padding=pyconv_kernels[2] // 2,
    
                         stride=stride, groups=pyconv_groups[2])
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x), self.conv2_3(x)), dim=1)
    
  
    
  
    
 class PyConv2(nn.Module):
    
  
    
     def __init__(self, inplans, planes,pyconv_kernels=[3, 5], stride=1, pyconv_groups=[1, 4]):
    
     super(PyConv2, self).__init__()
    
     self.conv2_1 = conv(inplans, planes // 2, kernel_size=pyconv_kernels[0], padding=pyconv_kernels[0] // 2,
    
                         stride=stride, groups=pyconv_groups[0])
    
     self.conv2_2 = conv(inplans, planes // 2, kernel_size=pyconv_kernels[1], padding=pyconv_kernels[1] // 2,
    
                         stride=stride, groups=pyconv_groups[1])
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x)), dim=1)
    
  
    
  
    
 def get_pyconv(inplans, planes, pyconv_kernels, stride=1, pyconv_groups=[1]):
    
     if len(pyconv_kernels) == 1:
    
     return conv(inplans, planes, kernel_size=pyconv_kernels[0], stride=stride, groups=pyconv_groups[0])
    
     elif len(pyconv_kernels) == 2:
    
     return PyConv2(inplans, planes, pyconv_kernels=pyconv_kernels, stride=stride, pyconv_groups=pyconv_groups)
    
     elif len(pyconv_kernels) == 3:
    
     return PyConv3(inplans, planes, pyconv_kernels=pyconv_kernels, stride=stride, pyconv_groups=pyconv_groups)
    
     elif len(pyconv_kernels) == 4:
    
     return PyConv4(inplans, planes, pyconv_kernels=pyconv_kernels, stride=stride, pyconv_groups=pyconv_groups)
    
  
    
  
    
 class PyConvBlock(nn.Module):
    
     expansion = 4
    
  
    
     def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None, pyconv_groups=1, pyconv_kernels=1):
    
     super(PyConvBlock, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     # Both self.conv2 and self.downsample layers downsample the input when stride != 1
    
     self.conv1 = conv1x1(inplanes, planes)
    
     self.bn1 = norm_layer(planes)
    
     self.conv2 = get_pyconv(planes, planes, pyconv_kernels=pyconv_kernels, stride=stride,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn2 = norm_layer(planes)
    
     self.conv3 = conv1x1(planes, planes * self.expansion)
    
     self.bn3 = norm_layer(planes * self.expansion)
    
     self.relu = nn.ReLU(inplace=True)
    
     self.downsample = downsample
    
     self.stride = stride
    
  
    
     def forward(self, x):
    
     identity = x
    
  
    
     out = self.conv1(x)
    
     out = self.bn1(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv2(out)
    
     out = self.bn2(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv3(out)
    
     out = self.bn3(out)
    
  
    
     if self.downsample is not None:
    
         identity = self.downsample(x)
    
  
    
     out += identity
    
     out = self.relu(out)
    
  
    
     return out
    
  
    
  
    
 class PyConvBasicBlock1(nn.Module):
    
     expansion = 1
    
  
    
     def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None, pyconv_groups=1, pyconv_kernels=1):
    
     super(PyConvBasicBlock1, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     # Both self.conv1 and self.downsample layers downsample the input when stride != 1
    
     self.conv1 = get_pyconv(inplanes, planes, pyconv_kernels=pyconv_kernels, stride=stride,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn1 = norm_layer(planes)
    
     self.relu = nn.ReLU(inplace=True)
    
     self.conv2 = get_pyconv(planes, planes, pyconv_kernels=pyconv_kernels, stride=1,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn2 = norm_layer(planes)
    
     self.downsample = downsample
    
     self.stride = stride
    
  
    
     def forward(self, x):
    
     identity = x
    
  
    
     out = self.conv1(x)
    
     out = self.bn1(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv2(out)
    
     out = self.bn2(out)
    
  
    
     if self.downsample is not None:
    
         identity = self.downsample(x)
    
  
    
     out += identity
    
     out = self.relu(out)
    
  
    
     return out
    
  
    
  
    
 class PyConvBasicBlock2(nn.Module):
    
     expansion = 1
    
  
    
     def __init__(self, inplanes, planes, stride=1, downsample=None, norm_layer=None, pyconv_groups=1, pyconv_kernels=1):
    
     super(PyConvBasicBlock2, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     # Both self.conv1 and self.downsample layers downsample the input when stride != 1
    
     self.conv1 = get_pyconv(inplanes, planes, pyconv_kernels=pyconv_kernels, stride=stride,
    
                             pyconv_groups=pyconv_groups)
    
     self.bn1 = norm_layer(planes)
    
     self.relu = nn.ReLU(inplace=True)
    
     self.conv2 = conv1x1(planes, planes * self.expansion)
    
     self.bn2 = norm_layer(planes)
    
     self.downsample = downsample
    
     self.stride = stride
    
  
    
     def forward(self, x):
    
     identity = x
    
  
    
     out = self.conv1(x)
    
     out = self.bn1(out)
    
     out = self.relu(out)
    
  
    
     out = self.conv2(out)
    
     out = self.bn2(out)
    
  
    
     if self.downsample is not None:
    
         identity = self.downsample(x)
    
  
    
     out += identity
    
     out = self.relu(out)
    
  
    
     return out
    
  
    
  
    
 class PyConvResNet(nn.Module):
    
  
    
     def __init__(self, block, layers, num_classes=1000, zero_init_residual=False, norm_layer=None, dropout_prob0=0.0):
    
     super(PyConvResNet, self).__init__()
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
  
    
     self.inplanes = 64
    
     self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
    
     self.bn1 = norm_layer(64)
    
     self.relu = nn.ReLU(inplace=True)
    
  
    
     self.layer1 = self._make_layer(block, 64, layers[0], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3, 5, 7, 9], pyconv_groups=[1, 4, 8, 16])
    
     self.layer2 = self._make_layer(block, 128, layers[1], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3, 5, 7], pyconv_groups=[1, 4, 8])
    
     self.layer3 = self._make_layer(block, 256, layers[2], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3, 5], pyconv_groups=[1, 4])
    
     self.layer4 = self._make_layer(block, 512, layers[3], stride=2, norm_layer=norm_layer,
    
                                    pyconv_kernels=[3], pyconv_groups=[1])
    
     self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    
  
    
     if dropout_prob0 > 0.0:
    
         self.dp = nn.Dropout(dropout_prob0, inplace=True)
    
         print("Using Dropout with the prob to set to 0 of: ", dropout_prob0)
    
     else:
    
         self.dp = None
    
  
    
     self.fc = nn.Linear(512 * block.expansion, num_classes)
    
  
    
     for m in self.modules():
    
         if isinstance(m, nn.Conv2d):
    
             nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    
         elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
    
             nn.init.constant_(m.weight, 1)
    
             nn.init.constant_(m.bias, 0)
    
  
    
     # Zero-initialize the last BN in each residual branch,
    
     # so that the residual branch starts with zeros, and each residual block behaves like an identity.
    
     # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
    
     if zero_init_residual:
    
         for m in self.modules():
    
             if isinstance(m, PyConvBlock):
    
                 nn.init.constant_(m.bn3.weight, 0)
    
  
    
     def _make_layer(self, block, planes, blocks, stride=1, norm_layer=None, pyconv_kernels=[3], pyconv_groups=[1]):
    
     if norm_layer is None:
    
         norm_layer = nn.BatchNorm2d
    
     downsample = None
    
     if stride != 1 and self.inplanes != planes * block.expansion:
    
         downsample = nn.Sequential(
    
             nn.MaxPool2d(kernel_size=3, stride=stride, padding=1),
    
             conv1x1(self.inplanes, planes * block.expansion),
    
             norm_layer(planes * block.expansion),
    
         )
    
     elif self.inplanes != planes * block.expansion:
    
         downsample = nn.Sequential(
    
             conv1x1(self.inplanes, planes * block.expansion),
    
             norm_layer(planes * block.expansion),
    
         )
    
     elif stride != 1:
    
         downsample = nn.MaxPool2d(kernel_size=3, stride=stride, padding=1)
    
  
    
     layers = []
    
     layers.append(block(self.inplanes, planes, stride=stride, downsample=downsample, norm_layer=norm_layer,
    
                         pyconv_kernels=pyconv_kernels, pyconv_groups=pyconv_groups))
    
     self.inplanes = planes * block.expansion
    
  
    
     for _ in range(1, blocks):
    
         layers.append(block(self.inplanes, planes, norm_layer=norm_layer,
    
                             pyconv_kernels=pyconv_kernels, pyconv_groups=pyconv_groups))
    
  
    
     return nn.Sequential(*layers)
    
  
    
     def forward(self, x):
    
     x = self.conv1(x)
    
     x = self.bn1(x)
    
     x = self.relu(x)
    
  
    
     x = self.layer1(x)
    
     x = self.layer2(x)
    
     x = self.layer3(x)
    
     x = self.layer4(x)
    
  
    
     x = self.avgpool(x)
    
     x = x.view(x.size(0), -1)
    
  
    
     if self.dp is not None:
    
         x = self.dp(x)
    
  
    
     x = self.fc(x)
    
  
    
     return x
    
  
    
  
    
 def pyconvresnet18(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-18 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     #model = PyConvResNet(PyConvBasicBlock1, [2, 2, 2, 2], **kwargs) #params=11.21M GFLOPs 1.55
    
     model = PyConvResNet(PyConvBasicBlock2, [2, 2, 2, 2], **kwargs)  #params=5.91M GFLOPs 0.88
    
     if pretrained:
    
     raise NotImplementedError("Not available the pretrained model yet!")
    
  
    
     return model
    
  
    
  
    
 def pyconvresnet34(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-34 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     #model = PyConvResNet(PyConvBasicBlock1, [3, 4, 6, 3], **kwargs) #params=20.44M GFLOPs 3.09
    
     model = PyConvResNet(PyConvBasicBlock2, [3, 4, 6, 3], **kwargs)  #params=11.09M GFLOPs 1.75
    
     if pretrained:
    
     raise NotImplementedError("Not available the pretrained model yet!")
    
  
    
     return model
    
  
    
  
    
 def pyconvresnet50(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-50 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     model = PyConvResNet(PyConvBlock, [3, 4, 6, 3], **kwargs)
    
     if pretrained:
    
     os.makedirs(default_cache_path, exist_ok=True)
    
     model.load_state_dict(torch.load(download_from_url(model_urls['pyconvresnet50'],
    
                                                        root=default_cache_path)))
    
     return model
    
  
    
  
    
 def pyconvresnet101(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-101 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     model = PyConvResNet(PyConvBlock, [3, 4, 23, 3], **kwargs)
    
     if pretrained:
    
     os.makedirs(default_cache_path, exist_ok=True)
    
     model.load_state_dict(torch.load(download_from_url(model_urls['pyconvresnet101'],
    
                                                        root=default_cache_path)))
    
     return model
    
  
    
  
    
 def pyconvresnet152(pretrained=False, **kwargs):
    
     """Constructs a PyConvResNet-152 model.
    
     Args:
    
     pretrained (bool): If True, returns a model pre-trained on ImageNet
    
     """
    
     model = PyConvResNet(PyConvBlock, [3, 8, 36, 3], **kwargs)
    
     if pretrained:
    
     os.makedirs(default_cache_path, exist_ok=True)
    
     model.load_state_dict(torch.load(download_from_url(model_urls['pyconvresnet152'],
    
                                                        root=default_cache_path)))
    
     return model

语义分割的PyConv网络

该方案所设计的场景解析（图像分割）架构已在图5中展示。为了构建高效的场景解析管道，在设计过程中需通过设计能够有效提取主干特征图的关键模块来实现。这些模块不仅能在微观层面上捕捉细节特征，在处理细节层面时具有高度专注力的同时需充分考虑各部分之间的相互作用关系。为此开发了一种创新性的PyConv heads架构——即PyConv heads with Partitioned Hierarchical Attention（简称PyConv-PHA）。这种架构能系统地综合考虑局内与局外特征并可支持多尺度分析以更好地适应复杂模式的变化趋势。

PyConvPH包含三个主要组成部分

LocalPyConv块（LocalPyConv）是专门设计用于处理小规模物体的一种模块，在多尺度上有效提取局部精细细节（如图5所示）。该模块采用了多种不同尺寸和深度的内核（即不同类型的内核），这也可以被视为一种局部多尺度上下文聚合机制。图6(a)展示了该模块各组成部分的具体信息分布情况。该模块从主干网络中提取输出特征图后，通过1×1卷积将512通道缩减为更少数量特征图；随后执行四次PyConv操作（分别使用9×9、7×7、5×5和3×3大小内核），以在不同尺度下捕捉多样化的局部细节特征（如图4所示）。值得注意的是，在这些操作中，默认采用G组连接方式以增强模型参数效率；最后再通过1×1卷积整合来自不同内核大小及深度层次的信息输出结果。（注：所有卷积操作后通常会紧跟批归一化层[6]并附加relu激活函数）

GlobalPyConv模块专门负责收集与场景相关的全局细节信息 ，并能够有效处理较大尺寸的对象。它设计了一个多尺度的全局聚合机制 。从图6(b)可以看出该模块包含的主要组件部分。考虑到输入图像尺寸可能存在差异性的影响因素，在保证能够完整捕捉全局信息的前提下，我们将最大空间维度设定为9单位长度。为此我们采用了自适应均值池化技术，将特征图的空间尺度压缩至9×9（适用于正方形图像的情况），这不仅降低了计算复杂度，还能维持合理的空间分辨率水平。随后我们使用1×1卷积操作来缩减特征图的空间维度并降低通道数量，具体应用于512个通道层面上。与局部PyConv（LocalPyConv）类似，当前模块同样采用了四层的PyConv结构进行设计。然而由于当前的空间尺度已经被压缩至9×9范围内，因此其内核响应范围显著扩大，最终形成了一个能够覆盖整个输入区域的全局感知机制（如图5所示）。在此基础上我们进一步采用1×1卷积操作来进行跨尺度特征融合。最后通过双线性插值技术将上一层的特征映射放大至与自适应均值池化操作缩放前的原始尺寸一致

在PyConv块中首先结合来自LocalPyConv和GlobalPyConv输出特征图进行串联操作，在生成的1024特征图块上应用了单个级别的PyConv操作。由于先前层已整合所有上下文信息，在此阶段重点聚焦于整合这些信息以接近最终分类目标。为了生成最终输出结果，在此基础上框架采用了上采样层（配合双线性插值）将特征图恢复至初始输入图像尺寸；随后引入一个1 × 1卷积层来生成与类数相当尺寸的输出。如图5所示，在多个内核尺度上捕获本地和全局信息并解析图像的能力使我们的框架表现出色；经过测试其性能优于现有最先进的同类框架。

复制代码

 """ PyConv network for semantic segmentation  as presented in our paper:

    
     Duta et al. "Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition"
    
     https://arxiv.org/pdf/2006.11538.pdf
    
 """
    
 import torch
    
 from torch import nn
    
 import torch.nn.functional as F
    
  
    
 from model.build_backbone_layers import build_backbone_layers
    
  
    
  
    
 class PyConv2d(nn.Module):
    
     """PyConv2d with padding (general case). Applies a 2D PyConv over an input signal composed of several input planes.
    
     Args:
    
     in_channels (int): Number of channels in the input image
    
     out_channels (list): Number of channels for each pyramid level produced by the convolution
    
     pyconv_kernels (list): Spatial size of the kernel for each pyramid level
    
     pyconv_groups (list): Number of blocked connections from input channels to output channels for each pyramid level
    
     stride (int or tuple, optional): Stride of the convolution. Default: 1
    
     dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
    
     bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``False``
    
     Example::
    
     >>> # PyConv with two pyramid levels, kernels: 3x3, 5x5
    
     >>> m = PyConv2d(in_channels=64, out_channels=[32, 32], pyconv_kernels=[3, 5], pyconv_groups=[1, 4])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     >>> # PyConv with three pyramid levels, kernels: 3x3, 5x5, 7x7
    
     >>> m = PyConv2d(in_channels=64, out_channels=[16, 16, 32], pyconv_kernels=[3, 5, 7], pyconv_groups=[1, 4, 8])
    
     >>> input = torch.randn(4, 64, 56, 56)
    
     >>> output = m(input)
    
     """
    
     def __init__(self, in_channels, out_channels, pyconv_kernels, pyconv_groups, stride=1, dilation=1, bias=False):
    
     super(PyConv2d, self).__init__()
    
  
    
     assert len(out_channels) == len(pyconv_kernels) == len(pyconv_groups)
    
  
    
     self.pyconv_levels = [None] * len(pyconv_kernels)
    
     for i in range(len(pyconv_kernels)):
    
         self.pyconv_levels[i] = nn.Conv2d(in_channels, out_channels[i], kernel_size=pyconv_kernels[i],
    
                                           stride=stride, padding=pyconv_kernels[i] // 2, groups=pyconv_groups[i],
    
                                           dilation=dilation, bias=bias)
    
     self.pyconv_levels = nn.ModuleList(self.pyconv_levels)
    
  
    
     def forward(self, x):
    
     out = []
    
     for level in self.pyconv_levels:
    
         out.append(level(x))
    
  
    
     return torch.cat(out, 1)
    
  
    
  
    
 class PyConv4(nn.Module):
    
  
    
     def __init__(self, inplans, planes, pyconv_kernels=[3, 5, 7, 9], stride=1, pyconv_groups=[1, 4, 8, 16]):
    
     super(PyConv4, self).__init__()
    
  
    
     self.conv2_1 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[0], stride=stride,
    
                              padding=pyconv_kernels[0]//2, dilation=1, groups=pyconv_groups[0], bias=False)
    
     self.conv2_2 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[1], stride=stride,
    
                              padding=pyconv_kernels[1] // 2, dilation=1, groups=pyconv_groups[1], bias=False)
    
     self.conv2_3 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[2], stride=stride,
    
                              padding=pyconv_kernels[2] // 2, dilation=1, groups=pyconv_groups[2], bias=False)
    
     self.conv2_4 = nn.Conv2d(inplans, planes // 4, kernel_size=pyconv_kernels[3], stride=stride,
    
                              padding=pyconv_kernels[3] // 2, dilation=1, groups=pyconv_groups[3], bias=False)
    
  
    
     def forward(self, x):
    
     return torch.cat((self.conv2_1(x), self.conv2_2(x), self.conv2_3(x), self.conv2_4(x)), dim=1)
    
  
    
  
    
 class GlobalPyConvBlock(nn.Module):
    
     def __init__(self, in_dim, reduction_dim, bins, BatchNorm):
    
     super(GlobalPyConvBlock, self).__init__()
    
     self.features = nn.Sequential(
    
             nn.AdaptiveAvgPool2d(bins),
    
             nn.Conv2d(in_dim, reduction_dim, kernel_size=1, bias=False),
    
             BatchNorm(reduction_dim),
    
             nn.ReLU(inplace=True),
    
             PyConv4(reduction_dim, reduction_dim),
    
             BatchNorm(reduction_dim),
    
             nn.ReLU(inplace=True),
    
             nn.Conv2d(reduction_dim, reduction_dim, kernel_size=1, bias=False),
    
             BatchNorm(reduction_dim),
    
             nn.ReLU(inplace=True)
    
     )
    
  
    
     def forward(self, x):
    
     x_size = x.size()
    
     x = F.interpolate(self.features(x), x_size[2:], mode='bilinear', align_corners=True)
    
     return x
    
  
    
  
    
 class LocalPyConvBlock(nn.Module):
    
     def __init__(self, inplanes, planes, BatchNorm, reduction1=4):
    
     super(LocalPyConvBlock, self).__init__()
    
     self.layers = nn.Sequential(
    
         nn.Conv2d(inplanes, inplanes//reduction1, kernel_size=1, bias=False),
    
         BatchNorm(inplanes // reduction1),
    
         nn.ReLU(inplace=True),
    
         PyConv4(inplanes // reduction1, inplanes // reduction1),
    
         BatchNorm(inplanes // reduction1),
    
         nn.ReLU(inplace=True),
    
         nn.Conv2d(inplanes // reduction1, planes, kernel_size=1, bias=False),
    
         BatchNorm(planes),
    
         nn.ReLU(inplace=True),
    
  
    
     )
    
  
    
     def forward(self, x):
    
     return self.layers(x)
    
  
    
  
    
 class MergeLocalGlobal(nn.Module):
    
     def __init__(self, inplanes, planes, BatchNorm):
    
     super(MergeLocalGlobal, self).__init__()
    
  
    
     self.features = nn.Sequential(
    
         nn.Conv2d(inplanes, planes,  kernel_size=3, padding=1, groups=1, bias=False),
    
         BatchNorm(planes),
    
         nn.ReLU(inplace=True)
    
     )
    
  
    
     def forward(self, local_context, global_context):
    
     x = torch.cat((local_context, global_context), dim=1)
    
     x = self.features(x)
    
     return x
    
  
    
  
    
 class PyConvHead(nn.Module):
    
     def __init__(self, inplanes, planes, BatchNorm):
    
     super(PyConvHead, self).__init__()
    
  
    
     out_size_local_context = 512
    
     out_size_global_context = 512
    
  
    
     self.local_context = LocalPyConvBlock(inplanes, out_size_local_context, BatchNorm, reduction1=4)
    
     self.global_context = GlobalPyConvBlock(inplanes, out_size_global_context, 9, BatchNorm)
    
  
    
     self.merge_context = MergeLocalGlobal(out_size_local_context + out_size_global_context, planes, BatchNorm)
    
  
    
     def forward(self, x):
    
     x = self.merge_context(self.local_context(x), self.global_context(x))
    
     return x
    
  
    
  
    
 class PyConvSegNet(nn.Module):
    
     def __init__(self, layers=50, dropout=0.1, classes=2, zoom_factor=8,
    
              criterion=nn.CrossEntropyLoss(ignore_index=255), BatchNorm=nn.BatchNorm2d, pretrained=True,
    
              backbone_output_stride=16, backbone_net='resnet'):
    
     super(PyConvSegNet, self).__init__()
    
     assert layers in [50, 101, 152, 200]
    
     assert classes > 1
    
     assert zoom_factor in [1, 2, 4, 8]
    
     self.zoom_factor = zoom_factor
    
     self.criterion = criterion
    
     self.layer0, self.layer1, self.layer2, self.layer3, self.layer4 = build_backbone_layers(backbone_net,
    
                                                                                              layers,
    
                                                                                              pretrained,
    
                                                                                              backbone_output_stride=backbone_output_stride,
    
                                                                                              convert_bn=BatchNorm)
    
     backbone_output_maps = 2048
    
     out_merge_all = 256
    
     self.pyconvhead = PyConvHead(backbone_output_maps, out_merge_all, BatchNorm)
    
  
    
     self.aux = nn.Sequential(
    
         nn.Conv2d(1024, 256, kernel_size=3, padding=1, bias=False),
    
         BatchNorm(256),
    
         nn.ReLU(inplace=True),
    
         nn.Dropout2d(p=dropout),
    
         nn.Conv2d(256, classes, kernel_size=1)
    
     )
    
  
    
     self.cls = nn.Sequential(
    
         nn.Dropout2d(p=dropout),
    
         nn.Conv2d(out_merge_all, classes, kernel_size=1)
    
     )
    
  
    
     def forward(self, x, y=None):
    
     x_size = x.size()
    
     assert (x_size[2]-1) % 8 == 0 and (x_size[3]-1) % 8 == 0
    
     h = int((x_size[2] - 1) / 8 * self.zoom_factor + 1)
    
     w = int((x_size[3] - 1) / 8 * self.zoom_factor + 1)
    
  
    
     x = self.layer0(x)
    
     x = self.layer1(x)
    
     x = self.layer2(x)
    
     out_stage3 = self.layer3(x)
    
     x = self.layer4(out_stage3)
    
  
    
     x = self.pyconvhead(x)
    
  
    
     x = self.cls(x)
    
  
    
     if self.zoom_factor != 1:
    
         x = F.interpolate(x, size=(h, w), mode='bilinear', align_corners=True)
    
  
    
     if self.training:
    
         main_loss = self.criterion(x, y)
    
  
    
         aux = self.aux(out_stage3)
    
         if self.zoom_factor != 1:
    
             aux = F.interpolate(aux, size=(h, w), mode='bilinear', align_corners=True)
    
             aux_loss = self.criterion(aux, y)
    
  
    
         return x.max(1)[1], main_loss, aux_loss
    
  
    
     else:
    
         return x
    
  
    
  
    
 if __name__ == '__main__':
    
     import os
    
     os.environ["CUDA_VISIBLE_DEVICES"] = '0, 1'
    
     size_input = 473 #817 #577 #473 #713
    
     input = torch.rand(1, 3, size_input, size_input)#.cuda()
    
     model = PyConvSegNet(layers=50, dropout=0.1, classes=150, zoom_factor=8,
    
                   pretrained=False, backbone_output_stride=8, backbone_net='resnet')#.cuda()
    
     model.eval()
    
     print(model)
    
     output = model(input)
    
     print('PyConvSegNet', output.size())

参考文献：

该模型基于深度学习框架进行设计与实现，并结合了残差网络的基本特性进行优化改进

在该项目的主仓库位置上提供了特定Python代码文件

全部评论 (0)

还没有任何评论哟~

金字塔卷积: 视觉识别的卷积神经网络的重新思考（Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual ）

引言大多数cnn使用相对较小的内核大小，通常为3×3，这是因为:增加大小会在参数量和计算复杂性方面带来大量成本。为了应对小卷积核不能覆盖输入的大区域，CNNs使用具有小卷积核大小和下采样层的卷积链，...

Pyramidal Convolution Rethinking Convolutional Neural Networks for Visual Recognition

PyramidalConvolution:RethinkingConvolutionalNeuralNetworksforVisualRecognition PyramidalConvolution:...

Convolutional Neural Networks (卷积神经网络)

前言: 刚进实验室，被叫去看CNN。看了一些博客和论文，消化了很久，同时觉得一些博客存在一些谬误。我在这里便尽量更正，并加入自己的思考。如果觉得本文有哪里不妥或疑惑，请在下面发表评论，大家一起探讨。

卷积神经网络 (Convolution Neural Networks, CNN)

卷积神经网络一般用在图像处理、计算机视觉等领域。下面14节介绍了构造卷积神经网络基础知识，第5节介绍一些经典的卷积神经网络，79节介绍了三种CNN常见应用：目标检测、人脸识别、风格迁移。

卷积神经网络(convolutional neural networks, CNN)

文章大部分摘自知乎，地址为：<https://zhuanlan.zhihu.com/p/31727402 卷积神经网络（ConvolutionalNeuralNetwork，CNN）是一种前馈神经网络...

卷积神经网络（Convolutional Neural Networks）概述

卷积神经网络是人工神经网络的一种，已成为当前语音分析和图像识别领域的研究热点。它的权值共享网络结构使之更类似于生物神经网络，降低了网络模型的复杂度，减少了权值的数量。该优点在网络的输入是多维图像时表现...

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition论文阅读翻译

PyramidalConvolution:RethinkingConvolutionalNeuralNetworksforVisualRecognition论文阅读翻译目录： PyramidalCo...

深度卷积神经网络——Deep Convolutional Neural Networks

本文是对深度学习大牛Hinton于2012年发表在NIPS上面的论文《ImageNetClassificationwithDeepConvolutionalNeuralNetworks》进行学习的笔记...

吴恩达：卷积神经网络（Convolutional Neural Networks）

文章目录 1.1计算机视觉 1.2边缘检测示例 1.3更多的边缘检测内容 1.4padding 1.5卷积步长stridedconvolution 1.6三维卷积 1.7单层神经网络 1.8简单卷积网...

深入理解卷积神经网络（卷积篇）（Convolutional Neural Networks, CNNs）

概述卷积神经网络，也称为卷积网络，简称CNN，是神经网络的一种。其可用于一维时间序列的处理，也可以用于二维（如图像）序列的处理。目前，卷积神经网络可以说是应用最为广泛、效果最为出众的方法之一。相比...

是否确定退出登录?

金字塔卷积: 视觉识别的卷积神经网络的重新思考（Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual ）

引言

相关工作

金字塔形卷积

PyConv的主要优点

多尺度处理

Efﬁciency

Flexibility.

用于图像分类的PyConv网络

语义分割的PyConv网络

全部评论 (0)

相关文章推荐

金字塔卷积: 视觉识别的卷积神经网络的重新思考（Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual ）

Pyramidal Convolution Rethinking Convolutional Neural Networks for Visual Recognition

Convolutional Neural Networks (卷积神经网络)

卷积神经网络 (Convolution Neural Networks, CNN)

卷积神经网络(convolutional neural networks, CNN)

卷积神经网络（Convolutional Neural Networks）概述

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition论文阅读翻译

深度卷积神经网络——Deep Convolutional Neural Networks

吴恩达：卷积神经网络（Convolutional Neural Networks）

深入理解卷积神经网络（卷积篇）（Convolutional Neural Networks, CNNs）