Advertisement

深度学习论文: RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization及其PyTorch实现

阅读量:

深度学习论文

注释

1 概述

RepGhostNet通过结构重参数机制实现了特征复用,并赋予GhostNet具备良好硬件友好特性的方案.ImageNet与COCO基线任务经过所提出方案的实施后验证了其性能优越性和高效性.

在这里插入图片描述

2 RepGhostNet

2-1 Feature Reuse via Re-parameterization

GhostNet采用了廉价操作以实现特征复用;然而这种Concat操作对硬件并不友好。尽管这种Concat操作没有参数且无需FLOPs计算(...),但其计算开销不可小觑(如图所示)。在batch_size增加的情况下(如 batch_size 增大), Concat 与 Add 之间的运行时间差逐渐扩大。

在这里插入图片描述

Re-parameterization vs. Concatenation

Concatenation可以表示为

在这里插入图片描述

Re-parameterization可以表示为

在这里插入图片描述

因此促使我们有无意地采用Re-parameterization替代Concatenation的操作

2-2 RepGhost Module

在这里插入图片描述

上图给出了Ghost到RepGhost的进化过程,包含以下三点:

  • 通过将Concat替换成Add来减少计算开销;
    • 一种基于ReL族的激活函数迁移到该框架中;
    • 通过引入带有Batch Normalization的捷径连接来增强模型的非线性特性

PyTorch代码:

复制代码
    class RepGhostModule(nn.Module):
    def __init__(self, inp, oup, kernel_size=1, dw_size=3, stride=1, act_type=dict(type='ReLU', inplace=True), use_act=True,
                 deploy=False, reparam_bn=True, reparam_identity=False):
        super(RepGhostModule, self).__init__()
        init_channels = oup
        new_channels = oup
        self.deploy = deploy
    
        self.primary_conv = nn.Sequential(
            nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size // 2, bias=False),
            nn.BatchNorm2d(init_channels),
            build_activation_layer(act_type) if use_act else nn.Sequential(),
        )
    
        fusion_conv = []
        fusion_bn = []
        if not deploy and reparam_bn:
            fusion_conv.append(nn.Identity())
            fusion_bn.append(nn.BatchNorm2d(init_channels))
        if not deploy and reparam_identity:
            fusion_conv.append(nn.Identity())
            fusion_bn.append(nn.Identity())
    
        self.fusion_conv = nn.Sequential(*fusion_conv)
        self.fusion_bn = nn.Sequential(*fusion_bn)
    
        self.cheap_operation = nn.Sequential(
            nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size // 2, groups=init_channels, bias=self.deploy),
            nn.BatchNorm2d(new_channels) if not self.deploy else nn.Sequential(),
        )
    
        if self.deploy:
            self.cheap_operation = self.cheap_operation[0]
    
        self.relu = build_activation_layer(act_type) if use_act else nn.Sequential()
    
    
    def forward(self, x):
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        for conv, bn in zip(self.fusion_conv, self.fusion_bn):
            x2 = x2 + bn(conv(x1))
        return self.relu(x2)
    
    def get_equivalent_kernel_bias(self):
        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.cheap_operation[0], self.cheap_operation[1])
        for conv, bn in zip(self.fusion_conv, self.fusion_bn):
            kernel, bias = self._fuse_bn_tensor(conv, bn, kernel3x3.shape[0], kernel3x3.device)
            kernel3x3 += self._pad_1x1_to_3x3_tensor(kernel)
            bias3x3 += bias
        return kernel3x3, bias3x3
    
    @staticmethod
    def _pad_1x1_to_3x3_tensor(kernel1x1):
        if kernel1x1 is None:
            return 0
        else:
            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
    
    @staticmethod
    def _fuse_bn_tensor(conv, bn, in_channels=None, device=None):
        in_channels = in_channels if in_channels else bn.running_mean.shape[0]
        device = device if device else bn.weight.device
        if isinstance(conv, nn.Conv2d):
            kernel = conv.weight
            assert conv.bias is None
        else:
            assert isinstance(conv, nn.Identity)
            kernel_value = np.zeros((in_channels, 1, 1, 1), dtype=np.float32)
            for i in range(in_channels):
                kernel_value[i, 0, 0, 0] = 1
            kernel = torch.from_numpy(kernel_value).to(device)
    
        if isinstance(bn, nn.BatchNorm2d):
            running_mean = bn.running_mean
            running_var = bn.running_var
            gamma = bn.weight
            beta = bn.bias
            eps = bn.eps
            std = (running_var + eps).sqrt()
            t = (gamma / std).reshape(-1, 1, 1, 1)
            return kernel * t, beta - running_mean * gamma / std
        assert isinstance(bn, nn.Identity)
        return kernel, torch.zeros(in_channels).to(kernel.device)
    
    def switch_to_deploy(self):
        if len(self.fusion_conv) == 0 and len(self.fusion_bn) == 0:
            return
        kernel, bias = self.get_equivalent_kernel_bias()
        self.cheap_operation = nn.Conv2d(in_channels=self.cheap_operation[0].in_channels,
                                         out_channels=self.cheap_operation[0].out_channels,
                                         kernel_size=self.cheap_operation[0].kernel_size,
                                         padding=self.cheap_operation[0].padding,
                                         dilation=self.cheap_operation[0].dilation,
                                         groups=self.cheap_operation[0].groups,
                                         bias=True)
        self.cheap_operation.weight.data = kernel
        self.cheap_operation.bias.data = bias
        self.__delattr__('fusion_conv')
        self.__delattr__('fusion_bn')
        self.fusion_conv = []
        self.fusion_bn = []
        self.deploy = True
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

2-3 RepGhost Bottleneck

Bottleneck改进主要表现在通道数量上,并非偶然因素所至。由于Concat转为Add时会带来通道数量的变化。
研究者特别关注中间层的通道数量,并未影响整体结构。
其中输入和输出端的总数量保持不变

在这里插入图片描述

2-4 RepGhostNet

提出的RepGhostNet结构

在这里插入图片描述

3 Experiments

ImageNet与COCO数据集上的性能如下

在这里插入图片描述
在这里插入图片描述

全部评论 (0)

还没有任何评论哟~