【ASFF】《Learning Spatial Fusion for Single-Shot Object Detection》

阅读量：

arXiv-2019

文章目录

【Background and Motivation

复制代码

* 4.3 Consistency Property

第五个实验部分
- 第五个子部分：数据集与评估指标
  - 第五个子部分：消融研究
  - 第五个子部分：针对其他单目标检测器的评估
  - 第五个子部分：与最新技术对比
- 6 Conclusion（own） / Future work

1 Background and Motivation

在目标检测任务中，特征金字塔技术有助于缓解同一类物体尺寸不同的问题，在这一过程中被广泛应用于提升模型的鲁棒性

不同尺度特征的一致性问题主要影响了基于特征金字塔的单次检测器的性能。对于基于特征金字塔的单次检测器而言，在不同尺度下的物体一致性问题是一个主要局限性之一。对于同一类别的人工目标物体而言，在特征金字塔的不同层级上有各自的表征；然而由于尺寸差异等原因，在各个层级上的表征并不具备强制性的协同机制。

when an image includes both small and large objects, the inconsistency among features at different levels tends to occupy the major portion of the feature pyramid

作者提出基于自适应空间特征融合的ASFF框架，并通过增强尺度不变性显著提升了模型性能，在推断过程中几乎无需额外计算资源

Feature pyramid representations or multi-level feature

still suffer from the inconsistency across different scales

作者的方法

The model enables adaptive learning of importance measures for multiple feature hierarchies across various spatial scales while avoiding spatial inconsistency.

3 Advantages / Contributions

引入ASFF模块（Instantiable and Cost-free），以增强特征金字塔的能力（Pyramid），从而解决单阶段检测器中特征金字塔不一致的问题。

在 COCO 数据集上验证了其有效性

4 Method

4.1 Strong Baseline

开源的 yolov3 基础上，引入了一些比较好的 trick，效果提升明显

BoF 是 Bag of freebies

该研究团队开发了一种方法，在 bag of freebies 中进行 object detection neural networks 的训练

GA 是 guided anchoring

Wang J, Chen K, Yang S, et al. Region proposals guided by anchoring strategies[C]//Proceedings of the IEEE/CVF meetings on computer vision and pattern recognition. 2019: 2965-2974.

细节可以跳转到本博客最后总结部分

IoU 指的是额外引入了 IoU loss

4.2 Adaptively Spatial Feature Fusion

dynamically adjust learning of spatial attention weights for feature map fusion across different scales

本文的核心

4.2.1 Feature Resizing

下采样 2 倍时，2-stride 1x1 convolution

进行4倍的下采样时，在执行2-stride的卷积操作之前引入一个2-stride的最大池化层。

上采样用的插值

4.2.2 Adaptive Fusion

核心公式

Let $x_{ij}^{n \rightarrow l}$ denote the feature vector corresponding to position $(i,j)$ on the resized feature maps from level $n$ to level $l$ .

特征金字塔经调整后统一到同一尺度后进行融合计算，在这一过程中权重参数是由学习算法得出并被所有通道共同使用，并具有类似于空间注意力机制的特点

$\alpha_{ij}^{l} + \beta_{ij}^{l} + \gamma_{ij}^{l} = 1$

加权系数约束到了和为1，实现的话就是 softmax

$\lambda$ 为 control parameters——代码中好像没有体现

看看代码

复制代码

    class ASFF(nn.Module):
    def __init__(self, level, rfb=False, vis=False):
        super(ASFF, self).__init__()
        self.level = level
        self.dim = [512, 256, 256]
        self.inter_dim = self.dim[self.level]
        if level==0:
            self.stride_level_1 = add_conv(256, self.inter_dim, 3, 2)
            self.stride_level_2 = add_conv(256, self.inter_dim, 3, 2)
            self.expand = add_conv(self.inter_dim, 1024, 3, 1)
        elif level==1:
            self.compress_level_0 = add_conv(512, self.inter_dim, 1, 1)
            self.stride_level_2 = add_conv(256, self.inter_dim, 3, 2)
            self.expand = add_conv(self.inter_dim, 512, 3, 1)
        elif level==2:
            self.compress_level_0 = add_conv(512, self.inter_dim, 1, 1)
            self.expand = add_conv(self.inter_dim, 256, 3, 1)
    
        compress_c = 8 if rfb else 16  #when adding rfb, we use half number of channels to save memory
    
        self.weight_level_0 = add_conv(self.inter_dim, compress_c, 1, 1)
        self.weight_level_1 = add_conv(self.inter_dim, compress_c, 1, 1)
        self.weight_level_2 = add_conv(self.inter_dim, compress_c, 1, 1)
    
        self.weight_levels = nn.Conv2d(compress_c*3, 3, kernel_size=1, stride=1, padding=0)
        self.vis= vis
    
    
    def forward(self, x_level_0, x_level_1, x_level_2):
        if self.level==0:
            level_0_resized = x_level_0
            level_1_resized = self.stride_level_1(x_level_1)
    
            level_2_downsampled_inter =F.max_pool2d(x_level_2, 3, stride=2, padding=1)
            level_2_resized = self.stride_level_2(level_2_downsampled_inter)
    
        elif self.level==1:
            level_0_compressed = self.compress_level_0(x_level_0)
            level_0_resized =F.interpolate(level_0_compressed, scale_factor=2, mode='nearest')
            level_1_resized =x_level_1
            level_2_resized =self.stride_level_2(x_level_2)
        elif self.level==2:
            level_0_compressed = self.compress_level_0(x_level_0)
            level_0_resized =F.interpolate(level_0_compressed, scale_factor=4, mode='nearest')
            level_1_resized =F.interpolate(x_level_1, scale_factor=2, mode='nearest')
            level_2_resized =x_level_2
    
        level_0_weight_v = self.weight_level_0(level_0_resized) # 缩放后的特征图压缩成 16 通道
        level_1_weight_v = self.weight_level_1(level_1_resized) # 缩放后的特征图压缩成 16 通道
        level_2_weight_v = self.weight_level_2(level_2_resized) # 缩放后的特征图压缩成 16 通道
        levels_weight_v = torch.cat((level_0_weight_v, level_1_weight_v, level_2_weight_v),1) # concat 在一起
        levels_weight = self.weight_levels(levels_weight_v) # 变成 3 通道
        levels_weight = F.softmax(levels_weight, dim=1) # 沿通道做 softmax
    
        fused_out_reduced = level_0_resized * levels_weight[:,0:1,:,:]+\
                            level_1_resized * levels_weight[:,1:2,:,:]+\
                            level_2_resized * levels_weight[:,2:,:,:]  # 与缩放后的特征图加权在一起
    
        out = self.expand(fused_out_reduced) # 扩充通道数
    
        if self.vis:
            return out, levels_weight, fused_out_reduced.sum(dim=1)
        else:
            return out
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-18/e4dFZ521J0MKGsgYupPOkLlaHAN6.png)

其中 add_conv 定义如下

复制代码

    def add_conv(in_ch, out_ch, ksize, stride, leaky=True):
    """
    Add a conv2d / batchnorm / leaky ReLU block.
    Args:
        in_ch (int): number of input channels of the convolution layer.
        out_ch (int): number of output channels of the convolution layer.
        ksize (int): kernel size of the convolution layer.
        stride (int): stride of the convolution layer.
    Returns:
        stage (Sequential) : Sequential layers composing a convolution block.
    """
    stage = nn.Sequential()
    pad = (ksize - 1) // 2
    stage.add_module('conv', nn.Conv2d(in_channels=in_ch,
                                       out_channels=out_ch, kernel_size=ksize, stride=stride,
                                       padding=pad, bias=False))
    stage.add_module('batch_norm', nn.BatchNorm2d(out_ch))
    if leaky:
        stage.add_module('leaky', nn.LeakyReLU(0.1))
    else:
        stage.add_module('relu6', nn.ReLU6(inplace=True))
    return stage
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-18/8q930Um71tljQckIHJCAxgsp4KaO.png)

level = 0

level = 1

level = 2

4.3 Consistency Property

反向传播推导推导

简化一下

觉得在对图像进行缩放操作的时候如果用了卷积层配合激活函数的话，并不怎么容易实现简化的呀。

进一步简化，当多个特征图融合的方式为 add 或者 concat 的时候

结果为

作者方法的反向传播公式为

这样通过设置 $\alpha$ 就可以避免各 level 梯度的影响

比如目标由 level 1 负责预测， $\alpha_{ij}^1 = 1$ , $\alpha_{ij}^2 = 0$ ， $\alpha_{ij}^3 = 1$

5 Experiments

5.1 Datasets and Metrics

MS COCO 2017

5.2 Ablation Study

（1）Solid Baseline

Table1，前面第四小节已介绍过了

（2） Effectiveness of Adjacent Ignore Regions

之前在讨论梯度时提到使用 "ignor" 并不理想,而这次遇到的是 "ignore area",这可能意味着我的理解还不够深入,因此建议进一步查阅相关文献并研究代码实现来加深对这个问题的理解

（3）Adaptively Spatial Feature Fusion

exhibit the images that have several objects of different sizes

5.3 Evaluation on Other Single-Shot Detector

体现了其即插即用

5.4 Comparison to State of the Art

6 Conclusion（own） / Future work

fitted to identify the optimal fusion
- characterized as differentiable（可微分的）

Wang J, Chen K, Yang S, et al. Region proposals based on guided anchoring[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2965-2974.

目标检测正反样本分类与平衡策略归纳（三）

全部评论 (0)

还没有任何评论哟~

Learning Spatial Fusion for Single-Shot Object Detection（ASFF）

LearningSpatialFusionforSingleShotObjectDetection（ASFF） paper https://github.com/ruinmessi/ASFF Abst...

【ASFF】《Learning Spatial Fusion for Single-Shot Object Detection》

arXiv2019 https://github.com/GOATmessi7/ASFF 文章目录 1BackgroundandMotivation 2RelatedWork 3Advantages/...

[目标检测]--YOLOV3+ASFF-Learning Spatial Fusion for Single-Shot Object Detection

创新点：因为论文思想比较简单，我直接看了创新点，主要是为了解决FPN多层间不同特征尺度之间的不一致性问题，所以作者提出了对FPN新的融合方法：具体可以看图，首先通过FPN产生level1level...

ASFF：Learning Spatial Fusion for Single-Shot Object Detection

ASFF：LearningSpatialFusionforSingleShotObjectDetection 论文：LearningSpatialFusionforSingleShotObjectDe...

Learning Spatial Fusion for Single-Shot Object Detection 论文笔记

前言特征金字塔（比如FPN）常被用来处理目标检测时的多尺度问题，但它有一个缺点，就是不同尺度之间的不一致性，尤其体现在singleshot检测器中。具体来说就是，当使用特征金字塔检测目标时，通常使用...

Learning Spatial Fusion for Single-Shot Object Detection one-stage-yolo优化

paper：<https://arxiv.org/abs/1911.09516 code：<https://github.com/ruinmessi/ASFF 此篇文章在coco上基于yolov3的b...

【论文笔记】：ASFF:Learning Spatial Fusion for Single-Shot Object Detection

ASFF:Learning Spatial Fusion for Single-Shot Object Detection 代码 &Summary 不同特征尺度之间的不一致性是基于特征金字塔...

深度学习论文:Learning Spatial Fusion for Single-Shot Object Detection及其PyTorch实现

LearningSpatialFusionforSingleShotObjectDetection PDF:<https://arxiv.org/pdf/1911.09516.pdf PyTorch代...

Learning Rich Features at High-Speed for Single-Shot Object Detection

LearningRichFeaturesatHighSpeedforSingleShotObjectDetection abstract 单级目标检测方法因其具有实时性强、检测精度高等特点，近年来受到...

Scale-Transferrable Object Detection & FSSD: Feature Fusion Single Shot Multibox Detector

1、ScaleTransferrableObjectDetection上海交大，物体检测，cvpr2018，网络结构名STDN（尺度可变的物体检测）。主要是将SSD中的vgg模型更改为了densen...

是否确定退出登录?

【ASFF】《Learning Spatial Fusion for Single-Shot Object Detection》

文章目录

1 Background and Motivation

2 Related Work

3 Advantages / Contributions

4 Method

4.1 Strong Baseline

4.2 Adaptively Spatial Feature Fusion

4.2.1 Feature Resizing

4.2.2 Adaptive Fusion

4.3 Consistency Property

5 Experiments

5.1 Datasets and Metrics

5.2 Ablation Study

5.3 Evaluation on Other Single-Shot Detector

5.4 Comparison to State of the Art

6 Conclusion（own） / Future work

全部评论 (0)

相关文章推荐

Learning Spatial Fusion for Single-Shot Object Detection（ASFF）

【ASFF】《Learning Spatial Fusion for Single-Shot Object Detection》

[目标检测]--YOLOV3+ASFF-Learning Spatial Fusion for Single-Shot Object Detection

ASFF：Learning Spatial Fusion for Single-Shot Object Detection

Learning Spatial Fusion for Single-Shot Object Detection 论文笔记

Learning Spatial Fusion for Single-Shot Object Detection one-stage-yolo优化

【论文笔记】：ASFF:Learning Spatial Fusion for Single-Shot Object Detection

深度学习论文:Learning Spatial Fusion for Single-Shot Object Detection及其PyTorch实现

Learning Rich Features at High-Speed for Single-Shot Object Detection

Scale-Transferrable Object Detection & FSSD: Feature Fusion Single Shot Multibox Detector