Focal and Global Knowledge Distillation for Detectors--FGD论文解读

阅读量：

论文：Focal and Global Knowledge Distillation for Detectors

一，针对问题

1. 目标检测中前背景不平衡问题

知识蒸馏的目标是让学生模仿教师的行为以获取知识以便生成与教师相同的结果从而提高自身的性能水平。研究者首先通过可视化手段展示了学生和教师在特征图上的异同从可视化结果可以看出在空间和通道注意力机制方面学生与教师之间存在明显的差异其中在空间注意力机制中学生与教师在前景区域的表现差异较大而在背景区域则表现较为接近这将导致不同层次的学习挑战

为了进一步探讨前景与后景在知识蒸馏中的作用, 研究者将前景与后景分开进行蒸馏实验. 若将全图统一进行蒸馏, 则会导致蒸馏性能显著下降. 分离前后景可使处理效果更加理想.

考虑到学生与教师注意力之间的差异性以及前景与背景之间的异质性, 作者提出了一种叫做Focal Distillation的重点蒸馏方法: 首先它通过实现前背景信息的有效区分, 并结合教师模型在空间维度上的位置感知能力和通道维度上的特征提取能力, 最终指导学习者完成知识迁移过程, 同时构建了基于区分度加权的重点蒸馏损失函数

二，方法

整体蒸馏损失计算方式:

C,H,W:feature map的通道时和高宽。

和

为教师和学生模型的输出。

2.1 分离前背景

前、背景Mask

设置一个二值MASK:

r代表GT bbox，如果feature map的点落在bbox内则该点为1，否则为0.

2.2 尺度

尺度Mask

大小目标focal，前、背景

Hr和Wr分别代表bounding box的高度和宽度，在存在遮挡的情况下（即同一个体若同时被归类到多个目标类别中），我们会优先采用具有最小面积的目标区域来计算S

2.2 空间与通道注意力

空间与通道注意力

C,H,W:feature map的通道时和高宽。

代表空间注意立和通道注意力机制，

Attention MASK:

T为蒸馏温度，论文设置为0.5

2.3 全局蒸馏

全局信息的丢失

Focal Distillation将前景和背景分别进行蒸馏，在切断前后景之间的联系中丢失了特征级别的整体信息。为此提出了一种基于全局蒸馏的方法：通过GcBlock分别提取学生与教师的信息，并用于计算整体蒸馏损失。

通过GCBlock获取全局信息，并使学生模型能够在教室模型的基础上学习前背景之间的联系

损失计算如下：

复制代码

  
    
     self.conv_mask_t = nn.Conv2d(teacher_channels, 1, kernel_size=1)
    
     self.channel_add_conv_s = nn.Sequential(
    
         nn.Conv2d(teacher_channels, teacher_channels//2, kernel_size=1),
    
         nn.LayerNorm([teacher_channels//2, 1, 1]),
    
         nn.ReLU(inplace=True),  # yapf: disable
    
         nn.Conv2d(teacher_channels//2, teacher_channels, kernel_size=1))
    
     self.channel_add_conv_t = nn.Sequential(
    
         nn.Conv2d(teacher_channels, teacher_channels//2, kernel_size=1),
    
         nn.LayerNorm([teacher_channels//2, 1, 1]),
    
         nn.ReLU(inplace=True),  # yapf: disable
    
         nn.Conv2d(teacher_channels//2, teacher_channels, kernel_size=1))
    
  
    
     def spatial_pool(self, x, in_type):
    
     batch, channel, width, height = x.size()
    
     input_x = x
    
     # [N, C, H * W]
    
     input_x = input_x.view(batch, channel, height * width)
    
     # [N, 1, C, H * W]
    
     input_x = input_x.unsqueeze(1)
    
     # [N, 1, H, W]
    
     if in_type == 0:
    
         context_mask = self.conv_mask_s(x)
    
     else:
    
         context_mask = self.conv_mask_t(x)
    
     # [N, 1, H * W]
    
     context_mask = context_mask.view(batch, 1, height * width)
    
     # [N, 1, H * W]
    
     context_mask = F.softmax(context_mask, dim=2)
    
     # [N, 1, H * W, 1]
    
     context_mask = context_mask.unsqueeze(-1)
    
     # [N, 1, C, 1]
    
     context = torch.matmul(input_x, context_mask)
    
     # [N, C, 1, 1]
    
     context = context.view(batch, channel, 1, 1)
    
  
    
     return context
    
  
    
    
    
     def get_rela_loss(self, preds_S, preds_T):
    
     loss_mse = nn.MSELoss(reduction='sum')
    
  
    
     context_s = self.spatial_pool(preds_S, 0)
    
     context_t = self.spatial_pool(preds_T, 1)
    
  
    
     out_s = preds_S
    
     out_t = preds_T
    
  
    
     channel_add_s = self.channel_add_conv_s(context_s)
    
     out_s = out_s + channel_add_s
    
  
    
     channel_add_t = self.channel_add_conv_t(context_t)
    
     out_t = out_t + channel_add_t
    
  
    
     rela_loss = loss_mse(out_s, out_t)/len(out_s)
    
     
    
     return rela_loss
    
  
    
    
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-18/gZOumyTCdNcD3G8pVtjP2oar6XLM.png)

2.4 最终Loss

alpha=0.001，beta=0.0005

除此之外，利用

通过注意力损失项引导学生模型模仿教师模型的空间和通道注意力Mask。

gamma=0.0005.

最终loss

lambda=0.000005

关于超参

最终效果：

全部评论 (0)

还没有任何评论哟~

Focal and Global Knowledge Distillation for Detectors--FGD论文解读

论文：FocalandGlobalKnowledgeDistillationforDetectors 论文：https://arxiv.org/abs/2111.11837 代码：https://gi...

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)

1\.Motivation 本文作者指出，在目标检测中，tea和stu之间的特征在不同的区域例如前后景的差别是比较大的。 Inthispaper,wepointoutthatinobjectdetec...

Focal and Global Knowledge Distillation forDetectors

摘要文章指出，在目标检测中，教师和学生在不同领域的特征差异很大，尤其是在前景和背景中。如果我们平等地蒸馏它们，特征图之间的不均匀差异将对蒸馏产生负面影响。因此，我们提出了局部和全局蒸馏。局部蒸馏分离...

Featured Based知识蒸馏及代码(3): Focal and Global Knowledge (FGD)

文章目录 1\.摘要 2\.FocalandGlobal蒸馏的原理 2.1常规的featurebased蒸馏算法 2.2FocalDistillation 2.3GlobalDistillation ...

Focal and Global Knowledge Distillation——目标检测网络的知识蒸馏

Paper地址：<https://arxiv.org/abs/2111.11837 GitHub链接：<https://github.com/yzdv/FGD 方法 FGKD（FocalandGlob...

[读论文][backbone][DiffKD] Knowledge Diffusion for Distillation

DiffKD 摘要 TherepresentationgapbetweenteacherandstudentisanemergingtopicinknowledgedistillationKD. To...

论文解读：Decoupled Knowledge Distillation

1\.论文基本信息论文：DecoupledKnowledgeDistillation 地址：https://arxiv.org/pdf/2203.08679.pdf 代码：https://githu...

Adaptive Knowledge Distillation for Lightweight Remote Sensing Object Detectors Optimizing

摘要目前，轻型目标检测在遥感领域得到越来越多的应用。一般来说，轻量级检测器很难达到与传统深度模型相比具有竞争力的性能，而知识蒸馏是解决这一问题的一种很有前途的训练方法。由于遥感图像背景较为复杂，目标...

读论文《PROTOTYPE KNOWLEDGE DISTILLATION FOR MEDICAL SEGMENTATION WITH MISSING MODALITY》

论文题目：缺失模态医学分割的原型知识蒸馏论文地址：2303.09830arxiv.org 项目地址：https://github.com/SakurajimaMaiii/ProtoKD 摘要：多...

[2022AAAI]Knowledge Distillation for Object Detection via Rank Mimicking and ... 论文笔记

目录摘要 Method Rankmimicking PredictionguidedFeatureImitation 总损失实验总结论文全名有点长，题目放不下了： KnowledgeDisti...

是否确定退出登录?

Focal and Global Knowledge Distillation for Detectors--FGD论文解读

一，针对问题

1. 目标检测中前背景不平衡问题

二，方法

2.1 分离前背景

前、背景Mask

2.2 尺度

尺度Mask

2.2 空间与通道注意力

空间与通道注意力

2.3 全局蒸馏

全局信息的丢失

2.4 最终Loss

关于超参

最终效果：

全部评论 (0)

相关文章推荐

Focal and Global Knowledge Distillation for Detectors--FGD论文解读

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)

Focal and Global Knowledge Distillation forDetectors

Featured Based知识蒸馏及代码(3): Focal and Global Knowledge (FGD)

Focal and Global Knowledge Distillation——目标检测网络的知识蒸馏

[读论文][backbone][DiffKD] Knowledge Diffusion for Distillation

论文解读：Decoupled Knowledge Distillation

Adaptive Knowledge Distillation for Lightweight Remote Sensing Object Detectors Optimizing

读论文《PROTOTYPE KNOWLEDGE DISTILLATION FOR MEDICAL SEGMENTATION WITH MISSING MODALITY》

[2022AAAI]Knowledge Distillation for Object Detection via Rank Mimicking and ... 论文笔记