Advertisement

Few-Shot Object Detection via Classification Refinement and Distractor Retreatment(CVPR. 2021)

阅读量:
image-20210804203221316

1. Motivation

The current state-of-the-art approach TFA [17] is still far away from satisfaction compared with those general data-abundant detection tasks

Given the fact that TFA is IOU-aware but less semantic discriminative, our key insight is to enhance the original classification results by injecting additional category-discriminative information.

we focus on a unique but practically-existed problem of FSOD in this work the presence of distractor samples due to the incomplete annotations, where objects belonging to novel classes can exist in the base set but remain unlabelled.

本文认为TFA可以从2个方面改进①IOU awareness,指的是对于hard negative的鲁棒性;②category discriminability,指的是不同类别之间confusion。

  • IOU awareness, i.e., robustness to hard negatives
  • category discriminability, i.e., robustness to category confusion.

作者使用的方法是,对于某一个object的预测得分(IOU = 0.4的poor box),对于IOU awareness,,将gt 类别的分类得分消除,其余类别不变;对于category discriminability,将除了gt以外的分类得分消除,gt得分不变。

这里想了很久,感觉需要结合focal loss分析,如果是第一种情况,那么正确类别的得分为0,loss_+就很大,那么网络就倾向于学习这种false positive样本。

如果是第二种情况,那么除了正确类别以外的得分都置0,那么loss_{\_}就是0了,也还是网络会倾向于优化 gt类别。
loss_+ = y \times log(\hat y) \\ loss_{\_} = (1-y) \times log(1 - \hat y)

本文通过图1的实验,分别对2个方面进进行消融实验,得出TFA是IOU-aware 但是less discriminative。
image-20210804221037373

2. Contribution

​ 本文提出了FSCN的结构,来改善FSOD的性能,correction network用于消除category confusion带来的false positive。

We explore the limitations of the classifier rebalancing method (TFA) for FSOD problems and propose a novel few-shot classification refinement framework for exhaustively boosting its FSOD performance. A novel few-shot correction network is developed to achieve great semantic discriminability so as to eliminate false positives caused by category confusion.

在fine-tune过程中使用CGDP方法。

We are the first to address the destructive distractor issue for FSOD. Instead of blindly treating it, a confidence-guided filtering strategy is proposed to exclude the distractors for base detector fine-tuning.

A s emi-supervised distractor utilization strategy is proposed to cooperate with FSCN, which not only stabilizes the training process but also significantly promotes the learning on data-scarce novel classes with no extra annotation cost.

Our proposed FSOD framework achieves the state-of- the-art results in various datasets with remarkable few- shot performance and knowledge retention ability.

3. Method

image-20210806115419580

3.1 Problem Definition

  • The definition of the distractor phenomenon in FSOD is that some images {I^{bs}_i}​ in D_{bs}​ may possibly contain unlabel objects belonging to C_{nv}​.

本文指出,有些D_b中的图片,没有标注D_n​​的class,也就是在第一阶段base detector training的训练中被当做为背景;但是在第二阶段fine-tuning中,如果还是没有标注,第二阶段的是训练集是novel + base classes,那么第二阶段就会造成性能的下降(fine-tune过程中原来在base images当做背景的novel class就不能当做背景了)

  • Therefore, handling the distractor through the delicate algorithm is of great significance to avoid the huge annotation cost and improve the FSOD performance.。

3.2. Framework Overview with Few-Shot Classification Refinement

整个网络结构如图3所示,其中,主要分为了base detectorF_d(\cdot)以及the FSCNF_r

F_r的输入是base detector中的box 回归分支得到的image, I_p = Cr(I, p)F_r分类输出的是N_t + 1, N_t = N_{bs}+ N_{nv}
image-20210806160201672

其中s_r​​是对于所有类别+1的置信度得分,整个的核心思想在于通过FSCN来加强base detector,从而加强proposal 的分类能力。对两类的分类得分做一个加权的操作:
image-20210806160832046

3.3 Few-Shot Correction Network (FSCN)

FSCN包含2个部分,a feature extractor\varphi _v和a linear classifie\varphi _w

CGNL增大FSCN的感受野。

  • In this work, a Compact Generalized Non-Local (CGNL) module [21] is equipped with FSCN to achieve global receptive field

使用cosine similarity metric

  • In this work, we introduce cosine similarity metric into FSCN, which can well encourage the unified recogni- tion over all classes.

3.3.1 Network Description

image-20210806162929104

3.3.2 Weight Imprinting for Novel Classes

本文使用加权平均的方法来使得FSCN从base 到 novel calsses。将w从\{w_j\}^{N_{bs}+1}_{j=1}​到\{w_j \}^{N_{bs}+N_{nv}+1}_{j=1}进行增加。

  • an intuitive way to set their weights is to average the corresponding normalized feature vectors zp
    image-20210806163449481

3.4. Semi-Supervised Distractor Utilization Loss

这个loss是用在FSCN中的。

  • we formulate it as a semi-supervised learning problem and tackle it with the pseudo hard labeling technique.
  • Specifically, given a background proposal I_{bp} from D_{bs}​, prediction confidence。
    image-20210806165301757

但是如果全把背景类当做是正样本,那么在训练中就没有负样本了。因此把原始的背景类和生成的pseudo classc_p​取并集,生成c_b^+

  • However, if all the background proposals are labeled as positive samples of novel classes, there will be no nega- tive samples for FSCN training.
  • To address this issue, we further introduce a new concept of background augmentation, which defines a Augmented Background class by merging the original background class C_b with the generated pseudo class C_p​, define as C^+_{b}:
    image-20210806165318618 image-20210806171124565

distractor utilzation loss 定义如下:

其中,只有在fine-tune过程中的D_{bs}才使用DULoss,而对于D_{nv}来说,还是使用普通的cross entropy。
image-20210806172628880

此外,还提出了UOM 无标注挖掘的方法来自动选择好置信度的无标签物体,具体公式如11所示,类似于IOU方法。

  • we propose an unlabelled object mining (UOM) strategy to automatically select the highpossibility unlabelled objects
  • a spatial metric M_{sp} is developed for performing effective training sample selection
    image-20210806173118120

3.5. Confidence-Guided Dataset Pruning (CGDP)

CGDP是在base detector中添加的方法,具体分为2个阶段,第一个阶段是indicator learning stage,第二个阶段是pruning stage。

CGDP的动机在于从D_{bs}去除干扰物,组成一个干净的子集,来促进对少样本的泛化,利用自监督的方法来清洗干扰样本。

The motivation for CGDP is to form a small clean subset with less distractors from D_{bs} to facilitate the base detector few-shot adaptation.

we aim at developing an automatic pipeline by taking the advantage of self-supervision to effectively clean the distractor samples.

Basically, our proposed CGDP is a two-stage process which consists of the indicator learning stage and the dataset pruning stage.

查询函数Q用来估计一张图片中有干扰物体的似然。

The proposed query function Q(·) that estimates the likelihood of an image to have distractors is defined as
image-20210806175519525 image-20210806180106226

  • we construct a class-specific data pool for each category:
    image-20210806180111510

挑选top m个图片来组成clean set D_cln

From each data pool, we select its top m samples which have the lowest likelihood in order to form the clean balanced training set D_{cln}.

Overall, the proposed CGDP pipeline can be formulated as:
image-20210806180332257

全部评论 (0)

还没有任何评论哟~