Advertisement

[CenterMask]CenterMask: Real-Time Anchor-Free Instance Segmentation(CVPR. 2020)

阅读量:
image-20210609091247451

1. Contribution

We design a scale-adaptive RoI assignment function that considers the input scale and is a more suitable one-stage object detector.

We also propose a more effective backbone network VoVNetV2 based on VoVNet, which shows better performance and faster speed than ResNet [10] and DenseNet [14] due to its One-shot Aggregation (OSA).

We add the residual connection into each OSA module to ease optimization, which makes the VoVNet deeper and in turn, boosts the performance.

Squeeze-Excitation (SE) reduces computational burden and unexpectedly causes channel information loss. Thus, we re-design the SE module as effective SE (eSE) replacing the two FC layers with one FC layer maintaining chan- nel dimension, which prevents the information loss and in turn, improves the performance.

2. Centermask

image-20210609092056681

2.1 Architecture

CenterMask consists of three-part:

backbone for feature extraction;

FCOS detection head;

mask head;

2.2 Adaptive RoI Assignment Function

After object proposals are predicted in the FCOS box head, CenterMask predicts segmentation masks using the predicted box regions in the same vein as Mask R- CNN.(这里需要补充,拿到FCOS网络预测的bbox后,是要对应到原图上,相当于Mask R-CNN系列预设的‘anchors’,因此在FCOS后面的分支还会继续连接FPN,所以有了此动机。)

Equation 1 is not suitable for CenterMask based one-stage detector because of two reasons.
image-20210609092633656

​ where k0 is 4 and w, h are the width and height of the each RoI.

  • First, Equation 1 is tuned to two-stage detectors (e.g.,FPN) that use different feature levels compared to one-stage de- tectors (e.g, FCOS, RetinaNet).
  • Besides, the canonical ImageNet pretraining size 224 in Equation 1 is hard-coded and not adaptive to feature scale variation.

Therefore, we define Equation 2 as a new RoI assignment function suited for CenterMask based one-stage detectors.
image-20210609092927357 K_{max} =7A_{input}A_{RoI} are area of input image and the RoI.

2.3 Spatial Attention-Guided Mask

To exploit the spatial attention map A_{sag}(X_i) \in R^{1 \times W \times H}
image-20210609095526208
image-20210609095828874

2.4 VoVNetV2 backbone

image-20210609100745422

2.4.1 Residual connection

2.4.2 Effective Squeeze-Excitation(eSE)

A_{eSE} \in R^{C \times 1 \times 1} , the eSE process is defined as:
image-20210609101941091

全部评论 (0)

还没有任何评论哟~