[CenterMask]CenterMask: Real-Time Anchor-Free Instance Segmentation(CVPR. 2020)

1. Contribution
We design a scale-adaptive RoI assignment function that considers the input scale and is a more suitable one-stage object detector.
We also propose a more effective backbone network VoVNetV2 based on VoVNet, which shows better performance and faster speed than ResNet [10] and DenseNet [14] due to its One-shot Aggregation (OSA).
We add the residual connection into each OSA module to ease optimization, which makes the VoVNet deeper and in turn, boosts the performance.
Squeeze-Excitation (SE) reduces computational burden and unexpectedly causes channel information loss. Thus, we re-design the SE module as effective SE (eSE) replacing the two FC layers with one FC layer maintaining chan- nel dimension, which prevents the information loss and in turn, improves the performance.
2. Centermask

2.1 Architecture
CenterMask consists of three-part:
backbone for feature extraction;
FCOS detection head;
mask head;
2.2 Adaptive RoI Assignment Function
After object proposals are predicted in the FCOS box head, CenterMask predicts segmentation masks using the predicted box regions in the same vein as Mask R- CNN.(这里需要补充,拿到FCOS网络预测的bbox后,是要对应到原图上,相当于Mask R-CNN系列预设的‘anchors’,因此在FCOS后面的分支还会继续连接FPN,所以有了此动机。)
Equation 1 is not suitable for CenterMask based one-stage detector because of two reasons.

where k0 is 4 and w, h are the width and height of the each RoI.
- First, Equation 1 is tuned to two-stage detectors (e.g.,FPN) that use different feature levels compared to one-stage de- tectors (e.g, FCOS, RetinaNet).
- Besides, the canonical ImageNet pretraining size 224 in Equation 1 is hard-coded and not adaptive to feature scale variation.
Therefore, we define Equation 2 as a new RoI assignment function suited for CenterMask based one-stage detectors.
K_{max} =7,A_{input}和A_{RoI} are area of input image and the RoI.
2.3 Spatial Attention-Guided Mask
To exploit the spatial attention map A_{sag}(X_i) \in R^{1 \times W \times H}。


2.4 VoVNetV2 backbone

2.4.1 Residual connection
2.4.2 Effective Squeeze-Excitation(eSE)
A_{eSE} \in R^{C \times 1 \times 1} , the eSE process is defined as:

