遥感影像实例分割:Cascade Faster RCNN训练自己的数据

该方法基于RCNN原有的双阶段架构,在此基础上引入了多阶段网络。在传统设计中,在RoI Head模块中,为了对输入图像中的候选框进行分类判断。传统的做法是通过计算候选框与真实边界框之间的IoU值来实现这一分类。将候选框划分为正样本或负样本的标准是其IoU值是否超过0.5。然而研究者指出仅采用单个IoU阈值(如0.5)来划分候选框类型的效果不佳。归因于以下两个主要原因:第一,在单次检测中无法充分提取多级特征;第二,在后续检测阶段存在信息丢失的问题。
1)IOU的阈值为0.5会导致很多低质量预选框被作为正样本进入训练。
2)训练阶段可以设置IOU阈值对预选框进行采样,但是在推理阶段,因为没有真实边框,所以会把所有的预选框都作为正样本框输入进行推理,训练阶段输入的正样本框的质量要高于推理阶段输入的样本框的质量,这就是论文里提到的mismatch问题。
代码解读
那么我们是否可以直接提高IOU指标呢?论文作者指出,仅靠提高IOU指标就会遇到下面两个问题
1、增大阈值后得到的正样本数量会减少,出现过拟合。
2、训练阶段和推理阶段会出现更严重的mismatch的问题。
代码解读
针对上述问题,在RCNN架构中作者提出了一种Cascade RCNN网络,在其ROIHead模块中增加了Cascade式的结构设计,并依次提高了区分前景与背景对象IOU阈值的能力(其中三个阶段分别采用0.5、0.6和0.7的IOU阈值)。
一、Cascade Faster RCNN
1.1、Cascade RCNN
相较于传统Faster RCNN,在ROIHead模块重复执行三次以实现Cascade RoI Head功能。基于 torchv
ision 库中的 RoIHeads 模块
,
本人对 CascadeRCNN 模型的 RoIHead 部分进行了重写。
该代码块被放置于 RS_Detection-pytorch 文件夹下的 instance_detection 子目录中的 cascadercnn 子文件夹下的 roi_heads.py 文件中。
在初始化过程中需配置三个 proposal_matcher 实例,
这三个 proposal_matcher 实例分别用于标记候选框为前景或背景。
特别地,
CascadeRoIHead 类在初始化时需要三个 proposal_matcher 实例,
这三个 proposal_matcher 实例分别对应三个不同的 IOU 阈值(0.5、0.6 和 0.7)。
#fg_iou_thresh=[0.5,0.6,0.7]
#bg_iou_thresh=[0.5,0.6,0.7]
self.proposal_matchers=[]
for i in range(3):
proposal_matcher = det_utils.Matcher(
fg_iou_thresh[i],
bg_iou_thresh[i],
allow_low_quality_matches=False)
self.proposal_matchers.append(proposal_matcher)
代码解读
在roiheads模块中包含了边界框的解码与编码操作。其中包含一个称为weights的参数,在不同的工作阶段(stage)所对应的权重有所不同,在配置时需要分别设置这些权重参数。具体作用可参考该文章中的详细说明。
if bbox_reg_weights is None:
bbox_reg_weights = [(10., 10., 5., 5.),(20.,20.,10.,10.),(30.,30.,15.,15.)]
self.box_coders=[]
for i in range(3):
self.box_coders.append(det_utils.BoxCoder(bbox_reg_weights[i]))
代码解读
所有前向操作统一整合至forward_box函数体内。其中,在训练阶段中,则是依次采用上一阶段解码所得边界框作为候选区域来进行前景与背景区分。相关区域处理模块(如ROI池化层、ROI头和 ROI预测器)会依次执行相应的特征提取操作,并在各阶段分别计算出对应的分类损失以及边界框回归损失。在推理过程中,则会将三个独立阶段分类器预测结果取平均值得到综合得分,并将最后一步生成的边界由最后一个阶段的回归模块通过解码运算生成。
def _forward_box(self,features,proposals,image_sizes,targets,labels,regression_targets):
head_outputs=[]
prev_pred_boxes=None
proposal_per_img=[proposal.shape[0] for proposal in proposals]
for k in range(self.num_cascade_stages):
if k>0:
#把上一步生成的检测框clip到图像范围内,防止越界
prev_pred_boxes=prev_pred_boxes.split(proposal_per_img)
proposals=[]
for prev_pred_boxes_per_img,image_size in zip(prev_pred_boxes,image_sizes):
proposals.append(box_ops.clip_boxes_to_image(prev_pred_boxes_per_img,image_size))
if self.training:
proposals,matched_idxs,labels,regression_targets=self._match_and_label_boxes(proposals,k,targets)
else:
matched_idxs=None
labels=None
regression_targets=None
predictions=self._run_stage(features,proposals,k,image_sizes)
#box解码
prev_pred_boxes=self.box_coders[k].decode(predictions[1],proposals).squeeze()
head_outputs.append((predictions,labels,regression_targets,proposals))
if self.training:
losses={}
for stage,(predictions,labels,regression_targets,_) in enumerate(head_outputs):
class_logits,box_regression=predictions
loss_classfier,loss_box_reg=fastrcnn_loss(
class_logits,box_regression,labels,regression_targets
)
stage_losses={
"loss_classifier":loss_classfier,
"loss_box_reg":loss_box_reg
}
losses.update({k+"_stage{}".format(stage):v for k,v in stage_losses.items()})
return losses
else:
scores_per_stage=[F.softmax(head_output[0][0],-1) for head_output in head_outputs]
#3个stage的分值求平均值
scores=(scores_per_stage[0]+scores_per_stage[1]+scores_per_stage[2])/self.num_cascade_stages
#使用最后一个stage输出的boxes
predictions,_,_,proposals=head_outputs[-1]
boxes=self.box_coders[-1].decode(predictions[1],proposals).squeeze()
pred_instances=self.postprocess_detections(scores,boxes,proposals,image_sizes)
return pred_instances
代码解读
1.2、Cascade Faster RCNN
在构建Cascade Faster R-CNN网络的过程中,则必须替换原始Faster R-CNN中的区域建议生成器(rois_heads)。此外需要注意的是,在Cascade R-Cnn边框预测器中采用了针对类别未知的边缘检测器设计——即每个候选框仅涉及四个坐标参数;而对比而言,在常规FasterRCnn架构中这些候选框则与之相比,在常规架构下会包含更多坐标信息(具体来说是4倍于类别数量)
class FastRCNNPredictor(nn.Module):
"""
Standard classification + bounding box regression layers
for Fast R-CNN.
Arguments:
in_channels (int): number of input channels
num_classes (int): number of output classes (including background)
"""
def __init__(self, in_channels, num_classes):
super(FastRCNNPredictor, self).__init__()
self.cls_score = nn.Linear(in_channels, num_classes)
#self.bbox_pred = nn.Linear(in_channels ,4*num_classes) 原始Faster RCNN的边框回归操作,类别可知
self.bbox_pred = nn.Linear(in_channels ,4) #Cascade RCNN的边框回归操作,类别不可知。
def forward(self, x):
if x.dim() == 4:
assert list(x.shape[2:]) == [1, 1]
x = x.flatten(start_dim=1)
scores = self.cls_score(x)
bbox_deltas = self.bbox_pred(x)
return scores, bbox_deltas
代码解读
class FasterRCNN(GeneralizedRCNN):
def __init__(self, backbone, num_classes=None,
# transform parameters
min_size=800, max_size=1333,
image_mean=None, image_std=None,
# RPN parameters
rpn_anchor_generator=None, rpn_head=None,
rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,
rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,
rpn_nms_thresh=0.7,
rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,
rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,
# Box parameters
box_roi_pool=None, box_head=None, box_predictor=None,
box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100,
box_fg_iou_thresh=[0.5,0.6,0.7], box_bg_iou_thresh=[0.5,0.6,0.7],
box_batch_size_per_image=128, box_positive_fraction=0.25,
bbox_reg_weights=None):
if not hasattr(backbone, "out_channels"):
raise ValueError(
"backbone should contain an attribute out_channels "
"specifying the number of output channels (assumed to be the "
"same for all the levels)")
assert isinstance(rpn_anchor_generator, (AnchorGenerator, type(None)))
assert isinstance(box_roi_pool, (MultiScaleRoIAlign, type(None)))
if num_classes is not None:
if box_predictor is not None:
raise ValueError("num_classes should be None when box_predictor is specified")
else:
if box_predictor is None:
raise ValueError("num_classes should not be None when box_predictor "
"is not specified")
out_channels = backbone.out_channels
if rpn_anchor_generator is None:
anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
rpn_anchor_generator = AnchorGenerator(
anchor_sizes, aspect_ratios
)
if rpn_head is None:
rpn_head = RPNHead(
out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
)
rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)
rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)
rpn = RegionProposalNetwork(
rpn_anchor_generator, rpn_head,
rpn_fg_iou_thresh, rpn_bg_iou_thresh,
rpn_batch_size_per_image, rpn_positive_fraction,
rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)
if box_roi_pool is None:
box_roi_pool = MultiScaleRoIAlign(
featmap_names=['0', '1', '2', '3'],
output_size=7,
sampling_ratio=2)
if box_head is None:
resolution = box_roi_pool.output_size[0]
representation_size = 1024
box_heads=[]
for i in range(3):
box_head = TwoMLPHead(
out_channels * resolution ** 2,
representation_size)
box_heads.append(box_head)
box_heads=nn.ModuleList(box_heads)
if box_predictor is None:
representation_size = 1024
box_predictors=[]
for i in range(3):
box_predictor = FastRCNNPredictor(
representation_size,
num_classes)
box_predictors.append(box_predictor)
box_predictors=nn.ModuleList(box_predictors)
#替换为Cascade ROIHeads
roi_heads = CascadeRoIHeads(
# Box
box_roi_pool, box_heads, box_predictors,
box_fg_iou_thresh, box_bg_iou_thresh,
box_batch_size_per_image, box_positive_fraction,
bbox_reg_weights,
box_score_thresh, box_nms_thresh, box_detections_per_img)
if image_mean is None:
image_mean = [0.485, 0.456, 0.406]
if image_std is None:
image_std = [0.229, 0.224, 0.225]
transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)
super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)
代码解读
二、Colab测试
我们把代码和数据上传谷歌网盘,使用Colab提供的GPU进行测试。
loading annotations into memory...
Done (t=0.15s)
creating index...
index created!
loading annotations into memory...
Done (t=0.18s)
creating index...
index created!
FasterRCNN(
(transform): GeneralizedRCNNTransform(
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Resize(min_size=(800,), max_size=1333, mode='bilinear')
)
(backbone): BackboneWithFPN(
(body): IntermediateLayerGetter(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=1e-05)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=1e-05)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64, eps=1e-05)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256, eps=1e-05)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): FrozenBatchNorm2d(256, eps=1e-05)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=1e-05)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64, eps=1e-05)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256, eps=1e-05)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(64, eps=1e-05)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(64, eps=1e-05)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(256, eps=1e-05)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=1e-05)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=1e-05)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=1e-05)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(512, eps=1e-05)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=1e-05)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=1e-05)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=1e-05)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=1e-05)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=1e-05)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=1e-05)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(128, eps=1e-05)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(128, eps=1e-05)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(512, eps=1e-05)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(1024, eps=1e-05)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512, eps=1e-05)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512, eps=1e-05)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048, eps=1e-05)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d(2048, eps=1e-05)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512, eps=1e-05)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512, eps=1e-05)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048, eps=1e-05)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(512, eps=1e-05)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(512, eps=1e-05)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(2048, eps=1e-05)
(relu): ReLU(inplace=True)
)
)
)
(fpn): FeaturePyramidNetwork(
(inner_blocks): ModuleList(
(0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
)
(layer_blocks): ModuleList(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(extra_blocks): LastLevelMaxPool()
)
)
(rpn): RegionProposalNetwork(
(anchor_generator): AnchorGenerator()
(head): RPNHead(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
(bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
)
)
(roi_heads): CascadeRoIHeads(
(box_roi_pool): MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=(7, 7), sampling_ratio=2)
(box_head): ModuleList(
(0): TwoMLPHead(
(fc6): Linear(in_features=12544, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): TwoMLPHead(
(fc6): Linear(in_features=12544, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
(2): TwoMLPHead(
(fc6): Linear(in_features=12544, out_features=1024, bias=True)
(fc7): Linear(in_features=1024, out_features=1024, bias=True)
)
)
(box_predictor): ModuleList(
(0): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
)
(1): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
)
(2): FastRCNNPredictor(
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
)
)
)
)
start train
end train
Epoch: [0] [ 0/188] eta: 0:13:08 lr: 0.000032 loss: 3.5076 (3.5076) loss_classifier_stage0: 0.7077 (0.7077) loss_box_reg_stage0: 0.2480 (0.2480) loss_classifier_stage1: 0.7394 (0.7394) loss_box_reg_stage1: 0.1427 (0.1427) loss_classifier_stage2: 0.6609 (0.6609) loss_box_reg_stage2: 0.0788 (0.0788) loss_objectness: 0.6979 (0.6979) loss_rpn_box_reg: 0.2323 (0.2323) time: 4.1919 data: 0.6933 max mem: 7380
Epoch: [0] [ 1/188] eta: 0:24:39 lr: 0.000058 loss: 3.5076 (3.6157) loss_classifier_stage0: 0.7047 (0.7062) loss_box_reg_stage0: 0.2480 (0.3035) loss_classifier_stage1: 0.7370 (0.7382) loss_box_reg_stage1: 0.1427 (0.1680) loss_classifier_stage2: 0.6609 (0.6634) loss_box_reg_stage2: 0.0788 (0.0911) loss_objectness: 0.6970 (0.6974) loss_rpn_box_reg: 0.2323 (0.2478) time: 7.9141 data: 4.4158 max mem: 7759
Epoch: [0] [ 2/188] eta: 0:27:30 lr: 0.000085 loss: 3.5685 (3.6000) loss_classifier_stage0: 0.7074 (0.7066) loss_box_reg_stage0: 0.2876 (0.2982) loss_classifier_stage1: 0.7394 (0.7389) loss_box_reg_stage1: 0.1427 (0.1500) loss_classifier_stage2: 0.6627 (0.6632) loss_box_reg_stage2: 0.0788 (0.0854) loss_objectness: 0.6979 (0.6976) loss_rpn_box_reg: 0.2634 (0.2601) time: 8.8731 data: 5.3943 max mem: 7759
......
......
......
Epoch: [9] [184/188] eta: 0:00:17 lr: 0.000005 loss: 2.9140 (2.8897) loss_classifier_stage0: 0.2181 (0.2122) loss_box_reg_stage0: 0.4110 (0.3996) loss_classifier_stage1: 0.1722 (0.1778) loss_box_reg_stage1: 0.7743 (0.7939) loss_classifier_stage2: 0.1754 (0.1764) loss_box_reg_stage2: 0.9149 (0.9671) loss_objectness: 0.0527 (0.0611) loss_rpn_box_reg: 0.0787 (0.1015) time: 4.3346 data: 0.8974 max mem: 7763
Epoch: [9] [185/188] eta: 0:00:13 lr: 0.000005 loss: 2.8553 (2.8882) loss_classifier_stage0: 0.2145 (0.2120) loss_box_reg_stage0: 0.4029 (0.3996) loss_classifier_stage1: 0.1674 (0.1777) loss_box_reg_stage1: 0.7703 (0.7933) loss_classifier_stage2: 0.1560 (0.1762) loss_box_reg_stage2: 0.9141 (0.9667) loss_objectness: 0.0525 (0.0610) loss_rpn_box_reg: 0.0787 (0.1016) time: 4.3141 data: 0.8771 max mem: 7763
Epoch: [9] [186/188] eta: 0:00:08 lr: 0.000005 loss: 2.8040 (2.8865) loss_classifier_stage0: 0.2145 (0.2118) loss_box_reg_stage0: 0.3973 (0.3993) loss_classifier_stage1: 0.1674 (0.1776) loss_box_reg_stage1: 0.7559 (0.7929) loss_classifier_stage2: 0.1560 (0.1761) loss_box_reg_stage2: 0.8998 (0.9664) loss_objectness: 0.0519 (0.0610) loss_rpn_box_reg: 0.0787 (0.1015) time: 4.3106 data: 0.8744 max mem: 7763
Epoch: [9] [187/188] eta: 0:00:04 lr: 0.000005 loss: 2.8040 (2.8839) loss_classifier_stage0: 0.2145 (0.2117) loss_box_reg_stage0: 0.3942 (0.3988) loss_classifier_stage1: 0.1617 (0.1774) loss_box_reg_stage1: 0.7455 (0.7924) loss_classifier_stage2: 0.1553 (0.1758) loss_box_reg_stage2: 0.8969 (0.9655) loss_objectness: 0.0519 (0.0608) loss_rpn_box_reg: 0.0812 (0.1015) time: 4.2365 data: 0.8658 max mem: 7763
Epoch: [9] Total time: 0:13:34 (4.3340 s / it)
creating index...
index created!
Test: [ 0/40] eta: 0:01:54 model_time: 2.1751 (2.1751) evaluator_time: 0.0700 (0.0700) time: 2.8614 data: 0.6108 max mem: 7763
Test: [39/40] eta: 0:00:02 model_time: 2.0807 (2.0847) evaluator_time: 0.0797 (0.0778) time: 2.8834 data: 0.7142 max mem: 7763
Test: Total time: 0:01:53 (2.8462 s / it)
Averaged stats: model_time: 2.0807 (2.0847) evaluator_time: 0.0797 (0.0778)
Accumulating evaluation results...
DONE (t=0.13s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.517
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.814
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.585
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.199
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.640
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.620
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.088
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.499
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.598
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.709
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.715
代码解读
三、测试结果

