Advertisement

遥感影像实例分割:Cascade Faster RCNN训练自己的数据

阅读量:
在这里插入图片描述

该方法基于RCNN原有的双阶段架构,在此基础上引入了多阶段网络。在传统设计中,在RoI Head模块中,为了对输入图像中的候选框进行分类判断。传统的做法是通过计算候选框与真实边界框之间的IoU值来实现这一分类。将候选框划分为正样本或负样本的标准是其IoU值是否超过0.5。然而研究者指出仅采用单个IoU阈值(如0.5)来划分候选框类型的效果不佳。归因于以下两个主要原因:第一,在单次检测中无法充分提取多级特征;第二,在后续检测阶段存在信息丢失的问题。

复制代码
    1)IOU的阈值为0.5会导致很多低质量预选框被作为正样本进入训练。
    2)训练阶段可以设置IOU阈值对预选框进行采样,但是在推理阶段,因为没有真实边框,所以会把所有的预选框都作为正样本框输入进行推理,训练阶段输入的正样本框的质量要高于推理阶段输入的样本框的质量,这就是论文里提到的mismatch问题。
    
    
      
      
    
    代码解读

那么我们是否可以直接提高IOU指标呢?论文作者指出,仅靠提高IOU指标就会遇到下面两个问题

复制代码
    1、增大阈值后得到的正样本数量会减少,出现过拟合。
    2、训练阶段和推理阶段会出现更严重的mismatch的问题。
    
    
      
      
    
    代码解读

针对上述问题,在RCNN架构中作者提出了一种Cascade RCNN网络,在其ROIHead模块中增加了Cascade式的结构设计,并依次提高了区分前景与背景对象IOU阈值的能力(其中三个阶段分别采用0.5、0.6和0.7的IOU阈值)。

一、Cascade Faster RCNN

1.1、Cascade RCNN

相较于传统Faster RCNN,在ROIHead模块重复执行三次以实现Cascade RoI Head功能。基于 torchv
ision 库中的 RoIHeads 模块

本人对 CascadeRCNN 模型的 RoIHead 部分进行了重写。
该代码块被放置于 RS_Detection-pytorch 文件夹下的 instance_detection 子目录中的 cascadercnn 子文件夹下的 roi_heads.py 文件中。
在初始化过程中需配置三个 proposal_matcher 实例,
这三个 proposal_matcher 实例分别用于标记候选框为前景或背景。
特别地,
CascadeRoIHead 类在初始化时需要三个 proposal_matcher 实例,
这三个 proposal_matcher 实例分别对应三个不同的 IOU 阈值(0.5、0.6 和 0.7)。

复制代码
    		#fg_iou_thresh=[0.5,0.6,0.7]
    		#bg_iou_thresh=[0.5,0.6,0.7]
        self.proposal_matchers=[]
        for i in range(3):
            proposal_matcher = det_utils.Matcher(
                fg_iou_thresh[i],
                bg_iou_thresh[i],
                allow_low_quality_matches=False)
            self.proposal_matchers.append(proposal_matcher)
    
    
      
      
      
      
      
      
      
      
      
    
    代码解读

在roiheads模块中包含了边界框的解码与编码操作。其中包含一个称为weights的参数,在不同的工作阶段(stage)所对应的权重有所不同,在配置时需要分别设置这些权重参数。具体作用可参考该文章中的详细说明。

复制代码
        if bbox_reg_weights is None:
            bbox_reg_weights = [(10., 10., 5., 5.),(20.,20.,10.,10.),(30.,30.,15.,15.)]
        self.box_coders=[]
        for i in range(3):
            self.box_coders.append(det_utils.BoxCoder(bbox_reg_weights[i]))
    
    
      
      
      
      
      
    
    代码解读

所有前向操作统一整合至forward_box函数体内。其中,在训练阶段中,则是依次采用上一阶段解码所得边界框作为候选区域来进行前景与背景区分。相关区域处理模块(如ROI池化层、ROI头和 ROI预测器)会依次执行相应的特征提取操作,并在各阶段分别计算出对应的分类损失以及边界框回归损失。在推理过程中,则会将三个独立阶段分类器预测结果取平均值得到综合得分,并将最后一步生成的边界由最后一个阶段的回归模块通过解码运算生成。

复制代码
    def _forward_box(self,features,proposals,image_sizes,targets,labels,regression_targets):
        head_outputs=[]
        prev_pred_boxes=None
    
        proposal_per_img=[proposal.shape[0] for proposal in proposals]
    
        for k in range(self.num_cascade_stages):
            if k>0:
                #把上一步生成的检测框clip到图像范围内,防止越界
                prev_pred_boxes=prev_pred_boxes.split(proposal_per_img)
                proposals=[]
                for prev_pred_boxes_per_img,image_size in zip(prev_pred_boxes,image_sizes):
                    proposals.append(box_ops.clip_boxes_to_image(prev_pred_boxes_per_img,image_size))
    
                if self.training:
                    proposals,matched_idxs,labels,regression_targets=self._match_and_label_boxes(proposals,k,targets)
                else:
                    matched_idxs=None
                    labels=None
                    regression_targets=None
    
            predictions=self._run_stage(features,proposals,k,image_sizes)
    
            #box解码
            prev_pred_boxes=self.box_coders[k].decode(predictions[1],proposals).squeeze()
    
            head_outputs.append((predictions,labels,regression_targets,proposals))
    
        if self.training:
            losses={}
    
            for stage,(predictions,labels,regression_targets,_) in enumerate(head_outputs):
                class_logits,box_regression=predictions
                loss_classfier,loss_box_reg=fastrcnn_loss(
                    class_logits,box_regression,labels,regression_targets
                )
                stage_losses={
                    "loss_classifier":loss_classfier,
                    "loss_box_reg":loss_box_reg
                }
                losses.update({k+"_stage{}".format(stage):v for k,v in stage_losses.items()})
    
            return losses
    
        else:
            scores_per_stage=[F.softmax(head_output[0][0],-1) for head_output in head_outputs]
    
            #3个stage的分值求平均值
            scores=(scores_per_stage[0]+scores_per_stage[1]+scores_per_stage[2])/self.num_cascade_stages
    
            #使用最后一个stage输出的boxes
            predictions,_,_,proposals=head_outputs[-1]
            boxes=self.box_coders[-1].decode(predictions[1],proposals).squeeze()
    
            pred_instances=self.postprocess_detections(scores,boxes,proposals,image_sizes)
    
            return pred_instances
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

1.2、Cascade Faster RCNN

在构建Cascade Faster R-CNN网络的过程中,则必须替换原始Faster R-CNN中的区域建议生成器(rois_heads)。此外需要注意的是,在Cascade R-Cnn边框预测器中采用了针对类别未知的边缘检测器设计——即每个候选框仅涉及四个坐标参数;而对比而言,在常规FasterRCnn架构中这些候选框则与之相比,在常规架构下会包含更多坐标信息(具体来说是4倍于类别数量)

复制代码
    class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.
    Arguments:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """
    
    def __init__(self, in_channels, num_classes):
        super(FastRCNNPredictor, self).__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        #self.bbox_pred = nn.Linear(in_channels ,4*num_classes) 原始Faster RCNN的边框回归操作,类别可知
        self.bbox_pred = nn.Linear(in_channels ,4) #Cascade RCNN的边框回归操作,类别不可知。
    
    def forward(self, x):
        if x.dim() == 4:
            assert list(x.shape[2:]) == [1, 1]
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)
    
        return scores, bbox_deltas
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
复制代码
    class FasterRCNN(GeneralizedRCNN):
    def __init__(self, backbone, num_classes=None,
                 # transform parameters
                 min_size=800, max_size=1333,
                 image_mean=None, image_std=None,
                 # RPN parameters
                 rpn_anchor_generator=None, rpn_head=None,
                 rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,
                 rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,
                 rpn_nms_thresh=0.7,
                 rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,
                 rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,
                 # Box parameters
                 box_roi_pool=None, box_head=None, box_predictor=None,
                 box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100,
                 box_fg_iou_thresh=[0.5,0.6,0.7], box_bg_iou_thresh=[0.5,0.6,0.7],
                 box_batch_size_per_image=128, box_positive_fraction=0.25,
                 bbox_reg_weights=None):
    
        if not hasattr(backbone, "out_channels"):
            raise ValueError(
                "backbone should contain an attribute out_channels "
                "specifying the number of output channels (assumed to be the "
                "same for all the levels)")
    
        assert isinstance(rpn_anchor_generator, (AnchorGenerator, type(None)))
        assert isinstance(box_roi_pool, (MultiScaleRoIAlign, type(None)))
    
        if num_classes is not None:
            if box_predictor is not None:
                raise ValueError("num_classes should be None when box_predictor is specified")
        else:
            if box_predictor is None:
                raise ValueError("num_classes should not be None when box_predictor "
                                 "is not specified")
    
        out_channels = backbone.out_channels
    
        if rpn_anchor_generator is None:
            anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
            aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
            rpn_anchor_generator = AnchorGenerator(
                anchor_sizes, aspect_ratios
            )
        if rpn_head is None:
            rpn_head = RPNHead(
                out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
            )
    
        rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)
        rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)
    
        rpn = RegionProposalNetwork(
            rpn_anchor_generator, rpn_head,
            rpn_fg_iou_thresh, rpn_bg_iou_thresh,
            rpn_batch_size_per_image, rpn_positive_fraction,
            rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)
    
        if box_roi_pool is None:
            box_roi_pool = MultiScaleRoIAlign(
                featmap_names=['0', '1', '2', '3'],
                output_size=7,
                sampling_ratio=2)
    
        if box_head is None:
            resolution = box_roi_pool.output_size[0]
            representation_size = 1024
    
            box_heads=[]
            for i in range(3):
                box_head = TwoMLPHead(
                    out_channels * resolution ** 2,
                    representation_size)
                box_heads.append(box_head)
            box_heads=nn.ModuleList(box_heads)
    
        if box_predictor is None:
            representation_size = 1024
    
            box_predictors=[]
            for i in range(3):
                box_predictor = FastRCNNPredictor(
                    representation_size,
                    num_classes)
                box_predictors.append(box_predictor)
            box_predictors=nn.ModuleList(box_predictors)
    
        #替换为Cascade ROIHeads
        roi_heads = CascadeRoIHeads(
            # Box
            box_roi_pool, box_heads, box_predictors,
            box_fg_iou_thresh, box_bg_iou_thresh,
            box_batch_size_per_image, box_positive_fraction,
            bbox_reg_weights,
            box_score_thresh, box_nms_thresh, box_detections_per_img)
    
    
        if image_mean is None:
            image_mean = [0.485, 0.456, 0.406]
        if image_std is None:
            image_std = [0.229, 0.224, 0.225]
        transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)
    
        super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
二、Colab测试

我们把代码和数据上传谷歌网盘,使用Colab提供的GPU进行测试。

复制代码
    loading annotations into memory...
    Done (t=0.15s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.18s)
    creating index...
    index created!
    FasterRCNN(
      (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
      )
      (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=1e-05)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=1e-05)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=1e-05)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=1e-05)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256, eps=1e-05)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=1e-05)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=1e-05)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=1e-05)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=1e-05)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128, eps=1e-05)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128, eps=1e-05)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512, eps=1e-05)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(512, eps=1e-05)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128, eps=1e-05)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128, eps=1e-05)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128, eps=1e-05)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128, eps=1e-05)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128, eps=1e-05)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128, eps=1e-05)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256, eps=1e-05)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256, eps=1e-05)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024, eps=1e-05)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(1024, eps=1e-05)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256, eps=1e-05)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256, eps=1e-05)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256, eps=1e-05)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256, eps=1e-05)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256, eps=1e-05)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256, eps=1e-05)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256, eps=1e-05)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256, eps=1e-05)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256, eps=1e-05)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256, eps=1e-05)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512, eps=1e-05)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512, eps=1e-05)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048, eps=1e-05)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(2048, eps=1e-05)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512, eps=1e-05)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512, eps=1e-05)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512, eps=1e-05)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512, eps=1e-05)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048, eps=1e-05)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (fpn): FeaturePyramidNetwork(
      (inner_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (layer_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (extra_blocks): LastLevelMaxPool()
    )
      )
      (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
      )
      (roi_heads): CascadeRoIHeads(
    (box_roi_pool): MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=(7, 7), sampling_ratio=2)
    (box_head): ModuleList(
      (0): TwoMLPHead(
        (fc6): Linear(in_features=12544, out_features=1024, bias=True)
        (fc7): Linear(in_features=1024, out_features=1024, bias=True)
      )
      (1): TwoMLPHead(
        (fc6): Linear(in_features=12544, out_features=1024, bias=True)
        (fc7): Linear(in_features=1024, out_features=1024, bias=True)
      )
      (2): TwoMLPHead(
        (fc6): Linear(in_features=12544, out_features=1024, bias=True)
        (fc7): Linear(in_features=1024, out_features=1024, bias=True)
      )
    )
    (box_predictor): ModuleList(
      (0): FastRCNNPredictor(
        (cls_score): Linear(in_features=1024, out_features=2, bias=True)
        (bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
      )
      (1): FastRCNNPredictor(
        (cls_score): Linear(in_features=1024, out_features=2, bias=True)
        (bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
      )
      (2): FastRCNNPredictor(
        (cls_score): Linear(in_features=1024, out_features=2, bias=True)
        (bbox_pred): Linear(in_features=1024, out_features=4, bias=True)
      )
    )
      )
    )
    start train
    end train
    Epoch: [0]  [  0/188]  eta: 0:13:08  lr: 0.000032  loss: 3.5076 (3.5076)  loss_classifier_stage0: 0.7077 (0.7077)  loss_box_reg_stage0: 0.2480 (0.2480)  loss_classifier_stage1: 0.7394 (0.7394)  loss_box_reg_stage1: 0.1427 (0.1427)  loss_classifier_stage2: 0.6609 (0.6609)  loss_box_reg_stage2: 0.0788 (0.0788)  loss_objectness: 0.6979 (0.6979)  loss_rpn_box_reg: 0.2323 (0.2323)  time: 4.1919  data: 0.6933  max mem: 7380
    Epoch: [0]  [  1/188]  eta: 0:24:39  lr: 0.000058  loss: 3.5076 (3.6157)  loss_classifier_stage0: 0.7047 (0.7062)  loss_box_reg_stage0: 0.2480 (0.3035)  loss_classifier_stage1: 0.7370 (0.7382)  loss_box_reg_stage1: 0.1427 (0.1680)  loss_classifier_stage2: 0.6609 (0.6634)  loss_box_reg_stage2: 0.0788 (0.0911)  loss_objectness: 0.6970 (0.6974)  loss_rpn_box_reg: 0.2323 (0.2478)  time: 7.9141  data: 4.4158  max mem: 7759
    Epoch: [0]  [  2/188]  eta: 0:27:30  lr: 0.000085  loss: 3.5685 (3.6000)  loss_classifier_stage0: 0.7074 (0.7066)  loss_box_reg_stage0: 0.2876 (0.2982)  loss_classifier_stage1: 0.7394 (0.7389)  loss_box_reg_stage1: 0.1427 (0.1500)  loss_classifier_stage2: 0.6627 (0.6632)  loss_box_reg_stage2: 0.0788 (0.0854)  loss_objectness: 0.6979 (0.6976)  loss_rpn_box_reg: 0.2634 (0.2601)  time: 8.8731  data: 5.3943  max mem: 7759
    ......
    ......
    ......
    Epoch: [9]  [184/188]  eta: 0:00:17  lr: 0.000005  loss: 2.9140 (2.8897)  loss_classifier_stage0: 0.2181 (0.2122)  loss_box_reg_stage0: 0.4110 (0.3996)  loss_classifier_stage1: 0.1722 (0.1778)  loss_box_reg_stage1: 0.7743 (0.7939)  loss_classifier_stage2: 0.1754 (0.1764)  loss_box_reg_stage2: 0.9149 (0.9671)  loss_objectness: 0.0527 (0.0611)  loss_rpn_box_reg: 0.0787 (0.1015)  time: 4.3346  data: 0.8974  max mem: 7763
    Epoch: [9]  [185/188]  eta: 0:00:13  lr: 0.000005  loss: 2.8553 (2.8882)  loss_classifier_stage0: 0.2145 (0.2120)  loss_box_reg_stage0: 0.4029 (0.3996)  loss_classifier_stage1: 0.1674 (0.1777)  loss_box_reg_stage1: 0.7703 (0.7933)  loss_classifier_stage2: 0.1560 (0.1762)  loss_box_reg_stage2: 0.9141 (0.9667)  loss_objectness: 0.0525 (0.0610)  loss_rpn_box_reg: 0.0787 (0.1016)  time: 4.3141  data: 0.8771  max mem: 7763
    Epoch: [9]  [186/188]  eta: 0:00:08  lr: 0.000005  loss: 2.8040 (2.8865)  loss_classifier_stage0: 0.2145 (0.2118)  loss_box_reg_stage0: 0.3973 (0.3993)  loss_classifier_stage1: 0.1674 (0.1776)  loss_box_reg_stage1: 0.7559 (0.7929)  loss_classifier_stage2: 0.1560 (0.1761)  loss_box_reg_stage2: 0.8998 (0.9664)  loss_objectness: 0.0519 (0.0610)  loss_rpn_box_reg: 0.0787 (0.1015)  time: 4.3106  data: 0.8744  max mem: 7763
    Epoch: [9]  [187/188]  eta: 0:00:04  lr: 0.000005  loss: 2.8040 (2.8839)  loss_classifier_stage0: 0.2145 (0.2117)  loss_box_reg_stage0: 0.3942 (0.3988)  loss_classifier_stage1: 0.1617 (0.1774)  loss_box_reg_stage1: 0.7455 (0.7924)  loss_classifier_stage2: 0.1553 (0.1758)  loss_box_reg_stage2: 0.8969 (0.9655)  loss_objectness: 0.0519 (0.0608)  loss_rpn_box_reg: 0.0812 (0.1015)  time: 4.2365  data: 0.8658  max mem: 7763
    Epoch: [9] Total time: 0:13:34 (4.3340 s / it)
    creating index...
    index created!
    Test:  [ 0/40]  eta: 0:01:54  model_time: 2.1751 (2.1751)  evaluator_time: 0.0700 (0.0700)  time: 2.8614  data: 0.6108  max mem: 7763
    Test:  [39/40]  eta: 0:00:02  model_time: 2.0807 (2.0847)  evaluator_time: 0.0797 (0.0778)  time: 2.8834  data: 0.7142  max mem: 7763
    Test: Total time: 0:01:53 (2.8462 s / it)
    Averaged stats: model_time: 2.0807 (2.0847)  evaluator_time: 0.0797 (0.0778)
    Accumulating evaluation results...
    DONE (t=0.13s).
    IoU metric: bbox
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.517
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.814
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.585
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.199
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.640
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.620
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.088
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.499
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.598
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.343
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.709
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.715
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
三、测试结果
在这里插入图片描述

全部评论 (0)

还没有任何评论哟~