Advertisement

基于Deeplabv3模型实现自动驾驶车辆的实时对象检测

阅读量:

团队名称

Intel启动队

问题陈述

结合先进的计算机视觉技术和英特尔® AI 分析工具套件,本研究致力于开发实时目标检测系统以支持自动驾驶汽车的应用。团队需构建一个基于深度学习的系统,在人行道、汽车、交通标志牌以及交通指示灯等物体的基础下进行识别与分类任务。该系统需具备极高的识别精度与极低的响应时间,并以确保自动驾驶汽车的安全可靠的行驶路径规划为目标进行设计与优化。

项目简介

本项目采用了deeplabv3模型,通过以下步骤实现了语义分割。

  1. 数据预处理过程
  2. ResNet模块用于提取输入图像的特征表示
  3. ASPP模块通过整合上下文信息来提升特征表达能力
  4. 基于深度学习框架DeeplabV3将各部分整合后,实现语义分割预测任务的完成。

至此已经实现了通过从视频或图像中分割出道路交通标志以及车辆等元素并识别出所处的环境情况从而为自动驾驶系统提供技术支持

相关技术

在编写程序时, 本系统采用了英特尔® AI 分析工具套件及其关键组件

  1. 借助Intel® Deep Neural Network (DNN) library for PyTorch framework来提升训练效率:Intel's DNN library for PyTorch framework能够通过硬件加速技术来优化卷积运算效率,从而有效缩短DeepLabV3+模型的训练时长。
  2. 采用Intel® Video Profiler来进行性能分析:Intel Video Profiler能够深入分析各层节点的性能特征,并据此进行优化设计,帮助识别并解决系统性能瓶颈问题。

与传统的CUDA编程模型相比,OneAPI具有以下优势:

  1. 多平台兼容性:OneAPI 支持英特尔 CPU、GPU 和 FPGA 等不同类型的硬件设备。
  2. 统一接口特性:该库提供了一种统一的接口结构,在这种结构下开发人员无需为每种不同的硬件架构专门编写代码。
  3. 数据并行技术实现:OneAPI 使用基于数据并行性的编程模型。
  4. 功能生态系统支持: OneAPI 提供了一个功能丰富且易于使用的生态系统的各种组件。
  5. 开放且通用的标准: OneAPI 是一个开放且通用的标准体系结构。

通过调用oneAPI技术方案,我们将模型与优化器迁移到IPUs上进行训练,无需人工设备管理.由于IPUs具有充分支持的数据并行计算能力,借助其神经网络加速功能显著提升了训练效率.具体实现步骤如下:

复制代码
 #导入Intel Extension for PyTorch:

    
 import intel_extension_for_pytorch as ipex
    
  
    
  
    
 #对网络和优化器进行IPU优化:
    
 network.train()  
    
 network = ipex.optimize(model=network, optimizer=optimizer)
    
  
    
  
    
 #屏蔽掉了原PyTorch版本的优化器,使用IPU优化后的版本:
    
 optimizer = torch.optim.Adam(params = network.parameters(), lr=0.003)

实现了基于OneAPI的高效深度学习训练流程。

团队收获

我们学会了使用oneAPI 进行性能优化:

oneAPI 丰富地提供了各种工具与库资源,在助力提升代码性能方面发挥着重要作用。其中一项关键工具是 Intel® VTune™ Profiler ,它不仅能够识别代码中的性能瓶颈问题,并且还能给出相应的优化策略以供参考。借助这些高性能调试工具的强大功能与丰富功能模块组合设计能力,在实际应用中能够显著提升开发效率与系统响应速度。

我们学会了使用oneAPI 进行深度学习模型训练:

oneAPI 包括以下深度学习与神经网络相关的库与工具包。在实际应用中掌握如何构建、训练与部署深度学习架构的同时,在使用Intel® oneAPI Deep Neural Network Library的过程中还深入学习了提升推理效率的具体方法与技巧。

实现方案

数据预处理

该数据集包含了34个类别,在特定场景下或研究中常会关注其中一部分分类。通常会选择19个训练标识符来进行图像语义分割的训练。

训练ID(trainId)是一种专为便于管理与训练过程而设计的独特标识符。它将原始标签中的34个分类映射到19个分类构成的集合中。这种做法的主要目的是降低计算复杂度并提升模型性能。

在此处,该标识符代表了19个类别中的每一个,在Cityscapes数据集的图像语义分割任务中被使用。

映射关系(id->trainId):
复制代码
 labels = [

    
     #       name                     id    trainId   category            catId     hasInstances   ignoreInEval   color
    
     Label(  'unlabeled'            ,  0 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    
     Label(  'ego vehicle'          ,  1 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    
     Label(  'rectification border' ,  2 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    
     Label(  'out of roi'           ,  3 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    
     Label(  'static'               ,  4 ,      19 , 'void'            , 0       , False        , True         , (  0,  0,  0) ),
    
     Label(  'dynamic'              ,  5 ,      19 , 'void'            , 0       , False        , True         , (111, 74,  0) ),
    
     Label(  'ground'               ,  6 ,      19 , 'void'            , 0       , False        , True         , ( 81,  0, 81) ),
    
     Label(  'road'                 ,  7 ,        0 , 'flat'            , 1       , False        , False        , (128, 64,128) ),
    
     Label(  'sidewalk'             ,  8 ,        1 , 'flat'            , 1       , False        , False        , (244, 35,232) ),
    
     Label(  'parking'              ,  9 ,      19 , 'flat'            , 1       , False        , True         , (250,170,160) ),
    
     Label(  'rail track'           , 10 ,      19 , 'flat'            , 1       , False        , True         , (230,150,140) ),
    
     Label(  'building'             , 11 ,        2 , 'construction'    , 2       , False        , False        , ( 70, 70, 70) ),
    
     Label(  'wall'                 , 12 ,        3 , 'construction'    , 2       , False        , False        , (102,102,156) ),
    
     Label(  'fence'                , 13 ,        4 , 'construction'    , 2       , False        , False        , (190,153,153) ),
    
     Label(  'guard rail'           , 14 ,      19 , 'construction'    , 2       , False        , True         , (180,165,180) ),
    
     Label(  'bridge'               , 15 ,      19 , 'construction'    , 2       , False        , True         , (150,100,100) ),
    
     Label(  'tunnel'               , 16 ,      19 , 'construction'    , 2       , False        , True         , (150,120, 90) ),
    
     Label(  'pole'                 , 17 ,        5 , 'object'          , 3       , False        , False        , (153,153,153) ),
    
     Label(  'polegroup'            , 18 ,      19 , 'object'          , 3       , False        , True         , (153,153,153) ),
    
     Label(  'traffic light'        , 19 ,        6 , 'object'          , 3       , False        , False        , (250,170, 30) ),
    
     Label(  'traffic sign'         , 20 ,        7 , 'object'          , 3       , False        , False        , (220,220,  0) ),
    
     Label(  'vegetation'           , 21 ,        8 , 'nature'          , 4       , False        , False        , (107,142, 35) ),
    
     Label(  'terrain'              , 22 ,        9 , 'nature'          , 4       , False        , False        , (152,251,152) ),
    
     Label(  'sky'                  , 23 ,       10 , 'sky'             , 5       , False        , False        , ( 70,130,180) ),
    
     Label(  'person'               , 24 ,       11 , 'human'           , 6       , True         , False        , (220, 20, 60) ),
    
     Label(  'rider'                , 25 ,       12 , 'human'           , 6       , True         , False        , (255,  0,  0) ),
    
     Label(  'car'                  , 26 ,       13 , 'vehicle'         , 7       , True         , False        , (  0,  0,142) ),
    
     Label(  'truck'                , 27 ,       14 , 'vehicle'         , 7       , True         , False        , (  0,  0, 70) ),
    
     Label(  'bus'                  , 28 ,       15 , 'vehicle'         , 7       , True         , False        , (  0, 60,100) ),
    
     Label(  'caravan'              , 29 ,      19 , 'vehicle'         , 7       , True         , True         , (  0,  0, 90) ),
    
     Label(  'trailer'              , 30 ,      19 , 'vehicle'         , 7       , True         , True         , (  0,  0,110) ),
    
     Label(  'train'                , 31 ,       16 , 'vehicle'         , 7       , True         , False        , (  0, 80,100) ),
    
     Label(  'motorcycle'           , 32 ,       17 , 'vehicle'         , 7       , True         , False        , (  0,  0,230) ),
    
     Label(  'bicycle'              , 33 ,       18 , 'vehicle'         , 7       , True         , False        , (119, 11, 32) ),
    
     Label(  'license plate'        , -1 ,       19 , 'vehicle'         , 7       , False        , True         , (  0,  0,142) ),
    
 ]

超参设置

num_epochs = 400
batch_size = 3
learning_rate = 0.0001

ResNet模块

简介

我们基于预训练的ResNet构建了一个核心特征提取模块,该模块具备多样化的深度设置与输出间隔选择能力,为其后的语义分割任务提供强有力的特征支撑。定义网络架构并导入预训练权重的过程支撑了一个通用性的ResNet特征提取模块,从而为其后的各项技术环节提供基础性支持。

具体实现思路
  1. 确定ResNet基础架构
  2. 设计多层次架构
  3. 导入预训练网络模型
  4. 生成适用于OS16及OS8的版本
  5. 建立正向与反向传播接口
  6. 实现功能模块接口构建
代码实现
复制代码
 import torch

    
 import torch.nn as nn
    
 import torch.nn.functional as F
    
 import torchvision.models as models
    
  
    
  
    
 def make_layer(block, in_channels, channels, num_blocks, stride=1, dilation=1):
    
     strides = [stride] + [1] * (num_blocks - 1)
    
  
    
     blocks = []
    
     for stride in strides:
    
     blocks.append(block(in_channels=in_channels, channels=channels, stride=stride, dilation=dilation))
    
     in_channels = block.expansion * channels
    
  
    
     layer = nn.Sequential(*blocks)
    
  
    
     return layer
    
  
    
  
    
 class BasicBlock(nn.Module):
    
     expansion = 1
    
  
    
     def __init__(self, in_channels, channels, stride=1, dilation=1):
    
     super(BasicBlock, self).__init__()
    
  
    
     out_channels = self.expansion * channels
    
  
    
     self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=3, stride=stride, padding=dilation, dilation=dilation,
    
                            bias=False)
    
     self.bn1 = nn.BatchNorm2d(channels)
    
  
    
     self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, stride=1, padding=dilation, dilation=dilation,
    
                            bias=False)
    
     self.bn2 = nn.BatchNorm2d(channels)
    
  
    
     if (stride != 1) or (in_channels != out_channels):
    
         conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
    
         bn = nn.BatchNorm2d(out_channels)
    
         self.downsample = nn.Sequential(conv, bn)
    
     else:
    
         self.downsample = nn.Sequential()
    
  
    
     def forward(self, x):
    
     out = F.relu(self.bn1(self.conv1(x)))
    
     out = self.bn2(self.conv2(out))
    
  
    
     out = out + self.downsample(x)
    
  
    
     out = F.relu(out)
    
  
    
     return out
    
  
    
  
    
 class Bottleneck(nn.Module):
    
     expansion = 4
    
  
    
     def __init__(self, in_channels, channels, stride=1, dilation=1):
    
     super(Bottleneck, self).__init__()
    
  
    
     out_channels = self.expansion * channels
    
  
    
     self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=1, bias=False)
    
     self.bn1 = nn.BatchNorm2d(channels)
    
  
    
     self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, stride=stride, padding=dilation, dilation=dilation,
    
                            bias=False)
    
     self.bn2 = nn.BatchNorm2d(channels)
    
  
    
     self.conv3 = nn.Conv2d(channels, out_channels, kernel_size=1, bias=False)
    
     self.bn3 = nn.BatchNorm2d(out_channels)
    
  
    
     if (stride != 1) or (in_channels != out_channels):
    
         conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
    
         bn = nn.BatchNorm2d(out_channels)
    
         self.downsample = nn.Sequential(conv, bn)
    
     else:
    
         self.downsample = nn.Sequential()
    
  
    
     def forward(self, x):
    
     out = F.relu(self.bn1(self.conv1(x)))  # (shape: (batch_size, channels, h, w))
    
     out = F.relu(self.bn2(self.conv2(out)))
    
     out = self.bn3(self.conv3(out))
    
     out = out + self.downsample(x)
    
     out = F.relu(out)
    
  
    
     return out
    
  
    
  
    
 class ResNet_Bottleneck_OS16(nn.Module):
    
     def __init__(self, num_layers):
    
     super(ResNet_Bottleneck_OS16, self).__init__()
    
  
    
     if num_layers == 50:
    
         resnet = models.resnet50()
    
         # load pretrained model:
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet50-19c8e357.pth"))
    
         # remove fully connected layer, avg pool and layer5:
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-3])
    
         print("pretrained resnet, 50")
    
  
    
     elif num_layers == 101:
    
         resnet = models.resnet101()
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet101-5d3b4d8f.pth"))
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-3])
    
         print("pretrained resnet, 101")
    
  
    
     elif num_layers == 152:
    
         resnet = models.resnet152()
    
         # load pretrained model:
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet152-b121ed2d.pth"))
    
         # remove fully connected layer, avg pool and layer5:
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-3])
    
         print("pretrained resnet, 152")
    
  
    
     else:
    
         raise Exception("num_layers must be in {50, 101, 152}!")
    
  
    
     self.layer5 = make_layer(Bottleneck, in_channels=4 * 256, channels=512, num_blocks=3, stride=1, dilation=2)
    
  
    
     def forward(self, x):
    
     c4 = self.resnet(x)
    
     output = self.layer5(c4)
    
     return output
    
  
    
  
    
 class ResNet_BasicBlock_OS16(nn.Module):
    
     def __init__(self, num_layers):
    
     super(ResNet_BasicBlock_OS16, self).__init__()
    
  
    
     if num_layers == 18:
    
         resnet = models.resnet18()
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet18-5c106cde.pth"))
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-3])
    
         num_blocks = 2
    
         print("pretrained resnet, 18")
    
  
    
     elif num_layers == 34:
    
         resnet = models.resnet34()
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet34-333f7ec4.pth"))
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-3])
    
         num_blocks = 3
    
         print("pretrained resnet, 34")
    
  
    
     else:
    
         raise Exception("num_layers must be in {18, 34}!")
    
  
    
     self.layer5 = make_layer(BasicBlock, in_channels=256, channels=512, num_blocks=num_blocks, stride=1, dilation=2)
    
  
    
     def forward(self, x):
    
     c4 = self.resnet(x)
    
     output = self.layer5(c4)
    
     return output
    
  
    
  
    
 class ResNet_BasicBlock_OS8(nn.Module):
    
     def __init__(self, num_layers):
    
     super(ResNet_BasicBlock_OS8, self).__init__()
    
  
    
     if num_layers == 18:
    
         resnet = models.resnet18()
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet18-5c106cde.pth"))
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-4])
    
         num_blocks_layer_4 = 2
    
         num_blocks_layer_5 = 2
    
         print("pretrained resnet, 18")
    
  
    
     elif num_layers == 34:
    
         resnet = models.resnet34()
    
         resnet.load_state_dict(torch.load("/root/deeplabv3/pretrained_models/resnet/resnet34-333f7ec4.pth"))
    
         self.resnet = nn.Sequential(*list(resnet.children())[:-4])
    
         num_blocks_layer_4 = 6
    
         num_blocks_layer_5 = 3
    
         print("pretrained resnet, 34")
    
     else:
    
         raise Exception("num_layers must be in {18, 34}!")
    
  
    
     self.layer4 = make_layer(BasicBlock, in_channels=128, channels=256, num_blocks=num_blocks_layer_4, stride=1,
    
                              dilation=2)
    
  
    
     self.layer5 = make_layer(BasicBlock, in_channels=256, channels=512, num_blocks=num_blocks_layer_5, stride=1,
    
                              dilation=4)
    
  
    
     def forward(self, x):
    
     c3 = self.resnet(x)
    
     output = self.layer4(c3)
    
     output = self.layer5(output)
    
     return output
    
  
    
  
    
 def ResNet18_OS16():
    
     return ResNet_BasicBlock_OS16(num_layers=18)
    
  
    
  
    
 def ResNet34_OS16():
    
     return ResNet_BasicBlock_OS16(num_layers=34)
    
  
    
  
    
 def ResNet50_OS16():
    
     return ResNet_Bottleneck_OS16(num_layers=50)
    
  
    
  
    
 def ResNet101_OS16():
    
     return ResNet_Bottleneck_OS16(num_layers=101)
    
  
    
  
    
 def ResNet152_OS16():
    
     return ResNet_Bottleneck_OS16(num_layers=152)
    
  
    
  
    
 def ResNet18_OS8():
    
     return ResNet_BasicBlock_OS8(num_layers=18)
    
  
    
  
    
 def ResNet34_OS8():
    
     return ResNet_BasicBlock_OS8(num_layers=34)

ASPP模块

简介

本研究采用了ASPP模块,并利用多层次空洞卷积与金字塔池化操作提取不同层次的空间关系信息,在提升图像语义分割精度方面取得了显著成果。

本研究开发了两个关键模块:ASPP及其瓶颈增强版本。其中,ASPP模块基于512通道的空间特征图设计。而其瓶颈增强版本则采用了4×512通道的空间特征图。通过在瓶颈增强版本中增加一个瓶颈层节点,在不显著增加计算开销的情况下显著提升了模型性能。

代码实现
复制代码
 import torch

    
 import torch.nn as nn
    
 import torch.nn.functional as F
    
  
    
 class ASPP(nn.Module):
    
     def __init__(self, num_classes):
    
     super(ASPP, self).__init__()
    
  
    
     self.conv_1x1_1 = nn.Conv2d(512, 256, kernel_size=1)
    
     self.bn_conv_1x1_1 = nn.BatchNorm2d(256)
    
  
    
     self.conv_3x3_1 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=6, dilation=6)
    
     self.bn_conv_3x3_1 = nn.BatchNorm2d(256)
    
  
    
     self.conv_3x3_2 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=12, dilation=12)
    
     self.bn_conv_3x3_2 = nn.BatchNorm2d(256)
    
  
    
     self.conv_3x3_3 = nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=18, dilation=18)
    
     self.bn_conv_3x3_3 = nn.BatchNorm2d(256)
    
  
    
     self.avg_pool = nn.AdaptiveAvgPool2d(1)
    
  
    
     self.conv_1x1_2 = nn.Conv2d(512, 256, kernel_size=1)
    
     self.bn_conv_1x1_2 = nn.BatchNorm2d(256)
    
  
    
     self.conv_1x1_3 = nn.Conv2d(1280, 256, kernel_size=1) # (1280 = 5*256)
    
     self.bn_conv_1x1_3 = nn.BatchNorm2d(256)
    
  
    
     self.conv_1x1_4 = nn.Conv2d(256, num_classes, kernel_size=1)
    
  
    
     def forward(self, feature_map):
    
     # (feature_map has shape (batch_size, 512, h/16, w/16)) (assuming self.resnet is ResNet18_OS16 or ResNet34_OS16. If self.resnet instead is ResNet18_OS8 or ResNet34_OS8, it will be (batch_size, 512, h/8, w/8))
    
  
    
     feature_map_h = feature_map.size()[2] # (== h/16)
    
     feature_map_w = feature_map.size()[3] # (== w/16)
    
  
    
     out_1x1 = F.relu(self.bn_conv_1x1_1(self.conv_1x1_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
     out_3x3_1 = F.relu(self.bn_conv_3x3_1(self.conv_3x3_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
     out_3x3_2 = F.relu(self.bn_conv_3x3_2(self.conv_3x3_2(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
     out_3x3_3 = F.relu(self.bn_conv_3x3_3(self.conv_3x3_3(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
  
    
     out_img = self.avg_pool(feature_map) # (shape: (batch_size, 512, 1, 1))
    
     out_img = F.relu(self.bn_conv_1x1_2(self.conv_1x1_2(out_img))) # (shape: (batch_size, 256, 1, 1))
    
     out_img = F.upsample(out_img, size=(feature_map_h, feature_map_w), mode="bilinear") # (shape: (batch_size, 256, h/16, w/16))
    
  
    
     out = torch.cat([out_1x1, out_3x3_1, out_3x3_2, out_3x3_3, out_img], 1) # (shape: (batch_size, 1280, h/16, w/16))
    
     out = F.relu(self.bn_conv_1x1_3(self.conv_1x1_3(out))) # (shape: (batch_size, 256, h/16, w/16))
    
     out = self.conv_1x1_4(out) # (shape: (batch_size, num_classes, h/16, w/16))
    
  
    
     return out
    
  
    
 class ASPP_Bottleneck(nn.Module):
    
     def __init__(self, num_classes):
    
     super(ASPP_Bottleneck, self).__init__()
    
  
    
     self.conv_1x1_1 = nn.Conv2d(4*512, 256, kernel_size=1)
    
     self.bn_conv_1x1_1 = nn.BatchNorm2d(256)
    
  
    
     self.conv_3x3_1 = nn.Conv2d(4*512, 256, kernel_size=3, stride=1, padding=6, dilation=6)
    
     self.bn_conv_3x3_1 = nn.BatchNorm2d(256)
    
  
    
     self.conv_3x3_2 = nn.Conv2d(4*512, 256, kernel_size=3, stride=1, padding=12, dilation=12)
    
     self.bn_conv_3x3_2 = nn.BatchNorm2d(256)
    
  
    
     self.conv_3x3_3 = nn.Conv2d(4*512, 256, kernel_size=3, stride=1, padding=18, dilation=18)
    
     self.bn_conv_3x3_3 = nn.BatchNorm2d(256)
    
  
    
     self.avg_pool = nn.AdaptiveAvgPool2d(1)
    
  
    
     self.conv_1x1_2 = nn.Conv2d(4*512, 256, kernel_size=1)
    
     self.bn_conv_1x1_2 = nn.BatchNorm2d(256)
    
  
    
     self.conv_1x1_3 = nn.Conv2d(1280, 256, kernel_size=1) # (1280 = 5*256)
    
     self.bn_conv_1x1_3 = nn.BatchNorm2d(256)
    
  
    
     self.conv_1x1_4 = nn.Conv2d(256, num_classes, kernel_size=1)
    
  
    
     def forward(self, feature_map):
    
     # (feature_map has shape (batch_size, 4*512, h/16, w/16))
    
  
    
     feature_map_h = feature_map.size()[2] # (== h/16)
    
     feature_map_w = feature_map.size()[3] # (== w/16)
    
  
    
     out_1x1 = F.relu(self.bn_conv_1x1_1(self.conv_1x1_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
     out_3x3_1 = F.relu(self.bn_conv_3x3_1(self.conv_3x3_1(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
     out_3x3_2 = F.relu(self.bn_conv_3x3_2(self.conv_3x3_2(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
     out_3x3_3 = F.relu(self.bn_conv_3x3_3(self.conv_3x3_3(feature_map))) # (shape: (batch_size, 256, h/16, w/16))
    
  
    
     out_img = self.avg_pool(feature_map) # (shape: (batch_size, 512, 1, 1))
    
     out_img = F.relu(self.bn_conv_1x1_2(self.conv_1x1_2(out_img))) # (shape: (batch_size, 256, 1, 1))
    
     out_img = F.upsample(out_img, size=(feature_map_h, feature_map_w), mode="bilinear") # (shape: (batch_size, 256, h/16, w/16))
    
  
    
     out = torch.cat([out_1x1, out_3x3_1, out_3x3_2, out_3x3_3, out_img], 1) # (shape: (batch_size, 1280, h/16, w/16))
    
     out = F.relu(self.bn_conv_1x1_3(self.conv_1x1_3(out))) # (shape: (batch_size, 256, h/16, w/16))
    
     out = self.conv_1x1_4(out) # (shape: (batch_size, num_classes, h/16, w/16))
    
  
    
     return out

这段代码定义了两个模型:ASPP和ASPP_Bottleneck。

ASPP模型的主要功能是对输入图像中的特征图进行多尺度特征融合,并被应用于语义分割任务中。其内部架构整合了多种卷积层与下采样模块。通过应用不同尺寸的卷积核来提取各层次的空间细节,并结合自适应平均池化技术对特征图进行降维处理。随后将各尺度提取的结果与全局上下文信息进行融合,并通过一系列额外的卷积层与激活函数处理后得到最终预测输出。

基于ASPP架构设计的ASPP_Bottleneck模型加入了瓶颈结构。
该模型旨在以增强特征表示能力。
其首先在其输入的空间金字塔式地执行多尺度卷积操作。
并利用自适应平均池化技术获取整体空间信息。
随后将各尺度生成的空间表征与整体空间信息融合后,在经过一系列的空间转换与归一化处理后获得最终预测结果

Deeplabv3模块

简介

该模块实现了Deeplabv3网络的主要结构和操作流程:

定义Deeplabv3网络类DeepLabV3,并传入训练相关参数;

指定不同的ResNet结构作为 backbone 提取特征图;

初始化ASPP模块,用于对特征图进行上下文聚合;

定义forward传播过程;

  1. 保存训练状态
代码实现
复制代码
 import torch

    
 import torch.nn as nn
    
 import torch.nn.functional as F
    
  
    
 import os
    
  
    
 from resnet import ResNet18_OS16, ResNet34_OS16, ResNet50_OS16, ResNet101_OS16, ResNet152_OS16, ResNet18_OS8, ResNet34_OS8
    
 from aspp import ASPP, ASPP_Bottleneck
    
  
    
 class DeepLabV3(nn.Module):
    
     def __init__(self, model_id, project_dir):
    
     super(DeepLabV3, self).__init__()
    
  
    
     self.num_classes = 20
    
  
    
     self.model_id = model_id
    
     self.project_dir = project_dir
    
     self.create_model_dirs()
    
  
    
     self.resnet = ResNet18_OS8() 
    
     self.aspp = ASPP(num_classes=self.num_classes) 
    
  
    
     def forward(self, x):
    
  
    
     h = x.size()[2]
    
     w = x.size()[3]
    
  
    
     feature_map = self.resnet(x) 
    
  
    
     output = self.aspp(feature_map) 
    
     
    
     output = F.upsample(output, size=(h, w), mode="bilinear") 
    
  
    
     return output
    
  
    
     def create_model_dirs(self):
    
     self.logs_dir = self.project_dir + "/training_logs"
    
     self.model_dir = self.logs_dir + "/model_%s" % self.model_id
    
     self.checkpoints_dir = self.model_dir + "/checkpoints"
    
     if not os.path.exists(self.logs_dir):
    
         os.makedirs(self.logs_dir)
    
     if not os.path.exists(self.model_dir):
    
         os.makedirs(self.model_dir)
    
         os.makedirs(self.checkpoints_dir)

训练过程

共训练400epoches,分成0-100、101-200、201-400epoch进行训练。

注明:101-200、201-400epoch训练损失结果图下标输出错误,已手动修改。

训练集训练损失

验证集训练损失

训练结果:

测试集中图片平均像素交并比-直方图

平均像素交并比:0.954983232183881

测试集中图片像素精度-直方图

平均像素精度为:0.86769234085083

测试集中单张图片的平均预测时间-直方图

平均预测时间为:0.0030388580219639257s

结果演示

全部评论 (0)

还没有任何评论哟~