Advertisement

基于Vgg-Unet模型自动驾驶场景检测

阅读量:

1.VGG

VGG的全称为Visual Geometry Group(VG),该组织隶属于牛津大学科学工程系,在该领域开发了一系列以VGG命名的卷积神经网络模型,并广泛应用于人脸识别和图像分类等领域。值得注意的是,在实际应用中,默认情况下其输入尺寸设定为224×244像素RGB图像。为了提升模型泛化能力,在训练阶段需要对数据集中的所有图像计算其通道均值参数,并将此均值参数作为输入供其卷积网络进行处理。在设计过程中采用了大小为3×3或1×1的滤波器,并且每个卷积操作均遵循固定模式进行处理。整个网络架构包含多个连续的卷积层以及三个完全连接层(FC)。具体而言,在不同配置下可实现从最小规模的VGG11(包含8个卷积层和3个完全连接层)到最大规模的VGG19(包含16个卷积层和3个完全连接层)之间的多种组合方式。此外需要注意的是,在传统的VGG架构设计中并未在每一个卷积操作之后引入池化操作,并且未在不同的卷积层级之间共享相同的池化操作次数。

在这里插入图片描述

架构图

在这里插入图片描述

2.Unet模型:

UNet是一种领先的语义分割技术,在性能上远超同类算法,并广泛应用于医学影像分析等领域。在基本工作流程上与传统的方法相似,在神经网络架构设计上具有显著差异性特点:相比于传统的卷积神经网络而言,在CNN中处理的是整个图像级别的分类任务,在UNet中则是在更高层次的空间粒度进行操作以实现更精细的目标识别功能

在这里插入图片描述

主要代码如下:

复制代码
    def get_vgg_encoder(input_height=224,  input_width=224, channels=3):
    
    if channel == 'channels_first':
        img_input = Input(shape=(channels, input_height, input_width))
    elif channel == 'channels_last':
        img_input = Input(shape=(input_height, input_width, channels))
    
    x = Conv2D(64, (3, 3), activation='relu', padding='same',
               name='block1_conv1', data_format=channel)(img_input)
    x = Conv2D(64, (3, 3), activation='relu', padding='same',
               name='block1_conv2', data_format=channel)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool',
                     data_format=channel)(x)
    f1 = x
    # Block 2
    x = Conv2D(128, (3, 3), activation='relu', padding='same',
               name='block2_conv1', data_format=channel)(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same',
               name='block2_conv2', data_format=channel)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool',
                     data_format=channel)(x)
    f2 = x
    
    # Block 3
    x = Conv2D(256, (3, 3), activation='relu', padding='same',
               name='block3_conv1', data_format=channel)(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same',
               name='block3_conv2', data_format=channel)(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same',
               name='block3_conv3', data_format=channel)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool',
                     data_format=channel)(x)
    f3 = x
    
    # Block 4
    x = Conv2D(512, (3, 3), activation='relu', padding='same',
               name='block4_conv1', data_format=channel)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same',
               name='block4_conv2', data_format=channel)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same',
               name='block4_conv3', data_format=channel)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool',
                     data_format=channel)(x)
    f4 = x
    
    # Block 5
    x = Conv2D(512, (3, 3), activation='relu', padding='same',
               name='block5_conv1', data_format=channel)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same',
               name='block5_conv2', data_format=channel)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same',
               name='block5_conv3', data_format=channel)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool',
                     data_format=channel)(x)
    f5 = x
    
    return img_input, [f1, f2, f3, f4, f5]
    
    def _unet(classes, encoder, l1_skip_conn=True, input_height=416,
          input_width=608, channels=3):
    
    img_input, levels = encoder(
        input_height=input_height, input_width=input_width, channels=channels)
    [f1, f2, f3, f4, f5] = levels
    
    o = f4
    
    o = (ZeroPadding2D((1, 1), data_format=channel))(o)
    o = (Conv2D(512, (3, 3), padding='valid' , activation='relu' , data_format=channel))(o)
    o = (BatchNormalization())(o)
    
    o = (UpSampling2D((2, 2), data_format=channel))(o)
    o = (concatenate([o, f3], axis=-1))
    o = (ZeroPadding2D((1, 1), data_format=channel))(o)
    o = (Conv2D(256, (3, 3), padding='valid', activation='relu' , data_format=channel))(o)
    o = (BatchNormalization())(o)
    
    o = (UpSampling2D((2, 2), data_format=channel))(o)
    o = (concatenate([o, f2], axis=-1))
    o = (ZeroPadding2D((1, 1), data_format=channel))(o)
    o = (Conv2D(128, (3, 3), padding='valid' , activation='relu' , data_format=channel))(o)
    o = (BatchNormalization())(o)
    
    o = (UpSampling2D((2, 2), data_format=channel))(o)
    
    if l1_skip_conn:
        o = (concatenate([o, f1], axis=-1))
    
    o = (ZeroPadding2D((1, 1), data_format=channel))(o)
    o = (Conv2D(64, (3, 3), padding='valid', activation='relu', data_format=channel, name="seg_feats"))(o)
    o = (BatchNormalization())(o)
    
    o = Conv2D(classes, (3, 3), padding='same',
               data_format=channel)(o)
    
    model = get_segmentation_model(img_input, o)
    
    return model

全部评论 (0)

还没有任何评论哟~