Advertisement

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

阅读量:

https://www.arxiv.org/abs/1608.08021

Demo code: https://github.com/sanghoon/pva-faster-rcnn

本文就多种类目标检测问题进行了深入研究,并充分整合了当前最前沿的技术成果,取得了显著成效。

We achieved robust performance in prominent object detection benchmarks, attaining 81.8% mean average precision (mAP) on the VOC2007 dataset and ranking second with an impressive 82.5% mAP on the VOC2012 benchmark. Our system demonstrated exceptional efficiency, achieving a runtime of 750ms per image when executed on an Intel i7-6700K CPU with a single processing core, while maintaining blazing-fast speeds of just 46ms per image when deployed on an NVIDIA Titan X GPU. Theoretically, our network demands only 12.3% of the computational resources required by ResNet-101, which emerged victorious in the VOC2012 competition.

在整体检测框架中:
首先进行了CNN特征提取(CNN feature extraction)、区域建议(region proposal)以及区域分类(RoI classification)三个模块的设计与实现。
其中我们重点优化了特征提取环节(feature extraction),由于该部分运行速度快(fast speed),因此未投入过多资源。
对于分类器部分,则通过奇异值分解(SVD)有效降低了其复杂度(effective reduction of complexity)。
我们的设计理念是尽量减少特征类型数量同时增加网络深度(as much as possible)。
在设计网络架构时整合了多个组件:连续激活函数(ReLU)、Inception模块以及HyperNet模块。
在训练过程中采用了批归一化层(batch normalization layer)、残差连接机制(residual connection)以及基于平台检测的学习率调度策略(learning rate scheduling based on plateau detection)。

2 详细网络架构设计
2.1 C ReLU: 在特征生成过程中起到前期作用的模块

这里写图片描述

ReLUs are commonly employed in the initial layers of convolutional networks. They halve the number of output channels and double the speed by taking their negative values, which doubles the speed without sacrificing accuracy.

2.2 Inception: Remaining building blocks in feature generation

这里写图片描述

Inception can effectively address both minor tasks and major tasks, its implementation involves controlling the size of convolution kernels.

2.3 HyperNet: Concatenation of multi-scale intermediate outputs

主要是将不同尺度的卷积特征层结合起来。可以进行多尺度目标检测。

这里写图片描述
这里写图片描述

2.4 Deep network training

在这里,在Inception模块之间设置为残差连接。在所有ReLU激活模块之前设置Batch normalization层。基于plateau检测机制动态地调整学习率。

Faster R-CNN被我们配备的特征提取网络所增强。我们通过将卷积 3/4层(下采样)和卷积 层4/4与卷积层5/4(上采样)结合起来形成一个具有512通道的多尺度输出特征,并将其作为Faster R-CNN模型的输入使用。Three intermediate outputs from conv3/4 (with down-scaling), conv4/4, and conv5/4 (with up-scaling) are integrated to produce a 512-channel multi-scale output features for the model.

4 Experimental results

这里写图片描述
这里写图片描述

全部评论 (0)

还没有任何评论哟~