论文笔记|Rich feature hierarchies for accurate object detection and semantic segmentation

阅读量：

Authors

Ross Girshick /Jeff Donahue/Trevor Darrell /Jitendra Malik

Ross Girshick

Abstract

R-CNN：Regions with CNN features. It combines two key insights:
1. apply cnns to bottom-up region proposals
2. supervised pre-training for an anuxiliary task(fine-tuning)

1 Introduction

1.1 from hog->cnns

The last decade of progress on various visual recognition tasks has been based on SIFT and HOG,which we could associate them with complex cells in V1, but they still perform poorly. we need more multi-stage processes for computing features.
Fukushinma–neocognitron– lacked a supervised training algorithm
LeCun et al. – SGD+backpropagation was effective for training CNNs
CNNs saw heavy use in 1990s, but then fell out of fashion with the rise of SVM, Krizhevsky et al. rekindled interest in CNNs,in ILSVRC 2012(rectifying non-linearities and “dropout” regularization) .
This paper is the first to show that a cnns can lead to dramatically higher object detection performance on PASCAL VOC than HOG-like features.

1.2 two problems

this paper focused on two problems: localizing objects with a deep network and training a high-capacity model with a small quantity of annotated data.
1. localization mathod
- as a regression problem
- sliding-window (loss precision) (overFeat)
2. fine-tuning

1.3 efficient

1.4 dominant error mode

a simple bounding-box regression method significantly reduce mislocalizations

2 Object detection with R-CNN

this system consists of three modules.
1. generates category-independent region proposals
2. extracts a fixed-length feature vector from region using cnn
3. linear SVMs

2.1 module design

2.1.1 region proposals

some examples: 1objectness, 2selective search, 3category-independent object proposals,4consitrained parametric min-cuts(CPMC),5multi-scale combinatorial grouping, 6ciresan….
we use selective search to enable a controlled comparsion with prior detection work.

复制代码

    J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders.Selective search for object recognition. IJCV, 2013.

2.1.2 Feature extraction

Warp all pixels in a tight bounding box around it to the required size(227x227), prior to warping ,we dilate the box p(16) pixels.

2.2 test-time detection

This paper run selective search on the test image to extract around 2000 region proposals(fast mode), Given all scored regions (SVM) in an image, we apply a greedy non-maximum suppression that reject a region if it has an intersection over union**(IoU)** overlap with a higher socring selected region larger than a learned threshold.

2.2.1 run-time analysis

cnn share parameters across all categories( and easier for different num_category)
feature vectors computed by cnn are lower dimensional

13s/image on a gpu, 53s/image on a cup
feature matrix is 2000x4096, svm weight maxtrix is 4096xnumber of classes.

2.3 training

2.3.1 supervised pre-treining

caffe cnn library (nearly matches the performance of Krizhevsk et al.)

2.3.2 domain-specific fine-tuning

replacing the 1000way classification layer with a randomly initialized (N+1)-way classification layer.(plus 1 for background)
We treat all region proposals with >=0.5 IoU overlap with a ground-truth box as positives .
we start SGD at a learning rate of 0.001(1/10th of theinitial pre-training rate, not clobbering the initialization )
In each SGD iteration, we uniformly sample 32 positive windows + 96 background windows to construct a mini-batch of size 128.

2.3.2 Object category classifiers

for a background region, it is easy
but how to label a region taht partially overlaps a car. We resolve this issue with an IoU overlap threshold 0.3 (grid search over 0,0.1,0.2,…0.5, and defined differently in fine-tuning),below are defined as negatives.
Since the training data is too large to fit in memory, we adopt the standard hard negative mining method.

2.4 Results on PASCAL VOC 2010-2012

2010 53.7% 5011-2012 53.3% mAP
UVA same region proposal algorithm + four level spatial pyramid SIFT +nonlinear kernel svm–>35.1%

2.5 Results on ILSVRC2013

31.4%
OverFeat 24.3%

3 Visualization,ablation,and modes of error

3.1 Visualizing learned features

The idea is to single out a particular unit (feature) in the network and ust it as if it were an object detector in its own right.
The follow picture showing top regions for six pool5 units. each pool5 unit has a recptive field of 195x195.
这里写图片描述

3.2 Ablation studies

3.2.1 Performance layer-by-layer, without fine-tuning.

Features from fc7 generalize worse than features from fc6, this means 29% of the parameters can be removed without degrading mAP.
Pool4 features are computed using only6% parameters, but it can produces quite good results.
1,2–> Much of the CNN’s representational power comes from its convlutional layers .
3—> this finding suggests potential utility in computing a dense feature map (HOG-like) by using only the convolutional layers of CNN. This representaion would enable exprimentation with sliding-window detectors on top of pool5 features.

3.2.2 Performance layer-by-layer, with fine-tuning.

The boost from fine-tuning is much larger for fc6 and fc7.
1–>suggests that the pool5 features learned from Image Net are general and that most of the imporvement is gained from learning domain-specific non-linear classifiers on top of them.

3.2.3 comparision to recent feature learning methods

table2 rows8-10

3.3 Network architectures

We have found that the choice of architecure has a large effect on R-CNN detection performance.
1. O-net (13layers of 3x3 convs and 5 pooling layers) outperforms T-net
2. a considerable drawback :7 times longer (forward) than T-net

3.4 Detection error analysis

CNN features are much more discriminative than HOG, loose loaclization likely resluts from our use of bottom-up region proposals and the positional invariance learned from pre-training the CNN for whole-image calssification.
这里写图片描述

复制代码

    D. Hoiem, Y. Chodpathumwan, and Q. Dai. Diagnosing error
    in object detectors. In ECCV. 2012. 2, 7, 8

3.5 Bounding-box regression

with this simple approach fixes a large number of mislocalized detections, boosting mAP by 3-4 points

复制代码

    P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan.
    Object detection with discriminatively trained part
    based models. TPAMI, 2010.

4 The ILSVRC2013 detection dataset

4.1 dataset overview

The dataset is split into threee sets:train(395918), val(20121), test(40152). Unlike the val and test sets ,the train images are not exhaustively annotated.
Our general strategy is to rely heavily on the val set and use some of the train images as an auxiliary source of positive examples. To use val for both training and validation, and it is splited into roughly equally sized ‘val1’ ,’val2’.

?

it is important to produce an appoxiamtely class-balanced partition.(????????)
The one with the smallest maximun relative class imbalance was selected.

4.2 Region proposals

On val,selective search resulted in an average of 2403 region proposals per image with a 91.6% recall of all ground-truth bounding boxes(0.5 IoU).(PASCLA 98%)

4.3 Training data

training data = val1+N(ground-truth from train)—>val1+trainN
(n<-{0,500,1000})
Training data is required for three procedures in RCNN:
1. CNN fine-tuning(val1+trainN)
2. detector SVM training(val1+trainN)
3. bounding-box regressor training(val1)

4.4 Validation and evaluation

We validated data usage choices and the effect of fine-tuning and bounding-box regression on the val2.(with the same hyperparameters as in PASCAL)

4.5 Ablation study

No-fine-tuning + val1 —>20.9%
No-fine-tuning + val1+trainN —>24.1%(N=500/1000 no difference)
Fine-tuning + val1 —>26.5% (overfitting due to the samll number of positive examples)
Fine-tuning + val1 +train 1000—>29.7%
Bounding-box regression —>31.0%

4.6 Relationship to OverFeat

Overfeat can be seen roughly as a special case of RCNN.
selective search v.s. mulit-scale pyramid of regular square
per-class bounding-box regeressors **v.s.**singe boundign-box regressor
OverFeat 9X FASTER

5 Semantic segmentation

?

CPMC regions
‘full’ strategy=>ignores the region’s shape and computes CNN features directly
‘fg’ strategy=> only on a region’s foreground mask(?????????)
full+fg 1 hour v.s. O2P 10+ hours.
P.S. semantic thesis《Learning Deconvolution Network for Semantic Segmentation》

6 Appendix

6.1 Object proposal transformations

这里写图片描述
top row corresponds to p=0 pixels of the context padding, while the bottom row has p=16 pixels(heigher 3-5 mAP than p=0).

6.2 Positive Vs. negative and softmax

6.2.1 fine-tuning 0.5 SVM 0.3?

Hypothesis: the difference in how positives are difined is not fundamentally important, because1 the fine-tuning data is limited and2 fine-tuning doesnot emphasize precise localization

6.2.2 why not softmax?

?

the softmax classifier was trained on randomly sampled negative examples rather than on the subset of ‘hard negatives’ used for svm training (??????)
this result shows that it’s possible to obtain close to the same level of performance without training svms after fine-tuning.(??????)

6.3 Bounding-box regression

$P_x P_y P_w P_h$ specifies the pixel coordinates of the center of proposal p’s boungding box togeter with P’s width and height in pixels
这里写图片描述

Two issues:
1. $\lambda$ is important
2. P is the nearest(maximun IoU) to G

6.4 Analysis of cross-dataset redundancy

?

outline

Authors
Abstract
Introduction
- 1 from hog-cnns
- 2 two problems
- 3 efficient
- 4 dominant error mode
Object detection with R-CNN
- 1 module design
  - 11 region proposals
  - 12 Feature extraction
- 2 test-time detection
  - 21 run-time analysis
- 3 training
  - 31 supervised pre-treining
  - 32 domain-specific fine-tuning
  - 32 Object category classifiers
- 4 Results on PASCAL VOC 2010-2012
- 5 Results on ILSVRC2013
Visualizationablationand modes of error
- 1 Visualizing learned features
- 2 Ablation studies
  - 21 Performance layer-by-layer without fine-tuning
  - 22 Performance layer-by-layer with fine-tuning
  - 23 comparision to recent feature learning methods
- 3 Network architectures
- 4 Detection error analysis
- 5 Bounding-box regression
The ILSVRC2013 detection dataset
- 1 dataset overview
- 2 Region proposals
- 3 Training data
- 4 Validation and evaluation
- 5 Ablation study
- 6 Relationship to OverFeat
Semantic segmentation
- Appendix

复制代码

* 1 Object proposal transformations
* 2 Positive Vs negative and softmax 
  * 21 fine-tuning 05 SVM 03
  * 22 why not softmax

- 3 Bounding-box regression
- 4 Analysis of cross-dataset redundancy
- outline

全部评论 (0)

还没有任何评论哟~

论文笔记：Rich feature hierarchies for accurate object detection and semantic segmentation

Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation 概要文章提出了一种目标检测算法，使得平均准确度相较于目...

论文笔记|Rich feature hierarchies for accurate object detection and semantic segmentation

Authors RossGirshick/JeffDonahue/TrevorDarrell/JitendraMalik RossGirshick Abstract RCNN：RegionswithC...

Rich feature hierarchies for accurate object detection and semantic segmentation论文笔记

1研究目的和方法文献的研究目的是为了提高对象检测和语义分割的准确性，特别是在PASCALVOC这样的标准数据集上。研究的核心是提出一种名为RCNN（RegionswithCNNfeatures）的新...

笔记二 | Rich feature hierarchies for accurate object detection and semantic segmentation

分为三个部分第一部分regionproposal 方法selectivesearch 一二用RCNN做目标检测算法流程：提取候选的region，然后提取 1Regionproposal 使用了s...

Rich feature hierarchies for accurate object detection and semantic segmentation（R-CNN，2013）论文笔记

Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation原文链接 RCNN阅读笔记一、解决问题二、解决方法（...

R-CNN（Rich feature hierarchies for accurate object detection and semantic segmentation）

Abstract 以往的目标检测算法不仅效果不好而且原理复杂，RCNN是一种简单并且可扩展的算法，在VOC2012数据集上meanaverageprecisionmAP提高了30%。

1、《Rich feature hierarchies for accurate object detection and semantic segmentation-v5》

一、背景传统的目标识别方法在当时的数据集上性能逐渐平缓，一般效果较好的都是复杂的集成系统，（由多个低水平的特征组成高水平特征）。与此同时，ALEXNET在目标分类上的优越表现给了作者灵感，是否能够将...

【论文精读】【RCNN】Rich feature hierarchies for accurate object detection and semantic segmentation

0.论文摘要在权威的的PASCALVOC数据集上测量的目标检测性能在过去几年中已经稳定下来。性能最好的方法是复杂的集成系统，通常将多个低级图像特征与高级上下文相结合。

（R-CNN）Rich feature hierarchies for accurate object detection and semantic segmentation论文阅读笔记2014

文章目录（RCNN）Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation论文阅读笔记2014 Abstrac...

目标检测 R-CNN 论文笔记（Rich feature hierarchies for accurate object detection and semantic segmentation）

目标检测RCNN论文笔记（Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation）如果对目标识别与检测尚不了解...

是否确定退出登录?

论文笔记|Rich feature hierarchies for accurate object detection and semantic segmentation

Authors

Abstract

1 Introduction

1.1 from hog->cnns

1.2 two problems

1.3 efficient

1.4 dominant error mode

2 Object detection with R-CNN

2.1 module design

2.1.1 region proposals

2.1.2 Feature extraction

2.2 test-time detection

2.2.1 run-time analysis

2.3 training

2.3.1 supervised pre-treining

2.3.2 domain-specific fine-tuning

2.3.2 Object category classifiers

2.4 Results on PASCAL VOC 2010-2012

2.5 Results on ILSVRC2013

3 Visualization,ablation,and modes of error

3.1 Visualizing learned features

3.2 Ablation studies

3.2.1 Performance layer-by-layer, without fine-tuning.

3.2.2 Performance layer-by-layer, with fine-tuning.

3.2.3 comparision to recent feature learning methods

3.3 Network architectures

3.4 Detection error analysis

3.5 Bounding-box regression

4 The ILSVRC2013 detection dataset

4.1 dataset overview

?

4.2 Region proposals

4.3 Training data

4.4 Validation and evaluation

4.5 Ablation study

4.6 Relationship to OverFeat

5 Semantic segmentation

?

6 Appendix

6.1 Object proposal transformations

6.2 Positive Vs. negative and softmax

6.2.1 fine-tuning 0.5 SVM 0.3?

6.2.2 why not softmax?

?

6.3 Bounding-box regression

6.4 Analysis of cross-dataset redundancy

?

outline

全部评论 (0)

相关文章推荐

论文笔记：Rich feature hierarchies for accurate object detection and semantic segmentation

论文笔记|Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation论文笔记

笔记二 | Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation（R-CNN，2013）论文笔记

R-CNN（Rich feature hierarchies for accurate object detection and semantic segmentation）

1、《Rich feature hierarchies for accurate object detection and semantic segmentation-v5》

【论文精读】【RCNN】Rich feature hierarchies for accurate object detection and semantic segmentation

（R-CNN）Rich feature hierarchies for accurate object detection and semantic segmentation论文阅读笔记2014

目标检测 R-CNN 论文笔记（Rich feature hierarchies for accurate object detection and semantic segmentation）